Wanwen Zeng

Featured in: Favicon

biomedcentral.com Favicon

biorxiv.org Favicon

nature.com Favicon

genomebiology.biomedcentral.com

Articles

CREATE: cell-type-specific cis-regulatory element identification via discrete embedding | Nature Communications

1 month ago | nature.com | Xuejian Cui |Qijin Yin |Zijing Gao |Zhen Li |Xiaoyang Chen |Shengquan Chen | +4 more

Cis-regulatory elements (CREs), including enhancers, silencers, promoters and insulators, play pivotal roles in orchestrating gene regulatory mechanisms that drive complex biological traits. However, current approaches for CRE identification are predominantly sequence-based and typically focus on individual CRE types, limiting insights into their cell-type-specific functions and regulatory dynamics. Here, we present CREATE, a multimodal deep learning framework based on Vector Quantized Variational AutoEncoder, tailored for comprehensive CRE identification and characterization. CREATE integrates genomic sequences, chromatin accessibility, and chromatin interaction data to generate discrete CRE embeddings, enabling accurate multi-class classification and robust characterization of CREs. CREATE excels in identifying cell-type-specific CREs, and provides quantitative and interpretable insights into CRE-specific features, uncovering the underlying regulatory codes. By facilitating large-scale prediction of CREs in specific cell types, CREATE enhances the recognition of disease- or phenotype-associated biological variabilities of CREs, thus advancing our understanding of gene regulatory landscapes and their roles in health and disease. Cui et al. present CREATE, a platform for the identification of multi-class cell-type-specific CREs by integrating multi-omics data. CREATE interpretably extracts discrete CRE embeddings, quantitatively unveils CRE-specific features, and effectively enables large scale prediction of CREs.
EpiGePT: a pretrained transformer-based language model for context-specific human epigenomics - Genome Biology

Dec 18, 2024 | genomebiology.biomedcentral.com | Zijing Gao |Qiao Liu |Wanwen Zeng |Rui Jiang

We used three different datasets in the experiments. For chromatin accessible data, we downloaded DNase bam files across 129 human biosamples from ENCODE [21] project (Additional file 2: Table S6). We divided the human hg19 genome into 200-bp non-overlapping bins, and we assigned the label for each bin in each cell type. For the regression design, we pooled the bam files of multiple replicates for a cell type, and obtain the raw read count \({n}_{lk}\) for bin \(l\) in cell type \(k\).
EpiGePT: a Pretrained Transformer model for epigenomics

Feb 3, 2024 | biorxiv.org | Zijing Gao |Qiao Liu |Wanwen Zeng |Rui Jiang

AbstractThe inherent similarities between natural language and biological sequences have given rise to great interest in adapting the transformer-based large language models (LLMs) underlying recent breakthroughs in natural language processing (references), for applications in genomics.
EpiGePT: a Pretrained Transformer model for epigenomics

Jul 18, 2023 | biorxiv.org | Zijing Gao |Qiao Liu |Wanwen Zeng |Rui Jiang

Thank you for your interest in spreading the word about bioRxiv. NOTE: Your email address is requested solely to identify you as the sender of this article. Your Email * Your Name * Send To * Enter multiple addresses on separate lines or separate them with commas. Message Subject (Your Name) has forwarded a page to you from bioRxiv Message Body (Your Name) thought you would like to see this page from the bioRxiv website.