-
Dec 5, 2024 |
nature.com | James Zou
AbstractLarge-scale gene-expression data are being leveraged to pretrain models that implicitly learn gene and cellular functions. However, such models require extensive data curation and training. Here we explore a much simpler alternative: leveraging ChatGPT embeddings of genes based on the literature.
-
Nov 21, 2024 |
nature.com | Mehran Karimzadeh |Helen Li |Olivier Elemento |James Zou |Fereydoun Hormozdiari |Babak Alipanahi
AbstractLiquid biopsies have the potential to revolutionize cancer care through non-invasive early detection of tumors. Developing a robust liquid biopsy test requires collecting high-dimensional data from a large number of blood samples across heterogeneous groups of patients. We propose that the generative capability of variational auto-encoders enables learning a robust and generalizable signature of blood-based biomarkers.
-
Oct 29, 2024 |
biorxiv.org | James Zou
AbstractPredicting how perturbation of a target gene affects the expression of other genes is a critical component of understanding cell biology. This is a challenging prediction problem as the model must capture complex gene-gene relationships and the output is high-dimensional and sparse.
-
Jun 10, 2024 |
cell.com | Moritz Gerstung |David C Liu |Marzyeh Ghassemi |James Zou
Experts discuss the challenges and opportunities of using artificial intelligence (AI) to study the evolution of cancer cells and their microenvironment, improve diagnosis, predict treatment response, and ensure responsible implementation in the clinic.
-
May 17, 2024 |
biorxiv.org | Kevin Wu |Howard Chang |James Zou
AbstractLanguage models have enabled a new era of biological sequence modeling. However, extracting meaningful sequence-level embeddings from these models remains challenging. In this work, we introduce ProteinCLIP, which applies contrastive learning between a protein's amino acid sequence and curated text describing its function. ProteinCLIP thus learns to take a pre-trained protein language model's sequence embedding and refines it produce a function-centric embedding.
-
Apr 26, 2024 |
arxiv.org | James Zou
[Submitted on 26 Apr 2024] Title:Talking Nonsense: Probing Large Language Models' Understanding of Adversarial Gibberish Inputs View a PDF of the paper titled Talking Nonsense: Probing Large Language Models' Understanding of Adversarial Gibberish Inputs, by Valeriia Cherepanova and James Zou View PDF HTML (experimental) Abstract:Large language models (LLMs) exhibit excellent ability to understand human languages, but do they also understand their own language that appears gibberish to us?
-
Mar 12, 2024 |
hdsr.mitpress.mit.edu | Lingjiao Chen |Matei Zaharia |James Zou
GPT-3.5 and GPT-4 are the two most widely used large language model (LLM) services. However, when and how these models are updated over time is opaque. Here, we evaluate the March 2023 and June 2023 versions of GPT-3.5 and GPT-4 on several diverse tasks: 1) math problems, 2) sensitive/dangerous questions, 3) opinion surveys, 4) multi-hop knowledge-intensive questions, 5) generating code, 6) U.S. Medical License tests, and 7) visual reasoning.
-
Mar 5, 2024 |
biorxiv.org | James Zou
AbstractThere has been significant recent progress in leveraging large-scale gene expression data to develop foundation models for single-cell biology. Models such as Geneformer and scGPT implicitly learn gene and cellular functions from the gene expression profiles of millions of cells, which requires extensive data curation and resource-intensive training. Here we explore a much simpler alternative by leveraging ChatGPT embeddings of genes based on literature.
-
Feb 4, 2024 |
nature.com | James Zou
AbstractThe ability to computationally generate novel yet physically foldable protein structures could lead to new biological discoveries and new treatments targeting yet incurable diseases. Despite recent advances in protein structure prediction, directly generating diverse, novel protein structures from neural networks remains difficult. In this work, we present a diffusion-based generative model that generates protein backbone structures via a procedure inspired by the natural folding process.
-
Feb 2, 2024 |
biorxiv.org | Eric Sun |Rong Ma |James Zou
AbstractSpatially resolved single-cell transcriptomics have provided unprecedented insights into gene expression in situ, particularly in the context of cell interactions or organization of tissues. However, current technologies for profiling spatial gene expression at single-cell resolution are generally limited to the measurement of a small number of genes.