
Ben Langmead
Articles
-
Nov 2, 2024 |
biorxiv.org | Nathaniel Brown |Vikram S. Shivakumar |Ben Langmead |Johns Hopkins
AbstractCompressed full-text indexes enable efficient sequence classification against a pangenome or tree-of-life index. Past work on compressed-index classification used matching statistics or pseudo-matching lengths to capture the fine-grained co-linearity of exact matches. But these fail to capture coarse-grained information about whether seeds appear co-linearly in the reference. We present a novel approach that additionally obtains coarse-grained co-linearity ("chain") statistics.
-
May 30, 2024 |
biorxiv.org | Omar Ahmed |Christina Boucher |Ben Langmead |Johns Hopkins
AbstractTaxonomic sequence classification is a computational problem central to the study of metagenomics and evolution. Advances in compressed indexing with the r-index enable full-text pattern matching against large sequence collections. But the data structures that link pattern sequences to their clades of origin still do not scale well to large collections.
-
Nov 17, 2023 |
biorxiv.org | Ben Langmead |Johns Hopkins
AbstractCentrifuger is an efficient taxonomic classification method that compares sequencing reads against a microbial genome database. In Centrifuger, the Burrows-Wheeler transformed genome sequences are losslessly compressed using a novel scheme called run-block compression. Run-block compression achieves sublinear space complexity and is effective at compressing diverse microbial databases like RefSeq while supporting fast rank queries.
-
Jul 6, 2023 |
genome.cshlp.org | Daniel Baker |Ben Langmead |Johns Hopkins
↵* Corresponding author; email: langmea{at}cs.jhu.edu Abstract A genomic sketch is a small, probabilistic representation of the set of k-mers in a sequencing dataset. Sketches are building blocks for large-scale analyses that consider similarities between many pairs of sequences or sequence collections. While existing tools can easily compare 10,000s of genomes, relevant datasets can reach millions of sequences and beyond.
-
May 31, 2023 |
genome.cshlp.org | Omar Ahmed |Massimiliano Rossi |Christina Boucher |Ben Langmead
↵* Corresponding author; email: omaryfekry{at}gmail.com Abstract Tools that classify sequencing reads against a database of reference sequences require efficient index data structures. The r-index is a compressed full-text index that answers substring presence/absence, count and locate queries in space proportional to the amount of distinct sequence in the database: O(r) space where r is the number of Burrows-Wheeler runs.
Try JournoFinder For Free
Search and contact over 1M+ journalist profiles, browse 100M+ articles, and unlock powerful PR tools.
Start Your 7-Day Free Trial →