
ï“

ETH tool MetaGraph revolutionises search in DNA data

by | Oct 10, 2025 | Health, Research

Researchers at ETH Zurich have developed a new tool called MetaGraph that enables the search of large quantities of sequenced DNA data efficiently, precisely and cost-effectively. It structures the data through indexes, making it easily searchable. As open source software, it offers a wide range of applications in biomedical research.

DNA sequencing has revolutionized biomedicine, for example in the detection of rare hereditary diseases or tumor mutations. New methods such as next-generation sequencing have led to breakthroughs, including rapid analysis of the SARS-CoV-2 genome. Public databases such as the Sequence Read Archive or the European Nucleotide Archive store around 100 petabytes of sequence data, comparable to the entire text on the Internet.

Until now, searching this data required a lot of computing power, as complete data sets had to be downloaded. MetaGraph solves this through a full-text search similar to an Internet search engine. Users enter a sequence and receive hits within seconds or minutes where it occurs. The tool is inexpensive: The display of all public sequences fits on a few hard drives, and queries cost a maximum of $0.74 per megabase.

Symbolic image. Credits: freepik
Symbolic image. Credits: freepik

In a study published in the journal Nature on October 8, the researchers describe how it works: MetaGraph indexes and compresses the data 300 times using mathematical graphs that create a matrix-like structure. It links raw and metadata without loss of information and is scalable for growing data volumes.

The tool can accelerate genetic research, for example in the case of little-researched pathogens, pandemics or antibiotic resistance, by identifying resistance genes or bacteriophages. Developed since 2020, MetaGraph is already usable and indexes almost half of all available sequences of DNA, RNA and proteins from viruses to humans. The rest is to follow by the end of the year. It is also suitable for pharmaceutical companies with internal data and could even be used privately in the future, for example to identify plants.

Original Paper:

Karasikov, M., Mustafa, H., Danciu, D., Kulkov, O., Zimmermann, M., Barber, C., Rätsch, G., & Kahles, A.: Efficient and accurate search in petabase-scale sequence repositories. Nature 2025, doi:10.1038/s41586-025-09603-w


Editor: X-Press Journalistenbüro GbR

Gender Notice. The personal designations used in this text always refer equally to female, male and diverse persons. Double/triple naming and gendered designations are used for better readability. ected.