Next Generation Sequencing has unleashed unprecedented growth of genomic knowledge. The exponential growth of genomic literature seen since 1980 makes it more & more difficult for clinical pathologists and geneticists to keep up with the new variants uncovered and documented through research.
The figure above shows the number of articles in PubMed that have “gene”, “genome”, “NGS”, “sequencing”, “DNA”, “variant”, or “RNA” in the title and abstract. In 2016, almost 200,000 new articles with one of they keywords were added to PubMed compared to just over 78,000 in 1996. Admittedly, this is a subset of all the genomic literature in PubMed as many genomic articles don’t include these keywords in the title or abstract. Nonetheless, this chart provides a good view of the growth of genomic research articles.
With this unprecedented growth in genomic research, it is becoming increasingly more difficult for human curators to search for and keep up with the literature tied to specific genes and variants. Pathologists, geneticists and variant curators are already spending too much time on incomplete searches for literature covering their VUSs (Variants of Uncertain Significance). An intractable problem like this requires a different view and a new set of tools to move variant interpretation forward.
Over the last 3 years, Genomenon has tackled the variant curation problem by indexing the full text of every genomic article and creating a searchable database to for every disease-gene-variant combination. To date, we’ve indexed over 3.3 million genomic articles to fully cover somatic cancer, genetic cancer, cardiomyopathy and infertility and we expect to index another 3 million articles to our database over the next 6 months to cover the entirety of genetic diseases.
With Mastermind, our genomic search engine, we don’t interpret variants – we believe humans are better equipped to make the calls on every variant. We’ve focused on getting pathologists and geneticists to an immediately accessible, comprehensive and prioritized set of literature for any disease-gene-variant data set they want to search, so they can quickly make the variant interpretation and aren’t spending their time spelunking to find the relevant literature.