Automating Genetic Literature Searches in Excel

Automating Genetic Literature Searches in Excel

Mastermind supports the ability to generate queries based on URL variables, which can be easily manipulated in spreadsheet software such as Microsoft Excel. The URLs can then take users to selected content in Mastermind based on modifiable input fields for disease, gene, variant and PubMed identification number (PMID).  In this post, we will detail how best to programmatically generate URLS in third party spreadsheet software  for any disease, gene, variant and/or specific article combination.

Generating Content-Specific Mastermind URLs

An example of a Mastermind URL is shown below.

Specifying genes The first element to note is the “gene=fbn1” string. Exchanging “fbn1” in this string with any other HGNC approved gene symbol will modify the URL to take you to the correct page in the Mastermind software. Information about the HGNC nomenclature can be found here.

Specifying disease The second element to note is the “disease=marfan%20syndrome” string. Diseases in Mastermind are pattered according to the Medical Subject Headings (MeSH) ontology. Because URLs do not accept spaces, the standard replacement string “%20” is used for all spaces. Commas that appear in MeSH terms remain unchanged. Information about the MeSH ontology for diseases can be found here and searches for specific entries can be performed here. Note that you may search the entire Mastermind database for articles matching your specified gene of interest by replacing the disease search term with “all” as in “disease=all” or otherwise by eliminating the entire “disease=marfan%20syndrome” string altogether.

Specifying variants The third element to note is the “mutation=C1039G” string. This permits the user to go directly to a known variant for the specified gene. Note that the variant is described at the protein level using the single letter amino acid code without any prepended “p.” as typifies HGVS nomenclature. Additional information on HGVS nomenclature can be found here. Please refer to our blog post describing in greater detail how to characterize additional variants.

Making Mastermind hyperlinks in Excel Many labs use Excel software to organize the tabulated data generated by their bioinformatic pipelines. An auto-generated data column with hyperlinks into the Mastermind database can be generated to facilitate variant interpretation workflows. The two Excel functions required to do this are HYPERLINK and CONCATENATE, with one cell each designated for a Gene and Variant input:

=HYPERLINK(CONCATENATE("",A1,"&mutation=",B1,"&mutation_source=fulltext"),"MM Link")

This formula will take the gene name (in lowercase HGNC format as described above) from Excel cell A1 and the variant name (in uppercase according to Mastermind’s variant nomenclature) from Excel cell B1. Note that in the hyperlink being assembled above, “https” has been converted to “http” and the “/#” string has been eliminated to create a viable URL that Excel will recognize and will redirect to the proper and secure Mastermind URL.

Hyperlinks will be generated in the Excel cell where the formula was inserted. The keyboard shortcut to open a hyperlink in Sheets is Alt+Enter. With Google Chrome set as your default browser and your Mastermind user credentials having previously been entered, the Mastermind software should automatically open to the specified content permitting accurate, high-volume variant interpretation.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s