NIH-Funded Project Aims to Build a ‘Google’ for Biomedical Data

Nineteen institutions across the country are working to integrate years of data, ranging from electronic health records to genomic sequences.

July 31, 2019 | By Ruth Hailu | STAT

Every year, the National Institutes of Health spends billions of dollars for biomedical research, ranging from basic science investigations into cell processes to clinical trials. The results are published in journals, presented in academic meetings, and then — building off of their findings — researchers move on to their next project.

But what happens to the data that’s collected and what more could we learn from it? If we aggregated all the data from countless years of research, might we learn something new about ourselves, the diseases that infect us, and possible treatments?

That’s the hope behind the Biomedical Data Translator program, launched by the NIH in 2016: to create a “Google” for biomedical data that could sift through hundreds of separate data sources to help researchers connect “dots” in datasets with distinct formats and peculiarities.

“There is a lot of information that is currently available, through publications, and through databases … and, at the end of the day, it’s really too much for a human to be trying to mine through and make sense of,” said Christine Colvis, who leads the translator initiative at the National Center for Advancing Translational Sciences.

The program has awarded about $17.5 million to 19 institutions across the country that are working to integrate years of data, ranging from electronic health records to genomic sequences, that had previously been spread across a variety of platforms, and then applying new machine learning tools to help organize and reason through the wealth of information.

This means that, unlike Google, the Translator would be able to make connections between datasets that had not previously been associated with each other. In the words of Colvis, the “translator would find that A is connected to B and that B is connected to C … and so on and show you how A is connected to Z using over 100 sources of data.”

The size and collaborative nature of the project is part of what makes it unique, but also part of what makes it successful, according to Colvis. Each group attends online meetings three or four times a week and travels to “hackathons,” held twice a year, where all the teams gather for a week to troubleshoot and discuss ideas in person.

“This project is actually trying to do a technical feasibility study: Is it feasible, first of all, to combine the different data sources into something common enough where you can ask questions,” said Will Byrd, a member of the team working on the project at the University of Alabama at Birmingham. “And once you’ve done that, is it then possible to do reasoning over this so you can then extract patterns or latent information and do things like drug re-purposing or drug discovery? That’s the hope.”

That initial feasibility phase of the project is set to end this year.

The selection process was also unusual, requiring applicants to work out the answers to a series of puzzles.

“You could go to the website and solve the puzzle, and each puzzle you solved unlocked one page of a PDF document that was the request for proposals. If you solved all the puzzles, then you could submit,” Byrd said. “They were trying to attract teams that had certain types of problem-solving skills.”

Some of the participating institutions work on data architecture, which includes standardizing the format of the various datasets.

“The problem is [that] these databases are not in a common format — they were never designed to work together,” Byrd said. “NIH has spent billions of dollars to collect these data, but people can’t access it. If you are a biologist or a physician-scientist, you have no chance, unless you’re like an expert programmer.”

Others are building off of this work to create a platform for researchers to sift through the information and interpret the results of their searches — work that would usually require a computational scientist. This platform could help researchers understand the mechanism behind and find treatments for rare diseases, and also help understand why certain drugs and compounds work.

Stefano Rensi, a research engineer who works on the project at Stanford University, said that their group’s main focus is to create a tool to help drug industry researchers scan the scientific literature to better understand the biological mechanisms of compounds they’ve identified as possible therapies.

“Maybe you do a large-scale medical screen and you get a bunch of hits and basically it tells you what chemicals are working, but it doesn’t necessarily tell you why they’re working,” Rensi said. “You’d be surprised the number of times that drugs and compounds come out, and how little we actually know about what they do.”

There are other projects that focus directly on improving patient outcomes, such as an AI system being developed at UAB. That teams’ leader, Matt Might, recently used this AI to help find what was causing his son’s life-threatening symptoms.

While the institutions’ research areas may differ, all involved emphasized the importance of the collaborative nature of the project.

“It’s basically like having a menu of great ideas from all the smartest people around the country,” Rensi said.

The physical separation of the groups makes it more challenging to keep priorities aligned, but weekly meetings and hackathons help keep members focused on the bigger picture, while also working independently at home institutions, said Paul Clemons, an institute scientist at the Broad Institute in Cambridge, Mass.“We’ve allowed the priorities of the overall program to help drive what the next round of milestones are,” Clemons said. “It’s a sort of constant rebalancing, and re-prioritization across all the teams to help us all move in a shared direction.”The goals of this program are ambitious, including to show “every disease that has symptom X and/or affects a particular cell type,” but those involved remain steadfast in their hope to make lasting change in the way patients are treated.“In the end, we do hope that we will have a system that can really help to augment human reasoning, and really advance the development of therapies so that we can get more treatments to more patients more quickly,” Colvis said.

TRNDS 2019

University of Rochester
Saunders Research Building
Helen Wood Hall Auditorium
255 Crittenden Blvd
Rochester, NY 14642

Date: Friday, September 13
Time: 7:30 am – 4:00pm


Sign-up for TRNDS updates and rare disease news

  • New Grants will Accelerate Clinical Trials in Rare Neurological Disorders
  • Neurogene Offers Access to Genetic Testing for Lysosomal Storage Disease
  • NIH launches 5-year, $10 million study on acute flaccid myelitis
  • Study refines ALS risk among first-degree relatives of patients with disease
  • NCATS Director Warns of 'Poorly Understood Public Health Implications of Rare Diseases'
  • Speeding Up Drug Discovery for Brain Diseases
  • NIH-Funded Project Aims to Build a ‘Google’ for Biomedical Data
  • Rare Disease Groups Seek Public Support to Renew Newborn Screening Act in Senate
  • Jsyne Gershkowitz, Amicus Therapeutics, TRNDS 2019
    (Podcast) Get to know Jayne Gershkowitz, Chief Patient Advocate at Amicus Therapeutics
  • PJ Brooks, NCATS, NIH, TRNDS
    Meet TRNDS 2019 Speaker PJ Brooks
  • Rare Pediatric Disease PRVs: FDA Updates Guidance
  • Batten Disease Study
    Neurogene Opens Natural History Study of CLN7, CLN5 Diseases at UT Southwestern
  • C-Path and NORD Collaborate to Launch Rare Disease Data Platform
  • TRNDS 2019 speaker Tauna Batiste, BDSRA
    Meet TRNDS Speaker Tauna Batiste
  • HD Patients Prefer Less Invasive, No-Placebo Gene Therapy Trials
  • Evidation & Eli Lilly Study Uses Apple Devices, Apps to Predict Cognitive Impairment
  • 'Dr. Google' Helps Some Patients Diagnose a Rare Disease
  • Antisense Drugs for HD, ALS & Prion Could Meet the Need for Brain Treatments
  • Ultragenyx Partners with GeneTx to Advance Treatment for Angelman Syndrome
  • Netflix's Medical Investigation Docuseries 'Diagnosis'
  • NIH All of Us Project Tops 270,000 Sign-ups
  • FDA Cancer Office Taps Syapse for Real-world Evidence Development
  • Gene Linked to Rare Neurological Disorder May Play a Role in Alzheimer’s
  • Gene Therapy Developers Issue Principles for Human Genome Editing
    Gene Therapy Developers Issue Principles for Human Genome Editing
  • Forget Single Genes: CRISPR Now Cuts & Splices Whole Chromosomes
  • Benjjamin Schlatka MC10 TRNDS
    TRNDS Speaker Benjamin Schlatka, MC10
  • TRNDS Danielle Edwards Jett Foundation
    TRNDS Speaker Danielle Edwards, Jett Foundation
  • His daughter has a rare disorder. He’s developing a novel gene therapy.
  • A Doctor Tested a New Treatment on Himself, Now it's Saving Other Lives
  • Study Presents Clinical, Genetic Landscape of Pompe Disease

Latest Tweets

TRNDS Speakers

Room Block

Hilton Garden Inn
Medical Center

30 Celebration Drive
Rochester, NY 14620

Phone: 585-424-4404

TRNDS Rate: $149/night