A federated ecosystem for sharing genomic, clinical data


Truly innovative scientific discoveries will only be available to the general population if researchers and clinicians can access and make comparisons across data from millions of individuals.

Excerpt from Science Magazine,  10 Jun 2016


Early data-sharing efforts have led to improved variant interpretation and development of treatments for rare diseases and some cancer types. However, such benefits will only be available to the general population if researchers and clinicians can access and make comparisons across data from millions of individuals.

Despite much progress, genomic and clinical data are still generally collected and studied in silos: by disease, by institution, and by country. Regulatory data-privacy requirements do not seamlessly lend themselves to the secure sharing of data within and across institutions and countries. Current practices in research and medicine hinder the sharing of data in ways that tangibly recognize an individual’s contributions. Tools and analytical methods are nonstandardized and incompatible, and the data are often stored in incompatible file formats. If we stay this course, the likely outcome will be an assortment of balkanized systems akin to those developed for U.S. electronic health records, which, although designed to advance human health by sharing clinical data across institutions, have by all measures fallen short of that goal because of a lack of interoperability.

A FEDERATED DATA ECOSYSTEM. The Global Alliance for Genomics and Health (GA4GH) was established in 2013 to enable responsible and effective sharing of genomic and clinical data in a way that is as simple as using the World Wide Web. GA4GH, which now brings together hundreds of individuals and organizations, was built on the hypothesis that the data underlying genomic medicine must be federated. That is, whereas data may be distributed across many databases and computers around the world, they must be virtually connected through software interfaces that allow seamless, authorized access. In contrast to large centralized data repositories, a federated system will allow legal data control to remain within the originating jurisdiction (see the figure). International consortia such as the International Cancer Genome Consortium (ICGC) have already adopted federated databases because the model allows local databases to maintain autonomy .

TOOL DEVELOPMENT AND USE. As a first step, the GA4GH Regulatory and Ethics Working Group (REWG) developed a framework document that provides basic principles and core elements for responsible data sharing and is founded on Article 27 of the 1948 Universal Declaration of Human Rights. This focus on human rights represents a paradigm shift with respect to data sharing, as most previous discussions focused solely on protection from harm without acknowledging the right to benefit from the fruits of scientific and medical advances. In practical terms, increased data sharing will enable researchers to make better predictions about disease risk, prevention, and treatment by virtue of having access to larger data sets. And through data exchanges that link the clinical and research communities, clinicians will be able to make better precision medicine decisions for individual patients.