Nature magazine discusses the ‘new era in scientific data sharing’


brainn-genomics-post

What are the pros and the cons in pooling and sharing clinically relevant genomic information?

Nature Editorial – 21 June 2016
Excerpt below:

 

When doctors in Ottawa saw a child with an unusual developmental disorder last year, they were stumped. Their patient had an abnormally small head and face and had been slow to develop. They sequenced the child’s genome hoping to find a genetic explanation, but came up with too many possible candidate genes to pinpoint a likely culprit. This still happens a lot in medicine: people with rare problems go undiagnosed. And that’s one reason behind a big push in science in recent years — the pooling and sharing of clinically relevant information.

In the Ottawa case, the doctors got lucky. They were able to search a database that contained information about other patients with undiagnosed diseases, and when they did so they found a second person with similar symptoms — and an identical mutation in one gene, EFTUD2. The finding allowed the Ottawa doctors to diagnose their patient with a disease called mandibulofacial dysostosis with microcephaly, and to begin to understand why mutations in EFTUD2 cause the disease’s symptoms.

That’s the upside of the new era of data sharing. But there is a possible downside too: invasion of privacy. Massive genetic studies in countries such as the United States, Qatar, Saudi Arabia and Brazil are collecting genetic data on millions of people, so there is a chance that a person’s identity could be dragged from those data ­— especially if they are linked to clinical information, such as medical history. The risk is that someone who volunteers their DNA could see their medical problems opened to public scrutiny.

This is a legitimate concern for many researchers, and is one reason why data sharing is easier said than done. Others include the lingering sense of ownership, and the career benefits offered to those who have privileged access. Those concerns relate to the standard model of data sharing, in which different groups of scientists deposit their results into centralized databases. This model has had some success, but researchers have already encountered problems, such as how to grant and control access to the pooled information.

Pooling it in the first place becomes more difficult as the data sets get larger and the underlying techniques more varied. Imagine the difficulty of finding a specific book by gathering all the contents of a dozen different national libraries and then devising a way to integrate the numerous ways in which they are filed, tracked, recorded and made available. It would be much easier to ask each library whether it holds that book. What if data sharing in science could go the same way?