Posted on July 08, 2019
Data is the new oil has been the slogan of recent years, explaining the tremendous rise in power and the data hunger of tech companies such as Facebook. Emails, contacts, social interactions, online activity, payments, hobbies, interests, location profiles - everything is collected. However there are increasing concerns with the current practices of handling these vast treasures of data, particularly with third-party sharing, its exploitation for unethical purposes and leakage of not sufficiently secured data. These practices leave customers with uncomfortable feelings and spurn initiatives advocating for more privacy and security regulations.
Meanwhile, consumer genetic testing companies are advertising a range of low-priced products offering personalised health or ancestry information from nothing more than a bit of your saliva. Many are wondering: wouldn't one of those kits be a good present for friends, relatives or themselves? Working in the Bio-IT field, I feel that many are underestimating the amount of highly personal information that can be gained from genetic testing, and are therefore overlooking the consequences of sharing their genetic data with these companies. With that in mind, let's take a look at what genetic information can reveal about you.
Due to the imperfections of biology, no two humans have identical DNA. Even identical twins that start with the same DNA from a single fertilized egg will accumulate mutations over time. Some occur so early in embryonic development that the majority of cells will inherit them, leading to small but detectable differences in their genetic code (1). DNA therefore is a durable and unique marker of human identity, and much more precise than other biometric markers such as fingerprints.
There exist various databases collecting genomics information from consumer genetic tests. As shown in a 2018 study your genetic data can be used to reveal your identity with impressive accuracy. As of the time of the study, it was estimated that 60% of people with European descendancy can be pinpointed to a third-cousin or closer match based on existing data. With more and more genomic data becoming available, these numbers can grow rapidly.
Figure 1: A hypothetical family tree showing the reach of public genetic databases. In this example, one of your third cousins, who shares the same great-grandparent as you, posts their test results and makes you identifiable. Figure by Aparna Nathan (article)
Already, these techniques are being increasingly utilised by law enforcement. In May 2019, big discussions were sparked by a case of violent assault where DNA was uploaded to a database for comparison with available genetic data, leading to the identification of a suspect. Critics fear that this technique will be increasingly used for less serious crimes, particularly as there is astonishingly little oversight regarding your own right to genetic privacy if you are identified in this manner. The success of this technique shows: your genetic data can never be anonymous.
DNA phenotyping is the task of predicting a person’s physical features using only genetic data. This a very active research area. It is already possible to determine eye color, hair color and skin color with high confidence. A 2017 study successfully used genomic data to predict facial structure, voice, biological age, height, weight and other human physical traits. There are already companies that commercially offer to predict the physical appearance of an unknown person from only their DNA, as obtained for example from a tiny bit of saliva on a cigarette butt.
Figure 2: Actual (left) and predicted from DNA (right) faces (from this study).
When asked, people usually underestimate how much of these physical traits are due to genetic factors versus environment. Heritability is the scientific term describing how much of a trait, for example differences in body height, can be attributed solely to genetic differences. The heritability of eye color for instance is 95%, so eye color could nearly completely be determined by your genes. The heritability for body weight (!) and height is 80% and 70%, respectively. Consequently, it can be expected that advances in DNA phenotyping will enable determination of more and more accurate, biometric information from genetic data in the near future.
A long-standing question, not only among behavioural scientists, is how much of our personal character is determined by genetics and how much is due to our environment (e.g. education). There are an increasing number of studies suggesting that the majority of psychological differences can be attributed to genetic factors. As psychologist Robert Plomin writes in his 2018 book “Blueprint”:
In summary, parents matter, schools matter, and life experiences matter, but they don’t make a difference in shaping who we are. DNA is the only thing that makes a substantial systemic difference, accounting for 50 percent of the variance in psychological traits. The rest comes down to chance environmental experiences that do not have long-term effects.
Some of the traits that were already examined for heritability include school achievement (60%), verbal ability (60%), spatial ability (70%) and ability to remember faces (60%). All of these attributes are encoded in our genetic data. As with phenotypic information, it is therefore increasingly possible to predict a person's behavioural traits based on only DNA.
Some hereditary diseases such as cystic fibrosis, sickle cell disease, and haemophilia are related to abnormalities in a single gene, and can be identified by reading out your DNA. Many other disorders are related to multiple genes in combination with environmental factors. These are known as complex disorders.
A relatively new development is the possibility to determine an individual's risk for common complex disorders like coronary artery disease, atrial fibrillation and type-2 diabetes. So-called polygenic risk scores now make it possible to compute a significant fraction of increased risk for these diseases. This is done by combining multiple genetic markers into a single score to predict lifetime outcomes. For example a polygenic risk score could be established for the onset of the severe mental illness schizophrenia (80% heritability). These scores can be helpful information for doctors when considering treatment options. However, information about disabilities or mental health should deserve the highest protection. Think about how employers, insurance or finance institutions could predicate their actions if it were known that an onset of schizophrenia is likely. As of today, no company who has genetic data is obliged to tell you about the findings and judgements from it.
Following the laws of inheritance your parents, siblings and kids share roughly half of their genetic information with you. Other relatives also share significant parts: you share 12.5% of DNA with your cousin, and about 0.8% with a third-cousin. Therefore sharing your genetic data always means sharing information about your relatives. It means that it cannot be avoided that genetics testing companies are getting more information than they imply, and you should always consider the consequences for your relatives’ interests and rights.
It cannot be overstated: intensely private and personal information - in a surprising breadth and depth - can be gained from genetic data. It could be summarised like this in the oversimplified formula:
Genetic data = fingerprint + passport + appearance + behaviour + health information
So who shall you trust with the handling of your genetic data? We have become more and more accustomed to events of "we are sorry that we have to report a cybersecurity incident", from even the most technologically potent firms. So is it worth it to demand higher security standards for genetic testing companies? Given the power of genetic data, I think higher security requirements are a must. Leaking of material information is bad, but at least you can react. You can change passwords, emails, credit card numbers, in emergency even addresses. But you cannot change your genetic code.
In June 2019 it became known that personal, financial and medical data of the two largest american genetic testing companies, LabCorp and Quest Diagnostics, had been leaked, totalling to 20M patients. Both companies shared this data with a third-party provider which had been breached. It is a simple truth: the more parties with access to sensitive data, the more likely eventual data leakage becomes. Furthermore, consumer genetic testing companies tend to have a second business as data brokers, selling your genetic data to third parties.
It is a valid demand for consumers and patients to have a right for genetic data privacy and security. The GDPR explicitly includes “genetic information” in their data privacy regulation, which is a step in the right direction, but greater awareness about what we are giving away with this kind of data is of utmost pertinence in this fast-growing market. Not every company has data security as their top priority, so consumers need to maintain their own due diligence to make sure there is sufficient trust when sharing the deep information inherent in genetic data.
(1) Consumer genetic tests will typically read out only the highly-variable sites (polymorphisms) instead of the complete DNA. For anyone except identical twins these tests will give significant differences. Some information might only be extracted from more comprehensive “genomic” data. But due to the growing possibilities for imputation, i.e. increasing the information from limited available data using the data from others, the partial genetic data from consumer genetic tests is often sufficient.