SAN FRANCISCO, CALIFORNIA—Researchers who have assembled a trove of genetic and medical data on 100,000 northern Californians unveiled their initial findings here this week at the annual meeting of the American Society of Human Genetics (ASHG). The effort, which may be the largest such "biobank" in the United States, has already yielded an intriguing connection between mortality and telomeres, the protective DNA sequences that cap chromosome ends, and found new links between genetic variants and disease traits. And that's just the beginning, say the biobank's curators at Kaiser Permanente (KP), the giant health care organization.
The project is one of many that aim to collect medical and DNA data on vast numbers of people and look for links between diseases, lifestyle factors, traits, and genes. More than a decade ago, the company deCODE genetics in Iceland led the way with a biobank now holding data on 140,000 Icelanders; and the UK Biobank, which has enrolled 500,000 people but hasn't yet tested their DNA, may be the largest such study in the world.
Although several large U.S. biobanks are under way, for instance at the Veterans Health Administration and Harvard University, KP's research division in Oakland, California, and collaborators at the University of California, San Francisco (UCSF) got a jump-start when they received a 2-year, $25 million National Institutes of Health (NIH) grant from the 2009 Recovery Act for an aging study. The team scanned for hundreds of thousands of DNA markers along the genomes of 100,000 Californian adults in the company's health care system. The researchers also measured the length of participants' telomeres, which typically shorten every time a cell divides until they reach a point that triggers the cell to enter a senescent state.
The idea is to link such genetic information with clinical data from the electronic medical records of the biobank's volunteers. (The participants, who also answered health surveys, averaged 63 years old, and 81% were white and the rest Asian, Latino, or African American.) For example, researchers using the Kaiser Permanente biobank have verified previously reported links between certain genetic markers, known as SNPs (single-nucleotide polymorphisms), and cholesterol measurements tied to heart disease risk. The data have revealed new SNPs that may influence cholesterol levels as well. Moreover, for some of the known cholesterol-linked SNPs, the strength of the association was much stronger than in the original work, and stronger than any other previous, similar SNP studies, says UCSF human geneticist Neil Risch, co-leader of the aging study with Catherine Schaefer, director of KP's Research Program on Genes, Environment, and Health. This is probably because of the biobank's large size and consistent, high-quality clinical information, which is an advantage compared to analyses that pool smaller, separate studies, Risch says.
On the telomere front, the KP team has verified that these DNA caps tend to be shorter in older people and in those who smoke and drink alcohol, but didn't confirm other previously reported links. For example, they didn't observe that telomeres were longer in people who exercised more. They did find an association between having short telomeres and an individual's risk of dying—another finding reported earlier in smaller studies. But the KP team hasn't yet determined if short telomeres somehow cause death directly or reflect other factors that contribute to mortality, which is a controversial question. (Some companies, including one co-founded by UCSF researcher and Nobel laureate Elizabeth Blackburn, whose lab measured telomeres for the KP study, are offering telomere tests even though critics say the value of such measurements isn't yet clear.)
The KP biobank, which will draw on a variety of anonymized data drawn from patients' medical records—from medications to brain images—is also open to outside researchers. "This is obviously a very rich set of data that we want to be widely used," Schaefer says. Her team will deposit a data set in dbGaP, an NIH database for sharing SNPs data sets. Researchers can also apply to collaborate with the Kaiser Permanente team. Exactly how it will be used will be "up to the creativity and ingenuity of lots of people," Risch says. For example, researchers could use geographical databases on air pollution to look for links between illness and pollution. The biobank may also grow—a total of 200,000 KP members have donated biological samples and 430,000 have filled out a survey saying they're interested in participating.
"It's great. They have a huge data set," says Aravinda Chakravarti, a human geneticist at Johns Hopkins University in Baltimore, Maryland, who is already discussing collaboration with KP. However, he expressed reservations about the general push to link genes to diseases—at the ASHG meeting, many talks discussed efforts to sequence part or all of peoples' genomes to uncover rarer disease genes than SNP studies can find. "The problem in our field is that we're making lists" of disease genes, Chakravarti says. Like some others, he would like to see more emphasis on understanding the biology of how those genes function and cause illness.