Efficient Genome-wide Association in Biobanks Using Topic Modeling Identifies Multiple Novel Disease Loci
Abstract Biobanks and national registries represent a powerful tool for genomic discovery, but rely on diagnostic codes that can be unreliable and fail to capture relationships between related diagnoses. We developed an efficient means of conducting genome-wide association studies using combinations...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Book |
Published: |
BMC,
2017-08-01T00:00:00Z.
|
Subjects: | |
Online Access: | Connect to this object online. |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
MARC
LEADER | 00000 am a22000003u 4500 | ||
---|---|---|---|
001 | doaj_c87f11294cbf4a7c8d9a2d297bcc84e3 | ||
042 | |a dc | ||
100 | 1 | 0 | |a Thomas H. McCoy |e author |
700 | 1 | 0 | |a Victor M. Castro |e author |
700 | 1 | 0 | |a Leslie A. Snapper |e author |
700 | 1 | 0 | |a Kamber L. Hart |e author |
700 | 1 | 0 | |a Roy H. Perlis |e author |
245 | 0 | 0 | |a Efficient Genome-wide Association in Biobanks Using Topic Modeling Identifies Multiple Novel Disease Loci |
260 | |b BMC, |c 2017-08-01T00:00:00Z. | ||
500 | |a 10.2119/molmed.2017.00100 | ||
500 | |a 1076-1551 | ||
500 | |a 1528-3658 | ||
520 | |a Abstract Biobanks and national registries represent a powerful tool for genomic discovery, but rely on diagnostic codes that can be unreliable and fail to capture relationships between related diagnoses. We developed an efficient means of conducting genome-wide association studies using combinations of diagnostic codes from electronic health records for 10,845 participants in a biobanking program at two large academic medical centers. Specifically, we applied latent Dirichilet allocation to fit 50 disease topics based on diagnostic codes, then conducted a genome-wide common-variant association for each topic. In sensitivity analysis, these results were contrasted with those obtained from traditional single-diagnosis phenome-wide association analysis, as well as those in which only a subset of diagnostic codes were included per topic. In meta-analysis across three biobank cohorts, we identified 23 disease-associated loci with p < 1e-15, including previously associated autoimmune disease loci. In all cases, observed significant associations were of greater magnitude than single phenome-wide diagnostic codes, and incorporation of less strongly loading diagnostic codes enhanced association. This strategy provides a more efficient means of identifying phenome-wide associations in biobanks with coded clinical data. | ||
546 | |a EN | ||
690 | |a Therapeutics. Pharmacology | ||
690 | |a RM1-950 | ||
690 | |a Biochemistry | ||
690 | |a QD415-436 | ||
655 | 7 | |a article |2 local | |
786 | 0 | |n Molecular Medicine, Vol 23, Iss 1, Pp 285-294 (2017) | |
787 | 0 | |n http://link.springer.com/article/10.2119/molmed.2017.00100 | |
787 | 0 | |n https://doaj.org/toc/1076-1551 | |
787 | 0 | |n https://doaj.org/toc/1528-3658 | |
856 | 4 | 1 | |u https://doaj.org/article/c87f11294cbf4a7c8d9a2d297bcc84e3 |z Connect to this object online. |