Efficient Genome-wide Association in Biobanks Using Topic Modeling Identifies Multiple Novel Disease Loci

Abstract Biobanks and national registries represent a powerful tool for genomic discovery, but rely on diagnostic codes that can be unreliable and fail to capture relationships between related diagnoses. We developed an efficient means of conducting genome-wide association studies using combinations...

Full description

Saved in:
Bibliographic Details
Main Authors: Thomas H. McCoy (Author), Victor M. Castro (Author), Leslie A. Snapper (Author), Kamber L. Hart (Author), Roy H. Perlis (Author)
Format: Book
Published: BMC, 2017-08-01T00:00:00Z.
Subjects:
Online Access:Connect to this object online.
Tags: Add Tag
No Tags, Be the first to tag this record!

MARC

LEADER 00000 am a22000003u 4500
001 doaj_c87f11294cbf4a7c8d9a2d297bcc84e3
042 |a dc 
100 1 0 |a Thomas H. McCoy  |e author 
700 1 0 |a Victor M. Castro  |e author 
700 1 0 |a Leslie A. Snapper  |e author 
700 1 0 |a Kamber L. Hart  |e author 
700 1 0 |a Roy H. Perlis  |e author 
245 0 0 |a Efficient Genome-wide Association in Biobanks Using Topic Modeling Identifies Multiple Novel Disease Loci 
260 |b BMC,   |c 2017-08-01T00:00:00Z. 
500 |a 10.2119/molmed.2017.00100 
500 |a 1076-1551 
500 |a 1528-3658 
520 |a Abstract Biobanks and national registries represent a powerful tool for genomic discovery, but rely on diagnostic codes that can be unreliable and fail to capture relationships between related diagnoses. We developed an efficient means of conducting genome-wide association studies using combinations of diagnostic codes from electronic health records for 10,845 participants in a biobanking program at two large academic medical centers. Specifically, we applied latent Dirichilet allocation to fit 50 disease topics based on diagnostic codes, then conducted a genome-wide common-variant association for each topic. In sensitivity analysis, these results were contrasted with those obtained from traditional single-diagnosis phenome-wide association analysis, as well as those in which only a subset of diagnostic codes were included per topic. In meta-analysis across three biobank cohorts, we identified 23 disease-associated loci with p < 1e-15, including previously associated autoimmune disease loci. In all cases, observed significant associations were of greater magnitude than single phenome-wide diagnostic codes, and incorporation of less strongly loading diagnostic codes enhanced association. This strategy provides a more efficient means of identifying phenome-wide associations in biobanks with coded clinical data. 
546 |a EN 
690 |a Therapeutics. Pharmacology 
690 |a RM1-950 
690 |a Biochemistry 
690 |a QD415-436 
655 7 |a article  |2 local 
786 0 |n Molecular Medicine, Vol 23, Iss 1, Pp 285-294 (2017) 
787 0 |n http://link.springer.com/article/10.2119/molmed.2017.00100 
787 0 |n https://doaj.org/toc/1076-1551 
787 0 |n https://doaj.org/toc/1528-3658 
856 4 1 |u https://doaj.org/article/c87f11294cbf4a7c8d9a2d297bcc84e3  |z Connect to this object online.