Selecting precise reference normal tissue samples for cancer research using a deep learning approach

Abstract Background Normal tissue samples are often employed as a control for understanding disease mechanisms, however, collecting matched normal tissues from patients is difficult in many instances. In cancer research, for example, the open cancer resources such as TCGA and TARGET do not provide m...

Full description

Saved in:

Bibliographic Details
Main Authors:	William Z. D. Zeng (Author), Benjamin S. Glicksberg (Author), Yangyan Li (Author), Bin Chen (Author)
Format:	Book
Published:	BMC, 2019-01-01T00:00:00Z.
Subjects:	article
Online Access:	Connect to this object online.
Tags:	Add Tag No Tags, Be the first to tag this record!

MARC


LEADER	00000 am a22000003u 4500
001	doaj_237a0a3cc7a94e479f8aed2f41a43ba5
042			\|a dc
100	1	0	\|a William Z. D. Zeng \|e author
700	1	0	\|a Benjamin S. Glicksberg \|e author
700	1	0	\|a Yangyan Li \|e author
700	1	0	\|a Bin Chen \|e author
245	0	0	\|a Selecting precise reference normal tissue samples for cancer research using a deep learning approach
260			\|b BMC, \|c 2019-01-01T00:00:00Z.
500			\|a 10.1186/s12920-018-0463-6
500			\|a 1755-8794
520			\|a Abstract Background Normal tissue samples are often employed as a control for understanding disease mechanisms, however, collecting matched normal tissues from patients is difficult in many instances. In cancer research, for example, the open cancer resources such as TCGA and TARGET do not provide matched tissue samples for every cancer or cancer subtype. The recent GTEx project has profiled samples from healthy individuals, providing an excellent resource for this field, yet the feasibility of using GTEx samples as the reference remains unanswered. Methods We analyze RNA-Seq data processed from the same computational pipeline and systematically evaluate GTEx as a potential reference resource. We use those cancers that have adjacent normal tissues in TCGA as a benchmark for the evaluation. To correlate tumor samples and normal samples, we explore top varying genes, reduced features from principal component analysis, and encoded features from an autoencoder neural network. We first evaluate whether these methods can identify the correct tissue of origin from GTEx for a given cancer and then seek to answer whether disease expression signatures are consistent between those derived from TCGA and from GTEx. Results Among 32 TCGA cancers, 18 cancers have less than 10 matched adjacent normal tissue samples. Among three methods, autoencoder performed the best in predicting tissue of origin, with 12 of 14 cancers correctly predicted. The reason for misclassification of two cancers is that none of normal samples from GTEx correlate well with any tumor samples in these cancers. This suggests that GTEx has matched tissues for the majority cancers, but not all. While using autoencoder to select proper normal samples for disease signature creation, we found that disease signatures derived from normal samples selected via an autoencoder from GTEx are consistent with those derived from adjacent samples from TCGA in many cases. Interestingly, choosing top 50 mostly correlated samples regardless of tissue type performed reasonably well or even better in some cancers. Conclusions Our findings demonstrate that samples from GTEx can serve as reference normal samples for cancers, especially those do not have available adjacent tissue samples. A deep-learning based approach holds promise to select proper normal samples.
546			\|a EN
690			\|a Drug repositioning
690			\|a Deep learning
690			\|a Autoencoder
690			\|a Disease signatures
690			\|a Internal medicine
690			\|a RC31-1245
690			\|a Genetics
690			\|a QH426-470
655	7		\|a article \|2 local
786	0		\|n BMC Medical Genomics, Vol 12, Iss S1, Pp 179-189 (2019)
787	0		\|n http://link.springer.com/article/10.1186/s12920-018-0463-6
787	0		\|n https://doaj.org/toc/1755-8794
856	4	1	\|u https://doaj.org/article/237a0a3cc7a94e479f8aed2f41a43ba5 \|z Connect to this object online.

Selecting precise reference normal tissue samples for cancer research using a deep learning approach

MARC

Similar Items