ac4C-AFL: A high-precision identification of human mRNA N4-acetylcytidine sites based on adaptive feature representation learning

RNA N4-acetylcytidine (ac4C) is a highly conserved RNA modification that plays a crucial role in controlling mRNA stability, processing, and translation. Consequently, accurate identification of ac4C sites across the genome is critical for understanding gene expression regulation mechanisms. In this...

Full description

Saved in:
Bibliographic Details
Main Authors: Nhat Truong Pham (Author), Annie Terrina Terrance (Author), Young-Jun Jeon (Author), Rajan Rakkiyappan (Author), Balachandran Manavalan (Author)
Format: Book
Published: Elsevier, 2024-06-01T00:00:00Z.
Subjects:
Online Access:Connect to this object online.
Tags: Add Tag
No Tags, Be the first to tag this record!

MARC

LEADER 00000 am a22000003u 4500
001 doaj_5e2f10b80c5e4b079927da0b80724e8a
042 |a dc 
100 1 0 |a Nhat Truong Pham  |e author 
700 1 0 |a Annie Terrina Terrance  |e author 
700 1 0 |a Young-Jun Jeon  |e author 
700 1 0 |a Rajan Rakkiyappan  |e author 
700 1 0 |a Balachandran Manavalan  |e author 
245 0 0 |a ac4C-AFL: A high-precision identification of human mRNA N4-acetylcytidine sites based on adaptive feature representation learning 
260 |b Elsevier,   |c 2024-06-01T00:00:00Z. 
500 |a 2162-2531 
500 |a 10.1016/j.omtn.2024.102192 
520 |a RNA N4-acetylcytidine (ac4C) is a highly conserved RNA modification that plays a crucial role in controlling mRNA stability, processing, and translation. Consequently, accurate identification of ac4C sites across the genome is critical for understanding gene expression regulation mechanisms. In this study, we have developed ac4C-AFL, a bioinformatics tool that precisely identifies ac4C sites from primary RNA sequences. In ac4C-AFL, we identified the optimal sequence length for model building and implemented an adaptive feature representation strategy that is capable of extracting the most representative features from RNA. To identify the most relevant features, we proposed a novel ensemble feature importance scoring strategy to rank features effectively. We then used this information to conduct the sequential forward search, which individually determine the optimal feature set from the 16 sequence-derived feature descriptors. Utilizing these optimal feature descriptors, we constructed 176 baseline models using 11 popular classifiers. The most efficient baseline models were identified using the two-step feature selection approach, whose predicted scores were integrated and trained with the appropriate classifier to develop the final prediction model. Our rigorous cross-validations and independent tests demonstrate that ac4C-AFL surpasses contemporary tools in predicting ac4C sites. Moreover, we have developed a publicly accessible web server at https://balalab-skku.org/ac4C-AFL/. 
546 |a EN 
690 |a Bioinformatics 
690 |a N4-acetylcytidine 
690 |a ensemble feature importance scoring strategy 
690 |a two-step feature selection 
690 |a bioinformatics 
690 |a machine learning 
690 |a Therapeutics. Pharmacology 
690 |a RM1-950 
655 7 |a article  |2 local 
786 0 |n Molecular Therapy: Nucleic Acids, Vol 35, Iss 2, Pp 102192- (2024) 
787 0 |n http://www.sciencedirect.com/science/article/pii/S2162253124000799 
787 0 |n https://doaj.org/toc/2162-2531 
856 4 1 |u https://doaj.org/article/5e2f10b80c5e4b079927da0b80724e8a  |z Connect to this object online.