Development of new statistical method likely to produce insights into human diseases
Jingyi “Jessica” Li’s UCLA research group, the Junction of Statistics and Biology (JSB), developed a new statistical method, called
AIDE, to recover full-length sequences of messenger RNA (mRNA) molecules from data generated by the second-generation RNA sequencing (RNA-seq) technology, which in the last decade has revolutionized the study of transcriptome — the collection of all mRNA molecules in a biological sample.
Li, a UCLA associate professor of statistics, and colleagues published their new research as the cover story in the December issue of Genome Research, a major international journal of genomics.
Alternative splicing is an important process in molecular biology that is known to play a critical role in many human diseases, and produces more than one mRNA sequences from a single gene; these distinct sequences are referred to as mRNA isoforms. Accurate identification and quantification of mRNA isoform is critical for biomedical researchers to understand disease mechanisms and to discover new biomarkers with the potential for treating diseases, such as breast cancer and melanoma.
Despite continual efforts among bioinformatics researchers to develop isoform discovery methods for second-generation RNA-seq data, accurate isoform discovery in a transcriptome-wide manner remains theoretically and computationally challenging for humans and other complex organisms, and “existing methods have no control on the false positive rate in their predicted isoforms,” Li said. The prevalence of falsely predicted isoforms has greatly hindered the feasibility and inflated the experimental validation costs for biologists to study isoforms from RNA-seq data, she said.
To address this challenge, Wei Vivian Li, a former JSB member and currently an assistant professor of biostatistics and epidemiology at Rutgers University, led the development of AIDE to increase the precision and robustness of isoform discovery. AIDE is the first isoform discovery method that enables the control of false isoform discoveries by employing a statistical testing procedure, which ensures that the discovered isoforms contribute significantly to explaining the observed RNA-seq data, Li said. AIDE is also unique in its selective leverage of knowledge from annotation databases in a data-adaptive manner to assist its isoform discovery.
“Compared with state-of-the-art methods, AIDE was shown to achieve more precise discoveries of full-length mRNA isoforms, and this advantage will make AIDE a useful tool for biologists to identify novel isoforms from disease samples with high confidence and to save experimental validation costs,” Li said.
The successful development of AIDE is the result of a collaboration with Hubing Shi’s laboratory, which conducted experimental validation and Xin Tong, who contributed to the original idea of the AIDE method.
The software package of AIDE is available at https://github.com/Vivianstats/AIDE.
Here is a description of the journal’s cover art work:
AIDE (the robot) is a robust statistical method that assembles short RNA-seq reads (blocks) into full-length mRNA isoforms (citadels). For this task, AIDE is the first bioinformatics tool that uses statistical testing (the magnifier, representing the p-value) to examine the assembled isoforms to control false discoveries (the citadels that do not pass the test are trashed), thus allowing users to discover novel isoforms with high confidence. AIDE will facilitate transcriptomic studies at the full-length isoform level from abundant second-generation short RNA-seq data. (Cover artwork by Zhongke Magic Color Enterprise, with conceptual input from Wei Vivian Li, Shan Li, Hubing Shi, and Jingyi Jessica Li).
Article by Professor Jingyi Jessica Li and Stuart Wolpert.