An Explainable Deep Learning Framework Integrating DNA Sequence and Transcription Initiation Signals for Gene Expression Prediction

Scritto il 01/07/2026
da Jianbo Qiao

ACS Synth Biol. 2026 Jul 1. doi: 10.1021/acssynbio.6c00275. Online ahead of print.

ABSTRACT

Gene expression is a fundamental process in cellular function and organismal phenotype formation, governed by intricate and finely tuned regulatory mechanisms. Deciphering the regulatory code of gene expression and elucidating the transcriptional effects of the genome remain critical challenges in human genetics. While numerous deep learning approaches have been developed to predict gene expression levels, most prioritize DNA sequence information, often overlooking the impact of transcription initiation signals and lacking interpretability regarding both model behavior and feature contributions. In this study, we present an interpretable deep learning framework based on convolutional neural networks with a Gated Recurrent Unit. This method efficiently predicts gene expression levels by integrating DNA sequence data, mRNA half-life features, and transcription initiation signals. We further validate the model on GM12878, HepG2, and K562 data sets, demonstrating its robustness across different human cell lines. Moreover, interpretability analysis demonstrates that the model provides valuable insights into gene expression mechanisms, underscoring its practical utility as a robust tool for understanding and predicting gene expression levels.

PMID:42384944 | DOI:10.1021/acssynbio.6c00275