学习发现基因表达预测的调控元件
Learning to Discover Regulatory Elements for Gene Expression Prediction
February 19, 2025
作者: Xingyu Su, Haiyang Yu, Degui Zhi, Shuiwang Ji
cs.AI
摘要
本研究探讨了从DNA序列预测基因表达的问题,其中核心挑战在于识别控制基因表达的关键调控元件。为此,我们提出了Seq2Exp网络,这是一种专门设计用于发现并提取驱动目标基因表达的调控元件的序列到表达网络,旨在提升基因表达预测的准确性。我们的方法深入挖掘了表观遗传信号、DNA序列及其相关调控元件之间的因果关系。具体而言,我们提出基于因果活性调控元件对表观遗传信号与DNA序列进行分解,并应用带有Beta分布的信息瓶颈理论,在整合其效应的同时滤除非因果成分。实验结果表明,Seq2Exp在基因表达预测任务中超越了现有基线模型,并在与MACS3等常用峰值检测统计方法的对比中,成功识别出了更具影响力的区域。相关源代码已作为AIRS库的一部分公开发布(https://github.com/divelab/AIRS/)。
English
We consider the problem of predicting gene expressions from DNA sequences. A
key challenge of this task is to find the regulatory elements that control gene
expressions. Here, we introduce Seq2Exp, a Sequence to Expression network
explicitly designed to discover and extract regulatory elements that drive
target gene expression, enhancing the accuracy of the gene expression
prediction. Our approach captures the causal relationship between epigenomic
signals, DNA sequences and their associated regulatory elements. Specifically,
we propose to decompose the epigenomic signals and the DNA sequence conditioned
on the causal active regulatory elements, and apply an information bottleneck
with the Beta distribution to combine their effects while filtering out
non-causal components. Our experiments demonstrate that Seq2Exp outperforms
existing baselines in gene expression prediction tasks and discovers
influential regions compared to commonly used statistical methods for peak
detection such as MACS3. The source code is released as part of the AIRS
library (https://github.com/divelab/AIRS/).Summary
AI-Generated Summary