Impact of genetic variation on the activity of gene promoters in Arabidopsis thaliana
- Genetic sequence variations at the level of gene promoters influence the binding of transcription factors. In plants, this often leads to differential gene expression across natural accessions and crop cultivars. Some of these differences are propagated through molecular networks and lead to macroscopic phenotypes. However, the link between promoter sequence variation and the variation of its activity is not yet well understood. In this project, we use the power of deep learning in 728 genotypes of Arabidopsis thaliana to shed light on some aspects of that link. Convolutional neural networks were successfully implemented to predict the likelihood of a gene being expressed from its promoter sequence. These networks were also capable of highlighting known and putative new sequence motifs causal for the expression of genes. We tested our algorithms in various scenarios, including single and multiple point mutations, as well as indels on synthetic and real promoter sequences and the respective performance characteristics of the algorithm have been estimated. Finally, we showed that the decision boundary to classify genes as expressed and non-expressed depends on the sensitivity of the transcriptome profiling assay and changing it has an impact on the algorithm’s performance.
Author: | Fritz Forbang Peleke |
---|---|
Advisor: | Röbbe Wünschiers, Jedrzej Jakub Szymanski, Mary-Ann Blätke |
Document Type: | Master's Thesis |
Language: | English |
Year of Completion: | 2020 |
Granting Institution: | Hochschule Mittweida |
Release Date: | 2023/03/30 |
GND Keyword: | Kulturpflanzen; Genexpression; Maschinelles Lernen |
Note: | Printexemplar Präsenzbestand |
Institutes: | Angewandte Computer‐ und Biowissenschaften |
DDC classes: | 572.865 Genexpression |
Open Access: | Innerhalb der Hochschule |
Licence (German): | Urheberrechtlich geschützt |