Refine
Document Type
- Master's Thesis (17)
- Bachelor Thesis (11)
- Conference Proceeding (11)
- Diploma Thesis (2)
Year of publication
- 2020 (41) (remove)
Language
- English (41) (remove)
Keywords
- Blockchain (14)
- Maschinelles Lernen (5)
- Bitcoin (4)
- Ethereum (4)
- Bioinformatik (3)
- Internet der Dinge (3)
- Algorithmus (2)
- Bildgebendes Verfahren (2)
- Bildverarbeitung (2)
- Biomedizin (2)
Institute
Gold cyanidation is a process by which gold is removed from low-grade ore. Due to its efficiency it has found widespread application around the world, including Peru. The process requires free cyanide in high concentration. After the gold extraction is completed, free cyanide as well as metal cyanide complexes remain in the effluent of gold mines and refineries. Often these effluents are kept in storage ponds where they pose considerable risk to health and environ-ment. Thus, it is preferable to degrade cyanide to minimize the risk of exposure. In the context of this thesis cyanide degradation was explored in a UV-light based prototype. Degradation with a combination of hydrogen peroxide and UV-light has proven to be very effective at degrading cyanide concentrations of 100 mg/L and 1000 mg/L. Furthermore, the presence of ammonia as a degradation product could also be confirmed. Membrane distillation may provide an alternative to cyanide destruction in the form of cyanide recovery. Promising results were gathered from several membrane experiment.
Footage of organoids taken by means of fluorescence microscopy and segmented as well as triangulated by image analysis software like LimeSeg and Mastodon often needs to be visualized in aesthetic manner for presentation of the results in scientific papers, talks and demonstrations. The goal of this work was to create a simple to use addon “Biobox” for the open source 3D – visualization package “Blender” which would allow to import triangulated 3D data with animation over time (4D), produced by image analysis software, and optimize it for efficient usage. ”Biobox” offers several visualization tools for the creation of rendered images and animation videos by biologists.
The optimization of imported data was performed by using Blender intern modifiers. The optimized data can then be visualized by using several tools built for visualizing the organoid in frozen, animated and semi-transparent manners. A dynamic link for object selection and dynamic data exchange between Blender and Mastodon was developed. Additionally, a user interface was developed for manual correction errors of segmentation and steering the object detection algorithms of LimeSeg. The benchmark of the developed addon “Biobox” was performed on real scientific data. The benchmark test demonstrated that developed optimization result in significant (~5 fold) decrease of RAM usage and acceleration of visualization more than 160 times.
Robust soft learning vector quantization (RSLVQ) is a probabilistic approach of Learning vector quantization (LVQ) algorithm. Basically, the RSLVQ approach describes its functionality with respect to Gaussian mixture model and its cost function is defined in terms of likelihood ratio. Our thesis work involves an approach of modifying standard RSLVQ with non-Gaussian density functions like logistic, lognormal, and Cauchy (referred as PLVQ). In this approach, we derive new update rules for prototypes using gradient of cost function with respect to non-Gaussian density functions. We also derive new learning rules for the model parameters like s and s, by differentiating the cost function with respect to parameters. The main goal of the thesis is to compare the performance results of PLVQ model with Gaussian-RSLVQ model. Therefore, the performance of these classification models have been tested on the Iris and Seeds dataset. To visualize the results of the classification models in an adequate way, the Principal component analysis (PCA) technique has been used.
This paper examines the communication channels used by innovation projects at the ProtoSpace Hamburg, when engaging with stakeholders, and tries to answer the thesis question whether new media channels improve the chances of success for innovation projects, when used for this communication. Expert interviews with eight experts in com-munication, innovation and stakeholder management were conducted and then analyzed through the application of Mayring´s qualitative content analysis, in order to answer the posed question.
The number of Internet of Things (IoT) devices is increasing rapidly. The Trustless Incentivized Remote Node Network, in short IN3 (Incubed), enables trustworthy and fast access to a blockchain for a large number of low-performance IoT devices. Although currently IN3 only supports the verification of Ethereum data, it is not limited to one blockchain due to modularity. This thesis describes the fundamentals, the concept and the implementation of the Bitcoin verification in IN3.
In this thesis two novel methods for removing undesired background illumination are de-veloped. These include a wavelet analysis based approach and an enhancement of a deep learning method. These methods have been compared with conventional methods, using real confocal microscopy images and synthetic generated microscopy images. These synthetic images were created utilizing a generator introduced in this thesis.
Glycans play an important role in the intracellular interactions of pathogenic bacteria. Pathogenic bacteria possess binding proteins capable of recognizing certain sugar motifs on other cells, which are found in glycan structures. Artificial carbohydrate synthesis allows scientists to recreate those sugar motifs in a rational, precise, and pure form. However, due to the high specificity of sugar-binding proteins, known as lectins, to glycan structures, methods for identifying suitable binding agents need to be developed. To tackle this hurdle, the Fraunhofer Institute for Cell Therapy and Immunology (Fraunhofer IZI) and the Max-Planck Institute of Colloids and Interfaces (MPIKG) developed a binding assay for the high throughput testing of sugar motifs that are presented on modular scaffolds formed by the assembly of four DNA strands into simple, branched DNA nanostructures. The first generation of this assay was used in combination with bacteria that express a fluorescent protein as a proof-of-concept. Here, the assay was optimized to be used with bacteria not possessing a marker gene for a fluorescent protein by staining their genomic DNA with SYBR® Green. For the binding assay, DNA nanostructures were combined with artificially synthesized mannose polymers, typical targets for many lectins on the surface of bacteria, presenting them in a defined constellation to bind bacteria strongly due to multivalent cooperativity. The testing of multiple mannose polymers identified monomeric mannose with a 5’-carbon linker and 1,2-linked dimeric mannose with linker as the best binding candidates for E. coli, presumably due to binding with the FimH protein on the surface. Despite similarities between the FimH proteins of E. coli and K. pneumoniae, binding was only observed between E. coli and the different sugar molecules on DNA structures. Furthermore, the degree of free movement seemed to affect the binding of mannose polymers to targeted proteins, since when utilizing a more flexible DNA nanostructure, an increase in binding could be observed. An alternative to the simple DNA nanostructures described above is the use of larger, more complex DNA origami structures consisting of several hundred strands. DNA origami structures are capable of carrying dozens of modifications at the same time. The results for the DNA origami structure showed a successful functionalization with up to 71 1,2-linked dimeric mannose with linker molecules. These results point towards a solution for the high-throughput analysis of potential binding agents for pathogenic bacteria e.g. as an alternative treatment for antibiotic-resistant.
Drought is one of the most common and dangerous threats plants have to face, costing the global agricultural sector billions of dollars every year and leading to the loss of tons of harvest. Until people drastically reduce their consumption of animal products or cellular agriculture comes of age, more and more crops will need to be produced to sustain the ever growing human population. Even then, as more areas on earth are becoming prone to drought due to climate change, we may still have to find or breed plant varieties more suitable to grow and prosper in these changing environments.
Plants respond to drought stress with a complex interplay of hormones, transcription factors, and many other functional or regulatory proteins and mapping out this web of agents is no trivial task. In the last two to three decades or so, machine learning has become immensely popular and is increasingly used to find patterns in situations that are too complex for the human mind to overlook. Even though much of the hype is focused on the latest developments in deep learning, relatively simple methods often yield superior results, especially when data is limited and expensive to gather.
This Master Thesis, conducted at the IPK in Gatersleben, develops an approach for shedding light on the phenotypic and transcriptomic processes that occur when a plant is subjected to stress. It centers around a random forest feature selection algorithm and although it is used here to illuminate drought stress response in Arabidopsis thaliana, it can be applied to all kinds of stresses in all kinds of plants.
Genetic sequence variations at the level of gene promoters influence the binding of transcription factors. In plants, this often leads to differential gene expression across natural accessions and crop cultivars. Some of these differences are propagated through molecular networks and lead to macroscopic phenotypes. However, the link between promoter sequence variation and the variation of its activity is not yet well understood. In this project, we use the power of deep learning in 728 genotypes of Arabidopsis thaliana to shed light on some aspects of that link. Convolutional neural networks were successfully implemented to predict the likelihood of a gene being expressed from its promoter sequence. These networks were also capable of highlighting known and putative new sequence motifs causal for the expression of genes. We tested our algorithms in various scenarios, including single and multiple point mutations, as well as indels on synthetic and real promoter sequences and the respective performance characteristics of the algorithm have been estimated. Finally, we showed that the decision boundary to classify genes as expressed and non-expressed depends on the sensitivity of the transcriptome profiling assay and changing it has an impact on the algorithm’s performance.
Data streams change their statistical behaviour over the time. These changes can occur gradually or abruptly with unforeseen reasons, which may effect the expected outcome. Thus it is important to detect concept drift as soon as it occurs. In this thesis we chose distance based methodology to detect presence of concept drift in the data streams. We used generalized learning vector quantization(GLVQ) and generalized matrix learning vector quantization( GMLVQ) classifiers for distance calculation between prototypes and data points. Chi-square and Kolmogorov–Smirnov tests are used to compare the distance distributions of test and train data sets to indicate the drift presence.
Anomaly Detection is a very acute technical problem among various business enterprises. In this thesis a combination of the Growing Neural Gas and the Generalized Matrix Learning Vector Quantization is presented as a solution based on collected theoretical and practical knowledge. The whole network is described and implemented along with references and experimental results. The proposed model is carefully documented and all the further open researching questions are stated for future investigations.
In response to prevailing environmental conditions, Arabidopsis thaliana plants must increase their photosynthetic capacity to acclimate to potential harmful environmental high light stress. In order to measure these changes in acclimation capacity, different high throughput imaging-based methods can be used. In this master thesis we studied different Arabidopsis thaliana knockout mutants-and accessions in their capacity to acclimate to potential harmful environmental high light and cold temperature conditions using a high throughput phenotyping system with an integrated chlorophyll fluorescence measurement system. In order to determine the acclimation capacity, Arabidopsis thaliana knockout mutants of previously not high light assigned genes as well as accessions of two different haplotype groups with a reference and alternative allele from different countries of origin were grown under switching high light and temperature environmental conditions. Photosynthetic analysis showed that knockout mutant plants did differ in their Photosystem II operating efficiency during an increased light irradiance switch but did not significantly differ a week later under the same circumstances from the wildtype. High throughput phenotyping of haplotype accessions revealed significant better acclimation capacity in non-photochemical quenching and steady-state photosynthetic efficiency in Russian domiciled accessions with an altered SPPA gene during high light and cold stress.
Financial fraud for banks can be a reason for huge monetary losses. Studies have shown that, if not mitigated, financial fraud can lead to bankruptcy for big financial institutions and even insolvency for individuals. Credit card fraud is a type of financial fraud that is ever growing. In the future, these numbers are expected to increase exponentially and that’s why a lot of researchers are focusing on machine learning techniques for detecting frauds. This task, however, is not a simple task. There are mainly two reasons
• varying behaviour in committing fraud
• high level of imbalance in the dataset (the majority of normal or genuine cases largely outnumbers the number of fraudulent cases)
A predictive model usually tends to be biased towards the majority of samples, in an unbalanced dataset, when this dataset is provided as an input to a predictive model.
In this Thesis this problem is tackled by implementing a data-level approach where different resampling methods such as undersampling, oversampling, and hybrid strategies along with bagging and boosting algorithmic approaches have been applied to a highly skewed dataset with 492 idetified frauds out of 284,807 transactions.
Predictive modelling algorithms like Logistic Regression, Random Forest, and XGBoost have been implemented along with different resampling techniques to predict fraudulent transactions.
The performance of the predictive models was evaluated based on Receiver Operating CharacteristicArea under the curve (AUC-ROC), Precision Recall Area under the Curve (AUC-PR), Precision, Recall, F1 score metrics.
Convolutional Neural network (CNN) has been one of most powerful and popular preprocessing techniques employed for image classification problems. Here, we use other signal processing techniques like Fourier transform and wavelet transform to preprocess the images in conjunction with different classifiers like MLP, LVQ, GLVQ and GMLVQ and compare its performance with CNN.
Vicia faba leaves and calli were transformed using CRISPR Cas RNP. Two kinds of CPP fused SpyCas9 were used with sgRNA7, sgRNA5 or sgRNA13 targeting PDS exon 1, PDS exon 2 or MgCh exon 3 respectively. RNP were applied using high pressure spraying, biolistic delivery, incubation in RNP solution and infiltration of leaf tissue. A PCR and restriction enzyme based approach was used for detection of mutation. Screening of 679 E. coli colonies containing the cloned fragments resulted in detection of 14 mutations. Most of the 14 mutations were deletions of sizes 150, 500 or 730 bp. 5 out of the 14 mutations were point mutations located two to three bp upstream of PAM.
Tokenization projects are currently very present when it comes to new blockchain technologies. After explaining the fundamentals of cross-chain interaction, the bachelor thesis will focus on tokenizing technology for Bitcoin on Ethereum. To get a more practical context, implementing the currently most successful decentralized tokenization project is described.
In bioinformatics one important task is to distinguish between native and mirror protein models based on the structural information. This information can be obtained from the atomic coordinates of the protein backbone. This thesis tackles the problem of distinction of these conformations, looking at the statistics of the dihedral angles’ distribution regarding the protein backbone. This distribution is visualized in Ramachandran plots. By means of an interpretable machine learning classification method – Generalized Matrix Learning Vector Quantization – we are able to distinguish between native and mirror protein models with high accuracy. Further, the classifier model supplies supplementary information on the important distributional regions for distinction, like α-helices and β-strands.
A Protein is a large molecule that consists of a vast number of atoms; one can only imagine the complexity of such a molecule. Protein is a series of amino acids that bind to each other to form specific sequences known as peptide chains. Proteins fold into three-dimensional conformations (or so-called protein’s native structure) to perform their functions. However, not every protein folds into a correct structure as a result of mutations occurring in their amino acid sequences. Consequently, this mutation causes many protein misfolding diseases. Protein folding is a severe problem in the biological field. Predicting changes in protein stability free energy in relation to the amino acid mutation (ΔΔG) aids to better comprehend the driving forces underlying how proteins fold to their native structures. Therefore, measuring the difference in Gibbs free energy provides more insight as to how protein folding occurs. Consequently, this knowledge might prove beneficial in designing new drugs to treat protein misfolding related diseases. The protein-energy profile aids in understanding the sequential, structural, and functional relationship, by assigning an energy profile to a protein structure. Additionally, measuring the changes in the protein-energy profile consequent to the mutation (ΔΔE) by using an approach derived from statistical physics will lead us to comprehend the protein structure thoroughly. In this work, we attempt to prove that ΔΔE values will be approximate to ΔΔG values, which can lead the future studies to consider that the energy profile is a good predictor of protein binding affinity as Gibbs free energy to solve the protein folding problem.
he automatic comparison of RNA/DNA or rather nucleotide sequences is a complex task requiring careful design due to the computational complexity. While alignment-based models suffer from computational costs in time, alignment-free models have to deal with appropriate data preprocessing and consistently designed mathematical data comparison. This work deals with the latter strategy. In particular, a systematic categorization is proposed, which emphasizes two key concepts that have to be combined for a successful comparison analysis: 1) the data transformation comprising adequate mathematical sequence coding and feature extraction, and 2) the subsequent (dis-)similarity evaluation of the transformed data by means of problem specific but mathematically consistent proximity measures. Respective approaches of different categories
of the introduced scheme are examined with regard to their suitability to distinguish natural RNA virus sequences from artificially generated ones encompassing varying degrees of biological feature preservation. The challenge in this application is the limited additional biological information available, such that the decision has to be made solely on the basis of the sequences and their
inherent structural characteristics. To address this, the present work focuses on interpretable, dissimilarity based classification models of machine learning, namely variants of Learning Vector Quantizers. These methods are known to be robust and highly interpretable, and therefore,
allow to evaluate the applied data transformations together with the chosen proximity measure with respect to the given discrimination task. First analysis results are provided and discussed, serving as a starting point for more in-depth analysis of this problem in the future.