Refine
Document Type
- Master's Thesis (121) (remove)
Year of publication
Language
- English (121) (remove)
Keywords
- Maschinelles Lernen (24)
- Vektorquantisierung (8)
- Blockchain (7)
- Algorithmus (5)
- Bioinformatik (5)
- Neuronales Netz (5)
- Deep learning (4)
- Kryptologie (4)
- Virtuelle Währung (4)
- China (3)
This thesis comprehensively explores factors contributing to malaria-induced anemia and severe malarial anemia (SMA). The study utilizes a comprehensive dataset to investigate immunological interactions, genetic variations, and temporal dynamics. Findings highlight the complex interplay between immune markers, genetic traits, and cohort-specific influences. Notably, age, HIV status, and genetic variations emerge as crucial factors influencing anemia risk. The incorporation of Poisson regression models sheds light on the genetic underpinnings of SMA, emphasizing the need for personalized interventions. Overall, this research provides valuable insights into the multifaceted nature of malaria-induced complications, paving the way for further molecular investigations and targeted interventions.
As new sensors are added to VR headsets, more data can be collected. This introduces a new potential threat to user privacy. We focused on the feasibility of extracting personal information from eye-tracking. To achieve this, we designed a preliminary user study focusing on the pupil response to audio stimuli. We used a variation of machine learning models to test the collected data to determine the feasibility of obtaining information such as the age or gender of the participant. Several of the experiments show promise for obtaining this information. We were able to extract with reasonable certainty whether caffeine was consumed and the gender of the participant. This demonstrates the unknown threat that embedded sensors pose to users. A further studies are planned to verify the results.
Computationally solving eigenvalue problems is a central problem in numerical analysis and as such has been the subject of extensive study. In this thesis we present four different methods to compute eigenvalues, each with its own characteristics, strengths and weaknesses. After formally introducing the methods we use them in various numerical experiments to test speed of convergence, stability as well as performance when used to compute eigenfaces, denoise images and compute the eigenvector centrality measure of a graph.
Robust soft learning vector quantization (RSLVQ) is a probabilistic approach of Learning vector quantization (LVQ) algorithm. Basically, the RSLVQ approach describes its functionality with respect to Gaussian mixture model and its cost function is defined in terms of likelihood ratio. Our thesis work involves an approach of modifying standard RSLVQ with non-Gaussian density functions like logistic, lognormal, and Cauchy (referred as PLVQ). In this approach, we derive new update rules for prototypes using gradient of cost function with respect to non-Gaussian density functions. We also derive new learning rules for the model parameters like s and s, by differentiating the cost function with respect to parameters. The main goal of the thesis is to compare the performance results of PLVQ model with Gaussian-RSLVQ model. Therefore, the performance of these classification models have been tested on the Iris and Seeds dataset. To visualize the results of the classification models in an adequate way, the Principal component analysis (PCA) technique has been used.
Machine learning models for timeseries have always been a special topic of interest due to their unique data structure. Recently, the introduction of attention improved the capabilities of recurrent neural networks and transformers with respect to their learning tasks such as machine translation. However, these models are usually subsymbolic architectures, making their inner working hard to interpret without comprehensive tools. In contrast, interpretable models such learning vector quantization are more transparent in the ability to interpret their decision process. This thesis tries to merge attention as a machine learning function with learning vector quantization to better handle timeseries data. A design on such a model is proposed and tested with a dataset used in connection with the attention based transformers. Although the proposed model did not yield the expected results, this work outlines improvements for further research on this approach.
In this paper, we conduct experiments to optimize the learning rates for the Generalized Learning Vector Quantization (GLVQ) model. Our approach leverages insights from cog- nitive science rooted in the profound intricacies of human thinking. Recognizing that human-like thinking has propelled humankind to its current state, we explore the applica- bility of cognitive science principles in enhancing machine learning. Prior research has demonstrated promising results when applying learning rate methods inspired by cognitive science to Learning Vector Quantization (LVQ) models. In this study, we extend this approach to GLVQ models. Specifically, we examine five distinct cognitive science-inspired GLVQ variants: Conditional Probability (CP), Dual Factor Heuristic (DFH), Middle Symmetry (MS), Loose Symmetry (LS), and Loose Symme- try with Rarity (LSR). Our experiments involve a comprehensive analysis of the performance of these cogni- tive science-derived learning rate techniques across various datasets, aiming to identify optimal settings and variants of cognitive science GLVQ model training. Through this research, we seek to unlock new avenues for enhancing the learning process in machine learning models by drawing inspiration from the rich complexities of human cognition. Keywords: machine learning, GLVQ, cognitive science, cognitive bias, learning rate op- timization, optimizers, human-like learning, Conditional Probability (CP), Dual Factor Heuristic (DFH), Middle Symmetry (MS), Loose Symmetry (LS), Loose Symmetry with Rarity (LSR).
Adversarial robustness of a nearest prototype classifier assures safe deployment in sensitive use fields. Much research has been conducted on artificial neural networks regarding their robustness against adversarial attacks, whereas nearest prototype classifiers have not chalked similar successes. This thesis presents the learning dynamics and numerical stability regarding the Crammer-normalization and the Hein-normalization for adversarial robustness of nearest prototype classifiers. Results of conducted experiments are penned down and analyzed to ascertain the bounds given by Saralajew et al. and Hein et al. for adversarial robustness of nearest prototype classifiers.
Traditional user management on the Internet has historically required individuals to give up control over their identities. In contrast, decentralized solutions promise to empower users and foster decentralized interactions. Over the last few years, the development of decentralized accounts and tokens has significantly increased, aiming at broader user adoption and shared social economies.
This thesis delves into smart contract standards and social infrastructure for Ethereum-based blockchains to enable identity-based data exchange between abstracted blockchain accounts. In this regard, the standardization landscapes of account and social token developments were analyzed in-depth to form guidelines that allow users to retain complete control over their data and grant access selectively.
Based on the evaluations, a pioneering Solidity standard is presented, natively integrating consensual restrictive on-chain assets for abstracted blockchain accounts. Further, the architecture of a decentralized messaging service has been defined to outline how new token and account concepts can be intertwined with efficient and minimal data-sharing principles to ensure security and privacy, while merging traditional server environments with global ledgers.
Analysis of the Forensic Preparation of Biometric Facial Features for Digital User Authentication
(2023)
Biometrics has become a popular method of securing access to data as it eliminates the need for users to remember a password. Although exploiting the vulnerabilities of biometric systems increased with their usage, these could also be helpful during criminal casework.
This thesis aims to evaluate approaches to bypass electronic devices with forged faces to access data for law enforcement. Here, obtaining the necessary data in a timely manner is critical. However, unlocking the devices with a password can take several years with a brute force attack. Consequently, biometrics could be a quicker alternative for unlocking.
Various approaches were examined to bypass current face recognition technologies. The first approaches included printing the user's face on regular paper and aimed to unlock devices performing face recognition in the visible spectrum. Further approaches consisted of printing the user's infrared image and creating three-dimensional masks to bypass devices performing face recognition in the near-infrared. Additionally, the underlying software responsible for face recognition was reverse-engineered to get information about its operation mode.
The experiments demonstrate that forged faces can partly bypass face recognition and obtain secured data. Devices performing face recognition in the visible spectrum can be unlocked with a printed image of the user's face. Regarding devices with advanced near-infrared face recognition, only one could be bypassed with a three-dimensional face mask. In addition, its underlying software provided evidence about the demands of face recognition. Other devices under attack remained locked, and their software provided no clues.
Analysis of Continuous Learning Strategies at the Example of Replay-Based Text Classification
(2023)
Continuous learning is a research field that has significantly boosted in recent years due to highly complex machine and deep learning models. Whereas static models need to be retrained entirely from scratch when new data get available, continuous models progressively adapt to new data saving computational resources. In this context, this work analyzes parameters impacting replay-based continuous learning approaches at the example of a data-incremental text classification task using an MLP and LSTM. Generally, it was found that replay improves the results compared to naive approaches but achieves not the performance of a static model. Mainly, the performances increased with more replayed examples, and the number of training iterations has a significant influence as it can partly control the stability-plasticity-trade-off. In contrast, the impact of balancing the buffer and the strategy to select examples to store in the replay buffer were found to have a minor impact on the results in the present case.
RNA tertiary contact interactions between RNA tetraloops and their receptors stabilize the folding of ribosomal RNA and support the maturation of the ribosome. Here we use FRET assisted structure prediction to develop structural models of two ribosomal tertiary contacts, one consisting of a kissing loop and a GAAA tetraloop and one consisting of the tetraloop receptor (TLR) and a GAAA tetraloop. We build bound and unbound states of the ribosomal contacts de novo, label the RNA in silico and compute FRET histograms based on MD simulations and accessible contact volume (ACV) calculations. The predicted mean FRET efficiency from molecular dynamics (MD) simulations and ACV determination show agreement for the KL-TLGAAA construct. The KL construct revealed too high FRET efficiency and artificial dye behavior, which requires further investigation of the model. In the case of the TLR, the importance of the correct dye and construct parameters in the modeling was shown, which also leads to a renewed modeling. This hybrid approach of experiment and simulation will promote the elucidation of dynamic RNA tertiary contacts and accelerate the discovery of novel RNA interactions as potential future drug targets.
To investigate the effects of climate change on interactions within ecosystems, a microcosm experiment was conducted. The effects of temperature increase and predator diversity on Collembola communities and their decomposition rate were investigated. The predators used were mites and Chilopods, whose predation effects on several response variables were analysed. This data included Collembola abundance, biomass and body mass as well as basal respiration and microbial biomass carbon. These response variables were tested against the predictors in several models. Temperature showed high significance in interaction with mite abundance in almost all models. Furthermore, the results of the basal respiration and microbial biomass carbon support the suggestion of a trophic cascade within the animal interaction.
This scientific work reveals the potential for the development of the renewable energy market, due to many reasons. The reasons are the unstable political situation in the world, rising energy prices, environmental degradation and the growing demand of Ger man residents for government measures to reduce the negative impact on the environment. This work is related to business planning and development using strategies based on the above reasons. The purpose of the study is to develop methods for successfully regulating the market for renewable resources to solve the problem of environmental pollution through the promotion of environmentally friendly products. The work explores the driving forces and problems hindering the development of the market for renewable resources. The problems raised concerned all interested parties, from consumers and producers to the state body for regulating and stimulating the industry . An analysis was also made of the methods of environmentally oriented companies and the tools they use to strengthen their positions in the market. Based on the data obtained from the conducted research, a concept and business strategy for a new environmentally oriented generation” was created. The business consulting company “Sun’s idea of the new company is to involve all parties using marketing tools, creating a healthy competitive environment among commercial companies and benefiting not only the companies themselves but also the end user of the products and the German government.
The occurence of prostate cancer (PCa) has been consistently rising since three decades and remains the third leading cause of cancer-related deaths after lung and bowel cancer in Germany. Despite of new methods of early detection, such as prostate-specific antigen (PSA) testing, it persists to be the most common cancer in german men with over 63,400 new diagnoses in Germany every year and exhibits high prevalence in other countries of Northern andWestern Europe as well [64]. Men over the age of 70 are most commonly affected by the lethal disease, whereas an indisposition before 50 is rare. The malignant prostate tumor can be healed through operation or irradiation while the cancer hasn’t reached the stage of metastasis in which other therapeutic methods have to be employed [14] [15]. In the metastatic phase, the patient usually exhibits symptoms when the tumors size affects the urethra or the cancer spreads to other tissue, often the bones [16].
The high prevalence of this disease marks the importance of further research into prognosis and diagnosis methods, whereby identification of further biomarkers in PCa poses a major topic of scientific analysis. For this task, the effectiveness of high-throughput RNA sequencing of the transcriptome (RNA molecules of an organism or specific cell type) is frequently exploited [66]. RNA sequencing or RNA-Seq in short, offers the possibility of transcriptome assessment, enabling the identification of transcriptional aberrations in diseases as well as uncharacterized RNA species such as non-coding RNAs (ncRNAs) which remain undetected by conventional methods [49]. To alleviate interpretation of the sequenced reads they are assembled to reconstruct the transcriptome as close to the original state as possible, thus enabling rapid detection of relevant biomolecules in the data [49]. Transcriptomic studies often require highly accurate and complete gene annotations on the reference genome of the examined organism. However, most gene annotations and reference genomes are far from complete, containing a multitude of unidentified protein-coding and non-coding genes and transcripts. Therefore, refinement of reference genomes and annotations by inclusion of novel sequences, discovered in high quality transcriptome assemblies, is necessary [24].
Glycans play an important role in the intracellular interactions of pathogenic bacteria. Pathogenic bacteria possess binding proteins capable of recognizing certain sugar motifs on other cells, which are found in glycan structures. Artificial carbohydrate synthesis allows scientists to recreate those sugar motifs in a rational, precise, and pure form. However, due to the high specificity of sugar-binding proteins, known as lectins, to glycan structures, methods for identifying suitable binding agents need to be developed. To tackle this hurdle, the Fraunhofer Institute for Cell Therapy and Immunology (Fraunhofer IZI) and the Max-Planck Institute of Colloids and Interfaces (MPIKG) developed a binding assay for the high throughput testing of sugar motifs that are presented on modular scaffolds formed by the assembly of four DNA strands into simple, branched DNA nanostructures. The first generation of this assay was used in combination with bacteria that express a fluorescent protein as a proof-of-concept. Here, the assay was optimized to be used with bacteria not possessing a marker gene for a fluorescent protein by staining their genomic DNA with SYBR® Green. For the binding assay, DNA nanostructures were combined with artificially synthesized mannose polymers, typical targets for many lectins on the surface of bacteria, presenting them in a defined constellation to bind bacteria strongly due to multivalent cooperativity. The testing of multiple mannose polymers identified monomeric mannose with a 5’-carbon linker and 1,2-linked dimeric mannose with linker as the best binding candidates for E. coli, presumably due to binding with the FimH protein on the surface. Despite similarities between the FimH proteins of E. coli and K. pneumoniae, binding was only observed between E. coli and the different sugar molecules on DNA structures. Furthermore, the degree of free movement seemed to affect the binding of mannose polymers to targeted proteins, since when utilizing a more flexible DNA nanostructure, an increase in binding could be observed. An alternative to the simple DNA nanostructures described above is the use of larger, more complex DNA origami structures consisting of several hundred strands. DNA origami structures are capable of carrying dozens of modifications at the same time. The results for the DNA origami structure showed a successful functionalization with up to 71 1,2-linked dimeric mannose with linker molecules. These results point towards a solution for the high-throughput analysis of potential binding agents for pathogenic bacteria e.g. as an alternative treatment for antibiotic-resistant.
Cryptorchidism is the most common disorder of sex development in dogs. It describes a failure of one or both testes to descend into the scrotum in due time. It is a heritable multifactorial disease. In this work, selected dogs of a german sheep poodle breed were sequenced with nanopore sequencing and subsequently examined for genetic variations correlating with cryptorchidism. The relationships of the studied dogs were also analyzed and visually processed.
Assessment of COI and 16S for insect species identification ti determine the diet of city bats
(2023)
Despite the numerous benefits of urbanization to human living conditions, urbanization has also negatively affected humans, their environment, and other organisms that share urban habitats with humans. Undoubtedly adverse while some wild animals avoid living in urban areas, others are more tolerant or prefer life in urban habitats. There are more than 1,400 species of bats in the world.
Therefore, they have the potential to contribute significantly to the mammalian biodiversity in urban areas. Insectivorous bats species play a key role in agriculture by improving yields and reducing chemical pesticide costs. Using metabarcoding, it is possible to determine the prey consumed by these noctule mammals based on the DNA fragments in their fecal pellets. This study
aimed to evaluate COI and 16S metabarcodes for insect species identification to determine the diet of metropolitan bats. For this purpose, COI and 16S metabarcodes were extracted, amplified, and sequenced from 65 bat feces collected in the Berlin metropolitan areas. Following a taxonomic annotation, I found that 73% of all identified insects could only be detected using the COI method, while 15% could be recovered using the 16S approach. Just 12% of all detected insects were identified simultaneously by both markers. According to this result, COI is more suitable for the taxonomic identification of insects from bat feces. However, given the bias of COI primers, it is recommended to use both markers for a more precise estimation of species diversity. Additionally,based on the insect species identified, I noticed that urban bats fed mainly on Diptera, Coleoptera,and Lepidoptera. The bat species Nyctalus noctula was most abundant in the samples. His diet analysis revealed that 91% of the samples contained the insect species Chironomus plumosus. 14 pest insect species were also found in his diet.
Our current research aims to establish a complete ribonucleic acid (RNA) production line from plasmid design to purification of in vitro transcribed RNA and labeling of RNA. RNA is the central molecule within the central dogma of molecular biology and is involved in most essential processes within a cell[1]. In many cases, only compact three-dimensional structures of the respective RNA are able to fulfill their function. In this context, RNA tertiary contacts such as kissing loops and pseudoknots are essential to stabilize three-dimensional folding[2]. We will produce a tertiary contact consisting of a kissing loop and a GAAA tetraloop that occurs in eukaryotic ribosomal RNA[3,4]. The RNA sequence is integrated into a vector plasmid. Subsequently, the plasmid is amplified in E. coli. After following plasmid purification steps, the RNA sequence will be transcribed in vitro[5,6]. In order for the RNA be used for Förster resonance energy transfer (FRET) experiments at the single molecule level, fluorescent dyes must be coupled to the RNA molecule[7].
Recently a deep neural network architecture designed to work on graph- structured data have been capturing notice as well as getting implemented in various domains and application. However, learning representation (feature embedding) from graphical data picking pace in research and constructing graph(s) from dataset remains a challenge. The ability to map the data to lower dimensions further makes the task easier while providing comfort in applying many operations. Graph neural network (GNN) is one of the novel neural network models that is catching attention as it is outperforming in various applications like recommender systems, social networks, chemical synthesis, and many more. This thesis discusses a unique approach for a fundamental task on graphs; node classification. The feature embedding for a node is aggregated by applying a Recurrent neural network (RNN), then a GNN model is trained to classify a node with the help of aggregated features and Q learning supports in optimizing the shape of neural networks. This thesis starts with the working principles of the Feedforward neural network, recurrent units like simple RNN, Long short-term memory (LSTM), and Gated recurrent unit (GRU), followed by concepts of Reinforcement learning (RL) and the Q learning algorithm. An overview of the fundamentals of graphs, followed by the GNN architecture and workflow, is discussed subsequently. Some basic GNN models are discussed in brief later before it approaches the technical implementation details, the output of the model, and a comparison with a few other models such as GraphSage and Graph attention network (GAN).
In the past few years, social media has become the most popular communication software, replacing phone calls, text messages, television and even advertisements. Social media has become the most important channel for spreading opinions. As a result of this trend, many politicians have also started to operate social media (Wang, Tsai, & Chen 2019). This study was conducted in order to understand whether there was an intercandidate agenda-setting effect between the Facebook posts of legislative candidates and presidential candidates during the election period, and whether the legislative candidates' Facebook posts were influenced by the presidential candidates' Facebook posts. The target population of this study was the three presidential candidates in Taiwan's 2020 presidential election — Dr. Tsai Ing-Wen, Mr. Han Kuo-Yu, and Mr. James Soong — as well as the 36 legislative candidates in Taipei, Taichung, and Kaohsiung.
The study focused on Facebook posts from 1thNovember 2019 to 10th January 2020, 10 weeks before the voting day. Text-mining and cosine similarity were used to organize the posts and compare the similarity between posts. Finally, the similarity between posts was presented as a line graph.
The study revealed that there was an inter-candidate agenda-setting effect between legislative candidate posts and presidential candidate posts, and that Dr. Tsai Ing-Wen, who was also the incumbent president during the campaign, was the most influential Facebook poster during the entire election.
Future research is proposed on the inter-candidate agenda-setting effect only analyzing the similarity of posts among the candidates to discuss the influence of the candidates' Facebook agenda-setting during a specific election period.
This is the first study in which the Facebook posts of Taiwanese politicians are analyzed and the relationships were analyzed and the relationships were systematically compared, across multiple degrees, which opens up a whole new subject for future elections in Taiwan.
As the cryptocurrency ecosystem rapidly grows, interoperability has become increasingly crucial, enabling assets and data to interact seamlessly across multiple chains. This work describes the concept and implementation of a trustless connection between the Bitcoin Lightning Network and EVM-compatible blockchains, allowing the transfer of assets between the two ecosystems. Establishing such a connection can significantly contribute to the growth of both ecosystems as they can benefit from each other’s advantages and emerge new pos- sibilities.
This thesis investigates the efficacy of four machine learning algorithms, namely linear regression, decision tree, random forest and neural network in the task of lead scoring. Specifically, the study evaluates the performance of these algorithms using datasets without sampling and with random under-sampling and over-sampling using SMOTE. The performance of each algorithm is measure using various performance metrics, including accuracy, AUC-ROC, specificity, sensitivity, precision, recall, F1 score, and G-mean. The results indicate that models trained on the dataset without sampling achieved higher accuracy than those trained on the dataset with either random under-sampling or random over-sampling using SMOTE. However, the neural network demonstrated remarkable results on each dataset compared to the other algorithms. These findings provide valuable insights into the effectiveness of machine learning algorithms for lead scoring tasks, particularly when using different sampling techniques. The findings of this study can aid lead management practices in selecting the most suitable algorithm and sampling technique for their needs. Furthermore, the study contributes to the literature by providing a comprehensive evaluation of the performance of machine learning algorithms for lead scoring tasks. This thesis has practical implications for businesses looking to improve their lead management practices, and future research could extend the analysis to other machine learning algorithms or more extensive datasets.
How Covid-19 impacts the workplace of knowledge workers in a pandemic and post pandemic world
(2021)
The following master thesis covers the topic workplace. The focus lies on the corona pandemic and how the pandemic has affected and will continue to affect the workplaces of knowledge workers. Therefore, the workplace as a research area has been described holistically, followed by the presentation of gathered secondary data and the conducted in depth interviews by the author. The presented secondary data and primary data are agreeing in the workplace how people know it will be changed after the pandemic. The most likely outcome is the hybrid workplace concept which mixes the home office, the office and alternatively third places. For these changes the companies have to be equipped and prepared. The meaning of the office will increase and has to be redesigned in order to meet the needs of the knowledge workers which are coming back to the office eventually.
In machine learning, Learning Vector Quantization (LVQ) is well known as supervised vector quantization. LVQ has been studied to generate optimal reference vectors because of its simple and fast learning algorithm [2]. In many tasks of classification, different variants are considered while training a model and a consideration of variants of large margin in LVQ helps to get significant
results [20]. Large margin LVQ (LMLVQ) is to maximize the distance between decision hyperplane and data points. In this thesis, a comparison of different variants of Generalized Learning Vector Quantization (GLVQ) and Large margin in LVQ is proposed along with visualization, implementation and experimental results.
With the growing market of cryptocurrencies, blockchain is becoming central to various research areas relevant from a mathematical and cryptographic point of view. Moreover, it is capable of transforming the traditional methods involving centralized network operations into decentralized peer-to-peer functionalities. At the same time, it provides an alternative to digital payments in a robust and tamperproof manner by adding the element of cryptography, consequently making it traversable for an individual who is a part of the blockchain network. Furthermore, for a blockchain to be optimal and efficient, it must handle the blockchain trilemma of security, decentralization, and scalability constraints in an effective manner. Algorand, a blockchain cryptocurrency protocol intended to solve blockchain’s trilemma, has been studied and discussed. It is a permissionless (public) blockchain protocol and uses pure proof of stake as its consensus mechanism.
There are multiple ways to gain information about an individual and its health status, but an increasingly popular field in medicine has become the analysis of human breath, which carries a lot of information about metabolic processes within the individuals body. The information in exhaled breath consists of volatile (organic) compounds (VOCs). These VOCs are products of metabolic processes within the individuals body, thus might be an indicator for diseases disturbing those processes. The compounds are to be detected by mass-spectrometric (MS) or ion-mobility spectrometric (IMS) techniques, making the analysis of these compounds not only bounded to exhaled breath. The resulting data is spectral data, capturing concentrations of the VOCs indirectly through intensities. However, a number of about 3000 VOCs [1] could already be determined in human exhaled breath. The number of research paper about VOC-analysis and detection had risen nearly constantly over the last decade 1. Furthermore, the technique to identify VOCs could also be used to capture biomarker from alien species within the individuals body. Extracting VOCs from an individual can be done by non- or minimal invasive techniques. However, the manual identification of VOCs and biomarkers related to a certain disease or infection is not feasible due to the complexity of the sample and often unknown metabolic products, thus automized techniques are needed. [1–4] To establish breath analysis as a diagnosis tool, machine learning methodes could be used. Machine learning has become a popular and common technique when dealing with medical data, due to the rapid analysis. Taking this advantage, breath analysis using machine learning could become the model of choice for diagnosis, keeping in mind that conventional methodes are laboratory based and thus when trying detect bacterial infection need sometimes several days to identify the organism. [5]
In this work, a protocol for portable nanopore sequencing of DNA from pollen collected from honey bees, bumble bees, and wild bees was developed. DNA metabarcoding is applied to identify genera within the mixed DNA samples. The DNA extraction and ITS and ITS2 PCR parameters tested for this purpose were applied to the collected pollen sample and the amplicons were then decoded using the Flongle sequencer adapter from Oxford Nanopore Technologies. It is shown that the main pollinator resources at the different sites can be identified in percentage proportions. The protocol generated in this study can be used for further ecological questions.
Drought is one of the most common and dangerous threats plants have to face, costing the global agricultural sector billions of dollars every year and leading to the loss of tons of harvest. Until people drastically reduce their consumption of animal products or cellular agriculture comes of age, more and more crops will need to be produced to sustain the ever growing human population. Even then, as more areas on earth are becoming prone to drought due to climate change, we may still have to find or breed plant varieties more suitable to grow and prosper in these changing environments.
Plants respond to drought stress with a complex interplay of hormones, transcription factors, and many other functional or regulatory proteins and mapping out this web of agents is no trivial task. In the last two to three decades or so, machine learning has become immensely popular and is increasingly used to find patterns in situations that are too complex for the human mind to overlook. Even though much of the hype is focused on the latest developments in deep learning, relatively simple methods often yield superior results, especially when data is limited and expensive to gather.
This Master Thesis, conducted at the IPK in Gatersleben, develops an approach for shedding light on the phenotypic and transcriptomic processes that occur when a plant is subjected to stress. It centers around a random forest feature selection algorithm and although it is used here to illuminate drought stress response in Arabidopsis thaliana, it can be applied to all kinds of stresses in all kinds of plants.
Genetic sequence variations at the level of gene promoters influence the binding of transcription factors. In plants, this often leads to differential gene expression across natural accessions and crop cultivars. Some of these differences are propagated through molecular networks and lead to macroscopic phenotypes. However, the link between promoter sequence variation and the variation of its activity is not yet well understood. In this project, we use the power of deep learning in 728 genotypes of Arabidopsis thaliana to shed light on some aspects of that link. Convolutional neural networks were successfully implemented to predict the likelihood of a gene being expressed from its promoter sequence. These networks were also capable of highlighting known and putative new sequence motifs causal for the expression of genes. We tested our algorithms in various scenarios, including single and multiple point mutations, as well as indels on synthetic and real promoter sequences and the respective performance characteristics of the algorithm have been estimated. Finally, we showed that the decision boundary to classify genes as expressed and non-expressed depends on the sensitivity of the transcriptome profiling assay and changing it has an impact on the algorithm’s performance.
Data streams change their statistical behaviour over the time. These changes can occur gradually or abruptly with unforeseen reasons, which may effect the expected outcome. Thus it is important to detect concept drift as soon as it occurs. In this thesis we chose distance based methodology to detect presence of concept drift in the data streams. We used generalized learning vector quantization(GLVQ) and generalized matrix learning vector quantization( GMLVQ) classifiers for distance calculation between prototypes and data points. Chi-square and Kolmogorov–Smirnov tests are used to compare the distance distributions of test and train data sets to indicate the drift presence.
Anomaly Detection is a very acute technical problem among various business enterprises. In this thesis a combination of the Growing Neural Gas and the Generalized Matrix Learning Vector Quantization is presented as a solution based on collected theoretical and practical knowledge. The whole network is described and implemented along with references and experimental results. The proposed model is carefully documented and all the further open researching questions are stated for future investigations.
Pollinating insects are of vital importance for the ecosystem and their drastic decline imposes severe consequences for the environment and humankind. The comprehension of their interaction networks is the first step in order to preserve these highly complex systems. For that purpose, the following study describes a protocol for the investigation of honey bee pollen samples from different agro-environmental areas by DNA extraction, PCR amplification and nanopore sequencing of the barcode regions rbcL and ITS. It was shown, that the most abundant species were classified consistently by both DNA barcodes, while species richness was enhanced by single-barcode detection of less abundant species. The analysis of the the different landscape variables exhibited a decline of species richness, Shannon diversity index, and species evenness with increasing organic crop area. However, sampling was only carried out in August and further investigations are suggested to display a more complete picture of honey bee foraging throughout the seasons.
In response to prevailing environmental conditions, Arabidopsis thaliana plants must increase their photosynthetic capacity to acclimate to potential harmful environmental high light stress. In order to measure these changes in acclimation capacity, different high throughput imaging-based methods can be used. In this master thesis we studied different Arabidopsis thaliana knockout mutants-and accessions in their capacity to acclimate to potential harmful environmental high light and cold temperature conditions using a high throughput phenotyping system with an integrated chlorophyll fluorescence measurement system. In order to determine the acclimation capacity, Arabidopsis thaliana knockout mutants of previously not high light assigned genes as well as accessions of two different haplotype groups with a reference and alternative allele from different countries of origin were grown under switching high light and temperature environmental conditions. Photosynthetic analysis showed that knockout mutant plants did differ in their Photosystem II operating efficiency during an increased light irradiance switch but did not significantly differ a week later under the same circumstances from the wildtype. High throughput phenotyping of haplotype accessions revealed significant better acclimation capacity in non-photochemical quenching and steady-state photosynthetic efficiency in Russian domiciled accessions with an altered SPPA gene during high light and cold stress.
Sequences are an important data structure in molecular biology, but unfortunately it is difficult for most machine learning algorithms to handle them, as they rely on vectorial data. Recent approaches include methods that rely on proximity data, such as median and relational Learning Vector Quantization. However, many of them are limited in the size of the data they are able to handle. A standard method to generate vectorial features for sequence data does not exist yet. Consequently, a way to make sequence data accessible to preferably interpretable machine learning algorithms needs to be found. This thesis will therefore investigate a new approach called the Sensor Response Principle, which is being adapted to protein sequences. Accordingly, sequence similarity is measured via pairwise sequence alignments with different sequence alignment algorithms and various substitution matrices. The measurements are then used as input for learning with the Generalized Learning Vector Quantization algorithm. A special focus lies on sequence length variability as it is suspected to affect the sequence alignment score and therefore the discriminative quality of the generated feature vectors. Specific datasets were generated from the Pfam protein family database to address this question. Further, the impact of the number of references and choice of substitution matrices is examined.
In this thesis, we focus on using machine learning to automate manual or rule-based processes for the deduplication task of the data integration process in an enterprise customer experience program. We study the underlying theoretical foundations of the most widely used machine learning algorithms, including logistic regression, random forests, extreme gradient boosting trees, support vector machines, and generalized matrix learning vector quantization. We then apply those algorithms to a real, private data set and use standard evaluation metrics for classification, such as confusion matrix, precision, and recall, area under the precision-recall curve, and area under the Receiver Operating Characteristic curve to compare their performances and results.
Differentiation is ubiquitous in the field of mathematics and especially in the field of Machine learning for calculations in gradient-based models. Calculating gradients might be complex and require handling multiple variables. Supervised Learning Vector Quantization models, which are used for classification tasks, also use the Stochastic Gradient Descent method for optimizing their cost functions. There are various methods to calculate these gradients or derivatives, namely Manual Differentiation, Numeric Differentiation, Symbolic Differentiation, and Automatic Differentiation. In this thesis, we evaluate each of the methods mentioned earlier for calculating derivatives and also compare the use of these methods for the variants of Generalized Learning Vector Quantization algorithms.
Financial fraud for banks can be a reason for huge monetary losses. Studies have shown that, if not mitigated, financial fraud can lead to bankruptcy for big financial institutions and even insolvency for individuals. Credit card fraud is a type of financial fraud that is ever growing. In the future, these numbers are expected to increase exponentially and that’s why a lot of researchers are focusing on machine learning techniques for detecting frauds. This task, however, is not a simple task. There are mainly two reasons
• varying behaviour in committing fraud
• high level of imbalance in the dataset (the majority of normal or genuine cases largely outnumbers the number of fraudulent cases)
A predictive model usually tends to be biased towards the majority of samples, in an unbalanced dataset, when this dataset is provided as an input to a predictive model.
In this Thesis this problem is tackled by implementing a data-level approach where different resampling methods such as undersampling, oversampling, and hybrid strategies along with bagging and boosting algorithmic approaches have been applied to a highly skewed dataset with 492 idetified frauds out of 284,807 transactions.
Predictive modelling algorithms like Logistic Regression, Random Forest, and XGBoost have been implemented along with different resampling techniques to predict fraudulent transactions.
The performance of the predictive models was evaluated based on Receiver Operating CharacteristicArea under the curve (AUC-ROC), Precision Recall Area under the Curve (AUC-PR), Precision, Recall, F1 score metrics.
Embeddings for Product Data
(2022)
The E-commerce industry has grown exponentially in the last decade, with giants like Amazon, eBay, Aliexpress, and Walmart selling billions of products. Machine learning techniques can be used within the e-commerce domain to improve the overall customer journey on a platform and increase sales. Product data, in specific, can be used for various applications, such as product similarity, clustering, recommendation, and price estimation. For data from these products to be used for such applications, we have to perform feature engineering. The idea is to transform these products into feature vectors before training a machine learning model on them. In this thesis, we propose an approach to create representations for heterogeneous product data from Unite’s platform in the form of structured tabular records. These tables consist of attributes having different information ranging from product-ids to long descriptions. Our model combines popular deep learning approaches used in natural language processing to create numerical representations, which contain mostly non-zeros elements in an array or matrix called as dense representation for all products. To evaluate the quality of these feature vectors, we validate how well the similarities between products are captured by these dense representations. The evaluations are further divided into two categories. The first category directly compares the similarities between individual products. On the other hand, the second category uses these dense vectors in any of the above- mentioned applications as inputs. It then evaluates the quality of these dense representation vectors based on the accuracy or performance of the defined application. As result, we explain the impact of different steps within our model on the quality of these learned representations.
Convolutional Neural network (CNN) has been one of most powerful and popular preprocessing techniques employed for image classification problems. Here, we use other signal processing techniques like Fourier transform and wavelet transform to preprocess the images in conjunction with different classifiers like MLP, LVQ, GLVQ and GMLVQ and compare its performance with CNN.
Purpose: The study is aimed to determine the Incentives for German SMEs to offshore their business activities in India and China.
Design: This study is based on quantitative approach. Primary and secondary data is being used in the study. The data was collected from individuals working in different SMEs in Germany, having relative offshoring experience. Theories from the articles, peer reviewed journal along with relevant books were consulted throughout the study.
Findings: The findingssuggest that the benefits and advantages of offshoring strategy in India and China are cost efficiency and technology. Moreover, the challenges that are being faced by the firms while executive offshoring strategy is cultural mix especially language/cultural barriers, security issues and loss of market performance.
Originality and Value: The study on incentives of German SMEs to offshore business activities in India and China enables me to understand why companies are interested in offshoring strategy in low cost countries for expanding their business while evaluating the challenges, merits and demerits of offshoring
Over the past few years, wind and solar power plants have increasingly contributed to energy production. However, due to fluctuating energy sources, the energy production data contain disruption. Such disrupted data lead to the wrong prediction performance, and they need to be estimated by other values. In this thesis, we provide a comparative study to estimate the online disrupted data based on the data of similar groups of power plants, We apply three estimation techniques, e.g., mean, interpolation, and k-nearest neighbor to estimate the disruption on training data. We then apply four clustering algorithms, e.g., k-means, neural gas, hierarchical agglomerative, and affinity propagation, with two similarity measures, e.g., euclidean and dynamic time warping to form groups of power plants and compare the results. Experimental results show that when KNN estimation is applied to data, and neural gas and agglomerative with dtw are used to cluster the data, the cluster quality scores and execution time give better results compared to others. Therefore, we conclude and choose KNN estimation to reconstruct the online disrupted data on each group of a similar power plants.
In the past few years Generative models have become an interesting topic in the field of Machine Learning (ML). Variational Autoencoder (VAE) is one of the popular frameworks of generative models based on the work of D.P Kingma and M. Welling [6] [7]. As an alternative to VAE the authors in [12] proposed and implemented Information Theoretic Learning (ITL) based Autoencoder. VAE and ITL Autoencoder are a combination of the neural networks and probabilistic graphical models (PGM) [7]. In modern statistics it is difficult to compute the approximation ofthe probability densities. In this paper we make use of Variational Inference (VI) technique from machine learning that approximate the distributions through optimization. The closeness between the distributions are measured by the information theoretic divergence measures such as Kullbach-Liebler, Euclidean and Cauchy Schwarz divergences. In this thesis, we study theoretical and experimental results of two different frameworks of generative models which generate images of MNIST handwritten characters [8] and Yale face database B [3]. The results obtained show that the proposed VAE and ITL Autoencoder are capable of generating the underlying structure of the example datasets
Digital data is rising day by day and so is the need for intelligent, automated data processing in daily life. In addition to this, in machine learning, a secure and accurate way to classify data is important. This holds utmost importance in certain fields, e.g. in medical data analysis. Moreover, in order to avoid severe consequences, the accuracy and reliability of the classification are equally important. So if the classification is not reliable, instead of accepting the wrongly classified data point, it is better to reject such a data point. This can be done with the help of some strategies by using them on top of a trained model or including them directly in the objective function of the desired training model. We discuss such strategies and analyze the results on data sets in this thesis.
This scientific work deals with the current opportunities of business development. Purpose of the work is study and analysis of the organization's development strategy and its development. The subject of the study is the mechanism of formation of an organization's development strategy, understanding of business development and its core methodologies and branches. This thesis is based on the operations of the real engineering company and main part of the research could be applied in reality. Main goal of the thesis is to find recommendations on the implementation of strategic changes organization's development strategy.
Applications and Potential Impacts of Blockchain Technology in Logistics and Supply Chain Areas
(2022)
The motive of the present thesis is to analyze the applications and potential impacts of blockchain technology in the logistics and supply chain areas. For this purpose, the literature from different sources has been used to analyze and get an overview of the current status and role of blockchain technology within the logistics and supply chain areas. Different use cases, as well as pilot projects from organizations all over the world and also from Germany, have been included. Suggestions for further applications and implementations of blockchain technology along with their potential impacts have been made. Additionally, the cost of implementing blockchain-based solutions and applications has been estimated along with providing recommendations and suggestions for important and key points to be considered before preparing and deciding to implement blockchain-based solutions in any organization.
Influenza A viruses are responsible for the outbreak of epidemics as well as pandemics worldwide. The surface protein neuraminidase of this virus is responsible, among other things, for the release of virions from the cell and is thus of interest in pharmacological research. The aim of this work is to gain knowledge about evolutionary changes in sequences of influenza A neuraminidase through different methods. First, EVcouplings is used with the goal of identifying evolutionary couplings within the protein sequences, but this analysis was unsuccessful. This is probably due to the great sequence length of neuraminidase. Second, the natural vector method will be used for sequence embedding purposes, in hopes to visualize sequential progression of the virus protein over time. Last, interpretable machine learning methods will be applied to examine if the data is classifiable by the different years and to gain information if the extracted information conform to the results from the EVcouplings analysis. Additionally to using the class label year, other labels such as groups or subtypes are used in classification with varying results. For balanced classes the machine learning models performed adequately, but this was not the case for imbalanced data. Groups and subtypes can be classified with a high accuracy, which was not the case for the years, continents or hosts. To identify the minimal number of features necessary for linear separation of neuraminidase group 1 subtypes, a logistic regression was performed at last, resulting in the identification of 15 combinations of nine amino acid frequencies. Since the sequence embedding as well as the machine learning methods did not show neuraminidase evolution over time, further research is necessary, for example with focus on one subtype with balanced data.
Noise in the oceans is a constantly increasing factor. The growing industrialisation due to shipping, offshore wind parks, seismic studies and other anthropogenic noise is putting the eco system under immense stress. The focus of this thesis is on the assessment of continuous underwater noise from ships. Based on existing strategies in air as well as underwater and a comparison of both an alternative strategy for the assessment of con-tinuous noise from ships is given. The concept developed is based on published, scien-tifically observed responses of animals to ship passes with an indication of an effect range. A model is created to describe the strategy using publicly available data for cargo ships as an example. The results are summarized in maps depicting the affected area for an MRU of the OSPAR II region and the MPA “Borkum Riffgrund”. The strategy is discussed and evaluated on the basis of these results. From this, further improvements and the need for additional information in publicly available data on vessel traffic are derived.
Due to the intractability of the Discrete Logarithm Problem (DLP), it has been widely used in the field of cryptography and the security of several cryptosystems is based on the hardness of computation of DLP. In this paper, we start with the topics on Number Theory and Abstract Algebra as it will enable one to study the nature of discrete logarithms in a comprehensive way, and then, we concentrate on the application and computation of discrete logarithms. Application of discrete logarithms such as Diffie Hellman key exchange, ElGamal signature scheme, and several attacks over the DLP such as Baby-step Giant-step method, Silver Pohlig Hellman algorithm, etc have been analyzed. We also focus on the elliptic curve along with the discrete logarithm over the elliptic curve. Attacks for the elliptic curve discrete logarithm problem, ECDLP have been discussed. Moreover, the extension of several discrete logarithms-based protocols over the elliptic curve such as the elliptic curve digital signature algorithm, ECDSA have been discussed also.
This master thesis covers the topics of Studying customers’ behavior on the example of skin care brand Nivea. There are presented theoretical basis for the following research about marketing, customers’ behavior and conducting marketing research properly. Then, there is the analysis of German market. Since Nivea is the brand of Beiersdorf company, there is a description of Beiersdorf’s activity and operation work. The main idea of the paper work is to analyze customers’ behavior of Nivea. Therefore, the work contains huge research about the brand along with its’ micro- and macroenvironment. There also were conducted an in-depth interview and a survey to understand customers’
current needs. With all the results the author of the work proposed some ideas for Nivea brand.
Probabilistic Micropayments
(2022)
Probabilistic micropayments are important cryptography research topics in electronic commerce. The Probabilistic micropayments have the potential to be researched in order to obtain efficient algorithms with low transaction costs and high speeding computer power. To delve into the topic, it is vital to scrutinize the cryptographic preliminaries such as hash functions and digital signatures. This thesis investigates the important probabilistic methods based on a centralized or decentralized network. Firstly, centralized networks such as lottery-based tickets, Payword, coin-flipping, and MR2 are described, and an approach based on blind signatures is also discussed. Then, decentralized network methods such as MICROPAY3, a transferable scheme on the blockchain network, along with an efficient model for cryptocurrencies, are explained. Then we compare the different probabilistic micropayment methods by improving their drawback with a new technique. To set the results from the theoretical analysis of different methods into some context, we analyze the attacks that reduce the security and, therefore, the system’s efficiency. Particularly, we discuss various methods for detecting double-spending and eclipse attacks occurrence