Refine
Document Type
- Master's Thesis (14)
- Bachelor Thesis (4)
Year of publication
- 2022 (18) (remove)
Language
- English (18) (remove)
Keywords
- Maschinelles Lernen (7)
- Kryptologie (3)
- Blockchain (2)
- Vektorquantisierung (2)
- Virtuelle Währung (2)
- Algorithmus (1)
- Biene <Gattung> (1)
- Biotechnologie (1)
- DNA Barcoding (1)
- DNS (1)
Institute
- Angewandte Computer‐ und Biowissenschaften (18) (remove)
With the growing market of cryptocurrencies, blockchain is becoming central to various research areas relevant from a mathematical and cryptographic point of view. Moreover, it is capable of transforming the traditional methods involving centralized network operations into decentralized peer-to-peer functionalities. At the same time, it provides an alternative to digital payments in a robust and tamperproof manner by adding the element of cryptography, consequently making it traversable for an individual who is a part of the blockchain network. Furthermore, for a blockchain to be optimal and efficient, it must handle the blockchain trilemma of security, decentralization, and scalability constraints in an effective manner. Algorand, a blockchain cryptocurrency protocol intended to solve blockchain’s trilemma, has been studied and discussed. It is a permissionless (public) blockchain protocol and uses pure proof of stake as its consensus mechanism.
The aim of this bachelor thesis is to find out how the use of artificial intelligence, specifically the one used in combat situations, can increase the playing time or even the replay value of games in the action role-playing genre. Thereby, it focuses mainly on combat situations between a player and an artificial intelligence.
To begin with, this bachelor thesis examines the action role-playing genre in order to find a suitable definition for it. Accordingly, action role-playing games involve titles that send the player on a hero’s journey-like adventure in which they must prove their skills in combat against virtual opponents. The greatest challenge of these real-time battles comes from the required quick reflexes, skill queries and hand-eye coordination.
Next, six means of increasing the replayability of a game are explored: Experience and Nostalgia, Variety and Randomness, Goals and Completion, Difficulty, Learning, and Social Aspect. The paper then proceeds to give an explanation for the term Artificial Intelligence and examines the various methods used to create intelligent behavior as well as the general advancement of the research field. Special attention is given to the implementation methods of Finite State Machines and Behavior Trees, as they are the most widely used methods for creating behavioral patterns of virtual characters.
Finally, a study conducted as part of the bachelor thesis is described, which compares a mathematically balanced artificial intelligence with a behaviorally balanced one in terms of game performance regarding the willingness of test subjects to purchase and play through the game as well as its replay value. The thesis concludes with the findings that while the behavioral approach is more promising than the mathematical approach, a combination of the two methods ultimately leads to the best outcome. Furthermore, the study shows that the use of artificial intelligence to individualize gaming experiences is promising for the future of the gaming industry.
Pollinating insects are of vital importance for the ecosystem and their drastic decline imposes severe consequences for the environment and humankind. The comprehension of their interaction networks is the first step in order to preserve these highly complex systems. For that purpose, the following study describes a protocol for the investigation of honey bee pollen samples from different agro-environmental areas by DNA extraction, PCR amplification and nanopore sequencing of the barcode regions rbcL and ITS. It was shown, that the most abundant species were classified consistently by both DNA barcodes, while species richness was enhanced by single-barcode detection of less abundant species. The analysis of the the different landscape variables exhibited a decline of species richness, Shannon diversity index, and species evenness with increasing organic crop area. However, sampling was only carried out in August and further investigations are suggested to display a more complete picture of honey bee foraging throughout the seasons.
Sequences are an important data structure in molecular biology, but unfortunately it is difficult for most machine learning algorithms to handle them, as they rely on vectorial data. Recent approaches include methods that rely on proximity data, such as median and relational Learning Vector Quantization. However, many of them are limited in the size of the data they are able to handle. A standard method to generate vectorial features for sequence data does not exist yet. Consequently, a way to make sequence data accessible to preferably interpretable machine learning algorithms needs to be found. This thesis will therefore investigate a new approach called the Sensor Response Principle, which is being adapted to protein sequences. Accordingly, sequence similarity is measured via pairwise sequence alignments with different sequence alignment algorithms and various substitution matrices. The measurements are then used as input for learning with the Generalized Learning Vector Quantization algorithm. A special focus lies on sequence length variability as it is suspected to affect the sequence alignment score and therefore the discriminative quality of the generated feature vectors. Specific datasets were generated from the Pfam protein family database to address this question. Further, the impact of the number of references and choice of substitution matrices is examined.
In this thesis, we focus on using machine learning to automate manual or rule-based processes for the deduplication task of the data integration process in an enterprise customer experience program. We study the underlying theoretical foundations of the most widely used machine learning algorithms, including logistic regression, random forests, extreme gradient boosting trees, support vector machines, and generalized matrix learning vector quantization. We then apply those algorithms to a real, private data set and use standard evaluation metrics for classification, such as confusion matrix, precision, and recall, area under the precision-recall curve, and area under the Receiver Operating Characteristic curve to compare their performances and results.
Differentiation is ubiquitous in the field of mathematics and especially in the field of Machine learning for calculations in gradient-based models. Calculating gradients might be complex and require handling multiple variables. Supervised Learning Vector Quantization models, which are used for classification tasks, also use the Stochastic Gradient Descent method for optimizing their cost functions. There are various methods to calculate these gradients or derivatives, namely Manual Differentiation, Numeric Differentiation, Symbolic Differentiation, and Automatic Differentiation. In this thesis, we evaluate each of the methods mentioned earlier for calculating derivatives and also compare the use of these methods for the variants of Generalized Learning Vector Quantization algorithms.
Embeddings for Product Data
(2022)
The E-commerce industry has grown exponentially in the last decade, with giants like Amazon, eBay, Aliexpress, and Walmart selling billions of products. Machine learning techniques can be used within the e-commerce domain to improve the overall customer journey on a platform and increase sales. Product data, in specific, can be used for various applications, such as product similarity, clustering, recommendation, and price estimation. For data from these products to be used for such applications, we have to perform feature engineering. The idea is to transform these products into feature vectors before training a machine learning model on them. In this thesis, we propose an approach to create representations for heterogeneous product data from Unite’s platform in the form of structured tabular records. These tables consist of attributes having different information ranging from product-ids to long descriptions. Our model combines popular deep learning approaches used in natural language processing to create numerical representations, which contain mostly non-zeros elements in an array or matrix called as dense representation for all products. To evaluate the quality of these feature vectors, we validate how well the similarities between products are captured by these dense representations. The evaluations are further divided into two categories. The first category directly compares the similarities between individual products. On the other hand, the second category uses these dense vectors in any of the above- mentioned applications as inputs. It then evaluates the quality of these dense representation vectors based on the accuracy or performance of the defined application. As result, we explain the impact of different steps within our model on the quality of these learned representations.
Studying and understanding the metabolism of plants is essential to better adapt them to future climate conditions. Computational models of plant metabolism can guide this process by providing a platform for fast and resource-saving in silico analyses. The reconstruction of these models can follow kinetic or stoichiometric approaches with Flux Balance Analysis being one of the most common one for stoichiometric models. Advances in metabolic modelling over the years include the increasing number of compartments, the automation of the reconstruction process, the modelling of plant-environment interactions and genetic variants or temporally and spatially resolved models. In addition, there is a growing focus on introducing synthetic pathways in plants to increase their agricultural potential regarding yield, growth and nutritional value. One example is the β-hydroxyaspartate cycle (BHAC) to bypass photorespiration. After the implementation in a stoichiometric C3 plant model, in silico flux analyses can help to understand the resulting metabolic changes. When comparing with in vivo experiments with BHAC plants, the metabolic model can reproduce most results with exceptions regarding growth and oxaloacetate. To evaluate whether the BHAC is suitable to establish a synthetic C4 cycle, the pathway is implemented in a two-cell type model that is capable of running a C4 cycle. The results show that the BHAC is only beneficial under light limitation in the bundle sheath cell. An additional engineering target for improved performance of plants is malate synthase. This work serves as the basis for further analyses combining the different factors boosting the advantages of the BHAC and for in vivo experiments in C3 and C4 plants.
In the past few years Generative models have become an interesting topic in the field of Machine Learning (ML). Variational Autoencoder (VAE) is one of the popular frameworks of generative models based on the work of D.P Kingma and M. Welling [6] [7]. As an alternative to VAE the authors in [12] proposed and implemented Information Theoretic Learning (ITL) based Autoencoder. VAE and ITL Autoencoder are a combination of the neural networks and probabilistic graphical models (PGM) [7]. In modern statistics it is difficult to compute the approximation ofthe probability densities. In this paper we make use of Variational Inference (VI) technique from machine learning that approximate the distributions through optimization. The closeness between the distributions are measured by the information theoretic divergence measures such as Kullbach-Liebler, Euclidean and Cauchy Schwarz divergences. In this thesis, we study theoretical and experimental results of two different frameworks of generative models which generate images of MNIST handwritten characters [8] and Yale face database B [3]. The results obtained show that the proposed VAE and ITL Autoencoder are capable of generating the underlying structure of the example datasets
Digital data is rising day by day and so is the need for intelligent, automated data processing in daily life. In addition to this, in machine learning, a secure and accurate way to classify data is important. This holds utmost importance in certain fields, e.g. in medical data analysis. Moreover, in order to avoid severe consequences, the accuracy and reliability of the classification are equally important. So if the classification is not reliable, instead of accepting the wrongly classified data point, it is better to reject such a data point. This can be done with the help of some strategies by using them on top of a trained model or including them directly in the objective function of the desired training model. We discuss such strategies and analyze the results on data sets in this thesis.