Refine
Document Type
- Master's Thesis (30)
- Bachelor Thesis (23)
- Conference Proceeding (4)
- Diploma Thesis (1)
- Final Report (1)
Year of publication
Keywords
- Maschinelles Lernen (59) (remove)
Institute
This thesis investigates the efficacy of four machine learning algorithms, namely linear regression, decision tree, random forest and neural network in the task of lead scoring. Specifically, the study evaluates the performance of these algorithms using datasets without sampling and with random under-sampling and over-sampling using SMOTE. The performance of each algorithm is measure using various performance metrics, including accuracy, AUC-ROC, specificity, sensitivity, precision, recall, F1 score, and G-mean. The results indicate that models trained on the dataset without sampling achieved higher accuracy than those trained on the dataset with either random under-sampling or random over-sampling using SMOTE. However, the neural network demonstrated remarkable results on each dataset compared to the other algorithms. These findings provide valuable insights into the effectiveness of machine learning algorithms for lead scoring tasks, particularly when using different sampling techniques. The findings of this study can aid lead management practices in selecting the most suitable algorithm and sampling technique for their needs. Furthermore, the study contributes to the literature by providing a comprehensive evaluation of the performance of machine learning algorithms for lead scoring tasks. This thesis has practical implications for businesses looking to improve their lead management practices, and future research could extend the analysis to other machine learning algorithms or more extensive datasets.
Das Ziel dieser Masterarbeit ist die Evaluierung des Realtime Multi-Person 2D Pose Estimation Frameworks OpenPose. Dazu wird die Forschungsfrage gestellt, bis zu welcher Pixelgröße ein Mensch allgemein von dem System mit einer Sicherheit von über 50% richtig detektiert und dargestellt wird. Um die Forschungsfrage zu beantworten ist eine Studie mit sieben Probanden durchgeführt wurden. Aus der Datenerhebung geht hervor, dass der gesuchte Confidence Value zwischen 110px und 150px Körpergröße in von Menschen digitalen Bildern erreicht wird.
Diese Arbeit behandelt die Herleitung und Verwendung eines alternativen Unähnlichkeitsmaßes im Neural - Gas - Algorithmus. Dabei werden zuerst ausgewählte Algorithmen vorgestellt und in das Feld der Vektorquantisierer eingeordnet. Anschließend wird die sogenannte Tangentenmetrik mathematisch motiviert und vermutete Vorteile gegenüber anderen Metriken anhand künstlich
erzeugten und real existierenden Beispielen experimentell untersucht. Weiterhin werden die Laufzeitkomplexität und beobachtete Limitierungen des neuen Algorithmus näher beleuchtet.
In machine learning, Learning Vector Quantization (LVQ) is well known as supervised vector quantization. LVQ has been studied to generate optimal reference vectors because of its simple and fast learning algorithm [2]. In many tasks of classification, different variants are considered while training a model and a consideration of variants of large margin in LVQ helps to get significant
results [20]. Large margin LVQ (LMLVQ) is to maximize the distance between decision hyperplane and data points. In this thesis, a comparison of different variants of Generalized Learning Vector Quantization (GLVQ) and Large margin in LVQ is proposed along with visualization, implementation and experimental results.
Die vorliegende Arbeit beschäftigt sich mit der KI-gestützten Klassifikation von Flügelbildern verschiedener Spezies der Familie Calliphoridae, auch Schmeißfliege genannt. Hauptziel soll dabei die Klassifikation nach Gattung sowie nach Spezies sein. Außerdem soll eine automatische Landmarkendetektion auf Fliegenflügeln entwickelt werden und anschließend als Merkmalsextraktor für das Klassifikationsmodell dienen. Dabei werden unterschiedliche Methoden der Bildverarbeitung sowie des maschinellen Lernens angewandt, kombiniert und bezüglich der Ergebnisse analysiert und verglichen.
There are multiple ways to gain information about an individual and its health status, but an increasingly popular field in medicine has become the analysis of human breath, which carries a lot of information about metabolic processes within the individuals body. The information in exhaled breath consists of volatile (organic) compounds (VOCs). These VOCs are products of metabolic processes within the individuals body, thus might be an indicator for diseases disturbing those processes. The compounds are to be detected by mass-spectrometric (MS) or ion-mobility spectrometric (IMS) techniques, making the analysis of these compounds not only bounded to exhaled breath. The resulting data is spectral data, capturing concentrations of the VOCs indirectly through intensities. However, a number of about 3000 VOCs [1] could already be determined in human exhaled breath. The number of research paper about VOC-analysis and detection had risen nearly constantly over the last decade 1. Furthermore, the technique to identify VOCs could also be used to capture biomarker from alien species within the individuals body. Extracting VOCs from an individual can be done by non- or minimal invasive techniques. However, the manual identification of VOCs and biomarkers related to a certain disease or infection is not feasible due to the complexity of the sample and often unknown metabolic products, thus automized techniques are needed. [1–4] To establish breath analysis as a diagnosis tool, machine learning methodes could be used. Machine learning has become a popular and common technique when dealing with medical data, due to the rapid analysis. Taking this advantage, breath analysis using machine learning could become the model of choice for diagnosis, keeping in mind that conventional methodes are laboratory based and thus when trying detect bacterial infection need sometimes several days to identify the organism. [5]
In den letzten Jahren tauchten im Internet Videos auf, die Politiker bei sonderbaren Reden und Prominente in pornographischen Filmen zeigten. Dieses Videophänomen bezeichnet die Öffentlichkeit als Deepfakes. Das kommt daher, dass sie in Fakt fake sind, produziert mit Hilfe von „deep learning“ – einer Form von maschinellem Lernen. Viele Leute befürchten das durch Missbrauch dieser Videos vor allem für Fake News ernstzunehmende Folgen haben könne. Für sie ist diese Technologie ein wahr gewordener Albtraum in einer Welt in der Fake Videos Chaos verbreiten. Diese Arbeit versucht sich mit mehreren aufkommenden Software Programmen, die die Verbindung von Sprachsynthese und Filmmanipulation ermöglichen zu beschäftigen. Der Verfasser dieser Arbeit wird positive Anwendungen für die Technologien in Betrachtung ziehen genauso wie die potenziellen negativen Konsequenzen.
Drought is one of the most common and dangerous threats plants have to face, costing the global agricultural sector billions of dollars every year and leading to the loss of tons of harvest. Until people drastically reduce their consumption of animal products or cellular agriculture comes of age, more and more crops will need to be produced to sustain the ever growing human population. Even then, as more areas on earth are becoming prone to drought due to climate change, we may still have to find or breed plant varieties more suitable to grow and prosper in these changing environments.
Plants respond to drought stress with a complex interplay of hormones, transcription factors, and many other functional or regulatory proteins and mapping out this web of agents is no trivial task. In the last two to three decades or so, machine learning has become immensely popular and is increasingly used to find patterns in situations that are too complex for the human mind to overlook. Even though much of the hype is focused on the latest developments in deep learning, relatively simple methods often yield superior results, especially when data is limited and expensive to gather.
This Master Thesis, conducted at the IPK in Gatersleben, develops an approach for shedding light on the phenotypic and transcriptomic processes that occur when a plant is subjected to stress. It centers around a random forest feature selection algorithm and although it is used here to illuminate drought stress response in Arabidopsis thaliana, it can be applied to all kinds of stresses in all kinds of plants.
Genetic sequence variations at the level of gene promoters influence the binding of transcription factors. In plants, this often leads to differential gene expression across natural accessions and crop cultivars. Some of these differences are propagated through molecular networks and lead to macroscopic phenotypes. However, the link between promoter sequence variation and the variation of its activity is not yet well understood. In this project, we use the power of deep learning in 728 genotypes of Arabidopsis thaliana to shed light on some aspects of that link. Convolutional neural networks were successfully implemented to predict the likelihood of a gene being expressed from its promoter sequence. These networks were also capable of highlighting known and putative new sequence motifs causal for the expression of genes. We tested our algorithms in various scenarios, including single and multiple point mutations, as well as indels on synthetic and real promoter sequences and the respective performance characteristics of the algorithm have been estimated. Finally, we showed that the decision boundary to classify genes as expressed and non-expressed depends on the sensitivity of the transcriptome profiling assay and changing it has an impact on the algorithm’s performance.
Data streams change their statistical behaviour over the time. These changes can occur gradually or abruptly with unforeseen reasons, which may effect the expected outcome. Thus it is important to detect concept drift as soon as it occurs. In this thesis we chose distance based methodology to detect presence of concept drift in the data streams. We used generalized learning vector quantization(GLVQ) and generalized matrix learning vector quantization( GMLVQ) classifiers for distance calculation between prototypes and data points. Chi-square and Kolmogorov–Smirnov tests are used to compare the distance distributions of test and train data sets to indicate the drift presence.