Refine
Document Type
- Master's Thesis (30)
- Bachelor Thesis (23)
- Conference Proceeding (4)
- Diploma Thesis (1)
- Final Report (1)
Year of publication
Keywords
- Maschinelles Lernen (59) (remove)
Institute
Digital data is rising day by day and so is the need for intelligent, automated data processing in daily life. In addition to this, in machine learning, a secure and accurate way to classify data is important. This holds utmost importance in certain fields, e.g. in medical data analysis. Moreover, in order to avoid severe consequences, the accuracy and reliability of the classification are equally important. So if the classification is not reliable, instead of accepting the wrongly classified data point, it is better to reject such a data point. This can be done with the help of some strategies by using them on top of a trained model or including them directly in the objective function of the desired training model. We discuss such strategies and analyze the results on data sets in this thesis.
This thesis investigates the efficacy of four machine learning algorithms, namely linear regression, decision tree, random forest and neural network in the task of lead scoring. Specifically, the study evaluates the performance of these algorithms using datasets without sampling and with random under-sampling and over-sampling using SMOTE. The performance of each algorithm is measure using various performance metrics, including accuracy, AUC-ROC, specificity, sensitivity, precision, recall, F1 score, and G-mean. The results indicate that models trained on the dataset without sampling achieved higher accuracy than those trained on the dataset with either random under-sampling or random over-sampling using SMOTE. However, the neural network demonstrated remarkable results on each dataset compared to the other algorithms. These findings provide valuable insights into the effectiveness of machine learning algorithms for lead scoring tasks, particularly when using different sampling techniques. The findings of this study can aid lead management practices in selecting the most suitable algorithm and sampling technique for their needs. Furthermore, the study contributes to the literature by providing a comprehensive evaluation of the performance of machine learning algorithms for lead scoring tasks. This thesis has practical implications for businesses looking to improve their lead management practices, and future research could extend the analysis to other machine learning algorithms or more extensive datasets.
Zur automatisierten Planung und Steuerung einer Anlage wird eine über viele Jahre entwickelte und stetig fortschreitende Software der Firma UTIKAL Automation GmbH & Co eingesetzt.
Diese basiert auf „klassischen“ Regeln bzw. Heuristiken zur Steuerung und Kontrolle der Abläufe, z.B. Überprüfung Maschinenbelegung, Verhinderung Kollision zwischen Transportwagen, Abstimmung von Fahrten mehrerer Transportwagen etc. Erzielt werden gute bis sehr gute Produktivität und Durchsätze in einer Anlage, jedoch ist das Ziel dieser Arbeit mittels Einsatz von maschinellem Lernen (Deep Reinforcement Learning) dies noch zu steigern und den Grad an Automatisierung zu erhöhen. Dies betrifft sowohl Produktivität und Durchsatz als auch ein hoffentlich intelligentes Eingreifen in unerwünschten oder unerwarteten Situationen ausgelöst z.B. durch Störungen.
Die vorliegende Bachelorarbeit beschäftigt sich mit maschinellem Lernen im Kontext des autonomen Fahrens. Das Ziel dieser Arbeit ist das Anlernen eines Steuerungsmechanismus eines simulierten Fahrzeugs, auf Grundlage maschineller Lernverfahren, speziell dem Deep Reinforcement Learning. Dazu werden zunächst die Grundlagen des autonomen Fahrens und des maschinellen Lernens geklärt. Mit der Unity-Engine und dem ML-Agents Toolkit wurden Szenen erstellt, in denen Agenten trainiert werden. In verschiedenen Szenen mit unterschiedlichen Komplexitäten und Aufgaben sollen die Agenten lernen ein simuliertes Fahrzeug zu steuern und die jeweilige Aufgabe zu erfüllen. Um das Fahrzeug zu steuern muss der Agent die Längs- und Querführung übernehmen. Die Aufgaben können zum Beispiel anhalten in einem Zielbereich, ausweichen vor Hindernissen oder folgen eines bestimmten Streckenverlaufs umfassen. Die Ergebnisse zeigten, dass es möglich ist ein simuliertes Fahrzeug, mit einem durch Deep Reinforcement Learning angelernten Steuerungsmechanismus, zu steuern. In den meisten Szenen zeigten die Agenten ein gutes Verhalten. Durch die Ergebnisse konnten Erkenntnisse gewonnen werden, welche Faktoren bei Lernvorgängen besonders wichtig sind. Es zeigte sich, dass unter anderem die Wahl einer guten Belohnungsfunktion ausschlaggebend war.
Machine learning models for timeseries have always been a special topic of interest due to their unique data structure. Recently, the introduction of attention improved the capabilities of recurrent neural networks and transformers with respect to their learning tasks such as machine translation. However, these models are usually subsymbolic architectures, making their inner working hard to interpret without comprehensive tools. In contrast, interpretable models such learning vector quantization are more transparent in the ability to interpret their decision process. This thesis tries to merge attention as a machine learning function with learning vector quantization to better handle timeseries data. A design on such a model is proposed and tested with a dataset used in connection with the attention based transformers. Although the proposed model did not yield the expected results, this work outlines improvements for further research on this approach.
Analysis of Continuous Learning Strategies at the Example of Replay-Based Text Classification
(2023)
Continuous learning is a research field that has significantly boosted in recent years due to highly complex machine and deep learning models. Whereas static models need to be retrained entirely from scratch when new data get available, continuous models progressively adapt to new data saving computational resources. In this context, this work analyzes parameters impacting replay-based continuous learning approaches at the example of a data-incremental text classification task using an MLP and LSTM. Generally, it was found that replay improves the results compared to naive approaches but achieves not the performance of a static model. Mainly, the performances increased with more replayed examples, and the number of training iterations has a significant influence as it can partly control the stability-plasticity-trade-off. In contrast, the impact of balancing the buffer and the strategy to select examples to store in the replay buffer were found to have a minor impact on the results in the present case.
In Machine Learning, Learning Vector Quantization(LVQ) is well known as supervised learning method. LVQ has been studied to generate optimal reference vectors because of its simple and fast learning algorithm [12]. In many tasks of classification, different variants of LVQ are considered while training a model. In this thesis, the two variants of LVQ, Generalized Matrix Learning Vector Quantization(GMLVQ) and Generalized Tangent Learning Vector Quantization(GTLVQ) have been discussed. And later, transfer learning technique for different variants of LVQ has been implemented, visualized and we have compared the results using different datasets.
Prototype-based classification methods like Generalized Matrix Learning Vector Quantization (GMLVQ) are simple and easy to implement. An appropriate choice of the activation function plays an important role in the performance of (deep) multilayer perceptrons (MLP) that rely on a non-linearity for classification and regression learning. In this thesis, successful candidates of non-linear activation functions are investigated which are known for MLPs for application in GMLVQ to realize a non-linear mapping. The influence of the non-linear activation functions on the performance of the model with respect to accuracy, convergence rate are analyzed and experimental results are documented.
Crowd-Powered Medical Diagnosis : The Potential of Crowdsourcing for Patients with Rare Diseases
(2023)
With the recent rise in medical crowdsourcing platforms,
patients with chronic illnesses increasingly broadcast their
medical records to obtain an explanation for their complex
health conditions. By providing access to a vast pool of
diverse medical knowledge, crowdsourcing platforms have
the potential to change the way patients receive a medical
diagnosis. We developed a conceptual model that details
a set of variables. To further the understanding of
crowdsourcing as an emerging phenomenon in health care,
we provide a contextualization of the various factors that
drive participants to exert effort. For this purpose, we used
CrowdMed.com as a platform from which we gathered and
examined a unique dataset that involves tasks of diagnosing
rare medical conditions. By promoting crowdsourcing
as a robust and non-discriminatory alternative to seeking
help from traditional physicians, we contribute to the acceptance
and adoption of crowdsourcing services in health
economics.