Refine
Document Type
- Master's Thesis (78)
- Bachelor Thesis (41)
- Diploma Thesis (1)
Year of publication
Language
- English (120) (remove)
Keywords
- Maschinelles Lernen (23)
- Blockchain (9)
- Vektorquantisierung (9)
- Algorithmus (7)
- Bioinformatik (5)
- Deep learning (5)
- Graphentheorie (5)
- Neuronales Netz (5)
- Kryptologie (4)
- Softwareentwicklung (4)
- Videospiel (4)
- Virtuelle Währung (4)
- Bildgebendes Verfahren (3)
- Biotechnologie (3)
- DNA Barcoding (3)
- Künstliche Intelligenz (3)
- Lernendes System (3)
- RNS (3)
- Sequenzanalyse <Chemie> (3)
- Vektor (3)
- Zeitreihe (3)
- Bildverarbeitung (2)
- Biomarker (2)
- Biomedizin (2)
- Bitcoin (2)
- CRISPR/Cas-Methode (2)
- Cluster-Analyse (2)
- DNS (2)
- Internet der Dinge (2)
- Kryptorchismus (2)
- Nanopartikel (2)
- Objekterkennung (2)
- Ackerbohne (1)
- Ammoniumverbindungen (1)
- Amyloid (1)
- Anämie (1)
- Biene <Gattung> (1)
- Biochemie (1)
- Biomarker , Krebs <Medizin> (1)
- Biometrie (1)
- Blender <Programm> (1)
- Bodenorganismus (1)
- COVID-19 (1)
- Cluster , Cluster-Analyse (1)
- Cluster <Datenanalyse> (1)
- Codierungstheorie (1)
- Computerforensik (1)
- Computersicherheit (1)
- Cyanide (1)
- DNS , Geschlechtsbestimmung (1)
- Datenbank (1)
- Datenbanksystem (1)
- Diskreter Logarithmus (1)
- Dokumentverarbeitung (1)
- Dürrestress (1)
- Echtzeitsystem (1)
- Eigenwertproblem (1)
- Electronic Commerce (1)
- Elektrizitätserzeugung (1)
- Elektrostimulation , Stammzelle , Knochenbildung (1)
- Epidemiologie (1)
- Erweiterte Realität <Informatik> (1)
- Erzaufbereitung (1)
- Extraktion (1)
- Feuchtgebiet (1)
- Fledermäuse (1)
- Fluoreszenz-Resonanz-Energie-Transfer (1)
- Fluoreszenzmarkierung (1)
- Forschung (1)
- Generative Adversarial Network (1)
- Genexpression (1)
- Gesichtserkennung (1)
- Glucosinolate , Kreuzblütler , Proteine , Hydrolysat (1)
- Golderz (1)
- Graph (1)
- Identitätsverwaltung (1)
- Immunologische Diagnostik (1)
- Influenza-A-Virus (1)
- Kind (1)
- Klimaänderung (1)
- Kontrolltheorie , Stabilität , Steuerungstheorie (1)
- Kryptoanalyse (1)
- Kryptosystem (1)
- Kugelspalt (1)
- Kulturpflanzen (1)
- Landwirtschaft (1)
- Lebensraum (1)
- Linearer Code (1)
- Lungenentzündung (1)
- Malaria (1)
- Maschinelles Sehen (1)
- Mathematisches Modell (1)
- Medizin (1)
- Messenger-RNS (1)
- Metrik <Mathematik> (1)
- Mikrofinanzierung (1)
- Mikroorganismus (1)
- Nanostruktur (1)
- Numerische Mathematik (1)
- Optische Spektroskopie (1)
- Oxidation (1)
- Pandemie (1)
- Passwort (1)
- Pathogene Bakterien (1)
- Patient (1)
- Peer-to-Peer-Netz (1)
- Pestizid (1)
- Pflanzen (1)
- Photorezeptor , Netzhautdegeneration (1)
- Photosynthese (1)
- Planung (1)
- Polynom (1)
- Polynom , Graphentheorie (1)
- Polysaccharide (1)
- Programmierung (1)
- Projektmanagement (1)
- Projektplanung (1)
- Prostatakrebs (1)
- Proteinbiosynthese (1)
- Proteine (1)
- Proteinfaltung (1)
- Realistische Computergrafik (1)
- Regularisierung (1)
- Rollenspiel (1)
- Satellitenfunk (1)
- Satellitentechnik (1)
- Social Media (1)
- Software (1)
- Spaltströmung (1)
- Stickstoffverbindungen (1)
- Stochastisches Modell (1)
- Stoffwechsel (1)
- Systemmedizin (1)
- Transkriptionsfaktor (1)
- Tutte-Polynom (1)
- Umweltbelastung (1)
- Vector Association (1)
- Virtuelle Realität (1)
- Wahrscheinlichkeitsrechnung (1)
- Wildtiere (1)
- Zeitreihe , Vektor , Hankel-Matrix (1)
- Zeitreihenanalyse (1)
- Zeitreise (1)
- Zufallsgraph (1)
- Ökosystem (1)
Institute
- Angewandte Computer‐ und Biowissenschaften (120) (remove)
Drought is one of the most common and dangerous threats plants have to face, costing the global agricultural sector billions of dollars every year and leading to the loss of tons of harvest. Until people drastically reduce their consumption of animal products or cellular agriculture comes of age, more and more crops will need to be produced to sustain the ever growing human population. Even then, as more areas on earth are becoming prone to drought due to climate change, we may still have to find or breed plant varieties more suitable to grow and prosper in these changing environments.
Plants respond to drought stress with a complex interplay of hormones, transcription factors, and many other functional or regulatory proteins and mapping out this web of agents is no trivial task. In the last two to three decades or so, machine learning has become immensely popular and is increasingly used to find patterns in situations that are too complex for the human mind to overlook. Even though much of the hype is focused on the latest developments in deep learning, relatively simple methods often yield superior results, especially when data is limited and expensive to gather.
This Master Thesis, conducted at the IPK in Gatersleben, develops an approach for shedding light on the phenotypic and transcriptomic processes that occur when a plant is subjected to stress. It centers around a random forest feature selection algorithm and although it is used here to illuminate drought stress response in Arabidopsis thaliana, it can be applied to all kinds of stresses in all kinds of plants.
There are multiple ways to gain information about an individual and its health status, but an increasingly popular field in medicine has become the analysis of human breath, which carries a lot of information about metabolic processes within the individuals body. The information in exhaled breath consists of volatile (organic) compounds (VOCs). These VOCs are products of metabolic processes within the individuals body, thus might be an indicator for diseases disturbing those processes. The compounds are to be detected by mass-spectrometric (MS) or ion-mobility spectrometric (IMS) techniques, making the analysis of these compounds not only bounded to exhaled breath. The resulting data is spectral data, capturing concentrations of the VOCs indirectly through intensities. However, a number of about 3000 VOCs [1] could already be determined in human exhaled breath. The number of research paper about VOC-analysis and detection had risen nearly constantly over the last decade 1. Furthermore, the technique to identify VOCs could also be used to capture biomarker from alien species within the individuals body. Extracting VOCs from an individual can be done by non- or minimal invasive techniques. However, the manual identification of VOCs and biomarkers related to a certain disease or infection is not feasible due to the complexity of the sample and often unknown metabolic products, thus automized techniques are needed. [1–4] To establish breath analysis as a diagnosis tool, machine learning methodes could be used. Machine learning has become a popular and common technique when dealing with medical data, due to the rapid analysis. Taking this advantage, breath analysis using machine learning could become the model of choice for diagnosis, keeping in mind that conventional methodes are laboratory based and thus when trying detect bacterial infection need sometimes several days to identify the organism. [5]
The games industry has significantly grown over the last 30 years. Projects are getting bigger and more expensive, making it essential to plan, structure and track them more efficiently.
The growth of projects has increased the administrative workload for producers, project managers and leads, as they have to monitor and control the progress of the project in order to keep a permanent overview of the project. This is often accompanied by a lack of insight into the project and basic communication within the team. Therefore, the goal of this thesis is to enhance conventional project management methods with process structures that occur frequently in game development.
This thesis initially elaborates on what project management in the game industry actually is: Here, methods are considered, especially agile methods and progress tracking prac-tices, which were created for software development and have become a standard in game development. Subsequently, an example is used to demonstrate how process management can function within the development of video games. Based on this, the ideal is depicted, which is implemented and used in a tool at the German games studio KING Art GmbH. This ideal is compared with expert interviews in order to verify its gen-eral validity in the industry.
By integrating process structures, the administrative effort can be reduced, communica-tion within game development can be simplified, while the current project status can be permanently presented. This benefits both project management and leads, as well as the entire team. Further application tests of this theory would have to be organized to check scalability and to draw comparisons to other applications.
Due to the intractability of the Discrete Logarithm Problem (DLP), it has been widely used in the field of cryptography and the security of several cryptosystems is based on the hardness of computation of DLP. In this paper, we start with the topics on Number Theory and Abstract Algebra as it will enable one to study the nature of discrete logarithms in a comprehensive way, and then, we concentrate on the application and computation of discrete logarithms. Application of discrete logarithms such as Diffie Hellman key exchange, ElGamal signature scheme, and several attacks over the DLP such as Baby-step Giant-step method, Silver Pohlig Hellman algorithm, etc have been analyzed. We also focus on the elliptic curve along with the discrete logarithm over the elliptic curve. Attacks for the elliptic curve discrete logarithm problem, ECDLP have been discussed. Moreover, the extension of several discrete logarithms-based protocols over the elliptic curve such as the elliptic curve digital signature algorithm, ECDSA have been discussed also.
Cancer is one of the main causes of death in developed countries, and cancer treatment heavily depends on successful early detection and diagnosis. Tumor biomarkers are helpful for early diagnose. The goal of this discovery method is to identify genetic variations as well as changes in gene expression or activity that can be linked to a typical cancer state.
First, several cancer gene signaling pathways were introduced and then combined. 27 candidate genes were selected, through the analysis of several data sets in the GEO database, a few expression difference matrices were established. Those candidate genes were tested in the matrices and found five genes PLA1A, MMP14, CCND1, BIRC5 and MYC that have the potential to be tumor biomarkers. Two of these genes have been further discussed, PLA1A is a potential biomarker for prostate cancer, and MMP14 can be considered as a biomarker for NSC lung cancer.
Finally, the significance of this study and the potential value of the two genes are discussed, and the future research in this direction is a prospect.
The Blockchain is a technology which has the capabilities to change the way, the world operates. As promising as this may be, there are still many challenges which do not exist or are way simpler to solve in conventional software solutions. Services which are offered over the blockchain suffer from so called Block-confirmation-times where the customer simply has to wait till the transaction is confirmed. In this paper possible solutions to that problem will be examined and challenges that arise from the specific criteria of the Ethereum Blockchain will be analyzed.
Machine learning models for timeseries have always been a special topic of interest due to their unique data structure. Recently, the introduction of attention improved the capabilities of recurrent neural networks and transformers with respect to their learning tasks such as machine translation. However, these models are usually subsymbolic architectures, making their inner working hard to interpret without comprehensive tools. In contrast, interpretable models such learning vector quantization are more transparent in the ability to interpret their decision process. This thesis tries to merge attention as a machine learning function with learning vector quantization to better handle timeseries data. A design on such a model is proposed and tested with a dataset used in connection with the attention based transformers. Although the proposed model did not yield the expected results, this work outlines improvements for further research on this approach.
Analysis of Continuous Learning Strategies at the Example of Replay-Based Text Classification
(2023)
Continuous learning is a research field that has significantly boosted in recent years due to highly complex machine and deep learning models. Whereas static models need to be retrained entirely from scratch when new data get available, continuous models progressively adapt to new data saving computational resources. In this context, this work analyzes parameters impacting replay-based continuous learning approaches at the example of a data-incremental text classification task using an MLP and LSTM. Generally, it was found that replay improves the results compared to naive approaches but achieves not the performance of a static model. Mainly, the performances increased with more replayed examples, and the number of training iterations has a significant influence as it can partly control the stability-plasticity-trade-off. In contrast, the impact of balancing the buffer and the strategy to select examples to store in the replay buffer were found to have a minor impact on the results in the present case.
Stability of control systems is one of the central subjects in control theory. The classical asymptotic stability theorem states that the norm of the residual between the state trajectory and the equilibrium is zero in limit. Unfortunately, it does not in general allow computing a concrete rate of convergence particularly due to algorithmic uncertainty which is related to numerical imperfections of floating-point arithmetic. This work proposes to revisit the asymptotic stability theory with the aim of computation of convergence rates using constructive analysis which is a mathematical tool that realizes equivalence between certain theorems and computation algorithms. Consequently, it also offers a framework which allows controlling numerical imperfections in a coherent and formal way. The overall goal of the current study also matches with the trend of introducing formal verification tools into the control theory. Besides existing approaches, constructive analysis, suggested within this work, can also be considered for formal verification of control systems. A computational example is provided that demonstrates extraction of a convergence certificate for example dynamical systems.
Analysis of the Forensic Preparation of Biometric Facial Features for Digital User Authentication
(2023)
Biometrics has become a popular method of securing access to data as it eliminates the need for users to remember a password. Although exploiting the vulnerabilities of biometric systems increased with their usage, these could also be helpful during criminal casework.
This thesis aims to evaluate approaches to bypass electronic devices with forged faces to access data for law enforcement. Here, obtaining the necessary data in a timely manner is critical. However, unlocking the devices with a password can take several years with a brute force attack. Consequently, biometrics could be a quicker alternative for unlocking.
Various approaches were examined to bypass current face recognition technologies. The first approaches included printing the user's face on regular paper and aimed to unlock devices performing face recognition in the visible spectrum. Further approaches consisted of printing the user's infrared image and creating three-dimensional masks to bypass devices performing face recognition in the near-infrared. Additionally, the underlying software responsible for face recognition was reverse-engineered to get information about its operation mode.
The experiments demonstrate that forged faces can partly bypass face recognition and obtain secured data. Devices performing face recognition in the visible spectrum can be unlocked with a printed image of the user's face. Regarding devices with advanced near-infrared face recognition, only one could be bypassed with a three-dimensional face mask. In addition, its underlying software provided evidence about the demands of face recognition. Other devices under attack remained locked, and their software provided no clues.
Abstract nicht vorhanden
In this thesis, we focus on using machine learning to automate manual or rule-based processes for the deduplication task of the data integration process in an enterprise customer experience program. We study the underlying theoretical foundations of the most widely used machine learning algorithms, including logistic regression, random forests, extreme gradient boosting trees, support vector machines, and generalized matrix learning vector quantization. We then apply those algorithms to a real, private data set and use standard evaluation metrics for classification, such as confusion matrix, precision, and recall, area under the precision-recall curve, and area under the Receiver Operating Characteristic curve to compare their performances and results.
As new sensors are added to VR headsets, more data can be collected. This introduces a new potential threat to user privacy. We focused on the feasibility of extracting personal information from eye-tracking. To achieve this, we designed a preliminary user study focusing on the pupil response to audio stimuli. We used a variation of machine learning models to test the collected data to determine the feasibility of obtaining information such as the age or gender of the participant. Several of the experiments show promise for obtaining this information. We were able to extract with reasonable certainty whether caffeine was consumed and the gender of the participant. This demonstrates the unknown threat that embedded sensors pose to users. A further studies are planned to verify the results.
Many companies use machine learning techniques to support decision-making and automate business processes by learning from the data that they have. In this thesis we investigate the theory behind the most widely used in practice machine learning algorithms for solving classification and regression problems.
In particular, the following algorithms were chosen for the classification problem: Logistic Regression, Decision Trees, Random Forest, Support Vector Machine (SVM), Learning Vector Quantization (LVQ). As for the regression problem, Decision Trees, Random Forest and Gradient Boosted Tree were used. We then apply those algorithms to real company data and compare their performances and results.
Assessment of COI and 16S for insect species identification ti determine the diet of city bats
(2023)
Despite the numerous benefits of urbanization to human living conditions, urbanization has also negatively affected humans, their environment, and other organisms that share urban habitats with humans. Undoubtedly adverse while some wild animals avoid living in urban areas, others are more tolerant or prefer life in urban habitats. There are more than 1,400 species of bats in the world.
Therefore, they have the potential to contribute significantly to the mammalian biodiversity in urban areas. Insectivorous bats species play a key role in agriculture by improving yields and reducing chemical pesticide costs. Using metabarcoding, it is possible to determine the prey consumed by these noctule mammals based on the DNA fragments in their fecal pellets. This study
aimed to evaluate COI and 16S metabarcodes for insect species identification to determine the diet of metropolitan bats. For this purpose, COI and 16S metabarcodes were extracted, amplified, and sequenced from 65 bat feces collected in the Berlin metropolitan areas. Following a taxonomic annotation, I found that 73% of all identified insects could only be detected using the COI method, while 15% could be recovered using the 16S approach. Just 12% of all detected insects were identified simultaneously by both markers. According to this result, COI is more suitable for the taxonomic identification of insects from bat feces. However, given the bias of COI primers, it is recommended to use both markers for a more precise estimation of species diversity. Additionally,based on the insect species identified, I noticed that urban bats fed mainly on Diptera, Coleoptera,and Lepidoptera. The bat species Nyctalus noctula was most abundant in the samples. His diet analysis revealed that 91% of the samples contained the insect species Chironomus plumosus. 14 pest insect species were also found in his diet.
The GeoFlow II experiment aims to replicate Earth’s core dynamics using a rotating spherical container with controlled temperature differences and simulated gravity. During the GeoFlow II campaign, a massive dataset of images was collected, necessitating an automated system for image processing and fluid flow visualization in the northern hemisphere of the spherical container. From here, we aim to detect the special structures appearing on the post processed images. Recognizing YOLOv5’s proficiency in object detection, we apply Yolov5 model for this task.
Gold cyanidation is a process by which gold is removed from low-grade ore. Due to its efficiency it has found widespread application around the world, including Peru. The process requires free cyanide in high concentration. After the gold extraction is completed, free cyanide as well as metal cyanide complexes remain in the effluent of gold mines and refineries. Often these effluents are kept in storage ponds where they pose considerable risk to health and environ-ment. Thus, it is preferable to degrade cyanide to minimize the risk of exposure. In the context of this thesis cyanide degradation was explored in a UV-light based prototype. Degradation with a combination of hydrogen peroxide and UV-light has proven to be very effective at degrading cyanide concentrations of 100 mg/L and 1000 mg/L. Furthermore, the presence of ammonia as a degradation product could also be confirmed. Membrane distillation may provide an alternative to cyanide destruction in the form of cyanide recovery. Promising results were gathered from several membrane experiment.
Die biologische Ammoniumoxidation ist ein zentraler Bestandteil des globalen Stickstoffkreislaufs. Angesichts der extremen Massen Stickstoff anthropogenen Ursprungs in der Umwelt, liegt die Entfernung reaktiven Stickstoffs im Interesse der Umwelt und der öffentlichen Gesundheit. In der folgenden Arbeit werden Bedingungen zur anaeroben Ammoniumoxidation mit Nitrat in einem Anammox-Reaktor untersucht. Dabei wurden 2 Laborreaktoren für eine Zeit von insgesamt 116 Tagen betrieben und beobachtet, die ausschließlich als Elektronendonatoren und Akzeptoren Ammonium und Nitrat enthielten. Zusätzlich wurden Batchkulturen mit Zellen eines Reaktors angezüchtet und auf ihre Gaszusammensetzung abhängig unterschiedlicher Eigenschaften untersucht. Hierbei wurde eine Reihe unterschiedlicher analytischer Quantifizierungsmethoden genutzt und es konnte gezeigt werden, dass ein Abbau unter den Bedingungen stattfindet.
Die aktuelle Forschung zu dieser Reaktion ist spärlich und verleiht der Bachelorarbeit dadurch Relevanz.
As the cryptocurrency ecosystem rapidly grows, interoperability has become increasingly crucial, enabling assets and data to interact seamlessly across multiple chains. This work describes the concept and implementation of a trustless connection between the Bitcoin Lightning Network and EVM-compatible blockchains, allowing the transfer of assets between the two ecosystems. Establishing such a connection can significantly contribute to the growth of both ecosystems as they can benefit from each other’s advantages and emerge new pos- sibilities.
In response to prevailing environmental conditions, Arabidopsis thaliana plants must increase their photosynthetic capacity to acclimate to potential harmful environmental high light stress. In order to measure these changes in acclimation capacity, different high throughput imaging-based methods can be used. In this master thesis we studied different Arabidopsis thaliana knockout mutants-and accessions in their capacity to acclimate to potential harmful environmental high light and cold temperature conditions using a high throughput phenotyping system with an integrated chlorophyll fluorescence measurement system. In order to determine the acclimation capacity, Arabidopsis thaliana knockout mutants of previously not high light assigned genes as well as accessions of two different haplotype groups with a reference and alternative allele from different countries of origin were grown under switching high light and temperature environmental conditions. Photosynthetic analysis showed that knockout mutant plants did differ in their Photosystem II operating efficiency during an increased light irradiance switch but did not significantly differ a week later under the same circumstances from the wildtype. High throughput phenotyping of haplotype accessions revealed significant better acclimation capacity in non-photochemical quenching and steady-state photosynthetic efficiency in Russian domiciled accessions with an altered SPPA gene during high light and cold stress.
We investigate the folding and thermodynamic stability of a tertiary contact of baker's yeast ribosomal ribonucleic acid (rRNA), which is supposed to be essential for the maturation process of ribosomes in eukaryotes at lower temperatures1. Ribosomes are cellular machines essential for all living organisms. RNA is at the center of these machines and responsible for translation of genetic information into proteins2,3. Only recently, the rRNA tertiary contact of interest was discovered in Zurich by the research group of Vikram Govind Panse. Gerhardy et al.1 showed in vitro that within the 60s-preribosome under defined metal ion concentrations the tertiary contact become visible between a GAAA-tetraloop and a kissing loop motif. Our aim is now to understand this RNA structure, especially the formation of the rRNA tertiary contact, in terms of thermodynamics and kinetics at various experimental conditions, such as temperature and metal ion concentration of K(I), Na(I) and Mg(II). Therein, we use optical spectroscopy like UV/VIS spectroscopy and ensemble Förster or Fluorescence Resonance Energy Transfer (FRET) folding studies. Our findings will help to further characterize this newly discovered ribosomal RNA contact and to elucidate its function within the ribosomal maturation process.
In today’s market, the process of dealing with textual data for internal and external processes has become increasingly important and more complex for certain companies. In this context,the thesis aims to support the process of analysis of similarities among textual documents by analyzing relationships among them. The proposed analysis process includes discovering similarities among these financial documents as well as possible patterns. The proposal is based on the exploitation and extension of already existing approaches as well as on their combination with well-known clustering analysis techniques. Moreover, a software tool has been implemented for the evaluation of the proposed approach, and experimented on the EDGAR filings, on the basis of qualitative criteria.
It is possible to obtain a common updating rule for k-means and Neural Gas algorithms by using a generalized Expectation Maximization method. This result is used to derive two variants of these methods. The use of a similarity measure, specifically the gaussian function, provides another clustering alternative to the before mentioned methods. The main benefit of using the gaussian function is that it inherently looks for a common cluster center for similar data points (depending on the value of the parameter s ). In different experiments we report similar behaviour of batch and proposed variants. Also we show some useful results for the “alternative” similarity method, specifically when there is no clue about the number of clusters in the data sets.
In this paper, we conduct experiments to optimize the learning rates for the Generalized Learning Vector Quantization (GLVQ) model. Our approach leverages insights from cog- nitive science rooted in the profound intricacies of human thinking. Recognizing that human-like thinking has propelled humankind to its current state, we explore the applica- bility of cognitive science principles in enhancing machine learning. Prior research has demonstrated promising results when applying learning rate methods inspired by cognitive science to Learning Vector Quantization (LVQ) models. In this study, we extend this approach to GLVQ models. Specifically, we examine five distinct cognitive science-inspired GLVQ variants: Conditional Probability (CP), Dual Factor Heuristic (DFH), Middle Symmetry (MS), Loose Symmetry (LS), and Loose Symme- try with Rarity (LSR). Our experiments involve a comprehensive analysis of the performance of these cogni- tive science-derived learning rate techniques across various datasets, aiming to identify optimal settings and variants of cognitive science GLVQ model training. Through this research, we seek to unlock new avenues for enhancing the learning process in machine learning models by drawing inspiration from the rich complexities of human cognition. Keywords: machine learning, GLVQ, cognitive science, cognitive bias, learning rate op- timization, optimizers, human-like learning, Conditional Probability (CP), Dual Factor Heuristic (DFH), Middle Symmetry (MS), Loose Symmetry (LS), Loose Symmetry with Rarity (LSR).
Convolutional Neural network (CNN) has been one of most powerful and popular preprocessing techniques employed for image classification problems. Here, we use other signal processing techniques like Fourier transform and wavelet transform to preprocess the images in conjunction with different classifiers like MLP, LVQ, GLVQ and GMLVQ and compare its performance with CNN.
Adversarial robustness of a nearest prototype classifier assures safe deployment in sensitive use fields. Much research has been conducted on artificial neural networks regarding their robustness against adversarial attacks, whereas nearest prototype classifiers have not chalked similar successes. This thesis presents the learning dynamics and numerical stability regarding the Crammer-normalization and the Hein-normalization for adversarial robustness of nearest prototype classifiers. Results of conducted experiments are penned down and analyzed to ascertain the bounds given by Saralajew et al. and Hein et al. for adversarial robustness of nearest prototype classifiers.
Differentiation is ubiquitous in the field of mathematics and especially in the field of Machine learning for calculations in gradient-based models. Calculating gradients might be complex and require handling multiple variables. Supervised Learning Vector Quantization models, which are used for classification tasks, also use the Stochastic Gradient Descent method for optimizing their cost functions. There are various methods to calculate these gradients or derivatives, namely Manual Differentiation, Numeric Differentiation, Symbolic Differentiation, and Automatic Differentiation. In this thesis, we evaluate each of the methods mentioned earlier for calculating derivatives and also compare the use of these methods for the variants of Generalized Learning Vector Quantization algorithms.
In the past few years Generative models have become an interesting topic in the field of Machine Learning (ML). Variational Autoencoder (VAE) is one of the popular frameworks of generative models based on the work of D.P Kingma and M. Welling [6] [7]. As an alternative to VAE the authors in [12] proposed and implemented Information Theoretic Learning (ITL) based Autoencoder. VAE and ITL Autoencoder are a combination of the neural networks and probabilistic graphical models (PGM) [7]. In modern statistics it is difficult to compute the approximation ofthe probability densities. In this paper we make use of Variational Inference (VI) technique from machine learning that approximate the distributions through optimization. The closeness between the distributions are measured by the information theoretic divergence measures such as Kullbach-Liebler, Euclidean and Cauchy Schwarz divergences. In this thesis, we study theoretical and experimental results of two different frameworks of generative models which generate images of MNIST handwritten characters [8] and Yale face database B [3]. The results obtained show that the proposed VAE and ITL Autoencoder are capable of generating the underlying structure of the example datasets
Studying and understanding the metabolism of plants is essential to better adapt them to future climate conditions. Computational models of plant metabolism can guide this process by providing a platform for fast and resource-saving in silico analyses. The reconstruction of these models can follow kinetic or stoichiometric approaches with Flux Balance Analysis being one of the most common one for stoichiometric models. Advances in metabolic modelling over the years include the increasing number of compartments, the automation of the reconstruction process, the modelling of plant-environment interactions and genetic variants or temporally and spatially resolved models. In addition, there is a growing focus on introducing synthetic pathways in plants to increase their agricultural potential regarding yield, growth and nutritional value. One example is the β-hydroxyaspartate cycle (BHAC) to bypass photorespiration. After the implementation in a stoichiometric C3 plant model, in silico flux analyses can help to understand the resulting metabolic changes. When comparing with in vivo experiments with BHAC plants, the metabolic model can reproduce most results with exceptions regarding growth and oxaloacetate. To evaluate whether the BHAC is suitable to establish a synthetic C4 cycle, the pathway is implemented in a two-cell type model that is capable of running a C4 cycle. The results show that the BHAC is only beneficial under light limitation in the bundle sheath cell. An additional engineering target for improved performance of plants is malate synthase. This work serves as the basis for further analyses combining the different factors boosting the advantages of the BHAC and for in vivo experiments in C3 and C4 plants.
In machine learning, Learning Vector Quantization (LVQ) is well known as supervised vector quantization. LVQ has been studied to generate optimal reference vectors because of its simple and fast learning algorithm [2]. In many tasks of classification, different variants are considered while training a model and a consideration of variants of large margin in LVQ helps to get significant
results [20]. Large margin LVQ (LMLVQ) is to maximize the distance between decision hyperplane and data points. In this thesis, a comparison of different variants of Generalized Learning Vector Quantization (GLVQ) and Large margin in LVQ is proposed along with visualization, implementation and experimental results.
In this work, we identify similarities between Adversarial Examples and Counterfactual Explanations, extend already stated differences from previous works to other fields of AI such as dimensionality, transferability etc. and try to observe these similarities and differences in different classifier with tabular and image data. We note that this topic is an open discussion and the work here isn’t definite and canbe further extended or modified in the future, if new discoveries found.
A relatively new research field of neurosciences, called Connectomics, aims to achieve a full understanding and mapping of neural circuits and fine neuronal structures of the nervous system in a variety of organisms. This detailed information will provide insight in how our brain is influenced by different genetic and psychiatric diseases, how memory traces are stored and ageing influences our brain structure. It is beyond question that new methods for data acquisition will produce large amounts of neuronal image data. This data will exceed the zetabyte range and is impossible to annotate manually for visualization and analysis. Nowadays, machine learning algorithms and specially deep convolutional neuronal networks are heavily used in medical imaging and computer vision, which brings the opportunity of designing fully automated pipelines for image analysis. This work presents a new automated workflow based on three major parts including image processing using consecutive deep convolutional networks, a pixel-grouping step called connected components and 3D visualization via neuroglancer to achieve a dense three dimensional reconstruction of neurons from EM image data.
In this master thesis, we define a new bivariate polynomial which we call the defensive alliance polynomial and denote it by da(G; x; y). It is a generalization of the alliance polynomial and the strong alliance polynomial. We show the relation between da(G; x; y) and the alliance, the strong alliance, the induced connected subgraph polynomials as well as the cut vertex sets polynomial. We investigate information encoded about G in da(G; x; y). We discuss the defensive alliance polynomial for the path graphs, the cycle graphs, the star graphs, the double star graphs, the complete graphs, the complete bipartite graphs, the regular graphs, the wheel graphs, the open wheel graphs, the friendship graphs, the triangular book graphs and the quadrilateral book graphs. Also, we prove that the above classes of graphs are characterized by its defensive alliance polynomial. We present the defensive alliance polynomial of the graph formed of attaching a vertex to a complete graph. We show two pairs of graphs which are not characterized by the alliance polynomial but characterized by the defensive alliance polynomial.
Also, we present three notes on results in the literature. The first one is improving a bound and the other two are counterexamples.
In this thesis two novel methods for removing undesired background illumination are de-veloped. These include a wavelet analysis based approach and an enhancement of a deep learning method. These methods have been compared with conventional methods, using real confocal microscopy images and synthetic generated microscopy images. These synthetic images were created utilizing a generator introduced in this thesis.
In the field of satellites it is common practice to combine multiple ground stations into one network, to increase communication times with satellites. This work focuses on TIM, which is an international academic colaborative project. Important criteria for this project are elaborated and used to evaluate existing ground station networks. It concludes that there is no appropriate solution availiable for this specific use case and establish a proposed solution. The proposed ground station network software will be elaborated and evaluated.
Traditional user management on the Internet has historically required individuals to give up control over their identities. In contrast, decentralized solutions promise to empower users and foster decentralized interactions. Over the last few years, the development of decentralized accounts and tokens has significantly increased, aiming at broader user adoption and shared social economies.
This thesis delves into smart contract standards and social infrastructure for Ethereum-based blockchains to enable identity-based data exchange between abstracted blockchain accounts. In this regard, the standardization landscapes of account and social token developments were analyzed in-depth to form guidelines that allow users to retain complete control over their data and grant access selectively.
Based on the evaluations, a pioneering Solidity standard is presented, natively integrating consensual restrictive on-chain assets for abstracted blockchain accounts. Further, the architecture of a decentralized messaging service has been defined to outline how new token and account concepts can be intertwined with efficient and minimal data-sharing principles to ensure security and privacy, while merging traditional server environments with global ledgers.
The objective of this Bachelor Project is the creation of a tool that should support forensic investigators during IT forensic interventions. It uses Kismet as the base program and adds functionalities to it via the plugin interface. The installation of the plugin shall be explained, how the plugin works, and a recommendation on how to use it. To understand the underlying basics, an introduction about WLAN and Bluetooth is given. The tests that were performed with the new plugin are described as well as their results. It is therefore briefly discussed why the tool is applicable for locating Wi-Fi devices, especially access points, but not Bluetooth devices. Using all this a few ideas on how to improve the tool and what can be researched in this area are provided.
Object detection and classification is active field of research inmachine learning and computervision. Depending on the application there are different limitations to adjust to, but also possibilities to take advantage of. In my thesis, We focus on classification and detection of video sequence during night-time and the proposed method is robust since it does use image thresholding [8] which is commonly use in other methods and the thesis uses histograms of oriented gradients (HOG) [37] as features and support vector machine (SVM) [74] as classifier. It is of great importance that the extracted features from the images should be robust and distinct enough to help the classifier distinguish between high-beam and a low-beam. The classifier is part of the object detection which predicts whether or not a testing image matches one group or the other. In our case that is predicting whether or not an image belongs to high or low-beam sequence.
In the present bachelor thesis, nanopore sequencing and Illumina sequencing was compared using pollen DNA collected from honeybees and bumble bees. Therefore, nanopore sequencing was performed with the MinION sequencers and the generated reads were analysed with bash programming. A quantitative and qualitative (based on ITS2 sequences) BLAST run was performed. The results confirme the error probability of nanopore sequencing that is described in the literature. Nevertheless, with both sequencing methods similar sample preferences of the bees could have been observed, allowing ecological conclusions.
Classification label security determines the extent to which predicted labels from classification results can be trusted. The uncertainty surrounding classification labels is resolved by the security to which the classification is made. Therefore, classification label security is very significant for decision-making whenever we are encountered with a classification task. This thesis investigates the determination of the classification label security by utilizing fuzzy probabilistic assignments of Fuzzy c-means. The investigation is accompanied by implementation, experimentation, visualization and documentation of the results.
This thesis deals with the development of a methodology / concept to analyse targeted attacks against IIoT / IoT devices. Building on the established background knowledge about honeypots, fileless malware and injection techniques a methodology is created that leads to a concept of a honeypot analyzation system. The system is created to analyse and detect novel threats like fileless attacks which are often utilized by Advanced Persistent Threats. That system is partially implemented and later evaluated by performing a simulated attack utilizing fileless attacks. The effectiveness is discussed and rated based on the results.
In this work, a protocol for portable nanopore sequencing of DNA from pollen collected from honey bees, bumble bees, and wild bees was developed. DNA metabarcoding is applied to identify genera within the mixed DNA samples. The DNA extraction and ITS and ITS2 PCR parameters tested for this purpose were applied to the collected pollen sample and the amplicons were then decoded using the Flongle sequencer adapter from Oxford Nanopore Technologies. It is shown that the main pollinator resources at the different sites can be identified in percentage proportions. The protocol generated in this study can be used for further ecological questions.
The endogen steroid hormone 17b-estradiol is a central player in a wide range of physiologic, behavioral processes and diseases in vertebrates. As a consequence, it is a main target for molecular design and drug discovery efforts in medicine and environmental sciences, which requires in-depth knowledge of protein-ligand binding processes. This work develops a bioinformatic framework based on local and global structure similarity for the characterization of E2-protein interactions in all 35 publicly available three-dimensional structures of estradiol-protein complexes. Subsequently, it uses gained data to identify four geometrically conserved estradiol binding residue motifs, against which the Protein Data Bank is queried. As result of this database query, 15 hits present in seven protein structures are found. Five of these structures do not contain E2 as ligand and had thus not been included in this work’s initial data set. One of these newly detected structures is structurally and functionally dissimilar, as well as evolutionarily distant from all other proteins analyzed in this work. Nevertheless, the ability of this protein to actually bind estradiol must be further analyzed. Finally, geometrically conserved E2-protein interactions are identified and a new research direction using these conserved interaction ensembles for the detection of novel estradiol targets is proposed.
Data streams change their statistical behaviour over the time. These changes can occur gradually or abruptly with unforeseen reasons, which may effect the expected outcome. Thus it is important to detect concept drift as soon as it occurs. In this thesis we chose distance based methodology to detect presence of concept drift in the data streams. We used generalized learning vector quantization(GLVQ) and generalized matrix learning vector quantization( GMLVQ) classifiers for distance calculation between prototypes and data points. Chi-square and Kolmogorov–Smirnov tests are used to compare the distance distributions of test and train data sets to indicate the drift presence.
DropConnect (the generalization of Dropout) is a very simple regularization technique that was introduced a few years ago and has become extremely popular because of its simplicity and effectiveness. In this thesis, a suitable architecture for applying DropConnect to Learning Vector Quantization networks is proposed along with a reference implementation and experimental results. Inmany classification tasks, the uncertainty of themodel is a vital piece of information for experts. Methods to extract the uncertainty and stability using DropConnect are also proposed and the corresponding experimental results are documented.
Large bone defects are a major clinical problem affecting elderly disproportionally, particularly indeveloped countries where this population is the fastest growing. Current treatments include autologous and allogenous bone grafts, bone elongation with the Ilizarov technique, bone graft substitutes, and electrical stimulation. Each of these approaches enjoys varying degrees of success, however, each also has its associated problems and complications. A new, still experimental, treatment is Tissue Engineering that combines scaffolds, osteogenic stem cells and growth factors, and is showing encouraging early results in preclinical and initial clinical studies.
Electrical stimulation has been shown to enhance bone healing by promoting mesenchymal stem cell migration, proliferation, and differentiation. In the present study we combine Tissue Engineering with Electrical Stimulation and hypothesize that this combined approach will have a synergistic effect resulting in enhanced new bone formation. In our in vitro experiments we observed that the levels of electrical stimulation we tested had no cytotoxic effect, instead increased osteogenic differentiation, as determined by enhanced expression of the osteogenic marker, Alkaline Phosphatase. These findings support our hypothesis by demonstrating that in the tissue-engineering environment electrical stimulation promotes bone formation. The bioinformatics part of this project consisted of gene network analysis, identification of the top 10 osteogenic markers and analyzis of genegene interactions. We observed that in studies of stem cells from both human and rat the genes, BMPR1A, BMP5, TGFßR1, SMAD4, SMAD2, BMP4, BMP7, RUNX3, and CDKN1A, are associated with osteogenesis and interact with each other. We observed a total of 31 interactions for human and 29 interactions for rat stem cells. While this approach needs to be proven experimentally, we believed that these in vitro and in silico analyses could compliment each other and in doing so contribute to the field of bone healing research.
Embeddings for Product Data
(2022)
The E-commerce industry has grown exponentially in the last decade, with giants like Amazon, eBay, Aliexpress, and Walmart selling billions of products. Machine learning techniques can be used within the e-commerce domain to improve the overall customer journey on a platform and increase sales. Product data, in specific, can be used for various applications, such as product similarity, clustering, recommendation, and price estimation. For data from these products to be used for such applications, we have to perform feature engineering. The idea is to transform these products into feature vectors before training a machine learning model on them. In this thesis, we propose an approach to create representations for heterogeneous product data from Unite’s platform in the form of structured tabular records. These tables consist of attributes having different information ranging from product-ids to long descriptions. Our model combines popular deep learning approaches used in natural language processing to create numerical representations, which contain mostly non-zeros elements in an array or matrix called as dense representation for all products. To evaluate the quality of these feature vectors, we validate how well the similarities between products are captured by these dense representations. The evaluations are further divided into two categories. The first category directly compares the similarities between individual products. On the other hand, the second category uses these dense vectors in any of the above- mentioned applications as inputs. It then evaluates the quality of these dense representation vectors based on the accuracy or performance of the defined application. As result, we explain the impact of different steps within our model on the quality of these learned representations.
Aminoacyl-tRNA synthetases (aaRSs) are key enzymes in the process of protein biosynthesis, charging tRNA molecules with their corresponding amino acid. Whereas adenosine phosphate fixation is common to all aaRSs, recognition of the respective amino acid to ensure correct translation poses a complex task, which is still not understood to its full extent. Using all aaRS structures in the Protein Data Bank (PDB), this thesis reveals further details about the specificitydetermining interactions of each aaRS. Moreover, inspection of the similarities between these enzymes using the structure-derived interaction data reinforces the sequence-based evolutionary trace of aaRSs to a certain degree: The concurrent development of two distinct Classes of aaRS is apparent at functional level, and previously determined evolutionary subclasses coincide altogether with specific aminoacyl recognition in each aaRS Type. Still, discrimination of amino acids in aaRSs involves a multitude of further relevant mechanisms. Eventually, analysis of specificity-relevant binding site interactions sheds light on how aaRS evolved to distinguish different amino acids.
In this work, the task is to cluster microarray gene expression data of the cyanobacterium Nostoc PCC 7120 for detection of messenger RNA (mRNA) degradation patterns. Searched are characteristic patterns of degradation which are caused by specific enzymes (ribonucleases) allowing a further biological investigation regarding biochemical mechanisms. The mRNA degradation is part of the regulation of gene expression because it regulates the amount and longevity of mRNA, which is available for translation into proteins. A particular class of RNA degrading enzymes are exoribonucleases which degrade the molecule from its ends, whereby a degradation from the 5’ end, the 3’ end or from both ends is theoretically possible.
In this investigation, the information about exoribonucleolytic degradation is given in a microarray data set containing gene expression values of 1,251 genes. The data set provides gene expression vectors containing the expression values of up to ten short distinct sections of a gene ordered from the genes 5’ end to its 3’ end. For each gene, expression vectors are available for both nitrogen fixing and non-nitrogen fixing conditions, which have to be considered separately due to biological reasons. Accordingly, after filtering and preprocessing, two datasets for clustering are obtained consisting of 133 ten-dimensional expression vectors. The similarity of the expression vectors is judged by a newly correlation based similarity measure and compared with the results obtained by use of the Euclidean distance. A non-linear transformation of the correlations was applied to obtain a dissimilarity measure. By choice of parameters within this transformation a user specific differentiation between negative and positive correlated gene expression vectors and an adequate adjustment regarding the noise level of gene expression values is possible.
Clustering was performed using Affinity Propagation (AP). The number of clusters obtained by AP depends on the so-called self-similarity for the data vectors. This dependence was used to identify stable cluster solutions by self-similarity control. To evaluate the clustering results, Median Fuzzy c-Means (M-FCM) was used. Further, several cluster validity measures are applied and visual inspections by t-distributed Stochastic Neighbor Embedding (t-SNE) as well as cluster visualization are provided for mathematical interpretation analysis of clusters.
To validate the clustering results biologically, the found data structure is checked for biological adequacy. A deeper investigation into the mechanisms behind mRNA-degradation was achieved by use of a RNA-Seq data set. Contained 40 (base pair) bp long reads for non-nitrogen fixing and nitrogen fixing conditions were assembled using bacteria-specific ab-initio assembly of Rockhopper. Thus, mRNA (transcript)-sequences of the clustered genes are obtained. A further investigation of the untranslated regions (UTRs) is performed here due to the assumption that exoribonucleases recognize specific transcript-sequences outside of the annotated gene regions as their binding sites. These UTRs need to be analyzed regarding sequence similarity using motif-finding algorithms.
To investigate the effects of climate change on interactions within ecosystems, a microcosm experiment was conducted. The effects of temperature increase and predator diversity on Collembola communities and their decomposition rate were investigated. The predators used were mites and Chilopods, whose predation effects on several response variables were analysed. This data included Collembola abundance, biomass and body mass as well as basal respiration and microbial biomass carbon. These response variables were tested against the predictors in several models. Temperature showed high significance in interaction with mite abundance in almost all models. Furthermore, the results of the basal respiration and microbial biomass carbon support the suggestion of a trophic cascade within the animal interaction.
This thesis comprehensively explores factors contributing to malaria-induced anemia and severe malarial anemia (SMA). The study utilizes a comprehensive dataset to investigate immunological interactions, genetic variations, and temporal dynamics. Findings highlight the complex interplay between immune markers, genetic traits, and cohort-specific influences. Notably, age, HIV status, and genetic variations emerge as crucial factors influencing anemia risk. The incorporation of Poisson regression models sheds light on the genetic underpinnings of SMA, emphasizing the need for personalized interventions. Overall, this research provides valuable insights into the multifaceted nature of malaria-induced complications, paving the way for further molecular investigations and targeted interventions.
The aim of this bachelor thesis was to establish extracytoplasmic function (ECF) σ factors as synthetic genetic regulators for biotechnological and synthetic biology applications in the new emerging model organism Vibrio natriegens. Therefore, synthetic genetic circuits were engineered on plasmids as test set-up for the investigated ECFs and their target promoters. The resulting plasmid library consisted of the reporter plasmids with the target promoter, fused to a lux cassette, a set of high-copy ECF plasmids and a backup set of lower-copy ECF plasmids. First, the high-copy plasmids were transformed in V. natriegens to test them for their functionality upon different inducer levels, which yielded good inducibility for few, but showed too high ECF-expression in most strains. For this reason, the set of lower copy plasmids was used for combinatorial co-transformation, to investigate the ECFs for their cross-talk to unspecific ECF target promoters. The switching to the lower-copy plasmid-set seemed to be partly helpful, while still much room for fine-tuning of the circuits remains. The knowledge gained can be used to achieve higher success rates when engineering synthetic circuits for various applications in V. natriegens, by using the ECFs here recommended as suitable synthetic genetic regulators.
Fermat proposed fermat’s little theorem in 1640, but a proof was not officially published until 1736. In this thesis paper, we mainly focus on different proofs of fermat’s little theorem like combinatorial proof by counting necklaces, multinomial proofs, proof by modular arithmetic, dynamical systems proof, group theory proof etc. We also concentrate on the generalizations of fermat’s little theorem given by Euler and Laplace. Euler was the first scientist to prove the fermat’s little theorem. We will also go through three different proofs given by Euler for fermat’s little theorem. This theorem has many applications in the field of mathematics and cryptography. We focus on applications of fermat’s little theorem in cryptography like primality testing and publickey cryptography. Primality test is used to determine if the given number n is a prime number or composite number. In this paper, we also concentrate on fermat primality test and Miller-Rabin primality test, which is an extension of fermat primality test. We also discuss the most widely used public-key cryptosystem i.e, the RSA Algorithm, named after its developers R. Rivest, A. Shamir, and L. Adleman. The algorithm was invented in 1978 and depends heavily on fermat’s little theorem.
Financial fraud for banks can be a reason for huge monetary losses. Studies have shown that, if not mitigated, financial fraud can lead to bankruptcy for big financial institutions and even insolvency for individuals. Credit card fraud is a type of financial fraud that is ever growing. In the future, these numbers are expected to increase exponentially and that’s why a lot of researchers are focusing on machine learning techniques for detecting frauds. This task, however, is not a simple task. There are mainly two reasons
• varying behaviour in committing fraud
• high level of imbalance in the dataset (the majority of normal or genuine cases largely outnumbers the number of fraudulent cases)
A predictive model usually tends to be biased towards the majority of samples, in an unbalanced dataset, when this dataset is provided as an input to a predictive model.
In this Thesis this problem is tackled by implementing a data-level approach where different resampling methods such as undersampling, oversampling, and hybrid strategies along with bagging and boosting algorithmic approaches have been applied to a highly skewed dataset with 492 idetified frauds out of 284,807 transactions.
Predictive modelling algorithms like Logistic Regression, Random Forest, and XGBoost have been implemented along with different resampling techniques to predict fraudulent transactions.
The performance of the predictive models was evaluated based on Receiver Operating CharacteristicArea under the curve (AUC-ROC), Precision Recall Area under the Curve (AUC-PR), Precision, Recall, F1 score metrics.
Our current research aims to establish a complete ribonucleic acid (RNA) production line from plasmid design to purification of in vitro transcribed RNA and labeling of RNA. RNA is the central molecule within the central dogma of molecular biology and is involved in most essential processes within a cell[1]. In many cases, only compact three-dimensional structures of the respective RNA are able to fulfill their function. In this context, RNA tertiary contacts such as kissing loops and pseudoknots are essential to stabilize three-dimensional folding[2]. We will produce a tertiary contact consisting of a kissing loop and a GAAA tetraloop that occurs in eukaryotic ribosomal RNA[3,4]. The RNA sequence is integrated into a vector plasmid. Subsequently, the plasmid is amplified in E. coli. After following plasmid purification steps, the RNA sequence will be transcribed in vitro[5,6]. In order for the RNA be used for Förster resonance energy transfer (FRET) experiments at the single molecule level, fluorescent dyes must be coupled to the RNA molecule[7].
The epithelial membrane proteins (EMP1-3), which belong to the family of peripheral myelin proteins 22-kDa (PMP22), are involved in epithelial differentiation. EMP2 was found to be a downstream target gene of the tumor suppressor gene HOPX, a homeobox-containing gene. Additionally, a dysregulation of EMP2 has been observed in various cancers, but the function of EMP2 in human lung cancer has not yet been clarified.
In this study, a real-time RT-PCR, Western blot and cytoblock analysis were performed to analyze the expression of EMP2. Gain-of-function was achieved by stable transfection with an EMP2 expression vector and loss-of-function by siRNA knockdown. Stable transfection led to overexpression of EMP2 at both mRNA and protein levels in the transfected cell lines H1299 and H2170.
Functional assays including proliferation, colony formation, migration and invasion assays as well as cell cycle analyzes were performed after stable transfection and it was found that the ectopic EMP2 expression resulted in a reduced cell proliferation, migration and invasion as well as a G1 cell cycle arrest. After the EMP2 gene was silenced by the siRNA knockdown, inhibition of the cell invasive property was observed. These phenomena were accompanied by reduced AKT, mTor and p38 activities.
Taken together, the data suggest that the epithelial membrane protein 2 (EMP2) is a tumor suppressor and exerts its tumor suppressive function by inhibiting AKT and MAPK signaling pathways in human lung cancer cells.
Simulating complex physical systems involves solving nonlinear partial differential equations (PDEs), which can be very expensive. Generative Adversarial Networks (GAN) has recently been used to generate solutions to PDEs-governed complex systems without having to numerically solve them.
However, concerns are raised that the standard GAN system cannot capture some important physical and statistical properties of a complex PDE-governed system, along side with other concerns for difficult and unstable training, the noisy appearance of generated samples and lack of robust assessment methods of the sample quality apart from visual examination. In this thesis, a standard GAN system is trained on a data set of Heat transfer images. We show that the generated data set can capture the true distribution of training data with respect to both visual and statistical properties, specifically the vertical statistical profile. Furthermore, we construct a GAN model which can be conditioned using variance-induced class label. We show that the variance threshold t = 0. 01 constructs a good conditional class label, such that the generated images achieve 96% accuracy
rate in complying with the given conditions.
In the following study we evaluated capabilities of how a simple autoencoder can be used to trainGeneralized Learning Vector Quantization classifier. Specifically, we proved that the bottlenecks of an autoencoder serve as an "information filter" which tries to best represent the desired output in that particular layer in the statistical sense of mutual information.
Autoencoder model was trained for purely unsupervised task and leveraged the advantages by learning feature representations. As a result, the model got the significant value of the accuracy. Implementation and tuning of the model was carried out using Tensor Flow [1].
An extra study has been dedicated to improve traditional GLVQ algorithm taken from sklearn-lvg [2] using the bottleneck from an autoencoder.
The study has revealed potential of bottlenecks of an autoencoder as pre-processing tool in improving the accuracy of GLVQ. Specifically, the model was capable to identify 75% improvements of accuracy in GLVQ comparing to original one, which has about 62%. Consequently, the research exposed the need for further improvement of the model in the present problem case.
In this work, a transgenic zebrafish line that expresses the fluorophore dsRed under the endogenous zebrafish cochlin promotor is supposed to be established, using the CRISPR/Cas9 system. dsRed was cloned into a pBluescript vector, followed by the cloning of the cochlin locus into this vector. This bait construct was then supposed to be micro injected into wild type AB zebrafish embryos. The micro injection of Cas9 mRNA, single guide RNA and a bait construct was practiced with the tyrosinase gene, which was disrupted using CRISPR/Cas9.
Pollinating insects are of vital importance for the ecosystem and their drastic decline imposes severe consequences for the environment and humankind. The comprehension of their interaction networks is the first step in order to preserve these highly complex systems. For that purpose, the following study describes a protocol for the investigation of honey bee pollen samples from different agro-environmental areas by DNA extraction, PCR amplification and nanopore sequencing of the barcode regions rbcL and ITS. It was shown, that the most abundant species were classified consistently by both DNA barcodes, while species richness was enhanced by single-barcode detection of less abundant species. The analysis of the the different landscape variables exhibited a decline of species richness, Shannon diversity index, and species evenness with increasing organic crop area. However, sampling was only carried out in August and further investigations are suggested to display a more complete picture of honey bee foraging throughout the seasons.
Genetic sex determination of ancient DNA samples based on one simple mathematical algorithm, which considers the number of mapped reads on autosomal, X, and Y chromosomes. The algorithm is implemented in one command line tool - SiD. SiD is used to deter-mine the sex of 16 samples, which have been shotgun sequenced and captured with a 1240k panel.
Cryptorchidism is the most common disorder of sex development in dogs. It describes a failure of one or both testes to descend into the scrotum in due time. It is a heritable multifactorial disease. In this work, selected dogs of a german sheep poodle breed were sequenced with nanopore sequencing and subsequently examined for genetic variations correlating with cryptorchidism. The relationships of the studied dogs were also analyzed and visually processed.
Prototype-based classification methods like Generalized Matrix Learning Vector Quantization (GMLVQ) are simple and easy to implement. An appropriate choice of the activation function plays an important role in the performance of (deep) multilayer perceptrons (MLP) that rely on a non-linearity for classification and regression learning. In this thesis, successful candidates of non-linear activation functions are investigated which are known for MLPs for application in GMLVQ to realize a non-linear mapping. The influence of the non-linear activation functions on the performance of the model with respect to accuracy, convergence rate are analyzed and experimental results are documented.
Classification of time series has received an important amount of interest over the past years due to many real-life applications, such as environmental modeling, speech recognition, and computer vision.
In my thesis, I focus on classification of time series by LVQ classifiers. To learn a classifiers, we need a training set. In our case, every data point in the training set contains a sequence (an ordered set) of feature vectors. Thus, the first task is to construct a new feature vector (or matrix) for each sequence.
Inspired by [2], I use Hankel matrices to construct the new feature vectors. This choice comes from a basic assumption that each time series is generated by a single or a set of unknown Linear Time Invariant (LTI) systems.
After generating new feature vectors by Hankel matrices, I use two approaches to learn a classifier: Generalized Learning Vector Quntization (GLVQ) and Median variant of Generalized Learning Vector Quantization (mGLVQ).
he automatic comparison of RNA/DNA or rather nucleotide sequences is a complex task requiring careful design due to the computational complexity. While alignment-based models suffer from computational costs in time, alignment-free models have to deal with appropriate data preprocessing and consistently designed mathematical data comparison. This work deals with the latter strategy. In particular, a systematic categorization is proposed, which emphasizes two key concepts that have to be combined for a successful comparison analysis: 1) the data transformation comprising adequate mathematical sequence coding and feature extraction, and 2) the subsequent (dis-)similarity evaluation of the transformed data by means of problem specific but mathematically consistent proximity measures. Respective approaches of different categories
of the introduced scheme are examined with regard to their suitability to distinguish natural RNA virus sequences from artificially generated ones encompassing varying degrees of biological feature preservation. The challenge in this application is the limited additional biological information available, such that the decision has to be made solely on the basis of the sequences and their
inherent structural characteristics. To address this, the present work focuses on interpretable, dissimilarity based classification models of machine learning, namely variants of Learning Vector Quantizers. These methods are known to be robust and highly interpretable, and therefore,
allow to evaluate the applied data transformations together with the chosen proximity measure with respect to the given discrimination task. First analysis results are provided and discussed, serving as a starting point for more in-depth analysis of this problem in the future.
RNA tertiary contact interactions between RNA tetraloops and their receptors stabilize the folding of ribosomal RNA and support the maturation of the ribosome. Here we use FRET assisted structure prediction to develop structural models of two ribosomal tertiary contacts, one consisting of a kissing loop and a GAAA tetraloop and one consisting of the tetraloop receptor (TLR) and a GAAA tetraloop. We build bound and unbound states of the ribosomal contacts de novo, label the RNA in silico and compute FRET histograms based on MD simulations and accessible contact volume (ACV) calculations. The predicted mean FRET efficiency from molecular dynamics (MD) simulations and ACV determination show agreement for the KL-TLGAAA construct. The KL construct revealed too high FRET efficiency and artificial dye behavior, which requires further investigation of the model. In the case of the TLR, the importance of the correct dye and construct parameters in the modeling was shown, which also leads to a renewed modeling. This hybrid approach of experiment and simulation will promote the elucidation of dynamic RNA tertiary contacts and accelerate the discovery of novel RNA interactions as potential future drug targets.
Brassica oleracea like all crucifers plants have a defense mechanism against natural enemies, which are chemical compounds formed form the enzymatic degradation of glucosinolates. In the presence of epithiospecifier proteins (ESP), the hydrolysis of glucosinolates will form epithionitriles or nitriles depending on the glucosinolate structure, This research proved that three predicted sequences (ESP) taken from NCBI database has a role in the enzymatic hydrolysis of glucosinolates in Brassica oleracea.
Obesity is a major public health issue in many countries and its development leads to many severe conditions. Adipose tissue (AT) simply called fat, in males visceral adipose tissues (VAT) are dominant. Estrogens play an important role in many pathological processes.
In this study, one of the subtypes of the estrogen receptor ER-beta is activated using KB (Specific ligand) treatment on VAT.
In this study, I investigated the metabolism effectof KB treatment on VAT using bioinformatics methods.
In this thesis study, I applied several bioinformatics methods such as differential expression gene analysis, pathway analysis, RNA splicing analysis and SNPs callings to make the prediction of the effect of KB treatment on VAT. A list of candidate genes, pathways and SNPs were identified in this study, which could provide some clues to reveal the genetic mechanism underlying the KB treatment effect. The results of my study show that the KB treatment on VAT has caused significant effect.
Genetic sequence variations at the level of gene promoters influence the binding of transcription factors. In plants, this often leads to differential gene expression across natural accessions and crop cultivars. Some of these differences are propagated through molecular networks and lead to macroscopic phenotypes. However, the link between promoter sequence variation and the variation of its activity is not yet well understood. In this project, we use the power of deep learning in 728 genotypes of Arabidopsis thaliana to shed light on some aspects of that link. Convolutional neural networks were successfully implemented to predict the likelihood of a gene being expressed from its promoter sequence. These networks were also capable of highlighting known and putative new sequence motifs causal for the expression of genes. We tested our algorithms in various scenarios, including single and multiple point mutations, as well as indels on synthetic and real promoter sequences and the respective performance characteristics of the algorithm have been estimated. Finally, we showed that the decision boundary to classify genes as expressed and non-expressed depends on the sensitivity of the transcriptome profiling assay and changing it has an impact on the algorithm’s performance.
In this work a second version for the Python implementation of an algorithm called Probabilistic Regulation of Metabolism (PROM) was created and applied to the metabolic model iSynCJ816 for the organism Synechocystis sp. PCC 6803. A crossvalidation was performed to determine the minimal amount of expression data needed to produce meaningful results with the PROM algorithm. The failed reproduction of the results of a method called Integrated and Deduced Regulation of Metabolism (IDREAM) is documented and causes for the failed reproduction are discussed.
The aim of this bachelor thesis is to find out how the use of artificial intelligence, specifically the one used in combat situations, can increase the playing time or even the replay value of games in the action role-playing genre. Thereby, it focuses mainly on combat situations between a player and an artificial intelligence.
To begin with, this bachelor thesis examines the action role-playing genre in order to find a suitable definition for it. Accordingly, action role-playing games involve titles that send the player on a hero’s journey-like adventure in which they must prove their skills in combat against virtual opponents. The greatest challenge of these real-time battles comes from the required quick reflexes, skill queries and hand-eye coordination.
Next, six means of increasing the replayability of a game are explored: Experience and Nostalgia, Variety and Randomness, Goals and Completion, Difficulty, Learning, and Social Aspect. The paper then proceeds to give an explanation for the term Artificial Intelligence and examines the various methods used to create intelligent behavior as well as the general advancement of the research field. Special attention is given to the implementation methods of Finite State Machines and Behavior Trees, as they are the most widely used methods for creating behavioral patterns of virtual characters.
Finally, a study conducted as part of the bachelor thesis is described, which compares a mathematically balanced artificial intelligence with a behaviorally balanced one in terms of game performance regarding the willingness of test subjects to purchase and play through the game as well as its replay value. The thesis concludes with the findings that while the behavioral approach is more promising than the mathematical approach, a combination of the two methods ultimately leads to the best outcome. Furthermore, the study shows that the use of artificial intelligence to individualize gaming experiences is promising for the future of the gaming industry.
The loss of photoreceptors is a major course for visual impairment and blindness with no cure currently established. Photoreceptor replacement into mouse models of retinal degeneration is currently investigated as a potential future therapy. To evaluate visual function in mice before and after treatment two vision-based behavioral tests (optomotor tracking and the light/dark box) were investigated including their feasibility to distinguish between rod and cone photoreceptor function. Both methods turned out to be an objective and reliable readout for vision ability in wildtype mice and mice with vision impairment due to retinal degeneration. The capability of the methods to assess slight vision improvements have to be further evaluated.
Therefore options for improvement of the established tests and an idea for a new test paradigm have been introduced.
Footage of organoids taken by means of fluorescence microscopy and segmented as well as triangulated by image analysis software like LimeSeg and Mastodon often needs to be visualized in aesthetic manner for presentation of the results in scientific papers, talks and demonstrations. The goal of this work was to create a simple to use addon “Biobox” for the open source 3D – visualization package “Blender” which would allow to import triangulated 3D data with animation over time (4D), produced by image analysis software, and optimize it for efficient usage. ”Biobox” offers several visualization tools for the creation of rendered images and animation videos by biologists.
The optimization of imported data was performed by using Blender intern modifiers. The optimized data can then be visualized by using several tools built for visualizing the organoid in frozen, animated and semi-transparent manners. A dynamic link for object selection and dynamic data exchange between Blender and Mastodon was developed. Additionally, a user interface was developed for manual correction errors of segmentation and steering the object detection algorithms of LimeSeg. The benchmark of the developed addon “Biobox” was performed on real scientific data. The benchmark test demonstrated that developed optimization result in significant (~5 fold) decrease of RAM usage and acceleration of visualization more than 160 times.
Social media platforms play an increasing role in marketing, politics and police affairs, because they can strongly influence opinions. So called “opinion leaders” exert their influence in a given network and shape the opinions of other users. Identifying central nodes in a social graph has been of interest for decades. However, not all centrality measures were developed for social media platforms. They were built for social graphs, which did not include additional metrics (e.g. “likes”, “shares”). Nevertheless, these metrics play a crucial role on modern platforms. Hence, outdated measures need to be adjusted and additional metrics need to be integrated to ensure the best possible results.
Digital data is rising day by day and so is the need for intelligent, automated data processing in daily life. In addition to this, in machine learning, a secure and accurate way to classify data is important. This holds utmost importance in certain fields, e.g. in medical data analysis. Moreover, in order to avoid severe consequences, the accuracy and reliability of the classification are equally important. So if the classification is not reliable, instead of accepting the wrongly classified data point, it is better to reject such a data point. This can be done with the help of some strategies by using them on top of a trained model or including them directly in the objective function of the desired training model. We discuss such strategies and analyze the results on data sets in this thesis.
Glycans play an important role in the intracellular interactions of pathogenic bacteria. Pathogenic bacteria possess binding proteins capable of recognizing certain sugar motifs on other cells, which are found in glycan structures. Artificial carbohydrate synthesis allows scientists to recreate those sugar motifs in a rational, precise, and pure form. However, due to the high specificity of sugar-binding proteins, known as lectins, to glycan structures, methods for identifying suitable binding agents need to be developed. To tackle this hurdle, the Fraunhofer Institute for Cell Therapy and Immunology (Fraunhofer IZI) and the Max-Planck Institute of Colloids and Interfaces (MPIKG) developed a binding assay for the high throughput testing of sugar motifs that are presented on modular scaffolds formed by the assembly of four DNA strands into simple, branched DNA nanostructures. The first generation of this assay was used in combination with bacteria that express a fluorescent protein as a proof-of-concept. Here, the assay was optimized to be used with bacteria not possessing a marker gene for a fluorescent protein by staining their genomic DNA with SYBR® Green. For the binding assay, DNA nanostructures were combined with artificially synthesized mannose polymers, typical targets for many lectins on the surface of bacteria, presenting them in a defined constellation to bind bacteria strongly due to multivalent cooperativity. The testing of multiple mannose polymers identified monomeric mannose with a 5’-carbon linker and 1,2-linked dimeric mannose with linker as the best binding candidates for E. coli, presumably due to binding with the FimH protein on the surface. Despite similarities between the FimH proteins of E. coli and K. pneumoniae, binding was only observed between E. coli and the different sugar molecules on DNA structures. Furthermore, the degree of free movement seemed to affect the binding of mannose polymers to targeted proteins, since when utilizing a more flexible DNA nanostructure, an increase in binding could be observed. An alternative to the simple DNA nanostructures described above is the use of larger, more complex DNA origami structures consisting of several hundred strands. DNA origami structures are capable of carrying dozens of modifications at the same time. The results for the DNA origami structure showed a successful functionalization with up to 71 1,2-linked dimeric mannose with linker molecules. These results point towards a solution for the high-throughput analysis of potential binding agents for pathogenic bacteria e.g. as an alternative treatment for antibiotic-resistant.
Introducing natural adversarial observations to a Deep Reinforcement Learning agent for Atari Games
(2021)
Deep Learning methods are known to be vulnerable to adversarial attacks. Since Deep Reinforcement Learning agents are based on these methods, they are prone to tiny input data changes. Three methods for adversarial example generation will be introduced and applied to agents trained to play Atari games. The attacks target either single inputs or can be applied universally to all possible inputs of the agents. They were able to successfully shift the predictions towards a single action or to lower the agent’s confidence in certain actions, respectively. All proposed methods had a severe impact on the agent’s performance while producing invisible adversarial perturbations. Since natural-looking adversarial observations should be completely hidden from a human evaluator, the negative impact on the performance of the agents should additionally be undetectable. Several variants of the proposed methods were tested to fulfil all posed criteria. Overall, seven generated observations for two of three Atari games are classified as natural-looking adversarial observations.
In this work, we discuss the key role that “conflict minerals” (Gold, Coltan, Cobalt, Tin, Tungsten) play in global supply chains and high-technology industries, and the issues surrounding their extraction and trade in origin
countries, particularly in the African Congo Basin and the Great Lakes Region. We discuss ongoing international efforts to combat violence, child labour and human rights violations at mineral extraction areas, particularly in the Democratic Republic of the Congo (DRC), where very large mineral reserves have been discovered. We present the OECD Due Diligence Guidance for Responsible Supply Chains of Minerals from Conflict-Affected and High-Risk Areas, and the
GOTS MineralTrace mineral proof-of-origin and trade chain certification solution developed by ibes AG in Germany, which automates and simplifies the implementation of the OECD Guidance. We discuss a pilot project in DRC involving the GOTS GoldTrace application, based on the MineralTrace platform. We point out MineralTrace’s benefits and its limitations. We analyse possible solutions to said limitations, including an analysis of blockchain-based transactional information exchange and record keeping systems, and finally we propose a new MineralTrace Application Programming Interface (API) that solves current limitations, introduces configuration flexibility for client applications, introduces workflow flexibility to adapt MineralTrace to any country or region, and simplifies data export functionality.
Since its foundation as an application of algebra, coding theory is obtaining a day by day increasing importance. For instance, any communication system needs the concepts of coding theory to function efficiently. In this thesis, reader will find an introductory explanation to linear codes and binary hamming codes including some of the algebraic tools devised in their applications. All the described software applications are verified using SageMath 9.0 using Hochschule Mittweida’s JupyterHub.
Neural networks have become one of the most powerful algorithms when it comes to learning from big data sets and it is used extensively for classification. But the deeper the network models, the lesser is the interpretability of such models. Although many methods exist to explain
the output of such networks, the lack of interpretability makes them black boxes. On the other hand, prototype-based machine learning algorithms are known to be interpretable and robust.
Therefore, the aim of this thesis is to find a way to interpret the functioning of the neural networks by introducing a prototype layer to the neural network architecture. This prototype layer will train alongside the neural network and help us interpret the model. We present architectures of neural networks consisting of autoencoders and prototypes that perform activity recognition from heart rates extracted from ECG signals. These prototypes represent the different activity groups that the heart rates belong to and thereby aid in interpretability.
Vicia faba leaves and calli were transformed using CRISPR Cas RNP. Two kinds of CPP fused SpyCas9 were used with sgRNA7, sgRNA5 or sgRNA13 targeting PDS exon 1, PDS exon 2 or MgCh exon 3 respectively. RNP were applied using high pressure spraying, biolistic delivery, incubation in RNP solution and infiltration of leaf tissue. A PCR and restriction enzyme based approach was used for detection of mutation. Screening of 679 E. coli colonies containing the cloned fragments resulted in detection of 14 mutations. Most of the 14 mutations were deletions of sizes 150, 500 or 730 bp. 5 out of the 14 mutations were point mutations located two to three bp upstream of PAM.
In bioinformatics one important task is to distinguish between native and mirror protein models based on the structural information. This information can be obtained from the atomic coordinates of the protein backbone. This thesis tackles the problem of distinction of these conformations, looking at the statistics of the dihedral angles’ distribution regarding the protein backbone. This distribution is visualized in Ramachandran plots. By means of an interpretable machine learning classification method – Generalized Matrix Learning Vector Quantization – we are able to distinguish between native and mirror protein models with high accuracy. Further, the classifier model supplies supplementary information on the important distributional regions for distinction, like α-helices and β-strands.
The Tutte polynomial is an important tool in graph theory. This paper provides an introduction to the two-variable polynomial using the spanning subgraph and rank-generating polynomials. The equivalency of definitions is shown in detail, as well as evaluations and derivatives. The properties and examples of the polynomial, i.e. the universality, coefficient relations, closed forms and recurrence relations are mentioned. Moreover, the thesis contains the connection between the dichromate and other significant polynomials.
A classical topic in the theory of random graphs is the probability of at least one isolated vertex in a given random graph. An isolated node has a huge impact on social networks which can be given by a random graph. We present a distribution on the number of isolated vertex using the probability generating function. We discuss the relationship between isolated edges and extended cut polynomials, extended matching polynomials using the principle of inclusion exclusion. We introduce an algorithm based on colored graphs for general graphs. We apply this to the components of a graph as well. Finally, we implement the idea on a special class of graphs like cycle, bipartite graph, path, and others. We discuss recursive procedure based on the analogous coloring rules for ladder and fan graphs.
This thesis investigates the efficacy of four machine learning algorithms, namely linear regression, decision tree, random forest and neural network in the task of lead scoring. Specifically, the study evaluates the performance of these algorithms using datasets without sampling and with random under-sampling and over-sampling using SMOTE. The performance of each algorithm is measure using various performance metrics, including accuracy, AUC-ROC, specificity, sensitivity, precision, recall, F1 score, and G-mean. The results indicate that models trained on the dataset without sampling achieved higher accuracy than those trained on the dataset with either random under-sampling or random over-sampling using SMOTE. However, the neural network demonstrated remarkable results on each dataset compared to the other algorithms. These findings provide valuable insights into the effectiveness of machine learning algorithms for lead scoring tasks, particularly when using different sampling techniques. The findings of this study can aid lead management practices in selecting the most suitable algorithm and sampling technique for their needs. Furthermore, the study contributes to the literature by providing a comprehensive evaluation of the performance of machine learning algorithms for lead scoring tasks. This thesis has practical implications for businesses looking to improve their lead management practices, and future research could extend the analysis to other machine learning algorithms or more extensive datasets.
With the growing market of cryptocurrencies, blockchain is becoming central to various research areas relevant from a mathematical and cryptographic point of view. Moreover, it is capable of transforming the traditional methods involving centralized network operations into decentralized peer-to-peer functionalities. At the same time, it provides an alternative to digital payments in a robust and tamperproof manner by adding the element of cryptography, consequently making it traversable for an individual who is a part of the blockchain network. Furthermore, for a blockchain to be optimal and efficient, it must handle the blockchain trilemma of security, decentralization, and scalability constraints in an effective manner. Algorand, a blockchain cryptocurrency protocol intended to solve blockchain’s trilemma, has been studied and discussed. It is a permissionless (public) blockchain protocol and uses pure proof of stake as its consensus mechanism.
Soft Learning Vector Quantisation (SLVQ) andRobust Soft Learning Vector Quantisation (RSLVQ) are supervised data classification methods, that have been applied successfully to real world classification problems. The performance of SLVQ and RSLVQ, however, reduces, when they are applied tomore complicated classification problems. In this thesis, we have introducedmodi-fications to SLVQand RSLVQ, in order to havemore capable versions of them. A few possibilities to modify SLVQ and RSLVQ are considered, some of them are not successful enough and they have been included for the sake of completeness. The fruits of the thesis are plenty, including Tangent Soft Learning Vector Quantisation-Strong (TSLVQ-S), together with its more stable version Tangent Robust Soft Learning Vector Quantisation-Strong (TRSLVQ-S), Attraction Soft Learning Vector Quantisation (ASLVQ) and Grassmannian Soft Learning Vector Quantisation (GSLVQ).
Mathematics Behind the Zcash
(2020)
Among all the new developed cryptocurrencies from Bitcoin, Zcash comes out to be the strongest cryptocurrency providing both transparency and anonymity to the transactions and its users by deploying the strong mathematics of zk-SNARKs.
We discussed the zero knowledge proofs which is a basic building block for providing the functionality to zk-SNARKs. It offers schnorr and sigma protocols with interactive and noninteractive versions. Non-interactive proofs are further used in Zcash transactions where the validation of sent transaction is proved by cryptographic proof.
Further, we deploy zk-SNARKs proofs following common reference string as public parameter when transaction is made. The proof allows sender to prove that she knows a secret for an instance such that the proof is succinct, can be verified very efficiently and does not leak the
secret. Non-malleability, small proofs and very effective verification make zk-SNARKs a classic tool in Zcash. Since we deal with NP problems therefore we have considered the elliptic curve cryptography to provide the same security like RSA but with smaller parameter size.
Lastly, we explain Zcash transaction process after minting the coin, the corresponding transaction completely hides the sender, receiver and amount of transaction using zero knowledge proof.
As future considerations, we talk about the improvements that can be done in term of decentralization, efficiency by comparing with top ranked cryptocurrencies namely Ethereum and Monero, privacy preserving against the thread of quantum computers and enhancements in shielded transactions.
Anomaly Detection is a very acute technical problem among various business enterprises. In this thesis a combination of the Growing Neural Gas and the Generalized Matrix Learning Vector Quantization is presented as a solution based on collected theoretical and practical knowledge. The whole network is described and implemented along with references and experimental results. The proposed model is carefully documented and all the further open researching questions are stated for future investigations.
A Protein is a large molecule that consists of a vast number of atoms; one can only imagine the complexity of such a molecule. Protein is a series of amino acids that bind to each other to form specific sequences known as peptide chains. Proteins fold into three-dimensional conformations (or so-called protein’s native structure) to perform their functions. However, not every protein folds into a correct structure as a result of mutations occurring in their amino acid sequences. Consequently, this mutation causes many protein misfolding diseases. Protein folding is a severe problem in the biological field. Predicting changes in protein stability free energy in relation to the amino acid mutation (ΔΔG) aids to better comprehend the driving forces underlying how proteins fold to their native structures. Therefore, measuring the difference in Gibbs free energy provides more insight as to how protein folding occurs. Consequently, this knowledge might prove beneficial in designing new drugs to treat protein misfolding related diseases. The protein-energy profile aids in understanding the sequential, structural, and functional relationship, by assigning an energy profile to a protein structure. Additionally, measuring the changes in the protein-energy profile consequent to the mutation (ΔΔE) by using an approach derived from statistical physics will lead us to comprehend the protein structure thoroughly. In this work, we attempt to prove that ΔΔE values will be approximate to ΔΔG values, which can lead the future studies to consider that the energy profile is a good predictor of protein binding affinity as Gibbs free energy to solve the protein folding problem.
Cryptorchidism describes a disease, in which one or both testes do not descend into the scrotum properly. With a prevalence of up to 10%, cryptorchidism is one of the most common birth defects of the male genital tract. Despite its associated health risks and accompanying economic damage, resulting from surgery and losses in breeding, studies on canine cryptorchidism and its causes are relatively rare. In this study a relational database for genetic causes of cryptorchidism was established and used as a basis for the identification of candidate genes. Associated regions were analysed by nanopore sequencing with the goal to identify genetic variants correlated with cryptorchidism in German Sheep Poodle.
This Bachelor thesis investigates the learning rules of the Hebbian, Oja and BCM neuron models for their convergence to, and the stability of, the fixed points. Existing research is presented in a structured manner using consistent notation. Hebbian learning is neither convergent nor stable. Oja learning converges to a stable fixed point, which is the eigenvector corresponding to the largest eigenvalue of the covariance matrix of the input data. BCM learning converges to a fixed point which is stable, when assuming a discrete distribution of orthogonal inputs that occur with equal probability. Hebbian learning can therefore not be used in further applications, where convergence to a stable fixed point is required. Furthermore, this Bachelor thesis came to the conclusion that determining the fixed points of the BCM learning rule explicitly involves extensive calculation and other methods for verifying the stability of possible fixed points should be considered.
Computationally solving eigenvalue problems is a central problem in numerical analysis and as such has been the subject of extensive study. In this thesis we present four different methods to compute eigenvalues, each with its own characteristics, strengths and weaknesses. After formally introducing the methods we use them in various numerical experiments to test speed of convergence, stability as well as performance when used to compute eigenfaces, denoise images and compute the eigenvector centrality measure of a graph.
In the practice of software engineering, project managers often face the problem of software project management.
It is related to resource constrained project scheduling
problem. In software project scheduling, main resources are considered to be the employees with some skill set and required amount of salary. The main purpose of software
project scheduling is to assign tasks of a project to the available employees such that the total cost and duration of the project are minimized, while keeping in check that
the constraints of software project scheduling are fulfilled. Software project scheduling (SPSP) has complex combined optimization issues and its search space increases exponentially when number of tasks and employees are increased, this makes software project scheduling problem (SPSP) a NP-Hard problem. The goal of software project scheduling problem is to minimize total cost and duration of project which makes it multi-objective problem. Many algorithms are proposed up till now that claim to give near optimal results for NP-Hard problems, but only few are there that gives feasible set of solutions for software project scheduling problem, but still we want to get more efficient algorithm to get feasible and efficient results.
Nowadays, most of the problems are being solved by using nature inspired algorithms because these algorithms provide the behavior of exploration and exploitation. For solving
software project scheduling (SPSP) some of these nature inspired algorithms have been used e.g. genetic algorithms, Ant Colony Optimization algorithm (ACO), Firefly etc.
Nature inspired algorithms like particle swarm optimization, genetic algorithms and Ant Colony Optimization algorithm provides more promising result than naive and greedy algorithms. However there is always a quest and room for more improvement. The main purpose of this research is to use bat algorithm to get efficient results and solutions for software project scheduling problem. In this work modified bat algorithm is implemented where a different approach of random walk is used. The contributions of this thesis are to: (1) To adapt and apply modified multi-objective bat algorithm for solving software project scheduling (SPSP) efficiently, (2) to adapt and apply other nature inspired algorithms like genetic algorithms for solving software project scheduling (SPSP) and (3) to compare and analyze the results obtained by applied nature inspired algorithms and provide the conclusion.
Path decomposition of a graph has received an important amount of interest over the past decades because of its applications in algorithmic graph theory and in real life problems. For the computation of a path decomposition of small width, we use different heuritics approaches. One of the most useful method is by Bodlaender and Kloks. In this thesis, we focus on the computation, applications, transformation and approximation of a path decomposition of small width.
It is easy to convert a path decomposition in to nice path decomposition with same width, which is more convinent to use to find the graph parameters like independent sets, chromatic polynomials etc. Inspired by [28], we find an algorithm to compute the chromatic polynomial of a graph via nice path decomposition with small width.
The Infinica product suite consists of multiple individual microservice applications, mainly gathered around Infinica Process Engine which allows the execution of highly individualised process definitions. For estimating process performance, a layered queuing network approach has been applied. In the first step this required the implementation of a basic modelling framework. Subsequently the implemented framework was used to evaluate the applicability of the approach by creating two models and comparing them with actual performance measurements. Although the calculated results deviated from the expected results, analysis showed that the differences may
derive from an inaccurate model. Nevertheless the general approach seems to be appropriate for the given application as well as for microservices in general, especially when extended with advanced modelling techniques, as the analysed modelled results appear consistent.
Robust soft learning vector quantization (RSLVQ) is a probabilistic approach of Learning vector quantization (LVQ) algorithm. Basically, the RSLVQ approach describes its functionality with respect to Gaussian mixture model and its cost function is defined in terms of likelihood ratio. Our thesis work involves an approach of modifying standard RSLVQ with non-Gaussian density functions like logistic, lognormal, and Cauchy (referred as PLVQ). In this approach, we derive new update rules for prototypes using gradient of cost function with respect to non-Gaussian density functions. We also derive new learning rules for the model parameters like s and s, by differentiating the cost function with respect to parameters. The main goal of the thesis is to compare the performance results of PLVQ model with Gaussian-RSLVQ model. Therefore, the performance of these classification models have been tested on the Iris and Seeds dataset. To visualize the results of the classification models in an adequate way, the Principal component analysis (PCA) technique has been used.