Refine
Document Type
- Master's Thesis (121) (remove)
Year of publication
Language
- English (121) (remove)
Keywords
- Maschinelles Lernen (24)
- Vektorquantisierung (8)
- Blockchain (7)
- Algorithmus (5)
- Bioinformatik (5)
- Neuronales Netz (5)
- Deep learning (4)
- Kryptologie (4)
- Virtuelle Währung (4)
- China (3)
VQ-VAE is a successful generative model which can perform lossy compression. It combines deep learning with vector quantization to achieve a discrete compressed representation of the data. We explore using different vector quantization techniques with VQ-VAE, mainly neural gas and fuzzy c-means. Moreover, VQ-VAE consists of a non-differentiable discrete mapping which we will explore and propose changes to the original VQ-VAE loss to fit the alternative vector quantization techniques.
Protein structures are essential elements in every biological system evolved on earth, where they function as stabilizing elements, signaltransducers or replication machin eries. They are consisting of linear-bonded amino acids, which determine the three-dimensional structure of the protein, whereas the structure in turn determines the function. The native and biological active structure ofa protein can be understood as the folding state of a polypeptide chain at the global minimum of free energy.
By means of protein energy profiling, which is an approach derived from statistical physics it is possible to assign a so called energy profile to a protein structure. Such an energy profile describes the local energetic interaction features of every amino acid within the structure and introduces an energetic point of view, instead of a structural or sequential onto proteins.
This work aims to give a perspective to the question of how we may gain pattern information out of energy profiles. The concrete subjects are energy-mapped Pfam family alignments and investigations on finding motifs or patterns indiscretizised energy profile segments.
In Machine Learning, Learning Vector Quantization(LVQ) is well known as supervised learning method. LVQ has been studied to generate optimal reference vectors because of its simple and fast learning algorithm [12]. In many tasks of classification, different variants of LVQ are considered while training a model. In this thesis, the two variants of LVQ, Generalized Matrix Learning Vector Quantization(GMLVQ) and Generalized Tangent Learning Vector Quantization(GTLVQ) have been discussed. And later, transfer learning technique for different variants of LVQ has been implemented, visualized and we have compared the results using different datasets.
Community acquired pneumonia (CAP) is a very common, yet infectious and sometimes lethal disease. Therefor, this disease is connected to high costs of diagnosis and treatment. To actually reduce the costs for health care in this matter, diagnosis and treatment must get cheaper to conduct with no loss in predictive accuracy. One effective way in doing so would be the identification of easy detectable and highly specific transcriptomic markers, which would reduce the amount of work required for laboratory tests by possibly enhanced diagnosis capability.
Transcriptomic whole blood data, derived from the PROGRESS study was combined with several documented features like age, smoking status or the SOFA score. The analysis pipeline included processing by self organizing maps for dimensionality and noise reduction, as well as diffusion pseudotime (DPT). Pseudotime enabled modelling a disease run of CAP, where each sample represented a state/time in the modelled run. Both methods combined resulted in a proposed disease run of CAP, described by 1476 marker genes. The additional conduction of a geneset analysis also provided information about the immune related functions of these marker genes.
Influenza A viruses are responsible for the outbreak of epidemics as well as pandemics worldwide. The surface protein neuraminidase of this virus is responsible, among other things, for the release of virions from the cell and is thus of interest in pharmacological research. The aim of this work is to gain knowledge about evolutionary changes in sequences of influenza A neuraminidase through different methods. First, EVcouplings is used with the goal of identifying evolutionary couplings within the protein sequences, but this analysis was unsuccessful. This is probably due to the great sequence length of neuraminidase. Second, the natural vector method will be used for sequence embedding purposes, in hopes to visualize sequential progression of the virus protein over time. Last, interpretable machine learning methods will be applied to examine if the data is classifiable by the different years and to gain information if the extracted information conform to the results from the EVcouplings analysis. Additionally to using the class label year, other labels such as groups or subtypes are used in classification with varying results. For balanced classes the machine learning models performed adequately, but this was not the case for imbalanced data. Groups and subtypes can be classified with a high accuracy, which was not the case for the years, continents or hosts. To identify the minimal number of features necessary for linear separation of neuraminidase group 1 subtypes, a logistic regression was performed at last, resulting in the identification of 15 combinations of nine amino acid frequencies. Since the sequence embedding as well as the machine learning methods did not show neuraminidase evolution over time, further research is necessary, for example with focus on one subtype with balanced data.
Proteins are involved in almost every aspect of life, mediating a wide range of cellular tasks. The protein sequence dictates the spatial arrangement of the residues and thus ultimately the function of a rotein. Huge effort is put into cumbersome structure eludication experiments which obtain models describing the observed spatial conformation of a protein, enabling users to predict their function, to understand their mode of action or to design tailored drugs to cure disease caused by misfolded or misregulated proteins.
However, the result of structure determination experiments are merely models of reality, made under simplifying assumptions - sometimes containing major undetected errors. On the other hand, such experiments are resource demanding and they cannot supply the actual demand.
Thus, scientists are predicting the structure of proteins in silico, resulting in models that are even
more prone to error.
In consequence, the structure biologists search after a practicable definition of structure quality and over the last two decades several model quality assessment programs emerged, measuring the local and global quality of peculiar structures. Seven representatives were studied, regarding the paradigms they follow and the features they use to describe the quality of residues. Their predications were compared, showing that there is almost no common ground among the tools.
Is there a way to combine their statements anyway?
Finally, the accumulated knowledge was used to design a novel evaluation tool, addressing problems previously spotted. Thereby, high quality of its predication as well as superior usability was
key. The strategy was compared to existing approaches and evaluated on suitable datasets.
The theoretical foundations of enterprise management using information technology were reviewed; analysis of the effectiveness of the use of information systems in the enterprise; ways of improving the enterprise management mechanism using information systems (on example of Mars Wrigley Confectionery Belarus) have been developed.
nicht vorhanden
This master thesis covers the topics of Studying customers’ behavior on the example of skin care brand Nivea. There are presented theoretical basis for the following research about marketing, customers’ behavior and conducting marketing research properly. Then, there is the analysis of German market. Since Nivea is the brand of Beiersdorf company, there is a description of Beiersdorf’s activity and operation work. The main idea of the paper work is to analyze customers’ behavior of Nivea. Therefore, the work contains huge research about the brand along with its’ micro- and macroenvironment. There also were conducted an in-depth interview and a survey to understand customers’
current needs. With all the results the author of the work proposed some ideas for Nivea brand.
This study shows the potential for the make-or-buy theory in several scenarios – production, assembling and development. The evaluation of these possibilities is conducted, based on Bosch’s core competencies. A decision model is developed to support the decision making process. Based on these results, the serial production at RBAC in China is planned and suggestions for setting up the assembly line are given
After the expression of the titin-Hsp27-construct with the following purification supplies no satisfied results which makes the realization of the atomic force microscopy not possible. The devel-opment of the structure model by using different bioinformatic methods can establish a model for the protein sequence. As bioinformatic methods the template search by different BLAST runs and free available software like SwissModel, Pcons, ModWeb and other tools are used. Nevertheless, the generated model is not the native conformation and has to be analyzed with other software until a stable conformation of the structure can be predicted. Depending on the time which is provided the generated model is a good approach for the aim this master thesis has.
As widely discussed in literature spatial patterns of amino acids, so-called structural motifs, play an important role in protein function. The functional responsible part of a protein often lies in an evolutionary highly conserved spatial arrangement of only few amino acids, which are held in place tightly by the rest of the structure. In general, these motifs can mediate various functional interactions, such as DNA/RNA targeting and binding, ligand interactions, substrate catalysis, and stabilization of the protein structure.
Hence, characterizing and identifying such conserved structural motifs can contribute to understanding of structurefunction relationships in diverse protein families. Therefore and because of the rapidly increasing number of solved protein structures, it is highly desirable to identify, understand and moreover to search for structural scattered amino acid motifs. The aim of this work was the development and the implementation of a matching algorithm to search for such small structural motifs in large sets of target structures. Furthermore, motif matches were extensively analyzed, statistically assessed and functionally classified. Following a novel approach, hierarchical clustering was combined with functional classification and used to deduce evolutionary structure-function relationships. The proposed methods were combined and implemented to a feature-rich and easy-to-use command line software tool, which is freely available and contributes to the field of structural bioinformatic research.
The emerging Internet of Things (IoT) technology interconnects billions of embedded devices with each other. These embedded devices are internet-enabled, which collect, share, and analyze data without any human interventions. The integration of IoT technology into the human environment, such as industries, agriculture, and health sectors, is expected to improve the way of life and businesses. The emerging technology possesses challenges and numerous
security threats. On these grounds, it is a must to strengthen the security of IoT technology to avoid any compromise, which affects human life. In contrast to implementing traditional cryptosystems on IoT devices, an elliptic curve cryptosystem (ECC) is used to meet the limited resources of the devices. ECC is an elliptic curve-based public-key cryptography which provides equivalent security with shorter key size compared to other cryptosystems such as Rivest–Shamir–Adleman (RSA). The security of an ECC hinges on the hardness to solve the elliptic curve discrete logarithm problem (ECDLP). ECC is faster and easier to implement and also consumes less power and bandwidth. ECC is incorporated in internationally recognized standards for lightweight applications due to the
benefits ECC provides.
Recently a deep neural network architecture designed to work on graph- structured data have been capturing notice as well as getting implemented in various domains and application. However, learning representation (feature embedding) from graphical data picking pace in research and constructing graph(s) from dataset remains a challenge. The ability to map the data to lower dimensions further makes the task easier while providing comfort in applying many operations. Graph neural network (GNN) is one of the novel neural network models that is catching attention as it is outperforming in various applications like recommender systems, social networks, chemical synthesis, and many more. This thesis discusses a unique approach for a fundamental task on graphs; node classification. The feature embedding for a node is aggregated by applying a Recurrent neural network (RNN), then a GNN model is trained to classify a node with the help of aggregated features and Q learning supports in optimizing the shape of neural networks. This thesis starts with the working principles of the Feedforward neural network, recurrent units like simple RNN, Long short-term memory (LSTM), and Gated recurrent unit (GRU), followed by concepts of Reinforcement learning (RL) and the Q learning algorithm. An overview of the fundamentals of graphs, followed by the GNN architecture and workflow, is discussed subsequently. Some basic GNN models are discussed in brief later before it approaches the technical implementation details, the output of the model, and a comparison with a few other models such as GraphSage and Graph attention network (GAN).
Over the past few years, wind and solar power plants have increasingly contributed to energy production. However, due to fluctuating energy sources, the energy production data contain disruption. Such disrupted data lead to the wrong prediction performance, and they need to be estimated by other values. In this thesis, we provide a comparative study to estimate the online disrupted data based on the data of similar groups of power plants, We apply three estimation techniques, e.g., mean, interpolation, and k-nearest neighbor to estimate the disruption on training data. We then apply four clustering algorithms, e.g., k-means, neural gas, hierarchical agglomerative, and affinity propagation, with two similarity measures, e.g., euclidean and dynamic time warping to form groups of power plants and compare the results. Experimental results show that when KNN estimation is applied to data, and neural gas and agglomerative with dtw are used to cluster the data, the cluster quality scores and execution time give better results compared to others. Therefore, we conclude and choose KNN estimation to reconstruct the online disrupted data on each group of a similar power plants.
The application described in this thesis has been created, built and designed to help nurses or any medical personnel all around the world in being able to access a real-time database to store patient records like Patient Name, Patient ID, Patient Age and Date of Birth, and the Symptoms that the patient is experiencing. A real-time database is a live database where all changes made to it are reflected across all devices accessing it. This application will be beneficial especially in countries where access to a computer or medical equipment is not always possible. A phone is always ready use and at the reach of the hand, users of this application will always be able to access the data at any given time and place. We will be able to add a new patient or search for existing patients. In addition, this application allows us to take RAW medical images that can be used to identify anomalies in the blood sample. RAW images are important for this application because they’re uncompressed, which means, they do not lose any quality or details. The users of this application are the medical personnel that will be taking care of the patients. These users will have to create a profile on the database in order to use the application, since their data, like user ID, will be used in order to control the behaviour of the data retrieved and stored. We will also discuss the current and future features of this application, as well as, the benefits of this application when it comes to the medical personnel, as well as patients. Finally, we will also go
over the implementation of such application from a hardware perspective, as well as a software one.
Sequences are an important data structure in molecular biology, but unfortunately it is difficult for most machine learning algorithms to handle them, as they rely on vectorial data. Recent approaches include methods that rely on proximity data, such as median and relational Learning Vector Quantization. However, many of them are limited in the size of the data they are able to handle. A standard method to generate vectorial features for sequence data does not exist yet. Consequently, a way to make sequence data accessible to preferably interpretable machine learning algorithms needs to be found. This thesis will therefore investigate a new approach called the Sensor Response Principle, which is being adapted to protein sequences. Accordingly, sequence similarity is measured via pairwise sequence alignments with different sequence alignment algorithms and various substitution matrices. The measurements are then used as input for learning with the Generalized Learning Vector Quantization algorithm. A special focus lies on sequence length variability as it is suspected to affect the sequence alignment score and therefore the discriminative quality of the generated feature vectors. Specific datasets were generated from the Pfam protein family database to address this question. Further, the impact of the number of references and choice of substitution matrices is examined.
The occurence of prostate cancer (PCa) has been consistently rising since three decades and remains the third leading cause of cancer-related deaths after lung and bowel cancer in Germany. Despite of new methods of early detection, such as prostate-specific antigen (PSA) testing, it persists to be the most common cancer in german men with over 63,400 new diagnoses in Germany every year and exhibits high prevalence in other countries of Northern andWestern Europe as well [64]. Men over the age of 70 are most commonly affected by the lethal disease, whereas an indisposition before 50 is rare. The malignant prostate tumor can be healed through operation or irradiation while the cancer hasn’t reached the stage of metastasis in which other therapeutic methods have to be employed [14] [15]. In the metastatic phase, the patient usually exhibits symptoms when the tumors size affects the urethra or the cancer spreads to other tissue, often the bones [16].
The high prevalence of this disease marks the importance of further research into prognosis and diagnosis methods, whereby identification of further biomarkers in PCa poses a major topic of scientific analysis. For this task, the effectiveness of high-throughput RNA sequencing of the transcriptome (RNA molecules of an organism or specific cell type) is frequently exploited [66]. RNA sequencing or RNA-Seq in short, offers the possibility of transcriptome assessment, enabling the identification of transcriptional aberrations in diseases as well as uncharacterized RNA species such as non-coding RNAs (ncRNAs) which remain undetected by conventional methods [49]. To alleviate interpretation of the sequenced reads they are assembled to reconstruct the transcriptome as close to the original state as possible, thus enabling rapid detection of relevant biomolecules in the data [49]. Transcriptomic studies often require highly accurate and complete gene annotations on the reference genome of the examined organism. However, most gene annotations and reference genomes are far from complete, containing a multitude of unidentified protein-coding and non-coding genes and transcripts. Therefore, refinement of reference genomes and annotations by inclusion of novel sequences, discovered in high quality transcriptome assemblies, is necessary [24].
Several algorithms have been proposed for the testing of series-parallel graphs in linear time. We give our alternate algorithms for testing series-parallel graphs, their tree decompositions, and the independence number when the input is undirected biconnected series-parallel graphs, which run (approximately) linearly in polynomial time.
Probabilistic Micropayments
(2022)
Probabilistic micropayments are important cryptography research topics in electronic commerce. The Probabilistic micropayments have the potential to be researched in order to obtain efficient algorithms with low transaction costs and high speeding computer power. To delve into the topic, it is vital to scrutinize the cryptographic preliminaries such as hash functions and digital signatures. This thesis investigates the important probabilistic methods based on a centralized or decentralized network. Firstly, centralized networks such as lottery-based tickets, Payword, coin-flipping, and MR2 are described, and an approach based on blind signatures is also discussed. Then, decentralized network methods such as MICROPAY3, a transferable scheme on the blockchain network, along with an efficient model for cryptocurrencies, are explained. Then we compare the different probabilistic micropayment methods by improving their drawback with a new technique. To set the results from the theoretical analysis of different methods into some context, we analyze the attacks that reduce the security and, therefore, the system’s efficiency. Particularly, we discuss various methods for detecting double-spending and eclipse attacks occurrence