Refine
Document Type
- Master's Thesis (78)
- Bachelor Thesis (41)
- Diploma Thesis (1)
Year of publication
Language
- English (120) (remove)
Keywords
- Maschinelles Lernen (23)
- Blockchain (9)
- Vektorquantisierung (9)
- Algorithmus (7)
- Bioinformatik (5)
- Deep learning (5)
- Graphentheorie (5)
- Neuronales Netz (5)
- Kryptologie (4)
- Softwareentwicklung (4)
Institute
- Angewandte Computer‐ und Biowissenschaften (120) (remove)
Workload Optimization Techniques for Password
Guessing Algorithms on Distributed Computing Platforms
(2019)
The following thesis covers several ways to optimize distributed computing platforms for cryptanalytic purposes. After an introduction on password storage, password guessing attacks and distributed computing in general, a set of inital benchmark results for a variety of different devices will be analyzed. The shown results are mainly based on utilization of the open source password recovery tool Hashcat. The second part of this work shows an algorithmic implementation for information retrieval and workload generation. This thesis can be used for the conception of a distributed computing system, inventory analysis of available hardware devices, runtime and cost estimations for specific jobs and finally strategic workload distribution.
The following is a description and outline of the work done at the Cornell Lab of Ornithology developing Nation Feathers VR, a virtual reality game for learning about bird calls and songs. The goal was to develop a game which is intuitive, educational and entertaining. Furthermore, the software needed to be structured in a way that allows for feasible future expansion. This required careful data saving and retrieval. The game gives the player an opportunity to learn and apply that knowledge, all while maintaining a shorter runtime in order to reduce the total time spent in the virtual world. This is meant to prevent any discomfort to the player that may result from extended use of the VR headset.
VQ-VAE is a successful generative model which can perform lossy compression. It combines deep learning with vector quantization to achieve a discrete compressed representation of the data. We explore using different vector quantization techniques with VQ-VAE, mainly neural gas and fuzzy c-means. Moreover, VQ-VAE consists of a non-differentiable discrete mapping which we will explore and propose changes to the original VQ-VAE loss to fit the alternative vector quantization techniques.
The number of Internet of Things (IoT) devices is increasing rapidly. The Trustless Incentivized Remote Node Network, in short IN3 (Incubed), enables trustworthy and fast access to a blockchain for a large number of low-performance IoT devices. Although currently IN3 only supports the verification of Ethereum data, it is not limited to one blockchain due to modularity. This thesis describes the fundamentals, the concept and the implementation of the Bitcoin verification in IN3.
In the following study the properties of the superabsorbent polymer Broadleaf P4 were investigated according to the aim to apply that polymer within constructed wetlands. The application of the polymer in constructed wetlands shall result in an improvement of the removal of pesticides. For that the polymer was given into lab-scale wetlands together with pumice and were compared to a control wetland, which was filled with gravel. The wetlands were running for several weeks in which the nutrient removal was recorded. The polymer was also tested according to its property to adsorb the pesticides before adding the pesticides to the wetland beds.
In this thesis, we implement, correct, and modify the compartmental model described in “Transmission Dynamics of Large Coronavirus Disease Outbreak in Homeless Shelter, Chicago, Illinois, USA, 2020”. Our objective is to engage in reading and understanding scientific literature, reproduce the results, and modify or generalize an existing mathematical model. We provide an overview of epidemiological models, focusing on simple compartmental SEIR models. We correct inaccuracies and misprints in the original implementation and use the limited-memory Broyden–Fletcher–Goldfarb–Shanno algorithm to fit the model’s parameters. Furthermore, we modify the model by introducing an additional compartment. The resulting model has a more intuitive interpretation and relies on fewer assumptions. We also perform the fitting process for this alternative model. Finally, we demonstrate the advantages of our modified implementations and discuss other possible approaches.
In Machine Learning, Learning Vector Quantization(LVQ) is well known as supervised learning method. LVQ has been studied to generate optimal reference vectors because of its simple and fast learning algorithm [12]. In many tasks of classification, different variants of LVQ are considered while training a model. In this thesis, the two variants of LVQ, Generalized Matrix Learning Vector Quantization(GMLVQ) and Generalized Tangent Learning Vector Quantization(GTLVQ) have been discussed. And later, transfer learning technique for different variants of LVQ has been implemented, visualized and we have compared the results using different datasets.
Community acquired pneumonia (CAP) is a very common, yet infectious and sometimes lethal disease. Therefor, this disease is connected to high costs of diagnosis and treatment. To actually reduce the costs for health care in this matter, diagnosis and treatment must get cheaper to conduct with no loss in predictive accuracy. One effective way in doing so would be the identification of easy detectable and highly specific transcriptomic markers, which would reduce the amount of work required for laboratory tests by possibly enhanced diagnosis capability.
Transcriptomic whole blood data, derived from the PROGRESS study was combined with several documented features like age, smoking status or the SOFA score. The analysis pipeline included processing by self organizing maps for dimensionality and noise reduction, as well as diffusion pseudotime (DPT). Pseudotime enabled modelling a disease run of CAP, where each sample represented a state/time in the modelled run. Both methods combined resulted in a proposed disease run of CAP, described by 1476 marker genes. The additional conduction of a geneset analysis also provided information about the immune related functions of these marker genes.
Influenza A viruses are responsible for the outbreak of epidemics as well as pandemics worldwide. The surface protein neuraminidase of this virus is responsible, among other things, for the release of virions from the cell and is thus of interest in pharmacological research. The aim of this work is to gain knowledge about evolutionary changes in sequences of influenza A neuraminidase through different methods. First, EVcouplings is used with the goal of identifying evolutionary couplings within the protein sequences, but this analysis was unsuccessful. This is probably due to the great sequence length of neuraminidase. Second, the natural vector method will be used for sequence embedding purposes, in hopes to visualize sequential progression of the virus protein over time. Last, interpretable machine learning methods will be applied to examine if the data is classifiable by the different years and to gain information if the extracted information conform to the results from the EVcouplings analysis. Additionally to using the class label year, other labels such as groups or subtypes are used in classification with varying results. For balanced classes the machine learning models performed adequately, but this was not the case for imbalanced data. Groups and subtypes can be classified with a high accuracy, which was not the case for the years, continents or hosts. To identify the minimal number of features necessary for linear separation of neuraminidase group 1 subtypes, a logistic regression was performed at last, resulting in the identification of 15 combinations of nine amino acid frequencies. Since the sequence embedding as well as the machine learning methods did not show neuraminidase evolution over time, further research is necessary, for example with focus on one subtype with balanced data.
Tokenization projects are currently very present when it comes to new blockchain technologies. After explaining the fundamentals of cross-chain interaction, the bachelor thesis will focus on tokenizing technology for Bitcoin on Ethereum. To get a more practical context, implementing the currently most successful decentralized tokenization project is described.
This thesis proposes a solution to the practical problem of supervising relatively basic mechanic processes in robotics by means of computervision. Supervision happens by comparing the tracked movement with a known, ideal recording of the movement that acts as a model.
First, this thesis analyzes possible approaches to the problem regarding data structures and representation, ways of extracting the data from the recording and ways to compare the data sets of two recordings. Then, a specific solution is implemented in C++ and explained.
The emerging Internet of Things (IoT) technology interconnects billions of embedded devices with each other. These embedded devices are internet-enabled, which collect, share, and analyze data without any human interventions. The integration of IoT technology into the human environment, such as industries, agriculture, and health sectors, is expected to improve the way of life and businesses. The emerging technology possesses challenges and numerous
security threats. On these grounds, it is a must to strengthen the security of IoT technology to avoid any compromise, which affects human life. In contrast to implementing traditional cryptosystems on IoT devices, an elliptic curve cryptosystem (ECC) is used to meet the limited resources of the devices. ECC is an elliptic curve-based public-key cryptography which provides equivalent security with shorter key size compared to other cryptosystems such as Rivest–Shamir–Adleman (RSA). The security of an ECC hinges on the hardness to solve the elliptic curve discrete logarithm problem (ECDLP). ECC is faster and easier to implement and also consumes less power and bandwidth. ECC is incorporated in internationally recognized standards for lightweight applications due to the
benefits ECC provides.
Recently a deep neural network architecture designed to work on graph- structured data have been capturing notice as well as getting implemented in various domains and application. However, learning representation (feature embedding) from graphical data picking pace in research and constructing graph(s) from dataset remains a challenge. The ability to map the data to lower dimensions further makes the task easier while providing comfort in applying many operations. Graph neural network (GNN) is one of the novel neural network models that is catching attention as it is outperforming in various applications like recommender systems, social networks, chemical synthesis, and many more. This thesis discusses a unique approach for a fundamental task on graphs; node classification. The feature embedding for a node is aggregated by applying a Recurrent neural network (RNN), then a GNN model is trained to classify a node with the help of aggregated features and Q learning supports in optimizing the shape of neural networks. This thesis starts with the working principles of the Feedforward neural network, recurrent units like simple RNN, Long short-term memory (LSTM), and Gated recurrent unit (GRU), followed by concepts of Reinforcement learning (RL) and the Q learning algorithm. An overview of the fundamentals of graphs, followed by the GNN architecture and workflow, is discussed subsequently. Some basic GNN models are discussed in brief later before it approaches the technical implementation details, the output of the model, and a comparison with a few other models such as GraphSage and Graph attention network (GAN).
Over the past few years, wind and solar power plants have increasingly contributed to energy production. However, due to fluctuating energy sources, the energy production data contain disruption. Such disrupted data lead to the wrong prediction performance, and they need to be estimated by other values. In this thesis, we provide a comparative study to estimate the online disrupted data based on the data of similar groups of power plants, We apply three estimation techniques, e.g., mean, interpolation, and k-nearest neighbor to estimate the disruption on training data. We then apply four clustering algorithms, e.g., k-means, neural gas, hierarchical agglomerative, and affinity propagation, with two similarity measures, e.g., euclidean and dynamic time warping to form groups of power plants and compare the results. Experimental results show that when KNN estimation is applied to data, and neural gas and agglomerative with dtw are used to cluster the data, the cluster quality scores and execution time give better results compared to others. Therefore, we conclude and choose KNN estimation to reconstruct the online disrupted data on each group of a similar power plants.
The application described in this thesis has been created, built and designed to help nurses or any medical personnel all around the world in being able to access a real-time database to store patient records like Patient Name, Patient ID, Patient Age and Date of Birth, and the Symptoms that the patient is experiencing. A real-time database is a live database where all changes made to it are reflected across all devices accessing it. This application will be beneficial especially in countries where access to a computer or medical equipment is not always possible. A phone is always ready use and at the reach of the hand, users of this application will always be able to access the data at any given time and place. We will be able to add a new patient or search for existing patients. In addition, this application allows us to take RAW medical images that can be used to identify anomalies in the blood sample. RAW images are important for this application because they’re uncompressed, which means, they do not lose any quality or details. The users of this application are the medical personnel that will be taking care of the patients. These users will have to create a profile on the database in order to use the application, since their data, like user ID, will be used in order to control the behaviour of the data retrieved and stored. We will also discuss the current and future features of this application, as well as, the benefits of this application when it comes to the medical personnel, as well as patients. Finally, we will also go
over the implementation of such application from a hardware perspective, as well as a software one.
Sequences are an important data structure in molecular biology, but unfortunately it is difficult for most machine learning algorithms to handle them, as they rely on vectorial data. Recent approaches include methods that rely on proximity data, such as median and relational Learning Vector Quantization. However, many of them are limited in the size of the data they are able to handle. A standard method to generate vectorial features for sequence data does not exist yet. Consequently, a way to make sequence data accessible to preferably interpretable machine learning algorithms needs to be found. This thesis will therefore investigate a new approach called the Sensor Response Principle, which is being adapted to protein sequences. Accordingly, sequence similarity is measured via pairwise sequence alignments with different sequence alignment algorithms and various substitution matrices. The measurements are then used as input for learning with the Generalized Learning Vector Quantization algorithm. A special focus lies on sequence length variability as it is suspected to affect the sequence alignment score and therefore the discriminative quality of the generated feature vectors. Specific datasets were generated from the Pfam protein family database to address this question. Further, the impact of the number of references and choice of substitution matrices is examined.
The occurence of prostate cancer (PCa) has been consistently rising since three decades and remains the third leading cause of cancer-related deaths after lung and bowel cancer in Germany. Despite of new methods of early detection, such as prostate-specific antigen (PSA) testing, it persists to be the most common cancer in german men with over 63,400 new diagnoses in Germany every year and exhibits high prevalence in other countries of Northern andWestern Europe as well [64]. Men over the age of 70 are most commonly affected by the lethal disease, whereas an indisposition before 50 is rare. The malignant prostate tumor can be healed through operation or irradiation while the cancer hasn’t reached the stage of metastasis in which other therapeutic methods have to be employed [14] [15]. In the metastatic phase, the patient usually exhibits symptoms when the tumors size affects the urethra or the cancer spreads to other tissue, often the bones [16].
The high prevalence of this disease marks the importance of further research into prognosis and diagnosis methods, whereby identification of further biomarkers in PCa poses a major topic of scientific analysis. For this task, the effectiveness of high-throughput RNA sequencing of the transcriptome (RNA molecules of an organism or specific cell type) is frequently exploited [66]. RNA sequencing or RNA-Seq in short, offers the possibility of transcriptome assessment, enabling the identification of transcriptional aberrations in diseases as well as uncharacterized RNA species such as non-coding RNAs (ncRNAs) which remain undetected by conventional methods [49]. To alleviate interpretation of the sequenced reads they are assembled to reconstruct the transcriptome as close to the original state as possible, thus enabling rapid detection of relevant biomolecules in the data [49]. Transcriptomic studies often require highly accurate and complete gene annotations on the reference genome of the examined organism. However, most gene annotations and reference genomes are far from complete, containing a multitude of unidentified protein-coding and non-coding genes and transcripts. Therefore, refinement of reference genomes and annotations by inclusion of novel sequences, discovered in high quality transcriptome assemblies, is necessary [24].
Several algorithms have been proposed for the testing of series-parallel graphs in linear time. We give our alternate algorithms for testing series-parallel graphs, their tree decompositions, and the independence number when the input is undirected biconnected series-parallel graphs, which run (approximately) linearly in polynomial time.
In dieser Arbeit wurden neuartige Proteasen aus psychrotoleranten Bakterienstämmen isoliert und auf ihre biochemischen Eigenschaften charakterisiert. Des Weiteren konnten S8 Familie Proteasen Gene amplifiziert werden und Unterschiede in der Aminosäuresequenz konnten in Zusammenhang mit den biochemischen Eigenschaften der Proteasen in Verbindung gebracht werden.
Probabilistic Micropayments
(2022)
Probabilistic micropayments are important cryptography research topics in electronic commerce. The Probabilistic micropayments have the potential to be researched in order to obtain efficient algorithms with low transaction costs and high speeding computer power. To delve into the topic, it is vital to scrutinize the cryptographic preliminaries such as hash functions and digital signatures. This thesis investigates the important probabilistic methods based on a centralized or decentralized network. Firstly, centralized networks such as lottery-based tickets, Payword, coin-flipping, and MR2 are described, and an approach based on blind signatures is also discussed. Then, decentralized network methods such as MICROPAY3, a transferable scheme on the blockchain network, along with an efficient model for cryptocurrencies, are explained. Then we compare the different probabilistic micropayment methods by improving their drawback with a new technique. To set the results from the theoretical analysis of different methods into some context, we analyze the attacks that reduce the security and, therefore, the system’s efficiency. Particularly, we discuss various methods for detecting double-spending and eclipse attacks occurrence