Refine
Document Type
- Master's Thesis (121) (remove)
Year of publication
Language
- English (121) (remove)
Keywords
- Maschinelles Lernen (24)
- Vektorquantisierung (8)
- Blockchain (7)
- Algorithmus (5)
- Bioinformatik (5)
- Neuronales Netz (5)
- Deep learning (4)
- Kryptologie (4)
- Virtuelle Währung (4)
- China (3)
When entering waterways that are restricted either in height, width or by another vessel, the behaviour of a ship changes. The most evident effect of navigating in shallow water is the squat which has led to several groundings. Because of pressure differences the vessel is pulled down into the water and the trim is changed. Another shallow water effect is the speed loss due to an increase in resistance which can reduce the maximal speed by upto 50 percent. In general the behaviour of a ship in shallow water is said to be sluggish, meaning that it is more difficult to navigate which affects the radius of the turning circle among others. Sailing parallel to a close-by bank affects the lateral force and the yaw moment. The interaction with other ships has similar effects as bank effects, but is more sophisticated since more parameters play a major role. In this thesis each of these effects is researched by studying several papers by renowned researchers.
Several models are developed which are correspondent with the inherent model of forces and moments of the simulation program. The challenges and obstacles that arised during modelling and implementation are pointed out and solutions or approaches are given.
The aim of this master thesis is to describe the key factors of successful energy efficiency projects. In particular, local conditions of such projects in Kazakhstan will be emphasized and a country-specific guideline will be provided at the end. The following topics will be covered in this thesis: energy efficiency technologies, financing, and capacities. The first part examines the energy efficiency approaches and their potential in the local industry. The second part deals with available financing methods, their specific characteristics and appropriateness for overcoming investment barriers in Kazakhstan. The third part of the master thesis concerns necessary project capacities. The application of the three elements for successful project implementation is described in the end.
After the expression of the titin-Hsp27-construct with the following purification supplies no satisfied results which makes the realization of the atomic force microscopy not possible. The devel-opment of the structure model by using different bioinformatic methods can establish a model for the protein sequence. As bioinformatic methods the template search by different BLAST runs and free available software like SwissModel, Pcons, ModWeb and other tools are used. Nevertheless, the generated model is not the native conformation and has to be analyzed with other software until a stable conformation of the structure can be predicted. Depending on the time which is provided the generated model is a good approach for the aim this master thesis has.
Proteins are macromolecules that consist of linear-bonded amino acids. They are essential elements in various metabolic processes. The three-dimensional structure of a protein is determined by the order of amino acids, also referred to as the protein sequence. This conformation corresponds to the structural state in which the protein is functionally active. However, relationships between protein sequence, structure and function have not been fully understood yet. Additionally, information about structural properties or even the entire protein structure are crucial for understanding the dynamics that define protein functionality and mechanisms. From this, the role of a protein in its molecular context can be described closely. For instance, interactions can be investigated and comprehended as a biological dynamic network that is sensitive to alternations, i.e. changes which are caused by diseases. Such knowledge can aid in drug design, whereas compounds need to be specifically tailored and adjusted to their molecular targets. Protein energy profile-basedmethods can be applied to investigate protein structures concerning dynamics and alternations. The publications enclosed to this work discuss in general the scientific potentials of energy profilebased techniques and algorithms. On the one hand, changes in stability caused by protein mutations and proteinligand interactions are discussed in the context of energy profiles. On the other hand, energetic relations to protein sequence, structure and function are elucidated in detail. Finally, the presented discussions focus on recent enhancements of the eProS (energy profile suite) database and toolbox. eProS freely provides all elucidated methodologies to the scientific community. Thus, one can address biological questions with the presented methods at hand. Additionally, eProS provides annotations related to foreign databases. This ensures a broad view on biological data and information. In particular, energetic characteristics can be identified which contribute to a protein’s structure and function.
There are a lot of people taking part in more than one competition. The competitions are also of a different kind. From local events with a small number of participants to international tournaments watched by many viewers. Naturally it becomes necessary a system to assess and compare the success in various competitions.
The existing ranking systems are usually specialized to fit their application area. More general ranking methods also exist. They can be applied to a wide spectrum of competition fields. However these ranking methods are still not universal and don't cover some important features of the competitions.
A totally new ranking system has been developed within the present master thesis. Its primary purpose is to evaluate and measure prestige gained by participants in competitions. The main contribution of the thesis consists of an original mathematical model that makes the ranking system unique.
The developed ranking system claims to be universal and interdisciplinary. It is based on the fundamental element that distinguishes the competition from the non-competition areas, namely standings that rank the participants according to their performance. The universality and the interdisciplinarity of the ranking system make available cross-disciplinary comparisons, which is usually very subjective and difficult for implementation.
The contribution of the master thesis extends beyond the theoretical area. A ranking software that fully implements this novel ranking system has been designed and developed. The software makes the practical benefits of the ranking system immediately available to potential application areas such as sports clubs and universities.
And finally, the developed ranking system offers a new viewpoint to the competitions – as a way of gaining prestige, rather than the traditional viewpoint of demonstrating mastery.
This master thesis investigates a new method for the feature extraction of gray scale images, the so called „Non-Euclidean Principal Component Analysis“ 1. Thereby the standard inner product of the Euclidean space is substituted by a semi inner product in the well known learning rule of Oja and Sanger. The new method is compared with the standard principal component analysis (PCA) by extracting features (feature vectors) of different databases with class labels and judged regarding the accuracies of „Border Sensitive Generalized Learning Vector Quantization“ (BSGLVQ), „Feed Forward Neural Networks“ (FFNN) and the „Support Vector Machines“ (SVM).
This study shows the potential for the make-or-buy theory in several scenarios – production, assembling and development. The evaluation of these possibilities is conducted, based on Bosch’s core competencies. A decision model is developed to support the decision making process. Based on these results, the serial production at RBAC in China is planned and suggestions for setting up the assembly line are given
In this work a novelty detection framework provided by M. Filippone and G. Sanguinetti is considered, which is useful especially when only few training samples are available. It is restricted to Gaussian mixture models and makes use of information theory, applying the Kullback-Leibler divergence. In this work two variations of the framework are presented, applying the symmetric Hellinger divergence and a statistical likelihood approach.
For the first time it was discovered that ultraviolet radiation with a wavelength of 200 to 400 nm (maximum 365 nm) radiated from a distance of 40 cm (intensity: 3500 mW/cm²) to PMMA altered its surface wettability as well as a roughness at the nanoscale that was observed with an atomic force microscope (AFM). The roughness rises and falls again in a short time ( 1-2days ) after 75 min and 180 min irradiation time. However , during the next 10 days roughness became stabilized and there was no influence of UV if PMMA was stored in air or in a Petri dish out of glass.
As widely discussed in literature spatial patterns of amino acids, so-called structural motifs, play an important role in protein function. The functional responsible part of a protein often lies in an evolutionary highly conserved spatial arrangement of only few amino acids, which are held in place tightly by the rest of the structure. In general, these motifs can mediate various functional interactions, such as DNA/RNA targeting and binding, ligand interactions, substrate catalysis, and stabilization of the protein structure.
Hence, characterizing and identifying such conserved structural motifs can contribute to understanding of structurefunction relationships in diverse protein families. Therefore and because of the rapidly increasing number of solved protein structures, it is highly desirable to identify, understand and moreover to search for structural scattered amino acid motifs. The aim of this work was the development and the implementation of a matching algorithm to search for such small structural motifs in large sets of target structures. Furthermore, motif matches were extensively analyzed, statistically assessed and functionally classified. Following a novel approach, hierarchical clustering was combined with functional classification and used to deduce evolutionary structure-function relationships. The proposed methods were combined and implemented to a feature-rich and easy-to-use command line software tool, which is freely available and contributes to the field of structural bioinformatic research.
Protein structures are essential elements in every biological system evolved on earth, where they function as stabilizing elements, signaltransducers or replication machin eries. They are consisting of linear-bonded amino acids, which determine the three-dimensional structure of the protein, whereas the structure in turn determines the function. The native and biological active structure ofa protein can be understood as the folding state of a polypeptide chain at the global minimum of free energy.
By means of protein energy profiling, which is an approach derived from statistical physics it is possible to assign a so called energy profile to a protein structure. Such an energy profile describes the local energetic interaction features of every amino acid within the structure and introduces an energetic point of view, instead of a structural or sequential onto proteins.
This work aims to give a perspective to the question of how we may gain pattern information out of energy profiles. The concrete subjects are energy-mapped Pfam family alignments and investigations on finding motifs or patterns indiscretizised energy profile segments.
Proteins are involved in almost every aspect of life, mediating a wide range of cellular tasks. The protein sequence dictates the spatial arrangement of the residues and thus ultimately the function of a rotein. Huge effort is put into cumbersome structure eludication experiments which obtain models describing the observed spatial conformation of a protein, enabling users to predict their function, to understand their mode of action or to design tailored drugs to cure disease caused by misfolded or misregulated proteins.
However, the result of structure determination experiments are merely models of reality, made under simplifying assumptions - sometimes containing major undetected errors. On the other hand, such experiments are resource demanding and they cannot supply the actual demand.
Thus, scientists are predicting the structure of proteins in silico, resulting in models that are even
more prone to error.
In consequence, the structure biologists search after a practicable definition of structure quality and over the last two decades several model quality assessment programs emerged, measuring the local and global quality of peculiar structures. Seven representatives were studied, regarding the paradigms they follow and the features they use to describe the quality of residues. Their predications were compared, showing that there is almost no common ground among the tools.
Is there a way to combine their statements anyway?
Finally, the accumulated knowledge was used to design a novel evaluation tool, addressing problems previously spotted. Thereby, high quality of its predication as well as superior usability was
key. The strategy was compared to existing approaches and evaluated on suitable datasets.
Die vorliegende Arbeit befasst sich mit der Analyse der kritischen Erfolgsfaktoren für die Zulassung europäischer Industrieprodukte in Indien, anhand eines europäisch entwickelten und produzierten Produktes für die indische Rolling Stock Industrie. Die dabei berücksichtigen Themenschwerpunkte, die im Detail betrachtet sind über: Welche Standards werden derzeit in Indien bzw. in Europa offiziell für den Zulassungsprozess herangezogen? Aktuelle Situation erfassen. Vergleich der technischen Zulassung Standards zwischen IR (Indischen Railway) Standards und
The almost complete transcription of the human genome yield in a high number of transcripts, that do not encode proteins. However, the functional elucidation of especially long non cod-ing RNAs is still difficult. Secondary structure analysis is assumed to be a possible method to detect functional relationships of lncRNAs on a large scale, but it is still time consuming and error-prone. GRAPHCLUST, the currently most suitable clustering tool based on RNA secondary structure analysis, lacks mainly in an efficient method for the interpretation of its results. Hence, an independent and interactive RNA clustering interpretation tool was developed to allow visu-alisation and an efficient analysis of RNA clustering results.
A variety of methods have been used to describe natural systems and cellular functions. Most use continuous systems with differential equations. Based upon the neighbourhood relations in graphs and the complex interactions in cellular automata a mathematical model was designed and implemented as an application user interface. This discrete approach called graph automata was utilised to simulate diffusion processes and chemical kinetics. The progression of diffusion in cellular environments was described and resulted in a discrepancy of 20% in comparison to experimental results. Different chemical kinetics were simulated and found to be as accurate as their continuous counterparts. The proposed model appears to be a highly scalable and modular
approach to simulate natural systems.
nicht vorhanden
nicht vorhanden
This thesis investigated the generation of laser induced periodic surface structures (LIPSS) using femtosecond laser irradiation at a central wavelength of 775 nm.
The metals stainless steel and copper as well as a semiconducting thin film, ITO on glass substrate were investigated. The impact of the processing parameters was studied for single and multiple pulse irradiation to determine the ablation threshold of the materials
and the different types of LIPSS. These observations allowed the optimisation of area structuring with regards to processing speed and LIPSS quality.
The feasibility of the LIPSS generation in dynamic, real time polarisation control was then explored. By using a fast response, liquid-crystal polarisation rotation device, the direction of the linear polarisation of the laser beam could be dynamically controlled and synchronised to the scanning during laser processing. As a result, a range of complex micro- and nano-scale patterns with orthogonal direction of LIPSS were created. The samples were analysed using optical and electron microscopy. The orientation of the LIPSS was determined also from detection of light diffracted by the LIPSS.
Finally, two applications of large area LIPSS patterning were demonstrated, information encoding on metals and periodic structuring of a thin film conducting oxide for solar cells.
This master’s thesis was written in cooperation with the Spanish company sí-internships. Developing an effective promotion strategy for this startup spending as little financial resources as possible is the main objective of this work. To do so an extensive research on the current internal, external and integral market situation follows. Building on the results of this analysis promotional objectives are being determined and a target audience chosen. Next a promotion strategy is being established.
Cancer is one of the main causes of death in developed countries, and cancer treatment heavily depends on successful early detection and diagnosis. Tumor biomarkers are helpful for early diagnose. The goal of this discovery method is to identify genetic variations as well as changes in gene expression or activity that can be linked to a typical cancer state.
First, several cancer gene signaling pathways were introduced and then combined. 27 candidate genes were selected, through the analysis of several data sets in the GEO database, a few expression difference matrices were established. Those candidate genes were tested in the matrices and found five genes PLA1A, MMP14, CCND1, BIRC5 and MYC that have the potential to be tumor biomarkers. Two of these genes have been further discussed, PLA1A is a potential biomarker for prostate cancer, and MMP14 can be considered as a biomarker for NSC lung cancer.
Finally, the significance of this study and the potential value of the two genes are discussed, and the future research in this direction is a prospect.
Large bone defects are a major clinical problem affecting elderly disproportionally, particularly indeveloped countries where this population is the fastest growing. Current treatments include autologous and allogenous bone grafts, bone elongation with the Ilizarov technique, bone graft substitutes, and electrical stimulation. Each of these approaches enjoys varying degrees of success, however, each also has its associated problems and complications. A new, still experimental, treatment is Tissue Engineering that combines scaffolds, osteogenic stem cells and growth factors, and is showing encouraging early results in preclinical and initial clinical studies.
Electrical stimulation has been shown to enhance bone healing by promoting mesenchymal stem cell migration, proliferation, and differentiation. In the present study we combine Tissue Engineering with Electrical Stimulation and hypothesize that this combined approach will have a synergistic effect resulting in enhanced new bone formation. In our in vitro experiments we observed that the levels of electrical stimulation we tested had no cytotoxic effect, instead increased osteogenic differentiation, as determined by enhanced expression of the osteogenic marker, Alkaline Phosphatase. These findings support our hypothesis by demonstrating that in the tissue-engineering environment electrical stimulation promotes bone formation. The bioinformatics part of this project consisted of gene network analysis, identification of the top 10 osteogenic markers and analyzis of genegene interactions. We observed that in studies of stem cells from both human and rat the genes, BMPR1A, BMP5, TGFßR1, SMAD4, SMAD2, BMP4, BMP7, RUNX3, and CDKN1A, are associated with osteogenesis and interact with each other. We observed a total of 31 interactions for human and 29 interactions for rat stem cells. While this approach needs to be proven experimentally, we believed that these in vitro and in silico analyses could compliment each other and in doing so contribute to the field of bone healing research.
Classification of time series has received an important amount of interest over the past years due to many real-life applications, such as environmental modeling, speech recognition, and computer vision.
In my thesis, I focus on classification of time series by LVQ classifiers. To learn a classifiers, we need a training set. In our case, every data point in the training set contains a sequence (an ordered set) of feature vectors. Thus, the first task is to construct a new feature vector (or matrix) for each sequence.
Inspired by [2], I use Hankel matrices to construct the new feature vectors. This choice comes from a basic assumption that each time series is generated by a single or a set of unknown Linear Time Invariant (LTI) systems.
After generating new feature vectors by Hankel matrices, I use two approaches to learn a classifier: Generalized Learning Vector Quntization (GLVQ) and Median variant of Generalized Learning Vector Quantization (mGLVQ).
Stability of control systems is one of the central subjects in control theory. The classical asymptotic stability theorem states that the norm of the residual between the state trajectory and the equilibrium is zero in limit. Unfortunately, it does not in general allow computing a concrete rate of convergence particularly due to algorithmic uncertainty which is related to numerical imperfections of floating-point arithmetic. This work proposes to revisit the asymptotic stability theory with the aim of computation of convergence rates using constructive analysis which is a mathematical tool that realizes equivalence between certain theorems and computation algorithms. Consequently, it also offers a framework which allows controlling numerical imperfections in a coherent and formal way. The overall goal of the current study also matches with the trend of introducing formal verification tools into the control theory. Besides existing approaches, constructive analysis, suggested within this work, can also be considered for formal verification of control systems. A computational example is provided that demonstrates extraction of a convergence certificate for example dynamical systems.
It is possible to obtain a common updating rule for k-means and Neural Gas algorithms by using a generalized Expectation Maximization method. This result is used to derive two variants of these methods. The use of a similarity measure, specifically the gaussian function, provides another clustering alternative to the before mentioned methods. The main benefit of using the gaussian function is that it inherently looks for a common cluster center for similar data points (depending on the value of the parameter s ). In different experiments we report similar behaviour of batch and proposed variants. Also we show some useful results for the “alternative” similarity method, specifically when there is no clue about the number of clusters in the data sets.
The endogen steroid hormone 17b-estradiol is a central player in a wide range of physiologic, behavioral processes and diseases in vertebrates. As a consequence, it is a main target for molecular design and drug discovery efforts in medicine and environmental sciences, which requires in-depth knowledge of protein-ligand binding processes. This work develops a bioinformatic framework based on local and global structure similarity for the characterization of E2-protein interactions in all 35 publicly available three-dimensional structures of estradiol-protein complexes. Subsequently, it uses gained data to identify four geometrically conserved estradiol binding residue motifs, against which the Protein Data Bank is queried. As result of this database query, 15 hits present in seven protein structures are found. Five of these structures do not contain E2 as ligand and had thus not been included in this work’s initial data set. One of these newly detected structures is structurally and functionally dissimilar, as well as evolutionarily distant from all other proteins analyzed in this work. Nevertheless, the ability of this protein to actually bind estradiol must be further analyzed. Finally, geometrically conserved E2-protein interactions are identified and a new research direction using these conserved interaction ensembles for the detection of novel estradiol targets is proposed.
Going green, environmental protection, eco-friendliness, sustainability or sustainable development have become frequent terms in everyone’s life. The negative impact of human activities, causing increased environmental pollution and decline, is a matter of dire concern nowadays. In the last few decades greater attention has been payed towards these issues. Understanding society´s new concerns, increasingly more companies have begun to modify their behaviours toward a more eco-friendly and responsible one. The term green marketing is an emerging area of interest, and is a tool of modern marketing used by companies in various industries. It is a full-service marketing strategy that includes green marketing plan development, sustainable auditing and planning, branding, design, and communication. An effective, authentic and transparent green presentation of a company provides a chance to successfully assert on the market, communicate core company values and build long-term customer relations. The young and innovative company SWOX Surf Protection, which entered the market with a long-lasting waterproof sunscreen particular designed for surfers and snowboarders, wants to foster growth by expanding their existing target group to a broader segment comprising all outdoor activists. Moreover, the brand strives to become the leading sunscreen manufacturer for outdoor sports and wants to position itself as a lifestyle brand. In 2016 the company started to produce “greener” sunscreen tubes with an imminent launch at hand. Due to the fact that especially surfers, snowboarders and outdoor activists are in close contact with nature and spend a lot of time in the sun, it is assumed that they have particular interest in making use of sunscreen on a healthrelated aspect, while at the same time showing increased commitment towards environmental protection. In this context, it is assumed that a holistic green and organic sunscreen could provide added values. This paper intends to examine whether green marketing could be a relevant strategy for SWOX Surf Protection to differentiate themselves from their competitors, attract potential customers, build long-term customer relations - and as a result position itself as a successful sunscreen lifestyle brand in the market. This will be verified through comprehensive literature review and detailed market research.
Brassica oleracea like all crucifers plants have a defense mechanism against natural enemies, which are chemical compounds formed form the enzymatic degradation of glucosinolates. In the presence of epithiospecifier proteins (ESP), the hydrolysis of glucosinolates will form epithionitriles or nitriles depending on the glucosinolate structure, This research proved that three predicted sequences (ESP) taken from NCBI database has a role in the enzymatic hydrolysis of glucosinolates in Brassica oleracea.
Massive multiple-input multiple-output (MIMO), eine Technik bei der die Basisstation einer Mobilfunkzelle mit einer großen Anzahl an Antennen ausgestattet ist, wird derzeit als eine vielversprechende Schlüsseltechnologie zur Erfüllung der Anforderungen zukünftiger drahtloser Kommunikationsnetze der fünften Generation betrachtet. Die zuversichtlichen Angaben über die Leistung solcher Systeme beruht allerdings auf einer theoretischen, bisher kaum praktisch verizierten Annahme, dass die drahtlosen Übertragungskanäle verschiedener Nutzer aufgrund der hohen Anzahl an Antennen voneinander unabhängig sind. Das heißt, dass sogenannte günstige Übertragungsbedingungen herrschen. Die vorliegende Masterarbeit untersucht diese neuartigen Systeme unter zwei verschiedenen Perspektiven.
Im ersten Teil dieser Arbeit wird der Einfluss von realistischen Übertragungsbedingungen auf die Performance von massive MIMO Systemen evaluiert. Dazu werden entsprechende numerische Systemsimulationen durchgeführt und mit den Ergebnissen von praktischen massive MIMO Messkampagnen verglichen.
Die Untersuchungen ergeben, dass die sogenannten günstigen Übertragungsbedingungen in realistischen Umgebungen nur bedingt beobachtet werden können. Daher führen traditionelle Kanalmodelle zu einer ungenauen Abschätzung der Leistung von praktischen massive MIMO Systemen. Um diesem Problem zu begegnen, wird deshalb eine neuartige Parametrisierung des traditionellen Kronecker-Modells vorgeschlagen, sodass relevante Kenngrößen realistischer Kanäle mit diesem Modell präzise widergespiegelt werden.
Anschließend folgt eine Untersuchung verschiedener Methoden zur Kanalschätzung in massive MIMO Systemen unter den verschiedenen Kanalmodellen mittels numerischer Simulationen. Die Experimente zeigen auf, dass Schätzmethoden, welche speziell für massive MIMO unter der Annahme von günstigen Übertragungsbedinungen hergeleitet wurden, eine signifikante Leistungsminderung unter realistischen Kanalmodellen erfahren.
Im zweiten Teil dieser Arbeit liegt der Fokus auf der Anwendung von massive MIMO Systemen in sogenannten Internet of Things (IoT) Netzwerken. Die typischerweise hohe Anzahl an aktiven IoT-Geräten macht die Anwendung von effizienten Scheduling-Algorithmen notwendig. Daher wird ein Downlink-Scheduling-Algorithmus präsentiert, welcher sich die Eigenschaften von massive MIMO Systemen und die typischen Anforderungen an die Datenraten von IoT-Geräten zunutze macht. Im Speziellen wird vorgeschlagen, die IoT-Nutzer in Gruppen aufzuteilen und die verschiedenen Gruppen nacheinander zu versorgen. Die Gruppengröße wird dabei mit Hilfe asymptotischer Eigenschaften von massive MIMO Systemen hergeleitet.
Um die Gruppenmitglieder zu selektieren, wird eine modifizierte Version des populären Semi-Orthogonal-User-Selection (SUS) Algorithmus vorgeschlagen. Die anschließend durchgeführten numerischen Simulationen bestätigen, dass die modifizierte Version von SUS die Nachteile des originalen Algorithmus eliminiert, was wiederum zu verbesserten Datenraten in dem betrachteten System führt.
This master thesis was developed based on public information about Linde AG. It analyzed and evaluated macroeconomic factors influencing the pеrformance of the company. Microeconomic and macroeconomic indicators play the central role for the financial management of each global company. Thus, performance measurement is important for understanding the vаlue and extent of the environment. The study of the thesis aims at estimating the extent to which a company may opеrate on the global market and what factors contribute to its performance the most.
Firstly, the thesis examines theoretical background based on the previous researches. It defines the specific macroeconomic and microeconomic factors and their role in the company’s performance. Afterwards the thesis analyses Linde AG activities on domestic and foreign markets. The present structure, the current position in the markets and financial indicators are analyzed. The correlation and regression analysis were developed with the aim to find the links between the company’s performance and the macroeconomic environment. It is believed that inflation, exchange and interest rates as well as stock market index have a significant influence on the Linde’s performance.
The results showed that the indicators of inflation rate and stock market index play a significant role in the Linde’s performance. Thus, when it comes to exchanging rates, more data needs to be evaluated in order to derive concrete conclusions.
Obesity is a major public health issue in many countries and its development leads to many severe conditions. Adipose tissue (AT) simply called fat, in males visceral adipose tissues (VAT) are dominant. Estrogens play an important role in many pathological processes.
In this study, one of the subtypes of the estrogen receptor ER-beta is activated using KB (Specific ligand) treatment on VAT.
In this study, I investigated the metabolism effectof KB treatment on VAT using bioinformatics methods.
In this thesis study, I applied several bioinformatics methods such as differential expression gene analysis, pathway analysis, RNA splicing analysis and SNPs callings to make the prediction of the effect of KB treatment on VAT. A list of candidate genes, pathways and SNPs were identified in this study, which could provide some clues to reveal the genetic mechanism underlying the KB treatment effect. The results of my study show that the KB treatment on VAT has caused significant effect.
This thesis focuses on the introduction of a process for the fracture toughness testing of epoxy resin systems, in the light of the linear elastic fracture mechanic approach. Based on the requirements of ISO 13586, SENB-specimen were designed and especially the precracking process was analysed and the tapping process was optimized by designing and testing a drop-weight device. After successful validating the test process using specimen made of Araldite LY556, the in uence of GNP loading on the fracture toughness was analysed. The pure epoxy showed a KIc of 0.73 MPap
m, being perfectly in line with the manufacturers datasheet. A peak in fracture toughness of 0.83 MPap
m was archived at 1 wt% and a loading rate of 10 mm/min, showing a decreasing trend as the loading is increased further. As the loading rate is increased, the fracture toughness reduces slightly for 0.5 wt% and 2 wt% GNP, but
drops signicantly for 1 wt% GNP obliterating the peak. The load vs. displacement curves showed quasi-brittle material behaviour. The fracture surfaces were analysed using SEM and while the neat resin did not show any features, did the reinforced samples show pattern of crack pinning in connection with bridging and pull-out. The resulting improvement is less signicant as observed by other researchers for larger GNPs. This is in line with the general idea, that small particles are not able to yield as high improvements, but the signicant decrease for higher loading rates is not observed or described so far. It is suspected that tests at lower loading rates (e.g. 1 or 0.5 mm/min) show an even higher fracture toughness.
This study presents an analysis of the coverage made by the journals El País (Spain), Folha de S. Paulo (Brazil) and Süddeutsche Zeitung (Germany) about the protests in Brazil against the 2013 Confederations Cup and the 2014 FIFA World Cup to establish a comparison between them and see which topics were emphasized by the newspapers and which tone they use in their reporting. Based on the research questions, four categories were developed for the analysis of the journals: article structure; topic of the article; actors/group of persons and tone of the reporting, all of them composed by several subcategories. It was concluded that the themes highlighted by the European newspapers were different from those stressed on the Brazilian diary. Nonetheless, all the reviewed newspapers made a neutral coverage of the protests.
kein Abstract vorhanden
Path decomposition of a graph has received an important amount of interest over the past decades because of its applications in algorithmic graph theory and in real life problems. For the computation of a path decomposition of small width, we use different heuritics approaches. One of the most useful method is by Bodlaender and Kloks. In this thesis, we focus on the computation, applications, transformation and approximation of a path decomposition of small width.
It is easy to convert a path decomposition in to nice path decomposition with same width, which is more convinent to use to find the graph parameters like independent sets, chromatic polynomials etc. Inspired by [28], we find an algorithm to compute the chromatic polynomial of a graph via nice path decomposition with small width.
In this master thesis, we define a new bivariate polynomial which we call the defensive alliance polynomial and denote it by da(G; x; y). It is a generalization of the alliance polynomial and the strong alliance polynomial. We show the relation between da(G; x; y) and the alliance, the strong alliance, the induced connected subgraph polynomials as well as the cut vertex sets polynomial. We investigate information encoded about G in da(G; x; y). We discuss the defensive alliance polynomial for the path graphs, the cycle graphs, the star graphs, the double star graphs, the complete graphs, the complete bipartite graphs, the regular graphs, the wheel graphs, the open wheel graphs, the friendship graphs, the triangular book graphs and the quadrilateral book graphs. Also, we prove that the above classes of graphs are characterized by its defensive alliance polynomial. We present the defensive alliance polynomial of the graph formed of attaching a vertex to a complete graph. We show two pairs of graphs which are not characterized by the alliance polynomial but characterized by the defensive alliance polynomial.
Also, we present three notes on results in the literature. The first one is improving a bound and the other two are counterexamples.
In the following study we evaluated capabilities of how a simple autoencoder can be used to trainGeneralized Learning Vector Quantization classifier. Specifically, we proved that the bottlenecks of an autoencoder serve as an "information filter" which tries to best represent the desired output in that particular layer in the statistical sense of mutual information.
Autoencoder model was trained for purely unsupervised task and leveraged the advantages by learning feature representations. As a result, the model got the significant value of the accuracy. Implementation and tuning of the model was carried out using Tensor Flow [1].
An extra study has been dedicated to improve traditional GLVQ algorithm taken from sklearn-lvg [2] using the bottleneck from an autoencoder.
The study has revealed potential of bottlenecks of an autoencoder as pre-processing tool in improving the accuracy of GLVQ. Specifically, the model was capable to identify 75% improvements of accuracy in GLVQ comparing to original one, which has about 62%. Consequently, the research exposed the need for further improvement of the model in the present problem case.
Community acquired pneumonia (CAP) is a very common, yet infectious and sometimes lethal disease. Therefor, this disease is connected to high costs of diagnosis and treatment. To actually reduce the costs for health care in this matter, diagnosis and treatment must get cheaper to conduct with no loss in predictive accuracy. One effective way in doing so would be the identification of easy detectable and highly specific transcriptomic markers, which would reduce the amount of work required for laboratory tests by possibly enhanced diagnosis capability.
Transcriptomic whole blood data, derived from the PROGRESS study was combined with several documented features like age, smoking status or the SOFA score. The analysis pipeline included processing by self organizing maps for dimensionality and noise reduction, as well as diffusion pseudotime (DPT). Pseudotime enabled modelling a disease run of CAP, where each sample represented a state/time in the modelled run. Both methods combined resulted in a proposed disease run of CAP, described by 1476 marker genes. The additional conduction of a geneset analysis also provided information about the immune related functions of these marker genes.
Soft Learning Vector Quantisation (SLVQ) andRobust Soft Learning Vector Quantisation (RSLVQ) are supervised data classification methods, that have been applied successfully to real world classification problems. The performance of SLVQ and RSLVQ, however, reduces, when they are applied tomore complicated classification problems. In this thesis, we have introducedmodi-fications to SLVQand RSLVQ, in order to havemore capable versions of them. A few possibilities to modify SLVQ and RSLVQ are considered, some of them are not successful enough and they have been included for the sake of completeness. The fruits of the thesis are plenty, including Tangent Soft Learning Vector Quantisation-Strong (TSLVQ-S), together with its more stable version Tangent Robust Soft Learning Vector Quantisation-Strong (TRSLVQ-S), Attraction Soft Learning Vector Quantisation (ASLVQ) and Grassmannian Soft Learning Vector Quantisation (GSLVQ).
Internationalization and business expansion appear to be the most challenging processes in business conduction today. Every step of the foreign market entry process and overseas operations establishment is full of obvious risks and hidden pitfalls. Theoretical background, multiplied with the vital practice, is playing the key role in such a complicated business process; such information can be used as a guideline by further market entrants and players. At present, Germany with its well-developed engineering industry represents a broad space for research of internationalization process in its different forms, as well as can show both successful and negative results of foreign market entries.
FUSO is one of the Japanese leading manufacturing of trucks and buses in the world and also it is an integral part of Daimler AG. Being a large manufacturer in trucks and buses, Fuso faces some marketing issues due to corrosion issues. Corrosion is one of the major issue to breakdown or damage the performance of the vehicles. To encounter this issue, FUSO initiated new project and called as “Anti-Corrosion Project”. The main mission of this project is to improve the corrosion resistivity or performance of the metal parts. Currently FUSO has almost 70 percent of parts which lies under Grade-III i.e. lesser than the one year corrosion resistivity.
In this project, the corrosion issues are collected by different types of audits like from customer as well as from taking two years old vehicle in worst conditions. Listed corrosion issues further investigated for current specification and requested for new proposal from supplier. Then the proposed solution is internally estimate the cost and make negotiation with the supplier. Later it’s forwarded to meeting with top management for approval. In case of higher corrosion specification, parts are taken from production line and tested in material lab which is available in FUSO. At last, the approved proposal is requested to release the drawing change and further the new proposal will be implemented. Entire project it should be coordinate with all different departments and working with teams gives more deep knowledge about the cause of issues.
With this project, parallel focused on the shop floor developments in return parts management area. FUSO is also responsible for the after sale services. In other words, FUSO provides warranty for the parts which breakdown within three years. Breakdown parts are directly delivered by the customers through dealers for warranty claim, so these parts called Warranty Part Investigation (WPI) parts. Sometimes customer wants to know the cause of the breakdown even though warranty has expired, in this case company will investigate the cause but they don’t provide the warranty. These kind of parts known as Product Quality Report (PQR) parts.
Company has a different shop floor for return parts and these parts are directly received by the company. RPM has four processes i.e. inwarding, pre-analysis, investigation and dispatch or scrap.
Usually, company used to get 30-50 parts per day, recently they decided to receive all the breakdown parts. Hence, it results in increasing the delay of inwarding and other processes. To solve this, standard layout and process are constructed. And, one of the main reasons for inward delay is higher documentation which is basically not required. These are converted into automation or digitalize work. Improvements are done using the lean manufacturing project methodology which results in more inward of failure parts and less inventory.
Many companies use machine learning techniques to support decision-making and automate business processes by learning from the data that they have. In this thesis we investigate the theory behind the most widely used in practice machine learning algorithms for solving classification and regression problems.
In particular, the following algorithms were chosen for the classification problem: Logistic Regression, Decision Trees, Random Forest, Support Vector Machine (SVM), Learning Vector Quantization (LVQ). As for the regression problem, Decision Trees, Random Forest and Gradient Boosted Tree were used. We then apply those algorithms to real company data and compare their performances and results.
The application described in this thesis has been created, built and designed to help nurses or any medical personnel all around the world in being able to access a real-time database to store patient records like Patient Name, Patient ID, Patient Age and Date of Birth, and the Symptoms that the patient is experiencing. A real-time database is a live database where all changes made to it are reflected across all devices accessing it. This application will be beneficial especially in countries where access to a computer or medical equipment is not always possible. A phone is always ready use and at the reach of the hand, users of this application will always be able to access the data at any given time and place. We will be able to add a new patient or search for existing patients. In addition, this application allows us to take RAW medical images that can be used to identify anomalies in the blood sample. RAW images are important for this application because they’re uncompressed, which means, they do not lose any quality or details. The users of this application are the medical personnel that will be taking care of the patients. These users will have to create a profile on the database in order to use the application, since their data, like user ID, will be used in order to control the behaviour of the data retrieved and stored. We will also discuss the current and future features of this application, as well as, the benefits of this application when it comes to the medical personnel, as well as patients. Finally, we will also go
over the implementation of such application from a hardware perspective, as well as a software one.
Implementation of a customised business model for innovative engineering consultancy services
(2019)
Business development is vital for every organisation who intend to grow. It follows expansion through organic and inorganic means. Also, there are many innovative business styles which help organisations to expand. This thesis shows how engineering services organisation chose its form of business expansion
The following thesis explains how engineering service sector company uses its expertise to expand its business towards consultancy market with the demonstration of the real-life executed business model.
The thesis provides a solution for the following issues
1) What is the best in-house strategy to be developed for business expansion in the service industry?
2) How did the niche market experiences help for business expansion?
Prototype-based classification methods like Generalized Matrix Learning Vector Quantization (GMLVQ) are simple and easy to implement. An appropriate choice of the activation function plays an important role in the performance of (deep) multilayer perceptrons (MLP) that rely on a non-linearity for classification and regression learning. In this thesis, successful candidates of non-linear activation functions are investigated which are known for MLPs for application in GMLVQ to realize a non-linear mapping. The influence of the non-linear activation functions on the performance of the model with respect to accuracy, convergence rate are analyzed and experimental results are documented.
This master thesis covers the topics of Customer relationships formation in the IT-outsourcing market on the example of “ABC” company. Most works related to the topic IT outsourcing cover the problems of implementation of IT services and the process of providing them to the customers and mostly all the issues are covered from the perspec-tive of consumers. Thus, problems and results of outsourcing providers of IT services remain almost uncovered. This master thesis is to reveal the specific features of IT out-sourcing business in Belarus and to develop an approach to the formation and construc-tion of a system of relationships between the company and its clients as a source of competitiveness increase.
Cryptorchidism describes a disease, in which one or both testes do not descend into the scrotum properly. With a prevalence of up to 10%, cryptorchidism is one of the most common birth defects of the male genital tract. Despite its associated health risks and accompanying economic damage, resulting from surgery and losses in breeding, studies on canine cryptorchidism and its causes are relatively rare. In this study a relational database for genetic causes of cryptorchidism was established and used as a basis for the identification of candidate genes. Associated regions were analysed by nanopore sequencing with the goal to identify genetic variants correlated with cryptorchidism in German Sheep Poodle.
In today’s market, the process of dealing with textual data for internal and external processes has become increasingly important and more complex for certain companies. In this context,the thesis aims to support the process of analysis of similarities among textual documents by analyzing relationships among them. The proposed analysis process includes discovering similarities among these financial documents as well as possible patterns. The proposal is based on the exploitation and extension of already existing approaches as well as on their combination with well-known clustering analysis techniques. Moreover, a software tool has been implemented for the evaluation of the proposed approach, and experimented on the EDGAR filings, on the basis of qualitative criteria.
This Master Thesis covers two main Topics: Sharing Economy and Risk Management and combines them in frames of this paper in order to provide a methodology (Uber was chosen as an example) of how a risk management process may be applied to a Sharing Economy business, as well as which types of risks are of special relevance for those types of businesses.
A relatively new research field of neurosciences, called Connectomics, aims to achieve a full understanding and mapping of neural circuits and fine neuronal structures of the nervous system in a variety of organisms. This detailed information will provide insight in how our brain is influenced by different genetic and psychiatric diseases, how memory traces are stored and ageing influences our brain structure. It is beyond question that new methods for data acquisition will produce large amounts of neuronal image data. This data will exceed the zetabyte range and is impossible to annotate manually for visualization and analysis. Nowadays, machine learning algorithms and specially deep convolutional neuronal networks are heavily used in medical imaging and computer vision, which brings the opportunity of designing fully automated pipelines for image analysis. This work presents a new automated workflow based on three major parts including image processing using consecutive deep convolutional networks, a pixel-grouping step called connected components and 3D visualization via neuroglancer to achieve a dense three dimensional reconstruction of neurons from EM image data.
Digital innovation in the quality management system from supply chain to final product conformityy
(2019)
As the new revolution is happening in the industry 4.0 as digitalization and the new trend in innovation is taken place. So, we want to digitalize the process from the supply chain to the final product conformity of the aircraft.
So every document which is received from the supplier like (eg.CoC, Inspection report, concession) digitally. When the part is received at the warehouse of the OEM the warehouse personal has a system to say that part A serial no X is the perfect fit for the part no By with the help of QR code and book the part into the ERP.
The biggest challenge we have is to reduce in production inspection method to be done by a human. We want to bring one more upper step that is automation with edition with IOT in the process to give better data processing to the Automation process plus reduce the overall inspection time and what is needed in create a proper visual automation control system and also with help of gauge Rand R make the process more accurate and also certify the traceability of the process . At finally there was so much data and we need data security for that to create a proper data source and data storage for supplier data as well as internal data security.
In the practice of software engineering, project managers often face the problem of software project management.
It is related to resource constrained project scheduling
problem. In software project scheduling, main resources are considered to be the employees with some skill set and required amount of salary. The main purpose of software
project scheduling is to assign tasks of a project to the available employees such that the total cost and duration of the project are minimized, while keeping in check that
the constraints of software project scheduling are fulfilled. Software project scheduling (SPSP) has complex combined optimization issues and its search space increases exponentially when number of tasks and employees are increased, this makes software project scheduling problem (SPSP) a NP-Hard problem. The goal of software project scheduling problem is to minimize total cost and duration of project which makes it multi-objective problem. Many algorithms are proposed up till now that claim to give near optimal results for NP-Hard problems, but only few are there that gives feasible set of solutions for software project scheduling problem, but still we want to get more efficient algorithm to get feasible and efficient results.
Nowadays, most of the problems are being solved by using nature inspired algorithms because these algorithms provide the behavior of exploration and exploitation. For solving
software project scheduling (SPSP) some of these nature inspired algorithms have been used e.g. genetic algorithms, Ant Colony Optimization algorithm (ACO), Firefly etc.
Nature inspired algorithms like particle swarm optimization, genetic algorithms and Ant Colony Optimization algorithm provides more promising result than naive and greedy algorithms. However there is always a quest and room for more improvement. The main purpose of this research is to use bat algorithm to get efficient results and solutions for software project scheduling problem. In this work modified bat algorithm is implemented where a different approach of random walk is used. The contributions of this thesis are to: (1) To adapt and apply modified multi-objective bat algorithm for solving software project scheduling (SPSP) efficiently, (2) to adapt and apply other nature inspired algorithms like genetic algorithms for solving software project scheduling (SPSP) and (3) to compare and analyze the results obtained by applied nature inspired algorithms and provide the conclusion.
The theoretical foundations of enterprise management using information technology were reviewed; analysis of the effectiveness of the use of information systems in the enterprise; ways of improving the enterprise management mechanism using information systems (on example of Mars Wrigley Confectionery Belarus) have been developed.
Neural networks have become one of the most powerful algorithms when it comes to learning from big data sets and it is used extensively for classification. But the deeper the network models, the lesser is the interpretability of such models. Although many methods exist to explain
the output of such networks, the lack of interpretability makes them black boxes. On the other hand, prototype-based machine learning algorithms are known to be interpretable and robust.
Therefore, the aim of this thesis is to find a way to interpret the functioning of the neural networks by introducing a prototype layer to the neural network architecture. This prototype layer will train alongside the neural network and help us interpret the model. We present architectures of neural networks consisting of autoencoders and prototypes that perform activity recognition from heart rates extracted from ECG signals. These prototypes represent the different activity groups that the heart rates belong to and thereby aid in interpretability.
Vicia faba leaves and calli were transformed using CRISPR Cas RNP. Two kinds of CPP fused SpyCas9 were used with sgRNA7, sgRNA5 or sgRNA13 targeting PDS exon 1, PDS exon 2 or MgCh exon 3 respectively. RNP were applied using high pressure spraying, biolistic delivery, incubation in RNP solution and infiltration of leaf tissue. A PCR and restriction enzyme based approach was used for detection of mutation. Screening of 679 E. coli colonies containing the cloned fragments resulted in detection of 14 mutations. Most of the 14 mutations were deletions of sizes 150, 500 or 730 bp. 5 out of the 14 mutations were point mutations located two to three bp upstream of PAM.
In bioinformatics one important task is to distinguish between native and mirror protein models based on the structural information. This information can be obtained from the atomic coordinates of the protein backbone. This thesis tackles the problem of distinction of these conformations, looking at the statistics of the dihedral angles’ distribution regarding the protein backbone. This distribution is visualized in Ramachandran plots. By means of an interpretable machine learning classification method – Generalized Matrix Learning Vector Quantization – we are able to distinguish between native and mirror protein models with high accuracy. Further, the classifier model supplies supplementary information on the important distributional regions for distinction, like α-helices and β-strands.
A Protein is a large molecule that consists of a vast number of atoms; one can only imagine the complexity of such a molecule. Protein is a series of amino acids that bind to each other to form specific sequences known as peptide chains. Proteins fold into three-dimensional conformations (or so-called protein’s native structure) to perform their functions. However, not every protein folds into a correct structure as a result of mutations occurring in their amino acid sequences. Consequently, this mutation causes many protein misfolding diseases. Protein folding is a severe problem in the biological field. Predicting changes in protein stability free energy in relation to the amino acid mutation (ΔΔG) aids to better comprehend the driving forces underlying how proteins fold to their native structures. Therefore, measuring the difference in Gibbs free energy provides more insight as to how protein folding occurs. Consequently, this knowledge might prove beneficial in designing new drugs to treat protein misfolding related diseases. The protein-energy profile aids in understanding the sequential, structural, and functional relationship, by assigning an energy profile to a protein structure. Additionally, measuring the changes in the protein-energy profile consequent to the mutation (ΔΔE) by using an approach derived from statistical physics will lead us to comprehend the protein structure thoroughly. In this work, we attempt to prove that ΔΔE values will be approximate to ΔΔG values, which can lead the future studies to consider that the energy profile is a good predictor of protein binding affinity as Gibbs free energy to solve the protein folding problem.
he automatic comparison of RNA/DNA or rather nucleotide sequences is a complex task requiring careful design due to the computational complexity. While alignment-based models suffer from computational costs in time, alignment-free models have to deal with appropriate data preprocessing and consistently designed mathematical data comparison. This work deals with the latter strategy. In particular, a systematic categorization is proposed, which emphasizes two key concepts that have to be combined for a successful comparison analysis: 1) the data transformation comprising adequate mathematical sequence coding and feature extraction, and 2) the subsequent (dis-)similarity evaluation of the transformed data by means of problem specific but mathematically consistent proximity measures. Respective approaches of different categories
of the introduced scheme are examined with regard to their suitability to distinguish natural RNA virus sequences from artificially generated ones encompassing varying degrees of biological feature preservation. The challenge in this application is the limited additional biological information available, such that the decision has to be made solely on the basis of the sequences and their
inherent structural characteristics. To address this, the present work focuses on interpretable, dissimilarity based classification models of machine learning, namely variants of Learning Vector Quantizers. These methods are known to be robust and highly interpretable, and therefore,
allow to evaluate the applied data transformations together with the chosen proximity measure with respect to the given discrimination task. First analysis results are provided and discussed, serving as a starting point for more in-depth analysis of this problem in the future.
Convolutional Neural network (CNN) has been one of most powerful and popular preprocessing techniques employed for image classification problems. Here, we use other signal processing techniques like Fourier transform and wavelet transform to preprocess the images in conjunction with different classifiers like MLP, LVQ, GLVQ and GMLVQ and compare its performance with CNN.
Anomaly Detection is a very acute technical problem among various business enterprises. In this thesis a combination of the Growing Neural Gas and the Generalized Matrix Learning Vector Quantization is presented as a solution based on collected theoretical and practical knowledge. The whole network is described and implemented along with references and experimental results. The proposed model is carefully documented and all the further open researching questions are stated for future investigations.
Genetic sequence variations at the level of gene promoters influence the binding of transcription factors. In plants, this often leads to differential gene expression across natural accessions and crop cultivars. Some of these differences are propagated through molecular networks and lead to macroscopic phenotypes. However, the link between promoter sequence variation and the variation of its activity is not yet well understood. In this project, we use the power of deep learning in 728 genotypes of Arabidopsis thaliana to shed light on some aspects of that link. Convolutional neural networks were successfully implemented to predict the likelihood of a gene being expressed from its promoter sequence. These networks were also capable of highlighting known and putative new sequence motifs causal for the expression of genes. We tested our algorithms in various scenarios, including single and multiple point mutations, as well as indels on synthetic and real promoter sequences and the respective performance characteristics of the algorithm have been estimated. Finally, we showed that the decision boundary to classify genes as expressed and non-expressed depends on the sensitivity of the transcriptome profiling assay and changing it has an impact on the algorithm’s performance.
Data streams change their statistical behaviour over the time. These changes can occur gradually or abruptly with unforeseen reasons, which may effect the expected outcome. Thus it is important to detect concept drift as soon as it occurs. In this thesis we chose distance based methodology to detect presence of concept drift in the data streams. We used generalized learning vector quantization(GLVQ) and generalized matrix learning vector quantization( GMLVQ) classifiers for distance calculation between prototypes and data points. Chi-square and Kolmogorov–Smirnov tests are used to compare the distance distributions of test and train data sets to indicate the drift presence.
In response to prevailing environmental conditions, Arabidopsis thaliana plants must increase their photosynthetic capacity to acclimate to potential harmful environmental high light stress. In order to measure these changes in acclimation capacity, different high throughput imaging-based methods can be used. In this master thesis we studied different Arabidopsis thaliana knockout mutants-and accessions in their capacity to acclimate to potential harmful environmental high light and cold temperature conditions using a high throughput phenotyping system with an integrated chlorophyll fluorescence measurement system. In order to determine the acclimation capacity, Arabidopsis thaliana knockout mutants of previously not high light assigned genes as well as accessions of two different haplotype groups with a reference and alternative allele from different countries of origin were grown under switching high light and temperature environmental conditions. Photosynthetic analysis showed that knockout mutant plants did differ in their Photosystem II operating efficiency during an increased light irradiance switch but did not significantly differ a week later under the same circumstances from the wildtype. High throughput phenotyping of haplotype accessions revealed significant better acclimation capacity in non-photochemical quenching and steady-state photosynthetic efficiency in Russian domiciled accessions with an altered SPPA gene during high light and cold stress.
Financial fraud for banks can be a reason for huge monetary losses. Studies have shown that, if not mitigated, financial fraud can lead to bankruptcy for big financial institutions and even insolvency for individuals. Credit card fraud is a type of financial fraud that is ever growing. In the future, these numbers are expected to increase exponentially and that’s why a lot of researchers are focusing on machine learning techniques for detecting frauds. This task, however, is not a simple task. There are mainly two reasons
• varying behaviour in committing fraud
• high level of imbalance in the dataset (the majority of normal or genuine cases largely outnumbers the number of fraudulent cases)
A predictive model usually tends to be biased towards the majority of samples, in an unbalanced dataset, when this dataset is provided as an input to a predictive model.
In this Thesis this problem is tackled by implementing a data-level approach where different resampling methods such as undersampling, oversampling, and hybrid strategies along with bagging and boosting algorithmic approaches have been applied to a highly skewed dataset with 492 idetified frauds out of 284,807 transactions.
Predictive modelling algorithms like Logistic Regression, Random Forest, and XGBoost have been implemented along with different resampling techniques to predict fraudulent transactions.
The performance of the predictive models was evaluated based on Receiver Operating CharacteristicArea under the curve (AUC-ROC), Precision Recall Area under the Curve (AUC-PR), Precision, Recall, F1 score metrics.
Drought is one of the most common and dangerous threats plants have to face, costing the global agricultural sector billions of dollars every year and leading to the loss of tons of harvest. Until people drastically reduce their consumption of animal products or cellular agriculture comes of age, more and more crops will need to be produced to sustain the ever growing human population. Even then, as more areas on earth are becoming prone to drought due to climate change, we may still have to find or breed plant varieties more suitable to grow and prosper in these changing environments.
Plants respond to drought stress with a complex interplay of hormones, transcription factors, and many other functional or regulatory proteins and mapping out this web of agents is no trivial task. In the last two to three decades or so, machine learning has become immensely popular and is increasingly used to find patterns in situations that are too complex for the human mind to overlook. Even though much of the hype is focused on the latest developments in deep learning, relatively simple methods often yield superior results, especially when data is limited and expensive to gather.
This Master Thesis, conducted at the IPK in Gatersleben, develops an approach for shedding light on the phenotypic and transcriptomic processes that occur when a plant is subjected to stress. It centers around a random forest feature selection algorithm and although it is used here to illuminate drought stress response in Arabidopsis thaliana, it can be applied to all kinds of stresses in all kinds of plants.
We present dimensionality reduction methods like autoencoders and t-SNE for visualization of high-dimensional data into a two-dimensional map. In this thesis, we initially implement basic and deep autoencoders using breast cancer and mushroom datasets. Next, we build another dimensionality reduction method t-SNE using the same datasets. The obtained visualization results of the datasets using the dimensionality reduction methods are documented in the experiments section of the thesis. The evaluation of classification and clustering for the dimensionality reduction techniques is also performed. The visualization and evaluation results of t-SNE are significantly better than the other dimensionality reduction techniques.
Mathematics Behind the Zcash
(2020)
Among all the new developed cryptocurrencies from Bitcoin, Zcash comes out to be the strongest cryptocurrency providing both transparency and anonymity to the transactions and its users by deploying the strong mathematics of zk-SNARKs.
We discussed the zero knowledge proofs which is a basic building block for providing the functionality to zk-SNARKs. It offers schnorr and sigma protocols with interactive and noninteractive versions. Non-interactive proofs are further used in Zcash transactions where the validation of sent transaction is proved by cryptographic proof.
Further, we deploy zk-SNARKs proofs following common reference string as public parameter when transaction is made. The proof allows sender to prove that she knows a secret for an instance such that the proof is succinct, can be verified very efficiently and does not leak the
secret. Non-malleability, small proofs and very effective verification make zk-SNARKs a classic tool in Zcash. Since we deal with NP problems therefore we have considered the elliptic curve cryptography to provide the same security like RSA but with smaller parameter size.
Lastly, we explain Zcash transaction process after minting the coin, the corresponding transaction completely hides the sender, receiver and amount of transaction using zero knowledge proof.
As future considerations, we talk about the improvements that can be done in term of decentralization, efficiency by comparing with top ranked cryptocurrencies namely Ethereum and Monero, privacy preserving against the thread of quantum computers and enhancements in shielded transactions.
Glycans play an important role in the intracellular interactions of pathogenic bacteria. Pathogenic bacteria possess binding proteins capable of recognizing certain sugar motifs on other cells, which are found in glycan structures. Artificial carbohydrate synthesis allows scientists to recreate those sugar motifs in a rational, precise, and pure form. However, due to the high specificity of sugar-binding proteins, known as lectins, to glycan structures, methods for identifying suitable binding agents need to be developed. To tackle this hurdle, the Fraunhofer Institute for Cell Therapy and Immunology (Fraunhofer IZI) and the Max-Planck Institute of Colloids and Interfaces (MPIKG) developed a binding assay for the high throughput testing of sugar motifs that are presented on modular scaffolds formed by the assembly of four DNA strands into simple, branched DNA nanostructures. The first generation of this assay was used in combination with bacteria that express a fluorescent protein as a proof-of-concept. Here, the assay was optimized to be used with bacteria not possessing a marker gene for a fluorescent protein by staining their genomic DNA with SYBR® Green. For the binding assay, DNA nanostructures were combined with artificially synthesized mannose polymers, typical targets for many lectins on the surface of bacteria, presenting them in a defined constellation to bind bacteria strongly due to multivalent cooperativity. The testing of multiple mannose polymers identified monomeric mannose with a 5’-carbon linker and 1,2-linked dimeric mannose with linker as the best binding candidates for E. coli, presumably due to binding with the FimH protein on the surface. Despite similarities between the FimH proteins of E. coli and K. pneumoniae, binding was only observed between E. coli and the different sugar molecules on DNA structures. Furthermore, the degree of free movement seemed to affect the binding of mannose polymers to targeted proteins, since when utilizing a more flexible DNA nanostructure, an increase in binding could be observed. An alternative to the simple DNA nanostructures described above is the use of larger, more complex DNA origami structures consisting of several hundred strands. DNA origami structures are capable of carrying dozens of modifications at the same time. The results for the DNA origami structure showed a successful functionalization with up to 71 1,2-linked dimeric mannose with linker molecules. These results point towards a solution for the high-throughput analysis of potential binding agents for pathogenic bacteria e.g. as an alternative treatment for antibiotic-resistant.
The emerging Internet of Things (IoT) technology interconnects billions of embedded devices with each other. These embedded devices are internet-enabled, which collect, share, and analyze data without any human interventions. The integration of IoT technology into the human environment, such as industries, agriculture, and health sectors, is expected to improve the way of life and businesses. The emerging technology possesses challenges and numerous
security threats. On these grounds, it is a must to strengthen the security of IoT technology to avoid any compromise, which affects human life. In contrast to implementing traditional cryptosystems on IoT devices, an elliptic curve cryptosystem (ECC) is used to meet the limited resources of the devices. ECC is an elliptic curve-based public-key cryptography which provides equivalent security with shorter key size compared to other cryptosystems such as Rivest–Shamir–Adleman (RSA). The security of an ECC hinges on the hardness to solve the elliptic curve discrete logarithm problem (ECDLP). ECC is faster and easier to implement and also consumes less power and bandwidth. ECC is incorporated in internationally recognized standards for lightweight applications due to the
benefits ECC provides.
Robust soft learning vector quantization (RSLVQ) is a probabilistic approach of Learning vector quantization (LVQ) algorithm. Basically, the RSLVQ approach describes its functionality with respect to Gaussian mixture model and its cost function is defined in terms of likelihood ratio. Our thesis work involves an approach of modifying standard RSLVQ with non-Gaussian density functions like logistic, lognormal, and Cauchy (referred as PLVQ). In this approach, we derive new update rules for prototypes using gradient of cost function with respect to non-Gaussian density functions. We also derive new learning rules for the model parameters like s and s, by differentiating the cost function with respect to parameters. The main goal of the thesis is to compare the performance results of PLVQ model with Gaussian-RSLVQ model. Therefore, the performance of these classification models have been tested on the Iris and Seeds dataset. To visualize the results of the classification models in an adequate way, the Principal component analysis (PCA) technique has been used.
A classical topic in the theory of random graphs is the probability of at least one isolated vertex in a given random graph. An isolated node has a huge impact on social networks which can be given by a random graph. We present a distribution on the number of isolated vertex using the probability generating function. We discuss the relationship between isolated edges and extended cut polynomials, extended matching polynomials using the principle of inclusion exclusion. We introduce an algorithm based on colored graphs for general graphs. We apply this to the components of a graph as well. Finally, we implement the idea on a special class of graphs like cycle, bipartite graph, path, and others. We discuss recursive procedure based on the analogous coloring rules for ladder and fan graphs.
Introducing natural adversarial observations to a Deep Reinforcement Learning agent for Atari Games
(2021)
Deep Learning methods are known to be vulnerable to adversarial attacks. Since Deep Reinforcement Learning agents are based on these methods, they are prone to tiny input data changes. Three methods for adversarial example generation will be introduced and applied to agents trained to play Atari games. The attacks target either single inputs or can be applied universally to all possible inputs of the agents. They were able to successfully shift the predictions towards a single action or to lower the agent’s confidence in certain actions, respectively. All proposed methods had a severe impact on the agent’s performance while producing invisible adversarial perturbations. Since natural-looking adversarial observations should be completely hidden from a human evaluator, the negative impact on the performance of the agents should additionally be undetectable. Several variants of the proposed methods were tested to fulfil all posed criteria. Overall, seven generated observations for two of three Atari games are classified as natural-looking adversarial observations.
We investigate the folding and thermodynamic stability of a tertiary contact of baker's yeast ribosomal ribonucleic acid (rRNA), which is supposed to be essential for the maturation process of ribosomes in eukaryotes at lower temperatures1. Ribosomes are cellular machines essential for all living organisms. RNA is at the center of these machines and responsible for translation of genetic information into proteins2,3. Only recently, the rRNA tertiary contact of interest was discovered in Zurich by the research group of Vikram Govind Panse. Gerhardy et al.1 showed in vitro that within the 60s-preribosome under defined metal ion concentrations the tertiary contact become visible between a GAAA-tetraloop and a kissing loop motif. Our aim is now to understand this RNA structure, especially the formation of the rRNA tertiary contact, in terms of thermodynamics and kinetics at various experimental conditions, such as temperature and metal ion concentration of K(I), Na(I) and Mg(II). Therein, we use optical spectroscopy like UV/VIS spectroscopy and ensemble Förster or Fluorescence Resonance Energy Transfer (FRET) folding studies. Our findings will help to further characterize this newly discovered ribosomal RNA contact and to elucidate its function within the ribosomal maturation process.
Several algorithms have been proposed for the testing of series-parallel graphs in linear time. We give our alternate algorithms for testing series-parallel graphs, their tree decompositions, and the independence number when the input is undirected biconnected series-parallel graphs, which run (approximately) linearly in polynomial time.
VQ-VAE is a successful generative model which can perform lossy compression. It combines deep learning with vector quantization to achieve a discrete compressed representation of the data. We explore using different vector quantization techniques with VQ-VAE, mainly neural gas and fuzzy c-means. Moreover, VQ-VAE consists of a non-differentiable discrete mapping which we will explore and propose changes to the original VQ-VAE loss to fit the alternative vector quantization techniques.
There are multiple ways to gain information about an individual and its health status, but an increasingly popular field in medicine has become the analysis of human breath, which carries a lot of information about metabolic processes within the individuals body. The information in exhaled breath consists of volatile (organic) compounds (VOCs). These VOCs are products of metabolic processes within the individuals body, thus might be an indicator for diseases disturbing those processes. The compounds are to be detected by mass-spectrometric (MS) or ion-mobility spectrometric (IMS) techniques, making the analysis of these compounds not only bounded to exhaled breath. The resulting data is spectral data, capturing concentrations of the VOCs indirectly through intensities. However, a number of about 3000 VOCs [1] could already be determined in human exhaled breath. The number of research paper about VOC-analysis and detection had risen nearly constantly over the last decade 1. Furthermore, the technique to identify VOCs could also be used to capture biomarker from alien species within the individuals body. Extracting VOCs from an individual can be done by non- or minimal invasive techniques. However, the manual identification of VOCs and biomarkers related to a certain disease or infection is not feasible due to the complexity of the sample and often unknown metabolic products, thus automized techniques are needed. [1–4] To establish breath analysis as a diagnosis tool, machine learning methodes could be used. Machine learning has become a popular and common technique when dealing with medical data, due to the rapid analysis. Taking this advantage, breath analysis using machine learning could become the model of choice for diagnosis, keeping in mind that conventional methodes are laboratory based and thus when trying detect bacterial infection need sometimes several days to identify the organism. [5]
In this work, a protocol for portable nanopore sequencing of DNA from pollen collected from honey bees, bumble bees, and wild bees was developed. DNA metabarcoding is applied to identify genera within the mixed DNA samples. The DNA extraction and ITS and ITS2 PCR parameters tested for this purpose were applied to the collected pollen sample and the amplicons were then decoded using the Flongle sequencer adapter from Oxford Nanopore Technologies. It is shown that the main pollinator resources at the different sites can be identified in percentage proportions. The protocol generated in this study can be used for further ecological questions.
Over the past few years, wind and solar power plants have increasingly contributed to energy production. However, due to fluctuating energy sources, the energy production data contain disruption. Such disrupted data lead to the wrong prediction performance, and they need to be estimated by other values. In this thesis, we provide a comparative study to estimate the online disrupted data based on the data of similar groups of power plants, We apply three estimation techniques, e.g., mean, interpolation, and k-nearest neighbor to estimate the disruption on training data. We then apply four clustering algorithms, e.g., k-means, neural gas, hierarchical agglomerative, and affinity propagation, with two similarity measures, e.g., euclidean and dynamic time warping to form groups of power plants and compare the results. Experimental results show that when KNN estimation is applied to data, and neural gas and agglomerative with dtw are used to cluster the data, the cluster quality scores and execution time give better results compared to others. Therefore, we conclude and choose KNN estimation to reconstruct the online disrupted data on each group of a similar power plants.
Purpose: The study is aimed to determine the Incentives for German SMEs to offshore their business activities in India and China.
Design: This study is based on quantitative approach. Primary and secondary data is being used in the study. The data was collected from individuals working in different SMEs in Germany, having relative offshoring experience. Theories from the articles, peer reviewed journal along with relevant books were consulted throughout the study.
Findings: The findingssuggest that the benefits and advantages of offshoring strategy in India and China are cost efficiency and technology. Moreover, the challenges that are being faced by the firms while executive offshoring strategy is cultural mix especially language/cultural barriers, security issues and loss of market performance.
Originality and Value: The study on incentives of German SMEs to offshore business activities in India and China enables me to understand why companies are interested in offshoring strategy in low cost countries for expanding their business while evaluating the challenges, merits and demerits of offshoring
In the past few years, social media has become the most popular communication software, replacing phone calls, text messages, television and even advertisements. Social media has become the most important channel for spreading opinions. As a result of this trend, many politicians have also started to operate social media (Wang, Tsai, & Chen 2019). This study was conducted in order to understand whether there was an intercandidate agenda-setting effect between the Facebook posts of legislative candidates and presidential candidates during the election period, and whether the legislative candidates' Facebook posts were influenced by the presidential candidates' Facebook posts. The target population of this study was the three presidential candidates in Taiwan's 2020 presidential election — Dr. Tsai Ing-Wen, Mr. Han Kuo-Yu, and Mr. James Soong — as well as the 36 legislative candidates in Taipei, Taichung, and Kaohsiung.
The study focused on Facebook posts from 1thNovember 2019 to 10th January 2020, 10 weeks before the voting day. Text-mining and cosine similarity were used to organize the posts and compare the similarity between posts. Finally, the similarity between posts was presented as a line graph.
The study revealed that there was an inter-candidate agenda-setting effect between legislative candidate posts and presidential candidate posts, and that Dr. Tsai Ing-Wen, who was also the incumbent president during the campaign, was the most influential Facebook poster during the entire election.
Future research is proposed on the inter-candidate agenda-setting effect only analyzing the similarity of posts among the candidates to discuss the influence of the candidates' Facebook agenda-setting during a specific election period.
This is the first study in which the Facebook posts of Taiwanese politicians are analyzed and the relationships were analyzed and the relationships were systematically compared, across multiple degrees, which opens up a whole new subject for future elections in Taiwan.
In machine learning, Learning Vector Quantization (LVQ) is well known as supervised vector quantization. LVQ has been studied to generate optimal reference vectors because of its simple and fast learning algorithm [2]. In many tasks of classification, different variants are considered while training a model and a consideration of variants of large margin in LVQ helps to get significant
results [20]. Large margin LVQ (LMLVQ) is to maximize the distance between decision hyperplane and data points. In this thesis, a comparison of different variants of Generalized Learning Vector Quantization (GLVQ) and Large margin in LVQ is proposed along with visualization, implementation and experimental results.
How Covid-19 impacts the workplace of knowledge workers in a pandemic and post pandemic world
(2021)
The following master thesis covers the topic workplace. The focus lies on the corona pandemic and how the pandemic has affected and will continue to affect the workplaces of knowledge workers. Therefore, the workplace as a research area has been described holistically, followed by the presentation of gathered secondary data and the conducted in depth interviews by the author. The presented secondary data and primary data are agreeing in the workplace how people know it will be changed after the pandemic. The most likely outcome is the hybrid workplace concept which mixes the home office, the office and alternatively third places. For these changes the companies have to be equipped and prepared. The meaning of the office will increase and has to be redesigned in order to meet the needs of the knowledge workers which are coming back to the office eventually.
Cryptorchidism is the most common disorder of sex development in dogs. It describes a failure of one or both testes to descend into the scrotum in due time. It is a heritable multifactorial disease. In this work, selected dogs of a german sheep poodle breed were sequenced with nanopore sequencing and subsequently examined for genetic variations correlating with cryptorchidism. The relationships of the studied dogs were also analyzed and visually processed.
The occurence of prostate cancer (PCa) has been consistently rising since three decades and remains the third leading cause of cancer-related deaths after lung and bowel cancer in Germany. Despite of new methods of early detection, such as prostate-specific antigen (PSA) testing, it persists to be the most common cancer in german men with over 63,400 new diagnoses in Germany every year and exhibits high prevalence in other countries of Northern andWestern Europe as well [64]. Men over the age of 70 are most commonly affected by the lethal disease, whereas an indisposition before 50 is rare. The malignant prostate tumor can be healed through operation or irradiation while the cancer hasn’t reached the stage of metastasis in which other therapeutic methods have to be employed [14] [15]. In the metastatic phase, the patient usually exhibits symptoms when the tumors size affects the urethra or the cancer spreads to other tissue, often the bones [16].
The high prevalence of this disease marks the importance of further research into prognosis and diagnosis methods, whereby identification of further biomarkers in PCa poses a major topic of scientific analysis. For this task, the effectiveness of high-throughput RNA sequencing of the transcriptome (RNA molecules of an organism or specific cell type) is frequently exploited [66]. RNA sequencing or RNA-Seq in short, offers the possibility of transcriptome assessment, enabling the identification of transcriptional aberrations in diseases as well as uncharacterized RNA species such as non-coding RNAs (ncRNAs) which remain undetected by conventional methods [49]. To alleviate interpretation of the sequenced reads they are assembled to reconstruct the transcriptome as close to the original state as possible, thus enabling rapid detection of relevant biomolecules in the data [49]. Transcriptomic studies often require highly accurate and complete gene annotations on the reference genome of the examined organism. However, most gene annotations and reference genomes are far from complete, containing a multitude of unidentified protein-coding and non-coding genes and transcripts. Therefore, refinement of reference genomes and annotations by inclusion of novel sequences, discovered in high quality transcriptome assemblies, is necessary [24].
Our current research aims to establish a complete ribonucleic acid (RNA) production line from plasmid design to purification of in vitro transcribed RNA and labeling of RNA. RNA is the central molecule within the central dogma of molecular biology and is involved in most essential processes within a cell[1]. In many cases, only compact three-dimensional structures of the respective RNA are able to fulfill their function. In this context, RNA tertiary contacts such as kissing loops and pseudoknots are essential to stabilize three-dimensional folding[2]. We will produce a tertiary contact consisting of a kissing loop and a GAAA tetraloop that occurs in eukaryotic ribosomal RNA[3,4]. The RNA sequence is integrated into a vector plasmid. Subsequently, the plasmid is amplified in E. coli. After following plasmid purification steps, the RNA sequence will be transcribed in vitro[5,6]. In order for the RNA be used for Förster resonance energy transfer (FRET) experiments at the single molecule level, fluorescent dyes must be coupled to the RNA molecule[7].
RNA tertiary contact interactions between RNA tetraloops and their receptors stabilize the folding of ribosomal RNA and support the maturation of the ribosome. Here we use FRET assisted structure prediction to develop structural models of two ribosomal tertiary contacts, one consisting of a kissing loop and a GAAA tetraloop and one consisting of the tetraloop receptor (TLR) and a GAAA tetraloop. We build bound and unbound states of the ribosomal contacts de novo, label the RNA in silico and compute FRET histograms based on MD simulations and accessible contact volume (ACV) calculations. The predicted mean FRET efficiency from molecular dynamics (MD) simulations and ACV determination show agreement for the KL-TLGAAA construct. The KL construct revealed too high FRET efficiency and artificial dye behavior, which requires further investigation of the model. In the case of the TLR, the importance of the correct dye and construct parameters in the modeling was shown, which also leads to a renewed modeling. This hybrid approach of experiment and simulation will promote the elucidation of dynamic RNA tertiary contacts and accelerate the discovery of novel RNA interactions as potential future drug targets.
Computationally solving eigenvalue problems is a central problem in numerical analysis and as such has been the subject of extensive study. In this thesis we present four different methods to compute eigenvalues, each with its own characteristics, strengths and weaknesses. After formally introducing the methods we use them in various numerical experiments to test speed of convergence, stability as well as performance when used to compute eigenfaces, denoise images and compute the eigenvector centrality measure of a graph.
In Machine Learning, Learning Vector Quantization(LVQ) is well known as supervised learning method. LVQ has been studied to generate optimal reference vectors because of its simple and fast learning algorithm [12]. In many tasks of classification, different variants of LVQ are considered while training a model. In this thesis, the two variants of LVQ, Generalized Matrix Learning Vector Quantization(GMLVQ) and Generalized Tangent Learning Vector Quantization(GTLVQ) have been discussed. And later, transfer learning technique for different variants of LVQ has been implemented, visualized and we have compared the results using different datasets.
This master thesis covers the topics of Studying customers’ behavior on the example of skin care brand Nivea. There are presented theoretical basis for the following research about marketing, customers’ behavior and conducting marketing research properly. Then, there is the analysis of German market. Since Nivea is the brand of Beiersdorf company, there is a description of Beiersdorf’s activity and operation work. The main idea of the paper work is to analyze customers’ behavior of Nivea. Therefore, the work contains huge research about the brand along with its’ micro- and macroenvironment. There also were conducted an in-depth interview and a survey to understand customers’
current needs. With all the results the author of the work proposed some ideas for Nivea brand.
Pollinating insects are of vital importance for the ecosystem and their drastic decline imposes severe consequences for the environment and humankind. The comprehension of their interaction networks is the first step in order to preserve these highly complex systems. For that purpose, the following study describes a protocol for the investigation of honey bee pollen samples from different agro-environmental areas by DNA extraction, PCR amplification and nanopore sequencing of the barcode regions rbcL and ITS. It was shown, that the most abundant species were classified consistently by both DNA barcodes, while species richness was enhanced by single-barcode detection of less abundant species. The analysis of the the different landscape variables exhibited a decline of species richness, Shannon diversity index, and species evenness with increasing organic crop area. However, sampling was only carried out in August and further investigations are suggested to display a more complete picture of honey bee foraging throughout the seasons.
This scientific work deals with the current opportunities of business development. Purpose of the work is study and analysis of the organization's development strategy and its development. The subject of the study is the mechanism of formation of an organization's development strategy, understanding of business development and its core methodologies and branches. This thesis is based on the operations of the real engineering company and main part of the research could be applied in reality. Main goal of the thesis is to find recommendations on the implementation of strategic changes organization's development strategy.
Noise in the oceans is a constantly increasing factor. The growing industrialisation due to shipping, offshore wind parks, seismic studies and other anthropogenic noise is putting the eco system under immense stress. The focus of this thesis is on the assessment of continuous underwater noise from ships. Based on existing strategies in air as well as underwater and a comparison of both an alternative strategy for the assessment of con-tinuous noise from ships is given. The concept developed is based on published, scien-tifically observed responses of animals to ship passes with an indication of an effect range. A model is created to describe the strategy using publicly available data for cargo ships as an example. The results are summarized in maps depicting the affected area for an MRU of the OSPAR II region and the MPA “Borkum Riffgrund”. The strategy is discussed and evaluated on the basis of these results. From this, further improvements and the need for additional information in publicly available data on vessel traffic are derived.
Applications and Potential Impacts of Blockchain Technology in Logistics and Supply Chain Areas
(2022)
The motive of the present thesis is to analyze the applications and potential impacts of blockchain technology in the logistics and supply chain areas. For this purpose, the literature from different sources has been used to analyze and get an overview of the current status and role of blockchain technology within the logistics and supply chain areas. Different use cases, as well as pilot projects from organizations all over the world and also from Germany, have been included. Suggestions for further applications and implementations of blockchain technology along with their potential impacts have been made. Additionally, the cost of implementing blockchain-based solutions and applications has been estimated along with providing recommendations and suggestions for important and key points to be considered before preparing and deciding to implement blockchain-based solutions in any organization.
Influenza A viruses are responsible for the outbreak of epidemics as well as pandemics worldwide. The surface protein neuraminidase of this virus is responsible, among other things, for the release of virions from the cell and is thus of interest in pharmacological research. The aim of this work is to gain knowledge about evolutionary changes in sequences of influenza A neuraminidase through different methods. First, EVcouplings is used with the goal of identifying evolutionary couplings within the protein sequences, but this analysis was unsuccessful. This is probably due to the great sequence length of neuraminidase. Second, the natural vector method will be used for sequence embedding purposes, in hopes to visualize sequential progression of the virus protein over time. Last, interpretable machine learning methods will be applied to examine if the data is classifiable by the different years and to gain information if the extracted information conform to the results from the EVcouplings analysis. Additionally to using the class label year, other labels such as groups or subtypes are used in classification with varying results. For balanced classes the machine learning models performed adequately, but this was not the case for imbalanced data. Groups and subtypes can be classified with a high accuracy, which was not the case for the years, continents or hosts. To identify the minimal number of features necessary for linear separation of neuraminidase group 1 subtypes, a logistic regression was performed at last, resulting in the identification of 15 combinations of nine amino acid frequencies. Since the sequence embedding as well as the machine learning methods did not show neuraminidase evolution over time, further research is necessary, for example with focus on one subtype with balanced data.
Digital data is rising day by day and so is the need for intelligent, automated data processing in daily life. In addition to this, in machine learning, a secure and accurate way to classify data is important. This holds utmost importance in certain fields, e.g. in medical data analysis. Moreover, in order to avoid severe consequences, the accuracy and reliability of the classification are equally important. So if the classification is not reliable, instead of accepting the wrongly classified data point, it is better to reject such a data point. This can be done with the help of some strategies by using them on top of a trained model or including them directly in the objective function of the desired training model. We discuss such strategies and analyze the results on data sets in this thesis.
In the past few years Generative models have become an interesting topic in the field of Machine Learning (ML). Variational Autoencoder (VAE) is one of the popular frameworks of generative models based on the work of D.P Kingma and M. Welling [6] [7]. As an alternative to VAE the authors in [12] proposed and implemented Information Theoretic Learning (ITL) based Autoencoder. VAE and ITL Autoencoder are a combination of the neural networks and probabilistic graphical models (PGM) [7]. In modern statistics it is difficult to compute the approximation ofthe probability densities. In this paper we make use of Variational Inference (VI) technique from machine learning that approximate the distributions through optimization. The closeness between the distributions are measured by the information theoretic divergence measures such as Kullbach-Liebler, Euclidean and Cauchy Schwarz divergences. In this thesis, we study theoretical and experimental results of two different frameworks of generative models which generate images of MNIST handwritten characters [8] and Yale face database B [3]. The results obtained show that the proposed VAE and ITL Autoencoder are capable of generating the underlying structure of the example datasets
Embeddings for Product Data
(2022)
The E-commerce industry has grown exponentially in the last decade, with giants like Amazon, eBay, Aliexpress, and Walmart selling billions of products. Machine learning techniques can be used within the e-commerce domain to improve the overall customer journey on a platform and increase sales. Product data, in specific, can be used for various applications, such as product similarity, clustering, recommendation, and price estimation. For data from these products to be used for such applications, we have to perform feature engineering. The idea is to transform these products into feature vectors before training a machine learning model on them. In this thesis, we propose an approach to create representations for heterogeneous product data from Unite’s platform in the form of structured tabular records. These tables consist of attributes having different information ranging from product-ids to long descriptions. Our model combines popular deep learning approaches used in natural language processing to create numerical representations, which contain mostly non-zeros elements in an array or matrix called as dense representation for all products. To evaluate the quality of these feature vectors, we validate how well the similarities between products are captured by these dense representations. The evaluations are further divided into two categories. The first category directly compares the similarities between individual products. On the other hand, the second category uses these dense vectors in any of the above- mentioned applications as inputs. It then evaluates the quality of these dense representation vectors based on the accuracy or performance of the defined application. As result, we explain the impact of different steps within our model on the quality of these learned representations.
Differentiation is ubiquitous in the field of mathematics and especially in the field of Machine learning for calculations in gradient-based models. Calculating gradients might be complex and require handling multiple variables. Supervised Learning Vector Quantization models, which are used for classification tasks, also use the Stochastic Gradient Descent method for optimizing their cost functions. There are various methods to calculate these gradients or derivatives, namely Manual Differentiation, Numeric Differentiation, Symbolic Differentiation, and Automatic Differentiation. In this thesis, we evaluate each of the methods mentioned earlier for calculating derivatives and also compare the use of these methods for the variants of Generalized Learning Vector Quantization algorithms.