@phdthesis{Marquardt2015, type = {Master Thesis}, author = {Andre Marquardt}, title = {Identification and characterization of mammalian signatures of viral adaptation : a computational approach}, url = {https://nbn-resolving.org/urn:nbn:de:bsz:mit1-opus4-67359}, year = {2015}, abstract = {Influenza viruses are single-stranded RNA (ssRNA) viruses, which are divided into three distinct genera: A,B and C. Their genome is divided into eight segments. Whilst the influenza types B and C evolve slowly, viruses of the type A evolve very fast, causing mild to severe infections, and are a constant harm for the human race. Beside the usual genetic mutations for altering the genetic information, the genomes content can randomly be altered by reassortment events. In this case a host cell has to be co-infected by two (or even more) influenza viruses, which emerge as a new virus containing segments of both (all co-infecting) ancestors. Beside this special case of reassortment events the influenza A virus already has a very high mutation rate. The reason of the fast genetic alteration and the resulting evasion of the host immune system, is a proofreading-lacking polymerase. Especially genetic alterations in one of the two major surface glycoproteins - Haemagglutinin (HA) - can have massive influence considering the ability of the virus to infect people. This protein shows preferred amino acids that are under extreme selective pressure. Additionally HA is of substantial importance for infecting host cells. Genetic alterations in this protein is one reason influenza A viruses are constantly able to evade the host immune system, because they are targeted by antibodies. Exogenous materials are specifically recognized by the host immune system and is very specific for some surface amino acids or their properties. Already little changes in sites known to be important for evading host responses can cause the evasion of the virus, because the binding and therefore the inactivation through antibodies is affected. The high mutation rate of the influenza A virus, especially in the HA protein causes the need for almost annual vaccinations. Changes in these preferred amino acids are involved in adaption to an increasingly immune population and are of major interest, because they provide the ability to reinfect the population. We want to establish a fully automated framework for data download and determining sites of proteins under selection, utilizing a user-friendly and user-individual input. Combining existing tools into a user-friendly pipeline will make it more easy for biologists to find sites under selection. In this work we introduce such a pipeline, called IPoSuS (Identify Patches of Sites under Selection) and use it for analysis of the influenza A virus protein HA. Furthermore, IPo- SuS can be applied onto every single dataset and protein with given sequences and according background data. Based on already existing datasets for evaluation we additionally tested new statistical approaches to find sites under positive selection, which makes it possible to not only use the gold standard, but also w-values, including more information than only counts of synonymous and non-synonymous mutations, to make the results more convincing and more factful. Using IPoSuS for the protein HA of different influenza A virus subtypes results into some new findings regarding host and subtype specificities. The obtained results favor one of the five approaches tested, namely the already established AdaPatch approach using a newly introduced counting scheme. But the results also confirm the possible usage of the approaches using w-ratios and w-values and the newly introduced statistical test. The only downside of these new approaches are the fewer amount of results, compared to the established and favored one.}, language = {de} }