Job offer | Computational Biology and Data Mining

We have an opening for a PhD student.

PhD project proposal

The evolution of protein complexity

Proteins emerge and change in evolution in many ways [1]. One of the driving forces for the evolution of proteins is the increase in organismic complexity, for example due to the emergence of compartments [2] or multicellularity. The evolution of the protein repertoire happens as an answer to evolutionary challenges.

The availability of many complete proteomes gives us now an unprecedented chance to use the protein database to infer mechanisms of evolution by analysis of proteins arranged in families [3-4]. A question that we want to address in relation to protein evolution is if and how proteins gain complexity as organisms gain complexity.

Finding answers to this question can give insights into the function and evolution of protein globular domains, but also on other parts of protein sequences that are non-globular and more difficult to investigate. We already found that linkers between domains [5] and homorepeats [6-8] evolve to provide protein interactions and sites of post-translational modifications. In this project, we will try to quantify in further detail, which functional requirements lead to which types of protein increase in complexity.

The project could expand to include and combine other levels of complexity, like gene duplication, alternative splicing, enhancers, protein-protein interaction, etc.

Requirements:

Programming skills are necessary; experience with bioinformatics and biological knowledge will be valued.

References:

[1] Chothia C, Gough J, Vogel C, Teichmann SA. Evolution of the protein repertoire. Science. 2003 Jun 13;300(5626):1701-3.

[2] Mier P, Pérez-Pulido AJ, Reynaud EG, Andrade-Navarro MA. Reading the Evolution of Compartmentalization in the Ribosome Assembly Toolbox: The YRG Protein Family. PLoS One. 2017 Jan 10;12(1):e0169750. doi: 10.1371/journal.pone.0169750. eCollection 2017.

[3] Mier P, Andrade-Navarro MA. Toward completion of the Earth's proteome: an update a decade later. Brief Bioinform. 2017 Oct 12. doi: 10.1093/bib/bbx127. [Epub ahead of print]

[4] Mier P, Andrade-Navarro MA. FastaHerder2: Four Ways to Research Protein Function and Evolution with Clustering and Clustered Databases. J Comput Biol. 2016 Apr;23(4):270-8. doi: 10.1089/cmb.2015.0191. Epub 2016 Feb 1.

[5] Brüne D, Andrade-Navarro MA, Mier P. Proteome-wide comparison between the amino acid composition of domains and linkers. BMC Res Notes. 2018 Feb 9;11(1):117. doi: 10.1186/s13104-018-3221-0.

[6] Mier P, Alanis-Lobato G, Andrade-Navarro MA. Context characterization of amino acid homorepeats using evolution, position, and order. Proteins. 2017 Apr;85(4):709-719. doi: 10.1002/prot.25250. Epub 2017 Feb 6.

[7] Mier P, Andrade-Navarro MA. dAPE: a web server to detect homorepeats and follow their evolution. Bioinformatics. 2017 Apr 15;33(8):1221-1223. doi: 10.1093/bioinformatics/btw790.

[8] Mier P, Andrade-Navarro MA. Glutamine Codon Usage and polyQ Evolution in Primates Depend on the Q Stretch Length. Genome Biol Evol. 2018 Mar 1;10(3):816-825. doi: 10.1093/gbe/evy046.

Start date: June 1st 2018

Contact: andrade@uni-mainz.de

https://cbdm.uni-mainz.de/