Protein motif and composition analysis

Conservation of tyrosine phosphorylation sites

In an approach to evaluate human protein kinases in a high-throughput fashion, we used yeast, which lacks a system for the phosphorylation of protein tyrosines, as an in vivo model system [1]. Expressing individual non-receptor tyrosine kinases (NRTKs) in yeast leads to the tyrosine phosphorylations of yeast proteins. This reproduces NRTK activity in human cells on the corresponding conserved orthologs. We used conservation analysis of putative sites of tyrosine phosphorylation (constrasting orthologs in organisms with NRTKs versus those without) to evaluate and select these candidates. Network analysis shows that individual NRTKs phosphorylate proteins that interact with each other, suggesting that motifs for the recognition of protein phosphorylation sites play a lesser role than previously thought. We predict relations between NRTKs and more than 3500 human target proteins.

Comparison of amino acid composition between protein domains and linkers

While it is well known that there is variability of amino acid composition in proteins across taxa, these differences in terms of protein domains and linkers have been less studied. We studied these using 38 proteomes [2]. The usage of polar residues in linkers and hydrophobic residues in globular domains was observed as expected. Focusing on particular types of domains can be more insightful. For example, while Arg usage in DNA-binding domains is high, their surrounding linkers are enriched in Ser, which are often target of phosphorylation in disordered regions. We created an R script to facilitate and visualize these analyses (RACCOON).

Properties of low complexity regions

We have reviewed the definition of low complexity regions (LCRs) in sequences [3]. We focused particularly in the methods to measure LCRs, and in the relation of LCRs to composition bias, repeats, disorder and structure. For example, while compositional bias tends to be associated to LCRs and disorder, short repeats, which are compositionally biased, can induce structure. We use a series of examples to illustrate these overlapping aspects. In this respect, we developed a method and an associated web tool to provide a visualization of the "repeatability" of a protein sequence (RES, [4]). We defined this for a window as the fraction of residues that do not need to be changed for the sequence to be composed of perfect repeats. Application of the method to complete proteomes suggests intriguing differences between species regarding the repeatability of their sequences, e.g. a depletion in repeats of odd lengths in Saccharomyces cerevisiae and a few oher species, and a large number of repeats of length 2 and 7 in Danio rerio and Arabidopsis thaliana, respectively.

We contributed to the creation of the first meta web-server for the analysis of low complexity regions in protein sequences (PlaToLoCo, [5]).



[1] Corwin, T., J. Woodsmith, F. Apelt, J.F. Fontaine, D. Meierhofer, J. Helmuth, A. Grossmann, M.A. Andrade-Navarro, B.A. Ballif and U. Stelzl. 2017. Defining human tyrosine kinase phosphorylation networks using yeast as an in vivo model substrate. Cell Systems. 5, 128-139.

[2] Brüne, D., M.A. Andrade-Navarro and P. Mier. 2018. Proteome-wide comparison between the amino acid composition of protein domains and linkers. BMC Research Notes. 11, 117.

[3] Mier, P., L. Paladin, S. Tamana, S. Petrosian, B. Hajdu-Soltész, A. Urbanek, A. Gruca, D. Plewczynski, M. Grynberg, P. Bernadó, Z. Gáspári, C. Ouzounis, V.J. Promponas, A.V. Kajava, J.M. Hancock, S. Tosatto, Z. Dosztanyi, and M.A. Andrade-Navarro. 2020. Disentangling the complexity of low complexity proteins. Briefings in Bioinformatics. 21, 458-472.

[4] Kamel, M., P. Mier, A. Tari and M.A. Andrade-Navarro. 2019. Repeatability in protein sequences. Journal of Structural Biology. 208, 86-91. [RES]

[5] Jarnot, P., J. Ziemska-Legiecka, L. Dobson, M. Merski, P. Mier, M.A. Andrade-Navarro, J.M. Hancock, Z. Dosztányi, L. Paladin, M. Necci, D. Piovesan, S.C.E. Tosatto, V.J. Promponas, M. Grynberg and A. Gruca. 2020. PlaToLoCo: the first web meta-server for visualization and annotation of low complexity regions in proteins. Nucleic Acids Research. In press. [PlaToLoCo]