Protein interaction networks | Computational Biology and Data Mining

Two main topics in the study of protein-protein interaction (PPI) networks are their annotation and their involvement in disease: these topics are inter-related since better annotations allow better understanding of how the disruption of the PPI network results in disease [1].

Methods and databases

The generation of high-throughput data for gene expression, PPIs, chromatin immunoprecipitation, etc. often produces large lists of results for which alternative experimental scoring schemes can be proposed but that can be evaluated with just a few positives, for example, already known PPIs. To facilitate the evaluation of alternative scoring schemes in such a situation we have implemented QiSampler, a web tool that uses as input a list of items (e.g. genes or PPIs), multiple scores, and a definition of positives [2]. The method uses a repetitive sampling strategy and outputs ROC graphs to make it easy to select the most discriminant scoring and a cut-off, and it is available both as web tool and as an R script.

We have generated a PPI network focused on cellular signalling using yeast-two-hybrid to find the protein partners of 450 signalling related proteins [3]. From these data we built a directed interaction network connecting receptors to transcription factors. We used this network to study the dynamics of protein phosphorylation upon EGF/Erk activation, and to predict modulators of EGF/Erk signaling.

PPI databases are resources that can be used for the study of cellular pathways and networks. However, PPI data is experimentally obtained by methods of variable reliability. To allow the study and filtering of the human PPI network, we created HIPPIE, a database that integrates PPI data from several databases and datasets and uses a score that accounts for the amount and quality of experimental information attached to each PPI [4].

In a previous study [5] we compared HIPPIE and five other databases of human PPI data (HPRD, INTACT, MINT, BIOGRID, and STRING) with emphasis in the coverage and network topological properties of their datasets. The addition of tissue information to the PPI data, currently not considered in these databases, improved the biological relevance of the data. We pointed to proteins that are subject to intense study but for which high confidence PPI data hardly exist. In a separate study we noted that these different frequencies of study generate biases in the PPI network that have to be accounted for, for example, making proteins involved in cancer to apparently have more interacting partners [6].

We show that measured interactions where the interacting proteins are expressed in the same tissue, or cellular location, and involved in similar processes are more reliable. We also show that interactions connecting paths between particular types of proteins (i.e. receptors and transcription factors) reflect biological pathways. These properties are useful to focus on relevant PPI subnetworks; we implemented mechanisms to do this in HIPPIE and illustrated how to apply this to biological cases (human proteins interacting with influenza virus and phosphorylation dependant pathways of aggregation in Alzheimer's) [7]. The HIPPIE scores were also used for the development of the SynSys database, which specializes in the interactions, structures, drugs and pathways of synaptic proteins [8].

We have developed a new method to evaluate results of Y2H screens with cDNA microarrays that provides more quantitative results and controls for autoactivation than classic Y2H [9]. Basically, the Y2H procedure uses yeast strains that carry a bait protein, which is tested against a library of prey proteins. Bait and prey are introduced as DNA plasmids. Protein interaction between bait and prey results in the activation of a HIS3 reporter gene, which allows the growth of the strain and the detection of the interaction. Instead of using strain growth as readout, our method detects the PCR-amplified plasmid DNA used for the expression of the prey protein by hybridization to a cDNA microarray. Application of this procedure to samples where the bait is lacking gives a readout of unspecific autoactivation, which is used to reduce Y2H false positives. We tested the system with Huntingtin and wild-type and mutant forms of ataxin-1, found novel interactors of both proteins and were able to describe how the mutation in ataxin-1 modifies its network of interactions.

We developed a method to predict activating or inhibiting effect of PPIs in a high-throughput fashion by comparing the phenotypes produced in a siRNA knockdown screen [10]. Basically, similarity of phenotypes upon knockdown of interacting partners is taken as evidence of an activatory interaction, whereas different phentoypic effects are taken as evidence of inhibiting effect. We applied this approach to cellular images from a genome-wide siRNA screen in HeLa cells to predict effects for 1,954 PPIs.

The algorithm explained above, and others for the evaluation of the PPI networks, were included for use in the HIPPIE v2.0 [11], which contains more than 270,000 scored interactions between human proteins. Given the importance of mouse as a model for human disease, we chose this as second species to apply the structure developed for the HIPPIE database (MIPPIE, [12]).

The topologies of many networks, including the protein interaction network, can be modeled using hyperbolic geometry, a space whose mathematical properties naturally lead to the emergence of network scale invariance and strong clustering. Such networks can be generated by adding new nodes and connecting them according to two variables: node popularity and node similarity. To create such a network in hyperbolic space, nodes can appear sequentially at random positions in a disk with a radius proportional to the number of nodes already generated and then they can be connected if they are close (modeling similarity). Older nodes in this model are closer to the center of the disk and to the rest of the nodes in the network, thus attracting more connections (modeling popularity). We calculated the analytical expression of the probability distribution of distances between nodes in this model of network formation [13]. This expression can be used to guide the generation of scale-free and strongly clustered topologies.

Mapping an existing network in a hyperbolic space is useful to facilitate the study of some networks. We proposed a method to do this using the spectral decomposition of the network's Laplacian, which is accurate and computationally more efficient than other methods (Laplacian-based Network Embedding, LaBNE; [14]). We show how the method can be used for the prediction of links and to study network evolution. We combined this approach with maximum likelihood estimation (using HyperMap, a method for hyperbolic network embedding that is slower but more accurate) [15]. Evaluation of this hybrid approach (LaBNE+HM) indicates good results especially in heterogeneous, highly clustered networks. We applied this method to map the human protein interaction network to a hyperbolic space and observed that many geometric properties of the map correspond to biological properties: distance to likelihood to interact, radial distance to protein age, and angular coordinates to function [16]. This suggests that the map has predictive power. This map can be explored using an associated web server (GAPI).

It has been suggested that the co-evolution of proteins that interact should allow to detect their interaction. We demonstrated a simplified hypothesis based on this idea, that is, if two proteins interact and in some organisms they have coiled coils, there will be a tendency for those coiled coils to emerge along evolution in a coordinated manner [17]. This would be due to these coiled coils facilitating the PPI. We observe this effect as a bias and expect that it might give a predictive capability for PPIs that will increase with the further availability of proteomes.

Function of polyQ in protein interaction

Expanded runs of consecutive trinucleotide CAG repeats encoding polyglutamine (polyQ) stretches are observed in the genes of patients with different genetic diseases such as Huntington's disease (HD) and several Ataxias. These repeats are thought to trigger disease. However, polyQ tracts are a normal feature of many human proteins, suggesting that they have an important cellular function. We integrated genomic, phylogenetic, protein interaction network and functional information to add evidence that polyQ tracts in proteins modulate protein interactions, most likely through structural changes whereby the polyQ sequence extends a neighbouring coiled-coil region to facilitate its interaction with a coiled-coil region in another protein [18]. Alteration of this important biological function due to polyQ expansion results in gain of abnormal interactions, leading to pathological effects like protein aggregation.

We characterized proteins that interact with ataxin-1, which has a polyQ tract whose abnormal expansion results in disease. Interactors that enhance the toxicity of mutant ataxin-1 are enriched in coiled-coil regions. We studied how the coiled-coil region of ataxin-1 interactor MED15 has an effect in polyQ dependent ataxin-1 aggregation [19]. In a review we put these results and our analyses of polyQ in protein interaction networks in the context of genetic screens of enhancers of mutant polyQ protein toxicity to propose that while interactions of coiled-coil proteins with polyQ regions might increase aggregation, blocking this interaction, or interacting with other regions, might reduce aggregates and neurodegeneration [20]. The over-expression of a gene coding for an ataxin-1 protein with an expanded tract of 82 glutamines (ATXN1(Q82)) in human mesenchymal stem cells (MSCs) that are resistant to the early cytotoxic effects of the protein, allows the characterization of insoluble aggregates and of the resulting expression changes [21]. The perturbed protein interaction network affects the assembly of the ribosome resulting in dysregulation of cerebellum protein synthesis. In a follow-up study, we compared the expression changes caused by ATXN1(Q82) in MSCs (in vitro model) with those occurring in the cerebellum of SCA1 B05 mice (expressing human ATXN1(Q82)), to find a link between the mechanisms altered in cells and the systemic changes observed in an in vivo model [22]. We identified common upregulation of extracellular matrix organization genes and identified temporal effects, for example, constant dysregulation of complexes involved in ion transport, dysregulation a a middle time-point of protein synthesis and oxidative phosphorylation processes, and a late dysregulation of several signaling pathways that could be due to loss of neuronal subtypes.

Overexpression of mutant ATXN1(Q82) in human neuroblastoma SH-SY5Y cells produces intranuclear inclusion bodies (IIBs), which are a good model for the pathogenic effects of polyQ expansion in SCA1. We isolated them and characterized their molecular content. They were enriched in gene transcripts that may dysregulate the ribosome, producing pathogenic changes in the proteome [23]. Sequestration of these transcripts on polyQ IBBs could contribute to SCA1 pathogenesis.

We selected CRMP1 in an approach that combined gene expression and PPI data to discover brain-specific proteins, downregulated in the brain of HD patients, that could interact with Huntingtin and reduce its aggregation in disease [24]. CRMP1 was found to interact with aggregation prone mutant Huntingtin fragments and reduce the formation of aggregates for this and ATXN1 and TARDBP proteins, which spontaneously aggregate. A mutant of CRMP1, D408V, had a much reduced anti-aggregation effect, possibly due to the removal of a salt bridge, which affects its capacity to oligomerize.

Evaluation of protein interaction networks

We evaluated interactions that were obtained using a methodology designed to increase the accuracy in the detection of interactions for RNA-binding ubiquitin ligases (RBULs). As a particular type of RNA binding proteins, RBULs have a tendency to form many non-specific interactions when cells are broken in the process of conventional SILAC-based affinity purification experiments (SILAC-AP). In this adapted approach, light and heavy isotope bait-marked cells are first combined and then lysed (as opposed to being first lysed and then combined) [25]. The idea is that interactions forming before lysis will have a high SILAC ratio versus background, whereas interactions that form after lysis will not. The interactions detected for five RBULs were computationally evaluated and compared to those that would be obtained by classical SILAC-AP: the new technique identified interactors with functions more similar to the baits, which suggested better quality results. The set of interactors contains a significant amount of components of the ubiquitin conjugation machinery and of proteins with RNA-metabolism related functions.

More recently, we evaluated the entire human protein interaction network (hPIN) to analyse the properties of disease modules (DMs), sets of proteins associated to a single disease that agglomerate in the network because they tend to interact with common components (explaining why they associate to the same disease) [26]. For this we used the mapping of the hPIN in the hyperbolic plane described above. We observe that indeed DMs form ensembles with significantly large connectivity and short distances (both in the network and hyperbolic) comparing to groups of proteins of similar size and connectivity but not related to the same disease. The hyperbolic mapping offers alternative ways to evaluate the functions of proteins in DMs and provides a metric to measure distance between two DMs. The latter replicates the manual classification of the diseases, so that, for example, mental disorders or neoplasms cluster together and in opposite ends of a map of diseases. This mapping also allows to discover faulty proteins, those that often drop a signal when a path jumping from a protein to the neighbour that is close to the target (in hyperbolic distance) is searched to connect a receptor to a transcription factor in the hPIN. We observed that targets of FDA approved drugs are significantly enirched in faulty proteins. These results support the use of the hyperbolic mapping of the hPIN to gain biological insight.

To study pathways involved in protein aggregation common to multiple neural diseases (NDs) we obtained putative protein interactions for about 500 human proteins involved in ND using yeast-two-hybrid (Y2H). These proteins were selected according to known ND causing mutations and to their relation with studies of protein aggregates in the brain according to the bibliography [27]. The resulting network linked more than 5000 human proteins and was used to identify novel proteins that influenced the effect of mutations in TDP-43, HTT and ARF-GEP100. The network could be analysed to produce disease-specific modules. Proteins such as ataxin-1 and MLK1, predicted to be relevant to Alzheimer's disease (AD), were for the first time identified as aggregated in brains of persons with AD.

To find potential modifiers of HD, we examined the network of HTT interactors within the human protein interaction network mapped in hyperbolic space. To look for strong modifiers with functions relevant to HD pathology, as opposed to interactors that modify polyQ aggregates by their mere interaction with HTT, we selected paralog pairs among HTT interactors (expected to interact similarly with HTT) and then focused on those 49 pairs where the paralogs located in different regions of the mapped network (expected to form part of different pathways and processes) [28]. Three of these pairs considered proteins with opposite effects on HD models. The three negative effectors interact with PPP2CA and TUBB, known negative factors in HD, as well as with HSP90AA1 and RPS3, while the three positive effectors interact with HSPA9. We discuss these potential HD modifiers and use these examples to illustrate how to study the dynamic aspects of paralog evolution using the hyperbolic mapping of the protein interaction network.

We generated the hyperbolic mapping of the protein interaction network of a resistant strain of Mycobacterium tuberculosis (MTB XDR1219), using an interolog approach from MTB. We analysed known drug targets (DTs) to identify close pairs in the map. We used the map to explain mechanisms of drug action, drug synergy and drug resistance. Also to identify possible effective drug combinations and novel DTs [29].

References

[1] Petrakis, S. and M.A. Andrade-Navarro. 2016. Editorial: Protein interaction networks in health and disease. Frontiers in Genetics. 7, 111. [Frontiers eBook]

[2] Fontaine, J.F., B. Suter, and M. A. Andrade-Navarro. 2011. QiSampler: evaluation of scoring schemes for high-throughput datasets using a repetitive sampling strategy on gold standards. BMC Research Notes. 4, 57 [QiSampler]

[3] Vinayagam, A., U. Stelzl, R. Foulle, S. Plassmann, M. Zenkner, J. Timm, H.E. Assmus, M.A Andrade-Navarro, E.E. Wanker. 2011. A directed protein interaction network for investigating intracellular signal transduction. Science Signaling. 4, rs8.

[4] Schaefer, M.H., J.F. Fontaine, A. Vinayagam, P. Porras, E.E. Wanker and M.A. Andrade-Navarro. 2012. HIPPIE: integrating protein interaction networks with experiment based quality scores. PLoS One. 7, e31826. [HIPPIE]

[5] Lopes, T.J., M. Schaefer, J. Schoemaker, Y. Matsuoka, J.F. Fontaine, G. Neumann, M.A. Andrade-Navarro, Y. Kawaoka and H. Kitano. 2011. Tissue-specific subnetworks and characteristics of publicly available human protein interaction databases. Bioinformatics. 27, 2414-2421.

[6] Schaefer, M.H., L. Serrano and M.A. Andrade-Navarro. 2015. Correcting for the study bias associated with protein-protein interaction measurements reveals differences between protein degree distributions in different cancer types. Frontiers in Genetics. 6, 260.

[7] Schaefer, M.H., T.J.S. Lopes, N. Mah, J.E. Shoemaker, Y. Matsuoka, J.F. Fontaine, C. Louis-Jeune, A.J. Eisfeld, G. Neumann, C. Perez-Iratxeta, Y. Kawaoka, H. Kitano, M.A. Andrade-Navarro. 2013. Adding protein context to the human protein-protein interaction network to reveal meaningful interactions. PLoS Comp Biol. 9, e1002860. [HIPPIE]

[8] von Eichborn, J., M. Dunkel, B. Gohlke, S. Preissner, M. Hoffmann, J. Bauer, J. Armstrong, M.H. Schaefer, M.A. Andrade-Navarro, N. LeNovere, M. Croning, S. Grant, P. van Nierop, A. Smit, R. Preissner. 2013. SynSysNet: integration of experimental data on synaptic protein-protein interactions with drug-target relations. Nucleic Acids Research. 41, D834-D840. [synsys]

[9] Suter, B., J.F. Fontaine, R. Yildrimann, T. Raskó, M.H. Schaefer, A. Rasche, P. Porras, B.M. Vázquez-Álvarez, J. Russ, K. Rau, R. Foulle, M. Zenkner, K. Saar, R. Herwig, M.A. Andrade-Navarro, E.E. Wanker. 2013. Development and application of a DNA microarray-based yeast two-hybrid system. Nucleic Acids Research. 41, 1496-1507.

[10] Suratanee, A., M.H. Schaefer, M.J. Betts, Z. Soons, H. Mannsperger, N. Harder, M. Oswald, M. Gipp, E. Ramminger, G. Marcus, R. Männer, K. Rohr, E.E. Wanker, R.B. Russell, M.A. Andrade-Navarro, R. Eils, R. König. 2014. Characterizing protein interactions employing a genome-wide siRNA cellular phenotyping screen. PLoS Comp. Biol. 10, e1003814.

[11] Alanis-Lobato, G., M.A. Andrade-Navarro, M. Schaefer. 2017. HIPPIE v2.0: enhancing meaningfulness and reliability of protein-protein interaction networks. Nucleic Acids Research. 45, D408-D414. [HIPPIE]

[12] Alanis-Lobato, G., J.S. Möllmann, M.H. Schaefer and M.A. Andrade-Navarro. 2020. MIPPIE: the mouse integrated protein-protein interaction reference. Database. 2020, baaa035. [MIPPIE]

[13] Alanis-Lobato, G, and M.A. Andrade-Navarro. 2016. Distance distribution between complex network nodes in hyperbolic space. Complex Systems. 25, 223-236.

[14] Alanis-Lobato, G.A., P. Mier and M.A. Andrade-Navarro. 2016. Efficient embedding of complex networks to hyperbolic space via the spectral decomposition of their Laplacian. Scientific Reports. 6, 30108.

[15] Alanis-Lobato, G., P. Mier, M.A. Andrade-Navarro. 2016. Manifold learning and maximum likelihood estimation for hyperbolic network embedding. Applied Network Science. 1, 10.

[16] Alanis-Lobato, G., P. Mier and M.A. Andrade-Navarro. 2018. The latent geometry of the human protein interaction network. Bioinformatics. 34, 2826-2834. [GAPI]

[17] Mier, P., G. Alanis-Lobato and M.A. Andrade-Navarro. 2017. Protein-protein interactions can be predicted using coiled coil co-evolution patterns. Journal of Theoretical Biology. 412:198-203.

[18] Schaefer, M.H., E.E. Wanker and M.A. Andrade-Navarro. 2012. Evolution and function of CAG/polyglutamine repeats in protein-protein interaction networks. Nucleic Acids Research. 40, 4273-4287.

[19] Petrakis, S., T. Rasko, J. Russ, R.P. Friedrich, M. Stroedicke, S.P. Riechers, K. Muehlenberg, A. Moeller, A. Reinhardt, A. Vinayagam, M.H. Schaefer, M. Boutros, H. Tricoire, M.A. Andrade-Navarro and E.E. Wanker. 2012. Identification of human proteins that modify misfolding and proteotoxicity of pathogenic ataxin-1. PLoS Genetics. 8, e1002897.

[20] Petrakis, S., M.H. Schaefer, E.E. Wanker, M.A. Andrade-Navarro. 2013. Aggregation of polyQ-extended proteins is promoted by interaction with their natural coiled coil partners. Bioessays. 35, 503-507.

[21] Laidou, S., G. Alanis-Lobato, J. Pribyl, T. Raskó, B. Tichy, K. Mikulasek, M. Tsagiopoulou, J. Oppelt, G. Kastrinaki, M. Lefaki, M. Singh, A. Zink, N. Chondrogianni, F. Psomopoulos, A. Prigione, Z. Ivics, S. Pospisilova, P. Skladal, Z. Izsvak, M.A. Andrade-Navarro and S. Petrakis. 2020. Nuclear inclusions of pathogenic ataxin-1 induce oxidative stress and perturb the protein synthesis machinery. Redox Biology. 32, 101458.

[22] Vagiona, A.C., S. Sgardelis, M.A. Andrade-Navarro, F. Psomopoulos and S. Petrakis. 2020. Dynamics of a protein interaction network associated to the aggregation of polyQ-expanded ataxin-1. Genes. 11, E1129.

[23] Gkekas, I., A.C. Vagiona, N. Pechlivanis, G. Kastrinaki, K. Pliatsika, S. Iben, K. Xanthopoulos, F.E. Psomopoulos, M.A. Andrade-Navarro and S. Petrakis. 2023. Intranuclear inclusions of polyQ-expanded ATXN1 sequester RNA molecules. Frontiers in Mol. Neurosci. 16, 1280546.

[24] Stroedicke, M., Y. Bounab, N. Strempel, S. Yigit, R.P. Friedrich, G. Chaurasia, S. Li, F. Hesse, S.P. Riechers, J. Russ, C. Nicoletti, C. Haenig, S. Schnoegl, D. Fournier, R.K. Graham, M.R. Hayden, S. Sigrist, G.P. Bates, J. Priller, M.A. Andrade-Navarro, M.E. Futschik and E.E. Wanker. 2015. Systematic interaction network filtering identifies CRMP1 as a novel suppressor of huntingtin misfolding and neurotoxicity. Genome Research. 25, 701-713.

[25] Hildebrandt, A., G. Alanis-Lobato, K. Zarnack, M.A. Andrade-Navarro, P. Beli and J. König. 2017. Interaction profiling of RNA-binding ubiquitin ligases reveals a link between posttranscriptional regulation and the ubiquitin system. Scientific Reports. 7, 16582.

[26] Härtner, F., M.A. Andrade-Navarro and G. Alanis-Lobato. 2018. Geometric characterisation of disease modules. Applied Network Science. 3, 10.

[27] Haenig, C., N. Atias, A.K. Taylor, A. Mazza, M.H. Schaefer, J. Russ, S.P. Riechers, S. Jain, M. Coughlin, J.F. Fontaine, B.D. Freibaum, L. Brusendorf, M. Zenkner, M. Stroedicke, S. Schnoegl, K. Arnsburg, A. Boeddrich, P. Heutink, J.P. Taylor, J. Kirstein, M.A. Andrade-Navarro, R. Sharan, E.E. Wanker. 2020. Interactome mapping provides a network of neurodegenerative disease proteins and uncovers widespread protein aggregation in affected brains. Cell Reports. 32, 108050.

[28] Vagiona, A.C., P. Mier, S. Petrakis and M.A. Andrade-Navarro. 2022. Analysis of Huntington’s disease modifiers using the hyperbolic mapping of the protein interaction network. Int. J. Mol. Sci. 23, 5853.

[29] Zahra N., A.C. Vagiona, R. Uddin and M.A. Andrade-Navarro. 2023. Selection of multi-drug targets against drug-resistant Mycobacterium tuberculosis XDR1219 using the hyperbolic mapping of the protein interaction network. Int. J. Mol. Sci. 24, 14050.