Representation of protein properties | Computational Biology and Data Mining

Profile hidden Markov models (profile HMMs) are widely used to do sensitive database searching using statistical descriptions of a sequence alignment. The package HMMER implements the algorithm and is freely available to biologists.

We developed a web server (NAIL - Network analysis interface for linking the HMMER results) [1] which processed the HMMER results. The results were presented in a web-linked format, whereby the user was able to easily get an overview about the sequences' organisms and select the interesting sequences for further analysis including making multiple alignments of selected sequences or sequence similarity searches in EMBL-local servers. The NAIL server was available at EMBL from 2000 to 2014,

We developed PhyloView [2], a program designed to provide phylogenetic colouring of protein derived phylogenetic trees. The interpretation of protein-sequence phylogenetic trees requires the examination of the taxonomic properties of the species associated to the sequences used to build the tree. PhyloView facilitates this by attaching taxonomic information to the nodes of a tree (taken from sequence databases), and by interactively colouring the branches of a tree according to any combination of taxonomic divisions. Users can submit phylogenetic trees in Newick (New Hampshire) format, which is used by most phylogenetic packages (e.g., ClustalW). The tree should contain SwissProt, SpTrembl, or GenBank GI protein identifiers in the leaf node names.

To facilitate the study of regions of amino acid biased composition in the context of a protein alignment, we have developed BiasViz [3], a tool that uses as input a protein alignment and then displays the local frequency of a group of amino acids along the aligned sequences. The user can modify the window of analysis and the set of amino acids, so that it is easy to explore different parameters.

Most protein features are related to sequence positions. However, protein function is carried by folded proteins. Therefore, the visualization of protein features in structures is usually necessary to understand protein function. We developed PDBpaint [4], a tool that takes as input protein structures in PDB format and allows colouring parts of the structure according to annotations taken from web servers or provided by the user. A particularly interesting use of our tool is in evaluating structural models: the proper location of sequence motifs (e.g. predicted HEAT repeats) might validate a model.

We collaborated in the development of the tool CellMap, which allows visualizing a network of protein-protein interactions over the image of a cell to highlight their different subcellular localizations [5]. A more recent publication describes step-by-step examples of how to use CellMap to visualize interactions between a set of proteins or all the interactors of a protein [6].

While sequence conservation in protein families mostly reflects the phylogenetic tree of the family, there are protein families with functions that evolve independently of the phylogeny. One such example is the evolution of fish anti-freeze proteins from C-type 4 lectins, which has happened in multiple species following independent events of gene duplication and mutation. In these cases, it is possible to find particular amino acids whose conservation reflects those functional aspects of the family independent of the phylogeny. To analyse multiple sequence alignments for such type of "non-phylogenetic" sequence conservation we developed MAGA [7]. This tool uses as input a multiple sequence alignment, for which the user can define groups of sequences according to non-phylogenetic groups, in order to observe amino acid conservation for each group.

[1] Sánchez-Pulido, L., Y.P. Yuan, M.A. Andrade and P. Bork. 2000. NAIL - Network analysis interface for linking the HMMER results. Bioinformatics. 54, 185-244.

[2] Palidwor, G., E.G. Reynaud, and M.A. Andrade-Navarro. 2006. Taxonomic colouring of phylogenetic trees of protein sequences. BMC Bioinformatics. 7, 79. [PhyloView]

[3] Huska, M.R., H. Buschmann and M.A. Andrade-Navarro. 2007. BiasViz: Visualization of amino acid biased regions in protein alignments. Bioinformatics. 23, 3093-3094. [BiasViz]

[4] Fournier, D. and M.A. Andrade-Navarro. 2011. PDBpaint, a visualization webservice to tag protein structures with sequence annotations. Bioinformatics. 27, 2605-2606. [PDBpaint]

[5] Dallago, C., T. Goldberg, M.A. Andrade-Navarro, G. Alanis-Lobato and B. Rost. 2018. CellMap visualizes protein-protein interactions and subcellular localization. F1000Research. 6, 1824. [CellMap]

[6] Dallago, C., T. Goldberg, M.A. Andrade-Navarro, G. Alanis-Lobato and B. Rost. 2020. Visualizing human protein-protein interactions and subcellular localizations on cell images through CellMap. Current Protocols in Bioinformatics. 69, e97. [CellMap]

[7] Mier, P. and M.A. Andrade-Navarro. 2020. MAGA: a supervised method to detect motifs from annotated groups in alignments. Evolutionary Bioinformatics. 20, 59. [MAGA]