Prediction of protein subcelullar location

In general, proteins perform their functions in concrete sub-cellular locations (e.g., cytoplasmic, nuclear, extra-cellular). Each of these locations has different physio-chemical properties. We proposed that proteins adapt to their environment by mutations on the residues exposed to the media not closely implicated in function [1]. To test our hypothesis we analysed the correlation between the properties of exposed amino acids of proteins and the subcellular location where the protein occurs. We suggested the use of this information for the prediction of protein sub-cellular location in the absence of known targeting signals, or sequence similarity.

More recently, we studied the predictive power of amino acid composition at variable ranges from buried to exposed for protein subcellular location and found that in fact, both buried and exposed amino acids carry complementary information about location [2]. An optimized two step predictor trained with vectors of amino acid composition in different ranges of exposure and using a Support Vector Machine followed by a Neural Network ([NYCE]), reaches an accuracy of 62% when predicting nuclear, nucleocytoplasmic, cytoplasmic or extracellular location of eukaryotic proteins.

Many outer membrane proteins (OMPs) are known for Escherichia coli (more than 60 in 2008) but by 2008 only two had been identified for mycobacteria. Assuming a series of properties for these proteins (for example, being beta-barrels, or having a signal peptide for export) allowed us to select 144 Mycobacterium tuberculosis proteins as possible OMPs [3]. Two of them (Rv1698 and Rv1973) were experimentally verified, suggesting that our assumption is reasonable and paving the way to the identification of other mycobacterial OMPs. This method was optimized by changing several parameter thresholds and adding a sliding window sequence analysis, taking into account seven completely sequenced bacterial genomes including M. tuberculosis [4]. The results were optimized for accuracy of detection of know OMPs (a very small number at the moment) and for the production of coherent results in families of similar mycobacterial sequences.


[1] Andrade, M.A., S.I. O'Donoghue and B. Rost. 1998. Adaptation of protein surfaces to subcellular location. J. Mol. Biol. 276, 517-525 [html version + web appendix (B. Rost)] location_cover

[2] Mer, A.S. and M.A. Andrade-Navarro. 2013. A novel approach for protein subcellular location prediction using amino acid exposure. BMC Bioinformatics. 14, 342. [NYCE]

[3] Song, H., Sandie, R., Y. Wang, S. Sukumaran, M.A. Andrade-Navarro and M. Niederweis. 2008. Identification of outer membrane proteins of Mycobacterium tuberculosis. Tuberculosis. 88, 526-544.

[4] Mah, N., C. Perez-Iratxeta, M.A. Andrade-Navarro. 2010. Outer membrane pore protein prediction in mycobacteria using genomic comparison. Microbiology.156, 2506-2515.