Transcript regulation

miRNA target prediction

We hypothesized that there are groups of genes that need to be repressed together and that this repression can occur in different cell types or states, both at the transcriptional level and at the post-transcriptional level, by a transcriptional repressor and a miRNA, respectively. We proved this assumption by observing significant enrichment of miRNA targets for particular miRNAs in the genes targetted by the neural repressor REST as derived from multiple ChIP-seq experiments [1]. The set of such miRNA targets overlapping REST targets was enriched in experimentally verified targets and the set of miRNAs was significantly associated with the suppression of glioblastoma, which is a neural related tumour. We have made the method available as a web tool (mBISON, [2]).

Function of upstream open reading frames in transcript regulation

Upstream open reading frames (uORFs) are (generally) short open reading frames that appear in the 5'UTRs of many transcripts. uORFs may interfere with the protein expression of a transcript but activatory effects have also been described.

We developed uORFdb, a database about uORFs in eukaryotic organisms that allows to query a collection of selected literature on uORF biology curated and annotated by experts [3].

Tyrosine kinases (TKs) are often upregulated or constitutively active in cancer. We investigated if mutations producing loss of uORFs in human TKs could result in abnormally hight levels of TK translation leading to cancer. For this we characterized and investigated uORFs in all human genes and in particular in a set of 140 human TKs. The start of these uORFs (uAUGs) are significantly more conserved than the surrounding sequence (both in the TK set and in total set of human genes), Removal of the start codon of the uORF in all TKs studied resulted in the increased translation of the transcript as observed using a luciferase reporter plasmid. Removal of the stop codon of uORFs decreased translation. This indicates that mutations affecting uORFs could cause abnormal expression and cancer [4].

To investigate if uORF mutations occur in human cancer, we used a multiplexed approach to sequence more than 400 uORF translation initiation sites in 132 potential oncogenes in a set of 308 human cancer samples finding mutations that affected uORFs of EPHB1 in two samples derived from breast and colon cancer, and of MAP2K6 in a sample of colon adenocarcinoma, which resulted in higher transcription of the corresponding mRNAs [5]. Computational analysis of whole exome sequencing datasets of 464 colon adenocarcinomas revealed another 53 somatic mutations deleting 22 uORF initiation and 31 uORF termination codons, respectively. These results support the existence of mutations in uORFs that result in alterations of transcript translation rates and cancer.

Analysis of Chromatin Immunoprecipitation (ChIP)-seq data

Chromatin immunoprecipitation followed by sequencing is used to detect DNA regions bound to a protein target by cross-linking proteins to bound DNA, breaking the unbound DNA (e.g. by sonication), extracting the protein target using an antibody, and then separating and sequencing the bound DNA fragments.

A modification of ChIP-seq was developed (Tagmentation-Assisted Fragmentation ChIP) to allow its application to small cell populations [6]. Tagmentation, the random fragmentation of DNA using the transposase Tn5, is used instead of sonication, resulting in a protocol with fewer steps, which prevents loss of material and reduces technical variability. The results of this new technique (e.g. on samples of 100 cells) compare well with those obtained with the classical ChIP-seq or with other variations specific for low cell input.

ChIP-seq results are often obtained with replicates. It is therefore possible to defined variable occupancy target regions (VOTs) in genomes, by looking for regions producing not fully replicated binding for multiple targets. We developed a protocol to detect cell-specific VOTs using ChIP-seq data with replicates for the same cell type and for a few different transcription factors [7]. Application of the method to human cell lines K562, GM12878, HepG2, MCF-7, and in mouse embryonic stem cells (mESCs), found VOTs that are CG dinucleotide rich, and are enriched at promoters and R-loops. Conservation mESC VOTs in placental organisms and enrichment near DNA-binding genes suggest that VOTs reflect functional regions with highly dynamic interactions, possibly feedback loops of the gene regulatory network involved in development.

References

[1] Gebhardt, M.L., S. Reuter, R. Mrowka, and M.A. Andrade-Navarro. 2014. Similarity in targets with REST points to neural and glioblastoma related miRNAs. Nucleic Acids Research. 42, 5436-5446.

[2] Gebhardt, M.L., A.S. Mer and M.A. Andrade-Navarro. 2015. mBISON: Finding miRNA target over-representation in gene lists from ChIP-sequencing data. BMC Research Notes. 8, 157. [mBISON]

[3] Wethmar, K., A. Barbosa da Silva, M.A. Andrade-Navarro and A. Leutz. 2013. uORFdb – a comprehensive literature database on eukaryotic uORF biology. Nucleic Acids Research. 42, D60-D67.

[4] Wethmar, K., J. Schultz, E.M. Muro, S. Talyan, M.A. Andrade-Navarro and A. Leutz. 2016. Comprehensive translational control of tyrosine kinase expression by upstream open reading frames (uORFs). Oncogene. 35, 1736-1742.

[5] Schulz, J., N. Mah, M. Neuenschwander, T. Kischka, R. Ratei, P.M. Schlag, E. Castaños-Vélez, I. Fichtner, P.U. Tunn, C. Denkert, O. Klaas, W.E. Berdel, J.P. von Kries, W. Makalowski, M.A. Andrade-Navarro, A. Leutz and K. Wethmar. 2017. Loss-of-function uORF mutations in human malignancies. Scientific Reports. 8, 2395.

[6] Akhtar, J., P. More, S. Albrecht, F. Marini, W. Kaiser, A. Kulkarni, L. Wojnowski, J.F. Fontaine, M.A. Andrade-Navarro, M. Silies, C. Berger. 2019. TAF-ChIP: An ultra-low input approach for genome wide chromatin immunoprecipitation assay. Life Science Alliance. 2, e201900318.

[7] Andreani, T., S. Albrecht, J.F. Fontaine and M.A. Andrade-Navarro. 2020. Computational identification of cell-specific variable regions in ChIP-seq data. Nucleic Acids Research. 48, e53.