Genome 3D structure

Topologically associating domains (TADs) are genomic self-interacting regions containing multiple genes. To investigate their function and evolution we have studied the position of pairs of paralog genes with respect to TADs [1]. We observed significantly more pairs within TADs than expected. Since most paralog gene pairs are formed by tandem duplication we propose that there is selective pressure to keep paralogs in the same TAD. Paralogs can have related functions and might require common regulatory mechanisms. Our results support that TADs may provide such mechanisms. We also found that paralog pairs within TADs have a bias to have fewer contacts than similar pairs of genes; their coded proteins also interact less than expected. Our interpretation of these results is that there is a population of paralog pairs within TADs that code for subunits that replace each other in complexes and thus need to be expressed in an exclusive manner.

We provided further evidence of the functional importance of TADs by interpreting the pathological effects of chromosomal abnormalities in non-coding regions of 17 subjects in terms of the 3D structure of the genome [2]. The individuals were selected for balanced chromosomal abnormalities (translocations and inversions) apparently not affecting coding genes, but suffering from abnormal developmental and cognitive phenotypes. Many of these rearrangement breakpoints disrupt TADs. We used known chromatin contact information to predict the genes whose expression could be disrupted by the rearrangements and computed similarity between the phenotypes of affected individuals and annotated phenotypes of genes close to the rearrangement breakpoints. This resulted in novel associations of genes to developmental diseases and provided computational evidence of a pathological mechanism by which structural variants disrupt 3D genome architecture and thus gene regulation.

Yet another way to study the importance of TADs is the analysis of their resilience to genomic rearrangements along evolution. We compared the human genome to other genomes and observed that regions that can be aligned have significantly borders that coincide to those of TADs [3]. In fact, sometimes TADs are rearranged differently in different organisms, but then this leads to modifications of the patterns of expression of the genes concerned. We deduced this from observations that the pattern of gene expression across tissues of a gene is more similar in mouse and human if the gene is in a conserved TAD.

There are different sequencing techniques available to measure accessible chromatin accessibility. Interpreting the results computationally using peak calling algorithms is currently very sensitive to parameter settings. We have developed a method to predict chromatin accesibility from transcriptomics data, which can be used to complement the chromatin accessibility assays [4]. The method was trained using public datasets of transcriptomics and DNase-seq data, and can be used to predict chromatin accessibility or to optimize the peak calling algorithms. Regarding the fuction of genes within TADs, we observed that genes in TADs with fewer genes are more often associated to disease [5]. Together with other observations, including that TADs with higher ratios of enhancers to genes also have more disease associated genes, suggests that larger TADs accomodating complex regulatory networks (more genes and more shared enhancers) increase the robustness of the gene regulatory network, supporting the role of TADs in gene regulation.

We developed a method (7C = Computational Chromosome Conformation Capture by Correlation of ChIP-seq at CTCF motifs) to predict chromosomal contacts based on a repurposing of ChIP-seq data. ChIP-seq reports genomic regions that interact with proteins. In human and other species, the CTCF protein interacts with genomic DNA and by dimerization creates a loop. These contacts bring proteins close to two far away DNA positions in sequence, which can be detected as two separate, symmetrical peaks in the ChIP-seq of the corresponding protein [6]. In combination with the detection of CTCF binding motifs, we have used these signals to predict the formation of such loops. The observation that several proteins allow this type of prediction suggest their involvement in complexes at CTCF-regulated contacts.

We participated in a benchmark of methods evaluating the data from ATAC-Seq applied to single cell samples [7]. Assay for Transposase Accessible Chromatin using sequencing (ATAC-Seq) is a sequencing technology that reports chromatin accessible regions in a genome. Its application to single cells is challenging due to the low peak detection. Out of ten methods evaluated in real and synthetic datasets, SnapATAC, Cusanovich2018 and cisTopic performed best. The fact that SnapATAC was the only method that could analyse a dataset of more than 80K cells indicates that memory requirements is an important issue posed by single cell datasets.



[1] Ibn-Salem, J., E.M. Muro and M.A. Andrade-Navarro. 2017. Co-regulation of paralog genes in the three-dimensional chromatin architecture. Nucleic Acids Research. 45, 81-91.

[2] Zepeda-Mendoza, C.J., J. Ibn-Salem, T. Kammin, D.J. Harris, C. Redin, H. Brand. D. Rita, K.W. Gripp, J.J. Mackenzie, A. Gropman, B. Graham, R. Shaheen, F.S. Alkuraya, C.K. Brasington, E.J. Spence, D. Masser-Frye, L.M. Bird, E. Spiegel, R.L. Sparkes, Z. Ordulu, M.E. Talkowski, M.A. Andrade-Navarro, P.N. Robinson, C.C. Morton. 2017. Computational prediction of position effects of apparently balanced human chromosome rearrangements. Am. J. Hum. Genetics. 101, 206-217.

[3] Krefting, J., M.A. Andrade-Navarro and J. Ibn-Salem. 2018. Evolutionary stability of topologically associating domains is associated with conserved gene regulation. BMC Biology. 16, 87.

[4] Jung, S., V. Espinosa Angarica, L. Dutan Polit, M.A. Andrade-Navarro, N.J. Buckley, A. del Sol. 2017. Prediction of chromatin accessibility in gene-regulatory regions from transcriptomics data. Scientific Reports. 7, 4660.

[5] Muro, E.M., J. Ibn-Salem and M.A. Andrade-Navarro. 2019. The distributions of protein coding genes within chromatin domains in relation to human disease. Epigenetics and Chromatin. 12, 72.

[6] Ibn-Salem, J.I. and M.A. Andrade-Navarro. 2019. Computational Chromosome Conformation Capture by Correlation of ChIP-seq at CTCF motifs. BMC Genomics. 20, 777.

[7] Chen, H., C. Lareau, T. Andreani, M.E. Vinyard, S.P. Garcia, K. Clement, M.A. Andrade-Navarro, J.D. Buenrostro and L. Pinello. 2019. Assessment of computational methods for the analysis of single-cell ATAC-seq data. Genome Biology. 20, 241.