Three-Dimensional Chromosome Organization in Eukaryotes: Novel Computational Approaches

2017-03-07T00:00:00Z (GMT) by Gamze Gursoy
DNA is the carrier of genetic information and is passed down from one generation to the next. With the completion of Human Genome Project, this genetic information is more accessible than ever before. Cells from different tissues carry the same genetic alphabet, but they exhibit different properties and perform different functions depending on the regulation of gene expression. We now know that the spatial organization of the genome occupies an important role in the regulation of gene expression as well as the repair, recombination and replication of DNA because of the information provided by recent experimental techniques based on fluorescence in-situ hybridization (FISH) and chromosome conformation capture (3C). However, understanding detailed mechanisms important for cellular activities from existing experimental data requires computational modeling of the 3D structures of chromatin. Here we describe a computational pipeline for constructing ensembles of chromatin chains from diverse experimental data. We have developed novel sampling tools to build spatial structures of chromosomes in an effort to understand the fundamental gene regulation mechanisms involving the effect of nuclear space in maintaining the epigenetic state of the cell, the physical factors and mechanisms that determine the genome organization, and long-distance chromatin interactions that promote cell-specific gene expression.   We first studied the effect of nuclear confinement on the folding landscape of human chromosomes. We developed a geometrical algorithm based on the Importance Sampling technique to generate ensembles of three-dimensional chromosome chains in the severe confinement of the cell nucleus. Our model, named Constrained Self-Avoiding Chromatin (C-SAC), showed how experimentally observed scaling properties of human chromosome folding, the formation of higher-order structural units such as Topologically Associated Domains (TADs), and the intrinsic propensity to form long-range chromatin loops emerge from the confinement of the cell nucleus. Our findings further highlight the importance of nuclear size as a potential regulator of epigenetic programming of cells.   The detailed understanding of different cellular states in mammalian cell differentiation require comprehensive analysis of the interactions in the whole genome. Thus, we next studied the origin of the chromatin interaction patterns of budding yeast genome observed by genome-wide 3C studies. With further improvement of multi-chromosome Constrained Self-Avoiding Chromatin (mC-SAC) model, we generated ensembles of model budding yeast genomes using nuclear landmarks observed with imaging techniques as constraints. Comparison of ensembles of folded chromosomes from mC-SAC model with those from genome-wide 3C studies shows that the majority of measured interactions are well captured (at an accuracy of 90%). We showed that nuclear confinement dictates the formation of intra-chromosomal interactions, while centromere tethering dictates the formation of inter-chromosomal interactions. Analysis of the model genomes showed a high propensity of double stranded DNA breaks to cluster in three-dimensional space. We further predicted novel chromatin interactions that are between tRNA genes and are enriched with transcription factors.   Identifying the difference in chromatin interactions between cell types is key to understanding phenotypical differences arising from cell-specific gene expression. Constructing 3D structures of a gene locus can help to obtain detailed structural understanding of promoter-enhancer interactions and how they may affect transcriptional machineries and regulate cellular epigenetic states. However, experimental data from 3C and related techniques are often sparse and incomplete due to systematic biases and challenges of the techniques. It is also challenging to distinguish biologically relevant interactions from non-specific collision of genomic elements in the nucleus. We further improved our sampling method to remove non-specific spatial interactions from the experimental measurements and incorporate remaining specific interactions in a polymer model to study the differential expression levels of genes in different cell states. Our computational modeling combined with the analysis of epigenetic profiling data provides insights that the differential expression of important genes is highly influenced by the folding landscape of chromatin and identifies novel chromatin interactions that were not captured by 3C data, but were shown to have biological importance by other independent studies.   Lastly, we exploited the structural hotspots of a gene locus that are responsible for important promoter-enhancer interactions to identify the minimum functional units of the locus. After successfully identifying enhancers of genes using ensemble of chromatin chains generated using Hi-C data, we were able to characterize the structure of transcriptional units in single chain level. Our results showed that when genes share a common enhancer, the expression of the genes may not be at play simultaneously. We further used our sampling technique to create virtual mutations that perturb the native chromatin interactions of a locus. This identified structural hotspots that are highly conserved and enriched with CTCF/cohesin binding, and that might be responsible for the mechanism of bringing distant enhancers and promoters together. Our method provides a powerful tool for deciphering the structural units of chromatin that are mapped to important sequence specific properties such as conservation or CTCF binding and exploring their impact on gene regulation.