A multi-institutional team of investigators have developed a catalog of transcription factor binding sites that regulate gene expression across the genome, according to a recent study published in Genome Research.
The comprehensive dataset can improve the understanding of the underlying causes of different cancers and developmental disorders, according to Bridget Lear, PhD, research associate professor of Biochemistry and Molecular Genetics and a co-author of the study.
In the current study, the scientists aimed to define parts of the genome that are involved in gene regulation and understanding how interactions between DNA and proteins influence this regulation in relation to disease.
The current study is the culmination of the modENCODE (model organism Encyclopedia of DNA Elements) and modERN (model organism Encyclopedia of Regulatory Networks) projects, in which the investigators utilized both fruit fly and C. elegans roundworm model organisms to identify regions of the genome that are involved in gene regulation and understanding how interactions between DNA and proteins influence this regulation.
“Researchers at Northwestern had the opportunity to contribute to the later stages of the modERN project beginning in 2020. Our team led the efforts to complete the catalog of transcription factor binding sites in the fruit fly Drosophila,” Lear said.
First, Lear’s team helped to create tagged transgenic strains for more than 90 percent of known transcription factors in the fruit fly genome; collaborators at Yale generated tagged transgenic strains for more than 60 percent of known transcription factors in C. elegans.
This allowed the scientists to apply uniform ChIP-sequencing approach to identify binding sites for each transcription factor, according to Lear.
“For each transgenic strain, we used an antibody that recognized the protein tag in order to isolate the transcription factor protein itself as well as all of the chromatin regions associated with that protein,” Lear said.
Next, using next-generation sequencing methods, the investigators sequenced all the DNA bound by each transcription factor, creating a comprehensive data set comprising 605 transcription factors identifying 3.6 million genome sites in the fruit fly models and 356 transcription factors identifying 0.9 genome million sites in the roundworm models.
Furthermore, the dataset serves as a resource that can guide future studies about transcription factor function and identifying new transcription factor binding sites and relationships, Lear said.
“This study indicates that most genes are likely regulated by multiple transcription factors and suggests that complex interactions between these factors influence the patterns and levels of gene expression. The establishment of this comprehensive dataset represents an important step towards understanding how genes are regulated,” Lear study.
This work was supported by the National Institutes of Health grants U41HG007355 and R01GM76655.