I am not a virgin anymore! Don’t worry, I am not talking about my romantic life it’s that I have been recently told that the article I had been working in during my master’s thesis -around 2018-, has been officially published in RNA Biology under the title ‘Xrn1 influence on gene transcription results from the combination of general effects on elongating RNA pol II and gene-specific chromatin configuration’, being the first journal paper in which I appear in the authors list. Nevertheless, I am not going to explain this article, since I worked in the MNase-seq data analysis and a few more tasks, but… what is MNase-seq?
What does MNase-seq mean?
MNase-seq is a technique used to study the nucleosome positioning across the genome. The word is the contraction of Micrococcal Nuclease digestion followed by high-throughput sequencing and here one would say ‘this is even more confusing than just MNase-seq!’, but don’t worry, we are going to descipher it part by part.
The word MNase-seq has two parts:
- MNase, which stands for Micrococcal nuclease
- seq, which stands for sequencing.
What is an MNase?
As it has been said, an MNase is a nuclease, an enzyme that cleaves the phosphodiester bonds between the nucleotides in a DNA -or RNA- sequence. More concretely the MNase digests the ‘accessible’ DNA and, the DNA bound by nucleosomes -protein complexes where the DNA loops, causing the DNA packaging into chromatin- or other DNA-bound proteins is not digested and remains intact, making the MNase perfect to study the nucleosome localization.
A normal MNase-seq experiment would consists in the following steps (some steps may be ommited and others may change between different protocols):
- Cross-linking: fix the unions between DNA and nucleosomes using formaldehyde.
- MNase digestion: incubation with MNase to degrade the DNA, except that DNA linked to the nucleosomes.
- Protein degradation
- DNA extraction: extraction of DNA. It could be done using phenol-chloroform-isoamyl extraction, used to separate nucleic acids from proteins and lipids, and ethanol precipitation, used to purify/concentrate the DNA.
- Mononucleosome purification: the DNA extracted in the DNA extraction step is separated by size in an agarose gel electrophoresis and the bands corresponding to mononuclosome size are then cut and purified to obtain the DNA that loops to nucleosomes.
However, the MNase has some bias, showing certain preference for some sequences than others and, to have correct results in the nucleosome positioning, this bias must be corrected with a so-called naked DNA: DNA that has undergone the same steps than the studied chromatin (DNA + nucleosomes), with the exception of the cross-linking step, resulting in DNA without nuclesomes.
Then, after MNase digestion of chromatin and naked DNA, other steps have to be carried out to map the nuclesomes. One of this extra steps is sequencing.
What is sequencing?
Although it can have other meanings and we can technically sequence several types of molecules, most of the times in biology, the word sequencing is related to determine the order of the nucleotides (‘the bricks’) of a fragment of DNA (‘the wall’), which can be a simple fragment, a gene, a chromosome or even the whole genome.
In the old days, these procedures were done manually or in a poorly automated way which enabled to sequence only short fragments of DNA and, with a lot of effort to put these fragments in order, chromosomes and even genomes could be sequenced. These short fragments of DNA are called reads, because the sequencing machine kind of reads them.
Nowadays, we still sequence short reads that range from a few bases to kilobases, but we do it in a highly automated and fast way, allowing us to sequence millions of these reads in a short period of time. For this reason we talk about high-throuhput sequencing.
Here is when bioinformatics come to play, since the length of the individual reads are usually not enough to study a functional piece of DNA (i.e. a gene, the whole genome…) and they have to be mapped to an already known genome or ordered to build a genome de novo or from scratch. Fortunately, the computational resources have improved along with the sequencing technologies, allowing use to easily handle this enormous ammounts of data.
Moreover, new exciting technologies are emerging which will allow to sequence longer reads of up to megabases (millions of bases).
MNase-seq data processing
After sequencing the DNA that comes from the MNase digestion, the data processing and analysis is the following one:
- Quality control: the reads have to undergo a quality control to assure that they have good quality.
- Adaptor trimming: to sequence the DNA, little known sequences of DNA are linked to the fragments and they must be removed. After this step, an extra quality control is performed.
- Genome indexing: in MNase-seq experiments, you should have a reference genome -an already built genome from the species in which you are working on-. This genome must be indexed (like a book) to allow a fast mapping of the reads to the genome.
- Read mapping: after indexing the genome, the reads are mapped. This is somekind of playing to a more sophisticate Where’s Wally? with the reads and their position: you have the sequence of the read and the mapping software looks for the matching sequence in the genome.
The steps above are quite common in other sequencing experiments (i.e. ChIP-seq, …), but after this point, the protocols may differ. In my case, I analsed the mapped reads using a software called DANPOS, a toolkit specifically designed for the study of nucleosomes, that also allow us to subtract the naked DNA information from the ‘normal DNA’ and correct the MNAse bias.
DANPOS has sevaral functions that allow us to map nucleosome positions in the genome, and calculate the nucleosome occupancy (how much a nucleosome binds in a determined localization) as well as compute the nucleosome profiling (how nucleosomes are distributed) in genomic features like transcription starting sites (TSS), transcription ending sites (TES), gene bodies or enhancers. Along with the nucleosome localization, DANPOS allows to calculate fuzziness, which comes from ‘same’ nucleosomes that map in slightly different positions in different cells.
After running DANPOS, we must do the following analyses. Fortunately, DANPOS does most of the things for us and, if we give it gene (or other features) lists with their coordinates, it calculates the nucleosome occupancy and fuzziness acorss the features and the upstream and downstream regions at a single-nucleotide level. This allows us to build beautiful profiles on the desired genomic features.
In my case, I made the profiles using Excel and then I performed some statistical analyses using PAST but now I would use R instead, since DANPOS generates R files to analyze the data.
References
Begley V et al (2020). Xrn1 influence on gene transcription results from the combination of general effects on elongating RNA pol II and gene-specific chromatin configuration. RNA Biology. doi: 10.1080/15476286.2020.1845504.
Chen K et al (2012). DANPOS: Dynamic Analysis of Nucleosome Position and Occupancy by Sequencing. Genome Research. doi:10.1101/gr.142067.112
Any doubts?
I hope you have enjoyed this post and, if you have any doubts don’t hesitate to contact me or leave a comment.