REVIEW
Molecular biology applications of the red king crab duplex-specific nuclease
1 Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Pirogov Russian National Research Medical University, Moscow, Russia
2 Kulakov National Medical Research Center for Obstetrics, Gynecology and Perinatology, Moscow, Russia
Correspondence should be addressed: Denis V. Rebrikov
Oparina, 4, Moscow, 117997, Russia; moc.liamg@4pigacn
Funding: the study was supported by the grant no. 075-15-2019-1789, Center for High Precision Genomic Editing and Genetic Technologies for Biomedicine.
Author contribution: DA Shagin — preparation of the manuscript; DV Rebrikov — editing of the manuscript.
Crab duplex-specific nuclease (DSN) from hepatopancreas of the red king crab was firstly characterized in 2002 [1]. The enzyme with a molecular weight of 41.5 kDa consists of 407 amino acid residues (Genbank AAN86143) and exerts a unique set of functional properties [2, 3]:
DSN exhibits maximum activity at pH 6.6 at 60–65 °C;
DSN remains active after heating at 90°C or incubation at pH within the range of 4–12;
DSN is Mn2+, Co2+ and Mg2+ dependent;
DSN is resistant to proteases (including proteinase K and papain);
DSN cleaves only double-stranded DNA, leaving single strands intact;
DSN shows negligible activity towards RNA of any secondary structure, while effectively cleaving the DNA chain in DNA-RNA hybrids.
The unique properties of DSN, which is also an important object of fundamental research in the field of nuclease evolution [4], inspires creation of molecular protocols on their basis.
This review considers diverse applications of DSN in modern molecular biology.
Single nucleotide polymorphisms (SNP) genotyping
SNP genotyping is used in the diagnosis of genetic predispositions, pharmacogenetics, forensic science, molecular genealogy, population genetics, and other research areas [5–9]. An SNP genotyping protocol known as duplexspecific nuclease preference (DSNP) approach is based on the unique property of DSN to cleave perfect (i.e. fully matched) short double-stranded DNA substrates with much higher efficiency than their imperfect analogs [10].
SNP genotyping by DSNP-analysis requires two specific 10-mer oligonucleotides with a fluorophore at the 5'-end and a quencher at the 3'-end — the so-called FRET-labeled probes (FRET, fluorescence resonance energy transfer). One of the probes corresponds to wild-type allele, the other one corresponds to variant allele. In the case of perfect duplex formed between the probe and the target DNA, the probe is hydrolyzed by DSN, which results in the fluorophore- quencher uncoupling and emission of fluorescence at a specific wavelength. In the absence of probe hydrolysis, no fluorescence is emitted.
Before DSNP analysis, the studied polymorphic region must be amplified to a high concentration by polymerase chain reaction (PCR) with specific primers. The crude (unpurified) PCR product is mixed with the probes and incubated with DSN at 60°C for 5–10 min. During the incubation, the DNA substrate becomes amplified due to combined activity of DSN and thermostable DNA polymerase; the latter is introduced into the reaction mixture as a component of the crude PCR product used as a template. DSN cleaves double-stranded DNA producing fragments that can serve as primers for DNA polymerase. At the same time, due to the hydrolysis of amplicons, short DNA fragments are formed that are capable of efficient hybridization with the signaling probes. At the final step of the analysis, the reaction mixture is incubated at 30–35°C, which ensures hybridization of the probes with the target DNA and the emission of fluorescence due to DSN activity.
The optimal length of PCR products for DSNP-analysis was tested empirically: fragments of various lengths containing C- or T-variants of the human mitochondrial COX1 C7028T were hybridized with a T-specific probe. Clear and unambiguous results were obtained for all tested products, proving the possibility of using DSNP analysis for a wide range of amplicon lengths.
Diverse approaches for SNP genotyping have been proposed, based on the difference in physicochemical properties of the variants [11, 12]. With regard to other published protocols, DSNP has several advantages, starting from its overall convenience (the use of crude PCR product, no cleanups/centrifugations, 5 min hands-on and results within 1 hour). Secondly, the protocol allows analysis of both alleles simultaneously in one tube. Thirdly, the specific fluorescence can be recorded using standard laboratory equipment. Finally, the protocol is applicable for virtually any length of the PCR product harboring the polymorphic position.
The study of multiple allelic variants in one tube implies the use of probes with fluorophores that emit at different wavelengths. Effects of different fluorophores on the efficiency of hydrolysis were negligible, unless the mismatches were positioned at the termini. The efficiency of hydrolysis for imperfect duplexes containing an unpaired nucleotide in the midportion did not depend on the type of fluorophores. Importantly, when using probes for different alleles labeled with identical fluorophores, the analysis must be carried out in separate tubes.
DSNP was successfully applied for genotyping of variants involved in a number of diseases or predispositions, including TP53 C309T; F2 G20210A, MTHFR C677T; KRAS G34A, G35T, G35A, and G38A; NRAS G34A, G35C, and G35A; HRAS G35T; APOE C388T; F5 G1698A; and BRCA1 5382insC. Allelic status of the studied samples was confirmed by Sanger sequencing.
The results of model experiments showed that DSNP allows reliable differentiation between mutant and wildtype alleles in both homozygous and heterozygous samples. In addition, the example of BRCA1 5382insC demonstrates that, apart from point substitutions, the method is also applicable to single-nucleotide indels. The suitability of the same standard reaction conditions for different genomic positions indicates the universality of the approach.
Clustered occurrence of point mutations in certain genomic regions is well described [13]. The majority of available PCR systems cannot afford accurate determination of such closely located mutations by routine genotyping. On an example of KRAS, with a mutagenesis hotspot at positions 34 and 35 (G34A, G35A, and G35T [14]), it has been demonstrated that DSNP analysis is suitable for genotyping of closely-spaced point mutations even in multiplex. Simultaneous use of up to four FRET-probes, inclusive, produced specific signal only when the probe was fully complementary to the target.
Occasionally, it may be important not only to detect a variant, but also to measure the allelic ratio for a sample. Such tasks may be relevant for tumor tissue samples or pooled genomic DNA samples from multiple donors [12, 15]. Experiments with KRAS G35A as a model proved the possibility of using DSNP for semi-quantitative determination of mutant alleles in complex samples.
Thus, the natural properties of DSN have qualified this enzyme as a basis for a genotyping protocol termed DSNP, fairly simple and automatable. Most prominent advantages of this protocol include
- the use of crude PCR products with arbitrary fragment lengths;
- no purification/separation required;
- rapidity (1 hour, starting from PCR products);
- the allelic ratio assessment option;
- universal applicability: the method is suitable for determination of single-nucleotide substitutions and indels, at clustered positions or not, regardless of the context.
The protocol requires no special equipment apart from that ubiquitously found in the labs. Even fluorescence signals can be assessed with the use of ordinary instruments: for instance, with fluorescein, the signal can be recorded with a conventional UV lamp used in gel-doc systems.
DSNP disadvantages compared with real-time PCR include
- preparative PCR amplification of the target fragment;
- endpoint detection;
- risk of contamination associated with the need to open tubes with amplicons.
cDNA normalization
Heterogeneous gene expression levels in a cell complicate the full-scale analysis of transcriptomes and gene hunting. Normalization of cDNA libraries prior to analysis allows to increase the sensitivity towards rare transcripts.
The classical principle of cDNA normalization involves hybridization kinetics. As hybridization rate is proportional to the squared concentration of molecules in a sample, highcopy fragments renaturate faster than low-copy fragments [16]. Separation of reassociated double-stranded fragments after denaturation of a complex cDNA sample affords a library with equalized concentrations of abundant and rare transcripts [17–19].
The existing normalization protocols differ by the means of separation of the normalized single-stranded (ss) and doublestranded (ds) fractions. The possibilities include physical separation of fractions using hydroxyapatite chromatography [17, 20] or paramagnetic beads [19, 21], dsDNA digestion with restriction endonucleases [18], and selective amplification of ssDNA using the PCR suppression effect [22]. Unfortunately, these possibilities are hardly adaptable for normalization of cDNA samples enriched with full-length sequences.
The unique properties of DSN enabled a highly efficient and easy-to-perform method, known as DSN-normalization and now a routine at many laboratories in Russia and across the world, universally applicable for normalization of both fragmented and full-length cDNA. Like most its predecessors and counterparts, this method is based on the kinetics of cDNA reassociation, but differs in the way the normalized ssDNA fraction is separated [23].
After hybridization, the reaction mixture is treated with DSN to remove the non-target fraction of dsDNA. Since DSN is a thermostable enzyme active at 70 °C, the hydrolysis occurs at the same temperature as hybridization. The high temperature affords minimization of non-specific binding and thereby prevents the loss of transcripts prone to formation of secondary structures. The normalized ss cDNA fraction is amplified by PCR.
The method is also applicable to non-amplified first strand cDNA. Abundant transcripts ('majors') are sponged through renaturation of the first strand cDNA with the poly(A)+ RNA that has served as a template for its synthesis. This protocol is applicable with large amounts of biomaterial available so that it is possible to isolate the poly(A)+ RNA fraction from total RNA. It should be noted that in the normalized libraries, the content of clones corresponding to certain highly represented transcripts is sometimes lower than the number of clones corresponding to rare transcripts. Such “supernormalization” can be explained by the continued dominance of major RNA species, which serve as guides for elimination of complementary DNA molecules. Upon the release after DSN-mediated hydrolysis of the complementary DNA strand, major RNAs can form new hybrids with single-stranded DNA molecules, thus promoting their hydrolysis, and so on.
For the subsequent use in a variety of applications, the normalized first strand cDNA must be amplified. For this reason, preparation of first strand cDNA for DSN normalization necessarily involves ligation of adapter sequences, which will provide annealing sites for oligonucleotide PCR primers. As is well-known, in PCR, short fragments are amplified more efficiently than longer fragments. Accordingly, amplification of complex normalized cDNA is fraught with the loss of long transcripts and decreased average length of the library. To preserve the fraction of long cDNA molecules during library preparation, short inverted repeats are included in the design of the adapters. PCR amplification with a primer matching the inverted repeat (albeit shorter) favors amplification of long cDNA molecules against the background of suppressed amplification of short molecules [24]. According to experimental data, normalization of amplified cDNA, although generally less efficient than the first strand cDNA normalization, also provides significant leveling of over-represented transcripts to enable the search for rare mRNA species under circumstances when only total RNA is available.
To date, DSN normalization provides both the simplest and most effective means for cDNA normalization. By contrast with many other protocols, it involves no physical separation of DNA fractions. Furthermore, it can be used to normalize both amplified cDNA and the non-amplified first strand of cDNA enriched with full-length molecules. In addition, DSN-normalization preserves the average lengths as well as the length distributions of cloned cDNA libraries.
Specific enrichment and normalization of genomic DNA libraries
Song et al. (2016) developed a protocol for enrichment with minor alleles (including those with mutations of clinical or biological significance) through selective elimination of wild-type alleles in mixed (pooled) clinical samples, termed Nuclease-Assisted Minor-Allele enrichment using Overlapping Probes, NaME-PrO [25]. The simultaneous removal of the excess of wild-type DNA for a virtually unlimited number of target genomic sequences is performed before amplification. The unique properties of DNS ensure priority cleavage of wildtype DNA regardless of genomic context
For each target sequence, a pair of oligonucleotide probes is designed to bind the target region on opposite strands, with a 10–15 bp overlap between them. The probes are added in excess to the fragmented genomic DNA denatured at 98 °C. When the temperature is lowered to 67 °C, DNA remains singlestranded due to its low concentration and slow reassociation kinetics. The probes anneal to their target sites in DNA, whereby they create pinpoint mismatches on complementary strands upon their contact with mutated DNA, resulting in imperfect duplexes. Upon exposure to DSN, which preferentially cleaves perfect duplexes, the wild-type DNA is cleaved whereas mutant DNA remains substantively intact. Because the two probes match the target sequence on opposite strands, both strands of wild-type DNA undergo preferential cleavage within the region covered by the probes. Thus, if at least one of the DNA strands containing the mutation is preserved after DSN cleavage, then subsequent DNA amplification will lead to exponential amplification providing the multiplex enrichment for all mutated sites simultaneously. This approach abolishes the need for deep sequencing to detect rare mutations.
As demonstrated by the authors, NaME-PrO affords 50 to 200-fold enrichment for a variety of target mutations found in clinical samples (exemplified by KRAS mutations).
In connection with the rapid technological progress, next generation sequencing (NGS), in particular whole-genome sequencing, is becoming an increasingly common approach in basic science and clinical laboratory diagnostics. In eukaryotes, a significant proportion of genomic DNA consists of highly homologous repetitive elements. Their presence not only increases the cost of genome sequencing, but also makes the bioinformatics processing and interpretation of the data extremely difficult.
The problem can be solved by several approaches. In particular, for higher plant genomes, it is possible to employ the pronounced tendency of repetitive sequences to hypermethylation. Yuan et al. (2002) used selective cleavage of hypermethylated regions with restriction endonucleases sensitive to cytosine 5'-methylation [26, 27]. Similarly, Palmer et al. (2003) used the methylation-dependent endonuclease McrBC from the E. coli K-12 strain in the construction of maize genomic libraries, thereby limiting the cloning of heavily methylated DNA [28]. Such technical solutions, however, are not applicable to organisms with other methylation patterns that are not selective for repetitive elements.
An alternative solution to the problem, the so-called C0t filtration, is based on the kinetics of DNA renaturation. Genomic DNA is fragmented, heat denatured and cooled. Since low-copy DNA fragments rehybridize slower than repetitive elements, over a certain time the single-stranded fraction becomes enriched with low-copy sequences [29, 30]. Next, the doublestranded fraction containing repetitive elements is separated from the single-stranded fraction (in the classic version, by hydroxyapatite chromatography). Although C0t filtration may work with any complex mixture of heterogeneously represented DNA sequences, its application requires precise knowledge of the reassociation kinetics for a particular genome.
Shagina et al. (2010) investigated the possibility of using DSN normalization to eliminate the highly homologous repetitive elements from genomic sequencing libraries. DNA is subjected to fragmentation, supplemented with adapter sequences through ligation, and denatured by heating. During the renaturation process, the sample is treated with DSN. The preserved single-stranded fraction of genomic DNA enriched in low-copy sequences is amplified with primers corresponding to the adapter sequences [31].
The method was tested in a model experiment on normalization of human genomic DNA before sequencing in a 454 GS FLX system (Roche). To enhance the sponging of nontarget sequences, hybridization was carried out in an excess of the Cot-1 fraction of human genomic DNA. For the normalized and control samples, 29,240 and 31,789 reads were obtained with a total coverage of 6,269,460 and 6,643,277 nucleotides, respectively. Representation of diverse repetitive elements in the sequencing data was determined using the RepeatMasker software available at repeatmasker.org/cgi-bin/ WEBRepeatMasker. According to the results, normalization reduced the content of repetitive elements from 40% to 25%.
The table shows representation of different families of repetitive elements in non-normalized (control) and normalized samples of human genomic DNA. Significant effects can be observed for Alu, LINE L1P, ERV-K, and ERV1 repeats, as well as satellite sequences. At the same time, certain families of repetitive elements show resistance to normalization by this method.
Reciprocally, сontrol samples (no normalization) contained about 10% of sequences sharing 100–91% similarity and about 20% of sequences sharing 90–71% similarity, whereas the remaining 70% of sequences had identical nucleotides in less than 71% positions. DSN normalization reproducibly reduced the content of low-divergent repetitive elements (100–91% identical) 15-fold and medium-divergent repetitive elements (81–90% identical) 2-fold. Concentrations of other sequences in the samples did not decrease with DSN-normalization. Preservation of single-copy genomic sequences during DSN normalization was demonstrated by real-time PCR assay on a panel of 11 unique genes.
These findings indicate that DSN normalization can effectively reduce the content of the evolutionary young lowdivergent repetitive sequences in genomic DNA samples. The cut-off threshold can be lowered by using milder reassociation conditions (e.g. by lowering the temperature and/or increasing the cation concentration), albeit with the risk of partial loss of unique sequences due to increased non-specific interactions.
MicroRNA studies
MicroRNA molecules are increasingly considered as promising biomarkers for diagnosis and monitoring of various pathologies, including cancers and autoimmune disorders. They are found in the blood plasma both in a freely circulating form and as part of the exosomal fraction. MicroRNAs are easy to isolate, resistant to degradation, and show reproducible and characteristic expression patterns.
The unique properties of DSN, particularly its indifference to RNA substrates, can be used to create specific chemiluminescent and fluorescent sensors for miRNA [32, 33]. A system developed by Shen et al. (2015) contains biotinylated DNA molecules (probes) labeled with fluorescein and immobilized on magnetic beads. Apart from the target microRNA and the beads, the medium contains DSN, which recognizes and cleaves the duplexes formed upon binding of microRNA with the probe. After the cleavage, the labeled outer fragment of the probe drifts into the medium, while its microRNA partner finds and binds the next immobilized DNA molecule, promotes its cleavage, and so on. The fluorescently labeled cleavage products eventually accumulate in the medium. In the end, the beads with immobilized unreacted probes are separated from the reaction medium with a magnet. The labeled cleavage products remain in the medium for the endpoint detection of fluorescence. The system is sensitive enough to detect femtomolar microRNA concentrations. Noteworthy, in contrast to protocols that use quantitative PCR, amplification of the signal is carried out isothermally at 40 °C, which makes the proposed method even more attractive [33].
CONCLUSION
Duplex-specific nuclease from hepatopancreas of the king crab exerts a unique combination of properties including the exquisite substrate specificity (selectively digests doublestranded DNA without affecting single-stranded DNA or RNA), the high optimal temperature of catalysis (60–65 °C), and the thermal stability (retains activity at 90 °C). Since its original characterization in 2002, the crab nuclease has been featured in diverse molecular protocols, which still evolve and are continually updated. The enzyme has been successfully used in a wide range of applications, including genotyping of single nucleotide polymorphisms (in both experimental and clinical samples), normalization of cDNA and genomic DNA libraries, selective elimination of non-target sequences, and miRNA studies.