DNA

Deoxyribonucleic acid ( ; DNA ) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries…

Deoxyribonucleic acid (; DNA) is a polymer consisting of two polynucleotide chains that coil around each other, forming a double helix. This polymer carries the genetic instructions essential for the development, functioning, growth, and reproduction of all known organisms and many viruses. DNA and ribonucleic acid (RNA) are categorized as nucleic acids. Alongside proteins, lipids, and complex carbohydrates (polysaccharides), nucleic acids represent one of the four principal types of macromolecules vital for all recognized forms of life.

Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of all known organisms and many viruses. DNA and ribonucleic acid (RNA) are nucleic acids. Alongside proteins, lipids and complex carbohydrates (polysaccharides), nucleic acids are one of the four major types of macromolecules that are essential for all known forms of life.

The two strands of DNA are designated as polynucleotides because they are constructed from simpler monomeric units termed nucleotides. Each nucleotide comprises one of four nitrogenous nucleobases (cytosine [C], guanine [G], adenine [A], or thymine [T]), a deoxyribose sugar, and a phosphate group. Nucleotides are covalently linked within a chain via phosphodiester bonds, which form between the sugar of one nucleotide and the phosphate of the subsequent one, thereby creating an alternating sugar-phosphate backbone. The nitrogenous bases from the two distinct polynucleotide strands are interconnected by hydrogen bonds, adhering to specific base pairing rules: adenine (A) pairs with thymine (T), and cytosine (C) pairs with guanine (G), thus forming double-stranded DNA. These complementary nitrogenous bases are categorized into two groups: the single-ringed pyrimidines and the double-ringed purines. Within DNA, thymine and cytosine are the pyrimidines, while adenine and guanine are the purines.

Both strands of double-stranded DNA contain identical biological information, which is duplicated during replication when the two strands dissociate. The two DNA strands exhibit an antiparallel orientation, running in opposing directions. Each sugar moiety is covalently bonded to one of four types of nucleobases (or bases). The specific sequence of these four nucleobases along the backbone is responsible for encoding genetic information. RNA strands are synthesized using DNA strands as a template through a process known as transcription, where DNA bases are replaced by their corresponding RNA bases, with uracil (U) substituting for thymine (T). Subsequently, according to the genetic code, these RNA strands dictate the amino acid sequence within proteins during a process termed translation.

In eukaryotic cells, DNA is systematically arranged into elongated structures known as chromosomes. Prior to typical cell division, these chromosomes undergo duplication via DNA replication, ensuring that each daughter cell receives a complete chromosomal set. Eukaryotic organisms, encompassing animals, plants, fungi, and protists, primarily house their DNA within the cell nucleus as nuclear DNA, with smaller quantities found in mitochondria as mitochondrial DNA or in chloroplasts as chloroplast DNA. Conversely, prokaryotes, including bacteria and archaea, store their DNA exclusively in the cytoplasm, typically in circular chromosomes. Within eukaryotic chromosomes, chromatin proteins, such as histones, are responsible for compacting and organizing the DNA. These compacted structures regulate the interactions between DNA and other proteins, thereby assisting in the control of DNA transcription.

Properties

DNA is characterized as an extensive polymer constructed from repeating nucleotide units. Its structure exhibits dynamic properties along its length, allowing it to coil into compact loops and various other configurations. Across all species, DNA consists of two helical chains interconnected by hydrogen bonds. Both chains are coiled around a common axis and possess a consistent pitch of 34 ångströms (3.4 nm). The pair of chains has a radius of 10 Å (1.0 nm). An alternative study reported that when measured in a distinct solution, the DNA chain exhibited a width of 22–26 Å (2.2–2.6 nm), with a single nucleotide unit measuring 3.3 Å (0.33 nm) in length. The buoyant density for the majority of DNA is 1.7 g/cm³.

DNA typically exists not as a single strand, but rather as a tightly associated pair of strands. These two elongated strands intertwine to form a double helix. Each nucleotide encompasses both a segment of the molecular backbone, which maintains the chain's integrity, and a nucleobase, which engages in interactions with the opposing DNA strand within the helix. A nucleobase covalently bonded to a sugar is termed a nucleoside, whereas a base attached to a sugar and one or more phosphate groups is defined as a nucleotide. A biopolymer composed of multiple linked nucleotides, such as DNA, is referred to as a polynucleotide.

The DNA strand's structural backbone comprises an alternating sequence of phosphate and sugar moieties. The specific sugar component in DNA is 2-deoxyribose, a pentose (five-carbon) sugar. These sugar units are interconnected by phosphate groups, which establish phosphodiester bonds between the third and fifth carbon atoms of adjacent sugar rings. These carbons are conventionally designated as the 3′-end (three prime end) and 5′-end (five prime end) carbons, with the prime symbol serving to differentiate them from the carbon atoms within the nitrogenous base to which deoxyribose forms a glycosidic bond.

Consequently, a typical DNA strand is characterized by two distinct termini: one bearing a phosphate group covalently linked to the 5′ carbon of a ribose unit (termed the 5′ phosphoryl), and the other possessing a free hydroxyl group attached to the 3′ carbon of a ribose unit (known as the 3′ hydroxyl). The precise orientation of these 3′ and 5′ carbons along the sugar-phosphate backbone imparts an inherent directionality, often referred to as polarity, to each DNA strand. Within a nucleic acid double helix, the directional alignment of nucleotides in one strand is inverted relative to the other, a configuration known as antiparallelism. The inherent asymmetry of DNA strand termini dictates a 5′ (five prime) and 3′ (three prime) directionality, where the 5′ end invariably features a terminal phosphate group and the 3′ end a terminal hydroxyl group. A fundamental distinction between DNA and RNA resides in their constituent sugars: DNA incorporates 2-deoxyribose, whereas RNA utilizes the analogous pentose sugar, ribose.

The structural integrity of the DNA double helix is predominantly maintained by two principal forces: the formation of hydrogen bonds between complementary nucleotides and the stabilizing base-stacking interactions among the aromatic nucleobases. The four canonical nitrogenous bases present in DNA are adenine (A), cytosine (C), guanine (G), and thymine (T). These four bases are covalently linked to the sugar-phosphate backbone to constitute a complete nucleotide, as exemplified by adenosine monophosphate. Specific complementary pairing occurs between adenine and thymine, and between guanine and cytosine, thereby forming A-T and G-C base pairs, respectively.

Nucleobase Classification

Nucleobases are systematically categorized into two primary classes: purines, specifically A and G, which are characterized by fused five- and six-membered heterocyclic ring systems; and pyrimidines, comprising the six-membered rings C and T. A fifth pyrimidine nucleobase, uracil (U), typically replaces thymine in RNA and is structurally differentiated from thymine by the absence of a methyl group on its ring. Beyond their natural occurrence in RNA and DNA, numerous synthetic nucleic acid analogues have been engineered for the investigation of nucleic acid properties and for diverse biotechnological applications.

Non-Canonical Bases

Modified bases are observed within DNA. The initial identification of such a base was 5-methylcytosine, discovered in the genome of Mycobacterium tuberculosis in 1925. The presence of these non-canonical bases in bacterial viruses (bacteriophages) serves to circumvent the restriction enzymes inherent in bacteria. This enzymatic system functions, at least partially, as a molecular immune defense mechanism, safeguarding bacteria from viral infections. Furthermore, modifications to cytosine and adenine, which represent some of the most frequently altered DNA bases, are critically important for the epigenetic regulation of gene expression in both plant and animal organisms.

A variety of non-canonical bases are recognized as constituents of DNA. The majority of these represent structural modifications of the standard canonical bases, additionally incorporating uracil.

Modified Adenine
N6-carbamoyl-methyladenine

N6-methyadenine

Modified Guanine

7-Deazaguanine

7-Methylguanine

Modified Cytosine

N4-Methylcytosine

5-Carboxylcytosine

5-Formylcytosine

5-Glycosylhydroxymethylcytosine

5-Hydroxycytosine

5-Methylcytosine

Modified Thymidine

α-Glutamythymidine

α-Putrescinylthymine

and Modifications
Base J

Uracil

The DNA backbone is composed of two helical strands. Interspersed between these strands are spaces, or grooves, which also follow a double helical path. These voids are situated next to the base pairs and can serve as potential binding sites. The asymmetrical arrangement of the strands results in grooves of disparate dimensions. Specifically, the major groove measures 22 ångströms (2.2 nm) in width, whereas the minor groove is 12 Å (1.2 nm) wide. The greater width of the major groove renders the base edges more accessible compared to those within the minor groove. Consequently, proteins like transcription factors, which target specific sequences in double-stranded DNA, typically interact with the base sides presented within the major groove. While this configuration can differ in atypical DNA conformations found within cells , the major and minor grooves consistently retain their designations, reflecting the width disparities observed when the DNA reverts to its standard B-form.

Base Pairing

Within a DNA double helix, each nucleobase on one strand forms a bond exclusively with a specific type of nucleobase on the opposing strand. This phenomenon is termed complementary base pairing. Purines establish hydrogen bonds with pyrimidines; specifically, adenine exclusively pairs with thymine via two hydrogen bonds, while cytosine binds solely to guanine through three hydrogen bonds. This specific configuration, where two nucleotides are linked across the double helix (connecting six-carbon rings), is designated a Watson-Crick base pair. DNA characterized by a high guanine-cytosine (GC) content exhibits greater stability compared to DNA with a low GC content. A Hoogsteen base pair, involving hydrogen bonding between a six-carbon ring and a five-carbon ring, represents an infrequent alternative base-pairing mode. Given that hydrogen bonds are non-covalent, they can be dissociated and reformed with comparative ease. Consequently, the two DNA strands within a double helix can be separated, akin to a zipper, through the application of mechanical force or elevated temperature. This complementarity ensures that all genetic information within the double-stranded sequence of a DNA helix is replicated on each strand, a process fundamental to DNA replication. The reversible and specific nature of these complementary base pair interactions is paramount for all DNA functions in living organisms.

Single-Stranded DNA (ssDNA) Versus Double-Stranded DNA (dsDNA)

The majority of DNA molecules consist of two polymer strands helically intertwined and held together by noncovalent bonds; this double-stranded (dsDNA) configuration is primarily sustained by intrastrand base stacking interactions, which are most robust for G,C stacks. These two strands can dissociate, a process termed melting, resulting in the formation of two single-stranded DNA (ssDNA) molecules. Melting is induced by high temperatures, low salt concentrations, and elevated pH levels (although low pH can also denature DNA, it is seldom employed due to DNA's instability from acid depurination).

The stability of the dsDNA configuration is contingent upon not only its GC content (the percentage of G,C base pairs) but also its specific sequence, given that stacking interactions are sequence-dependent, and its length, as longer molecules exhibit greater stability. Stability can be quantified through several methods; a prevalent approach involves determining the melting temperature (also referred to as the T_m value), which represents the temperature at which half of the double-stranded molecules transition into single-stranded molecules. This melting temperature is influenced by both ionic strength and DNA concentration. Consequently, the strength of the association between the two DNA strands within a double helix is dictated by both the percentage of GC base pairs and the overall length of the helix. Extended DNA helices possessing a high GC content exhibit more robust strand interactions, whereas shorter helices rich in AT content display weaker interactions. Biologically, regions of the DNA double helix requiring facile separation, such as the TATAAT Pribnow box found in certain promoters, typically feature a high AT content, thereby facilitating strand dissociation.

Experimentally, the interaction strength can be quantified by ascertaining the melting temperature T_m required to disrupt half of the hydrogen bonds. Upon complete melting of all base pairs within a DNA double helix, the strands dissociate and persist in solution as two distinct, independent molecules. While these single-stranded DNA molecules lack a singular, universal conformation, certain configurations exhibit greater stability than others.

Quantity

In humans, the total female diploid nuclear genome per cell comprises 6.37 Gigabase pairs (Gbp), measures 208.23 cm in length, and weighs 6.51 picograms (pg). For males, the corresponding figures are 6.27 Gbp, 205.00 cm, and 6.41 pg. Individual DNA polymers, such as those found in chromosome 1, can encompass hundreds of millions of nucleotides. As the largest human chromosome, chromosome 1 contains approximately 220 million base pairs and would extend to approximately 85 mm in length if fully straightened.

In eukaryotes, cells possess mitochondrial DNA (mtDNA) in addition to nuclear DNA, which encodes specific proteins essential for mitochondrial function. Mitochondrial DNA is typically considerably smaller than its nuclear counterpart. For instance, human mitochondrial DNA exists as closed circular molecules, each comprising 16,569 base pairs and typically containing a complete set of mitochondrial genes. On average, each human mitochondrion houses approximately five such mtDNA molecules. Given that each human cell contains roughly 100 mitochondria, the total number of mtDNA molecules per cell approximates 500. Nevertheless, mitochondrial abundance varies significantly across cell types; an egg cell, for example, can harbor 100,000 mitochondria, equating to as many as 1,500,000 copies of the mitochondrial genome, which can constitute up to 90% of the cell's total DNA.

Sense and Antisense Strands

A DNA sequence is designated as "sense" if its sequence corresponds to that of the messenger RNA (mRNA) transcript destined for protein translation. Conversely, the complementary sequence on the opposing strand is termed "antisense." It is possible for both sense and antisense sequences to be present on different segments of the same DNA strand, implying that both strands can harbor regions of both types. While antisense RNA sequences are generated in both prokaryotic and eukaryotic organisms, their precise functions remain incompletely elucidated. A prominent hypothesis suggests that antisense RNAs participate in gene expression regulation via RNA-RNA base pairing interactions.

The distinction between sense and antisense strands becomes less defined in certain DNA sequences found in prokaryotes and eukaryotes, and more frequently in plasmids and viruses, due to the presence of overlapping genes. Such instances involve DNA sequences performing a dual function, encoding one protein when transcribed from one strand and a distinct protein when transcribed in the reverse direction from the complementary strand. In bacteria, this genomic overlap might contribute to the regulation of gene transcription, whereas in viruses, overlapping genes serve to maximize the informational content within their compact genomes.

DNA Supercoiling

DNA can undergo a process known as supercoiling, where it becomes twisted akin to a rope. In its relaxed conformation, a DNA strand typically completes one turn around the double helix axis approximately every 10.4 base pairs. However, when DNA is subjected to twisting, the strands become either more tightly or more loosely wound. Twisting the DNA in the same direction as the helical turn results in positive supercoiling, which enhances the stability of base pairing. Conversely, twisting in the opposite direction leads to negative supercoiling, facilitating the separation of base pairs. Naturally occurring DNA predominantly exhibits a slight negative supercoiling, a state induced by enzymes known as topoisomerases. These enzymes are also crucial for alleviating the torsional stresses that arise in DNA strands during essential cellular processes like transcription and DNA replication.

Alternative DNA Conformations

DNA can adopt numerous conformations, including the A-DNA, B-DNA, and Z-DNA forms. However, only B-DNA and Z-DNA have been directly observed within living organisms. The specific conformation assumed by DNA is influenced by several factors, including hydration levels, the DNA sequence itself, the degree and orientation of supercoiling, chemical modifications to the bases, the species and concentration of metal ions, and the presence of polyamines in the surrounding solution.

Initial published accounts of A-DNA and B-DNA X-ray diffraction patterns relied on analyses employing Patterson functions, which yielded restricted structural insights for oriented DNA fibers. In 1953, Wilkins et al. proposed an alternative analysis for the in vivo B-DNA X-ray diffraction-scattering patterns obtained from highly hydrated DNA fibers, interpreting them using squares of Bessel functions. Concurrently, in the same publication, James Watson and Francis Crick presented their molecular modeling analysis of the DNA X-ray diffraction patterns, which led them to propose the double helix structure.

Although the B-DNA form is predominant under physiological cellular conditions, it is not a singular, precisely defined conformation, but rather a family of analogous DNA structures that emerge in highly hydrated cellular environments. X-ray diffraction and scattering analyses reveal patterns indicative of molecular paracrystals exhibiting substantial structural disorder.

In contrast to B-DNA, the A-DNA conformation presents as a broader right-handed helix, characterized by a shallow, wide minor groove and a more constricted, deeper major groove. While the A-form typically manifests under non-physiological conditions within partially dehydrated DNA samples, its presence within cells can arise in hybrid DNA-RNA pairings and within enzyme-DNA complexes. DNA segments featuring chemically modified bases, such as through methylation, can undergo a more pronounced conformational shift, leading to the adoption of the Z-form. In this configuration, the strands coil around the helical axis in a left-handed spiral, which is contrary to the right-handed twist observed in the prevalent B-form. Such atypical structures are identifiable by specialized Z-DNA binding proteins and are potentially implicated in transcriptional regulatory processes.

Non-Canonical DNA Chemistries

Exobiologists have long hypothesized the existence of a "shadow biosphere," positing a terrestrial microbial ecosystem employing biochemical and molecular mechanisms fundamentally distinct from those of known life forms. Among these propositions was the concept of organisms utilizing arsenic as a substitute for phosphorus within their DNA. In 2010, a report suggested this possibility in the bacterium GFAJ-1; however, subsequent research disputed these findings, indicating that the bacterium actively inhibits arsenic incorporation into its DNA backbone and other biomolecules.

Quadruplex Structures

Specialized DNA regions, termed telomeres, are situated at the termini of linear chromosomes. Their primary role involves enabling cellular replication of chromosome ends via the enzyme telomerase, given that conventional DNA replication enzymes are unable to duplicate the extreme 3′ termini of chromosomes. Furthermore, these distinctive chromosomal caps safeguard DNA ends and prevent cellular DNA repair mechanisms from misinterpreting them as damage requiring correction. Human telomeres typically consist of single-stranded DNA segments comprising several thousand repetitions of the conserved TTAGGG sequence.

These guanine-rich sequences can contribute to chromosome end stabilization by assembling into stacked arrays of four-base units, diverging from the conventional base pairing observed in other DNA molecules. Specifically, four guanine bases coalesce to form a planar structure known as a guanine tetrad. These planar four-base units subsequently stack coaxially, generating a stable G-quadruplex architecture. Stabilization of these structures is achieved through hydrogen bonding among the base edges and the chelation of a metal ion centrally positioned within each four-base unit. Alternative configurations are also possible, wherein the central quartet of bases originates from either a single strand folded upon itself or from multiple distinct parallel strands, each contributing one base to the core structure.

Beyond these stacked configurations, telomeres also generate substantial looped architectures, referred to as telomere loops or T-loops. Within these loops, the single-stranded DNA coils into an extended circular formation, maintained by telomere-binding proteins. At the distal end of the T-loop, the single-stranded telomeric DNA associates with a double-stranded DNA region, achieved by the telomere strand invading and disrupting the double helix, subsequently base-pairing with one of the two existing strands. This resulting triple-stranded entity is designated a displacement loop, or D-loop.

Branched DNA

DNA fraying manifests when non-complementary segments are present at the terminus of an otherwise complementary double-stranded DNA molecule. Conversely, branched DNA can form upon the introduction of a third DNA strand possessing contiguous regions capable of hybridizing with the frayed segments of the pre-existing double helix. While the most straightforward instance of branched DNA comprises merely three DNA strands, more intricate complexes incorporating additional strands and multiple branching points are also feasible. Branched DNA finds application in nanotechnology for the fabrication of geometric structures.

Artificial Bases

Multiple synthetic nucleobases have been successfully created and integrated into an eight-base DNA analog, termed Hachimoji DNA. Designated S, B, P, and Z, these synthetic bases exhibit predictable pairing (S–B and P–Z), preserve the DNA double helix conformation, and are amenable to transcription into RNA. The presence of these artificial bases suggests that the four natural nucleobases, which evolved on Earth, may not possess unique inherent properties. Conversely, DNA is intimately linked to RNA, which functions not merely as a DNA transcript but also as a molecular machine executing numerous cellular processes. To fulfill this role, RNA must adopt specific folded structures. Research indicates that a minimum of four bases is necessary for RNA to form all potential structures, although a greater number is feasible but would contradict the biological principle of parsimony.

Acidity

The phosphate groups within DNA confer acidic properties akin to phosphoric acid, classifying DNA as a strong acid. At typical cellular pH, DNA undergoes complete ionization, releasing protons and consequently imparting negative charges to its phosphate groups. These negative charges safeguard DNA from hydrolytic degradation by repelling nucleophiles that could otherwise initiate hydrolysis.

Macroscopic Appearance

When extracted from cells, pure DNA manifests as white, fibrous aggregates.

Chemical Modifications and DNA Packaging Alterations

Base Modifications and DNA Packaging

Gene expression is modulated by the organizational structure of DNA within chromosomes, known as chromatin. Base modifications contribute to this packaging, with areas exhibiting minimal or absent gene expression typically characterized by elevated levels of cytosine methylation. The packaging of DNA and its impact on gene expression can also arise from covalent modifications to the histone protein core, around which DNA is coiled within chromatin, or through remodeling processes executed by chromatin remodeling complexes. Furthermore, a reciprocal interaction exists between DNA methylation and histone modification, enabling their coordinated influence on chromatin structure and gene expression.

For instance, cytosine methylation yields 5-methylcytosine, a crucial factor in the X-inactivation of chromosomes. The mean methylation level differs across organisms; for example, the nematode Caenorhabditis elegans exhibits no cytosine methylation, whereas vertebrates possess higher levels, with up to 1% of their DNA comprising 5-methylcytosine. Notwithstanding its significance, 5-methylcytosine can undergo deamination, resulting in a thymine base, which renders methylated cytosines especially susceptible to mutations. Additional base modifications encompass adenine methylation in bacteria, the occurrence of 5-hydroxymethylcytosine in neural tissue, and the glycosylation of uracil to form the "J-base" in kinetoplastids.

DNA Damage

DNA is susceptible to damage from various mutagens, which induce alterations in its sequence. Mutagenic agents comprise oxidizing compounds, alkylating substances, and high-energy electromagnetic radiation, including ultraviolet light and X-rays. The specific form of DNA damage incurred is contingent upon the nature of the mutagen. For instance, ultraviolet light can inflict DNA damage by generating thymine dimers, which are covalent cross-links between adjacent pyrimidine bases. Conversely, oxidants like free radicals or hydrogen peroxide induce diverse types of damage, such as base modifications, especially affecting guanosine, and double-strand breaks. An average human cell typically contains approximately 150,000 bases that have undergone oxidative damage. Among these oxidative lesions, double-strand breaks are considered the most perilous due to their challenging repair and potential to cause point mutations, insertions, deletions within the DNA sequence, and chromosomal translocations. Such mutations are implicated in carcinogenesis. Due to intrinsic limitations in DNA repair mechanisms, all humans would ultimately develop cancer if their lifespan were sufficiently extended. Naturally occurring DNA damages, arising from normal cellular processes that generate reactive oxygen species and the hydrolytic activity of cellular water, also manifest frequently. While the majority of these damages are repaired, some DNA damage may persist in any cell despite the operation of repair mechanisms. These persistent DNA damages progressively accumulate with age in mammalian postmitotic tissues. This accumulation is considered a significant contributing factor to the aging process.

Numerous mutagenic agents are capable of inserting themselves into the interstitial space between adjacent base pairs, a process termed intercalation. The majority of intercalating compounds are characterized by their aromatic and planar molecular structures, with notable examples encompassing ethidium bromide, acridines, daunomycin, and doxorubicin. Successful intercalation necessitates the separation of base pairs, which subsequently distorts the DNA strands through the unwinding of the double helix. This structural alteration impedes both transcriptional and DNA replicative processes, leading to cellular toxicity and the induction of mutations. Consequently, DNA intercalators can function as carcinogens, and in specific instances, such as with thalidomide, as teratogens. Conversely, certain substances like benzo[a]pyrene diol epoxide and aflatoxin generate DNA adducts, which are responsible for inducing replication errors. Despite their detrimental effects, the capacity of similar toxins to inhibit DNA transcription and replication is therapeutically exploited in chemotherapy to suppress the proliferation of rapidly dividing cancer cells.

Biological Functions

Deoxyribonucleic acid (DNA) typically manifests as linear chromosomes within eukaryotic organisms and as circular chromosomes in prokaryotic cells. The complete complement of chromosomes within a cell constitutes its genome; for instance, the human genome comprises approximately 3 billion base pairs of DNA organized into 46 distinct chromosomes. The genetic information encoded within DNA resides in the specific sequence of segments known as genes. The accurate transmission of this genetic information, contained within genes, is facilitated by the principle of complementary base pairing. During transcription, for example, when a cell accesses the information within a gene, the DNA sequence is transcribed into a complementary RNA sequence, a process driven by the specific attraction between DNA bases and their corresponding RNA nucleotides. Subsequently, this RNA transcript is typically utilized to synthesize a corresponding protein sequence through a process termed translation, which similarly relies on specific interactions between RNA nucleotides. Alternatively, a cell can duplicate its entire genetic information through a mechanism known as DNA replication. While the intricate specifics of these functions are elaborated elsewhere, the present discussion emphasizes the molecular interactions between DNA and other cellular components that orchestrate genomic activity.

Genomes

Genomic DNA undergoes a highly organized and compact packaging process, known as DNA condensation, enabling it to occupy the limited cellular volume. Within eukaryotic cells, DNA is primarily localized within the cell nucleus, with minor quantities also present in mitochondria and chloroplasts. Conversely, in prokaryotic organisms, the DNA is contained within an irregularly shaped cytoplasmic region referred to as the nucleoid.

The functional genetic information within a genome is distributed across genes, regulatory sequences, origins of replication, centromeres, telomeres, and specific segments crucial for establishing the three-dimensional architecture of chromatin. In numerous complex eukaryotic organisms, merely a minor proportion of the entire genomic sequence is dedicated to these diverse functional elements. For instance, in humans, less than 10% of the genome possesses a clearly defined functional role, with the remaining 90% often characterized as non-coding or "junk" DNA.

Transcription and Translation

A gene is defined as a specific DNA sequence that harbors genetic information capable of influencing an organism's phenotype. Within the confines of a gene, the precise arrangement of bases along a DNA strand dictates the sequence of a messenger RNA (mRNA) molecule, which subsequently specifies one or more protein sequences. The intricate correlation between the nucleotide sequences of genes and the amino acid sequences of proteins is governed by the principles of translation, collectively referred to as the genetic code. This genetic code is composed of three-letter "words," designated as codons, each formed by a specific sequence of three nucleotides (e.g., ACT, CAG, TTT).

During transcription, the codons present within a gene are accurately transcribed into messenger RNA by the enzyme RNA polymerase. Subsequently, this RNA transcript is translated by a ribosome, which interprets the RNA sequence through complementary base-pairing between the messenger RNA and transfer RNA molecules, each carrying a specific amino acid. Given the four distinct bases and their arrangement in three-letter combinations, a total of 64 unique codons are possible (4³ combinations). These codons collectively specify the twenty standard amino acids, resulting in degeneracy where most amino acids are encoded by more than one codon. Furthermore, three specific "stop" or "nonsense" codons (TAG, TAA, and TGA in DNA; UAG, UAA, and UGA in mRNA) signal the termination of the coding region.

Replication

Cellular division is fundamental for organismal growth; however, during cell division, the genome's DNA must be accurately replicated to ensure that daughter cells inherit identical genetic information from the parent. The double-helical configuration of DNA facilitates a straightforward mechanism for its replication. This process involves the separation of the two strands, followed by the enzymatic synthesis of a complementary DNA sequence for each strand, catalyzed by DNA polymerase. DNA polymerase constructs the new strand by identifying and covalently linking the appropriate complementary base to the original template strand. Given that DNA polymerases can only elongate a DNA strand in the 5′ to 3′ direction, distinct mechanisms are employed to replicate the antiparallel strands of the double helix. Consequently, the nucleotide sequence of the template strand dictates the sequence of the newly synthesized strand, resulting in a precise duplication of the cellular DNA.

Extracellular Nucleic Acids

Extracellular DNA (eDNA), predominantly released through cellular demise, is widely distributed throughout various environments. Concentrations can reach up to 2 μg/L in soil and 88 μg/L in natural aquatic systems. Several potential functions have been attributed to eDNA, including its involvement in horizontal gene transfer, its role as a nutrient source, and its capacity to serve as a buffer for the recruitment or titration of ions or antibiotics. Furthermore, eDNA functions as a critical component of the extracellular matrix within the biofilms of numerous bacterial species. Its roles in biofilms encompass acting as a recognition factor to modulate the attachment and dispersal of specific cell types, contributing to biofilm structural integrity, and enhancing resistance to biological stressors.

Cell-free fetal DNA, present in maternal blood, can be sequenced to yield substantial information regarding the developing fetus.

Designated as environmental DNA, eDNA has gained prominence in the natural sciences as a valuable survey tool for ecological studies, facilitating the monitoring of species movement and presence across aquatic, aerial, and terrestrial environments, as well as the assessment of regional biodiversity.

Interactions with Proteins

The diverse functions of DNA are fundamentally dependent on its interactions with proteins. These protein-DNA interactions can be either non-specific or sequence-specific. Among the enzymes that bind to DNA, polymerases, which are responsible for replicating the DNA base sequence during transcription and DNA replication, are of paramount importance.

DNA-Binding Proteins

Structural proteins that associate with DNA represent prominent examples of non-specific DNA-protein interactions. Within chromosomes, DNA is organized into complexes with various structural proteins. These proteins facilitate the compaction of DNA into a structure known as chromatin. In eukaryotic cells, this organization entails DNA binding to a complex of small, basic proteins termed histones, whereas prokaryotic cells utilize multiple protein types for similar structural roles. Histones assemble into a disk-shaped complex, the nucleosome, around which two complete turns of double-stranded DNA are wrapped. These non-specific interactions arise from the basic residues within histones forming ionic bonds with the acidic sugar-phosphate backbone of DNA, thereby exhibiting minimal dependence on the specific base sequence. Post-translational modifications of these basic amino acid residues, such as methylation, phosphorylation, and acetylation, are common. Such modifications modulate the affinity between DNA and histones, consequently influencing DNA accessibility for transcription factors and altering transcriptional rates. Additional non-specific DNA-binding proteins found in chromatin include high-mobility group (HMG) proteins, which preferentially bind to bent or distorted DNA conformations. HMG proteins play a crucial role in inducing bends in nucleosome arrays and facilitating their organization into higher-order chromosomal structures.

A distinct category of DNA-binding proteins comprises those that specifically interact with single-stranded DNA. In humans, replication protein A (RPA) is the most extensively characterized member of this family, participating in processes involving double helix unwinding, such as DNA replication, recombination, and repair. These proteins appear to stabilize single-stranded DNA, preventing the formation of secondary structures like stem-loops and protecting it from nuclease-mediated degradation.

Conversely, certain proteins have developed the capacity to specifically interact with distinct DNA sequences. Among these, transcription factors, which are proteins governing gene transcription, represent the most extensively investigated category. Each transcription factor exhibits specificity for a unique set of DNA sequences, thereby either activating or repressing the transcription of genes possessing these sequences in proximity to their promoter regions. This regulatory function is achieved through two primary mechanisms. Initially, they can directly or indirectly, via intermediary proteins, associate with RNA polymerase, the enzyme responsible for transcription. This interaction positions the polymerase at the promoter, facilitating the initiation of transcription. Alternatively, transcription factors may engage with enzymes that induce modifications to histones situated at the promoter. Such modifications consequently alter the accessibility of the DNA template for polymerase binding.

Given that these DNA target sequences are distributed across an organism's entire genome, alterations in the activity of a single transcription factor type can exert widespread effects on numerous genes. Consequently, these proteins frequently serve as focal points for signal transduction pathways, which orchestrate cellular responses to environmental stimuli, differentiation, and developmental processes. The precise specificity of transcription factor-DNA interactions arises from the proteins forming multiple contacts with the exposed edges of DNA bases, thereby enabling them to discern the underlying DNA sequence. A majority of these base-specific interactions occur within the major groove, a region characterized by enhanced accessibility of the DNA bases.

DNA-Modifying Enzymes

Nucleases and Ligases

Nucleases are a class of enzymes that cleave DNA strands through the catalytic hydrolysis of phosphodiester bonds. Exonucleases specifically hydrolyze nucleotides from the termini of DNA strands, whereas endonucleases perform internal cleavages within the strands. In molecular biology, restriction endonucleases, which precisely cleave DNA at defined recognition sequences, are among the most commonly employed nucleases. For example, the EcoRV enzyme identifies the 6-base sequence 5′-GATATC-3′ and executes a cleavage at a specific point within this sequence. Naturally, these enzymes function as a defense mechanism in bacteria against bacteriophage infection, degrading phage DNA upon its entry into the bacterial cell as components of the restriction-modification system. In biotechnological applications, these sequence-specific nucleases are indispensable tools for molecular cloning and DNA fingerprinting.

DNA ligases are enzymes capable of rejoining severed or fragmented DNA strands. These ligases play a crucial role in lagging strand DNA replication, where they covalently link the nascent, short DNA segments generated at the replication fork to form a continuous DNA template copy. Furthermore, they are integral to DNA repair mechanisms and genetic recombination processes.

Topoisomerases and Helicases

Topoisomerases are a class of enzymes characterized by both nuclease and ligase enzymatic activities. These proteins modulate the degree of supercoiling within DNA. Certain topoisomerases function by transiently cleaving the DNA helix, permitting a segment to rotate and consequently alleviating supercoiling, after which the enzyme reseals the DNA break. Conversely, other topoisomerase subtypes can cleave one DNA helix, facilitate the passage of a second DNA strand through the resulting gap, and subsequently rejoin the helix. Topoisomerases are indispensable for numerous DNA-dependent cellular processes, including DNA replication and transcription.

Helicases are proteins classified as molecular motors. These enzymes harness the chemical energy derived from nucleoside triphosphates, primarily adenosine triphosphate (ATP), to disrupt hydrogen bonds between complementary bases, thereby unwinding the DNA double helix into its constituent single strands. Helicases are critical for the majority of cellular processes necessitating enzymatic access to the DNA bases.

Polymerases

Polymerases are enzymes responsible for synthesizing polynucleotide chains from nucleoside triphosphates. The resulting product sequence is determined by an existing polynucleotide chain, referred to as a template. Their enzymatic mechanism involves the sequential addition of nucleotides to the 3′ hydroxyl group at the terminus of the elongating polynucleotide chain. Consequently, all polymerases operate exclusively in a 5′ to 3′ synthesis direction. Within the enzyme's active site, the incoming nucleoside triphosphate forms complementary base pairs with the template, enabling polymerases to accurately synthesize a strand complementary to the template. Polymerases are categorized based on the specific type of template they utilize.

During DNA replication, DNA-dependent DNA polymerases synthesize copies of DNA polynucleotide chains. To preserve biological information, precise complementarity of base sequences between each newly synthesized copy and the template strand is crucial. Many DNA polymerases exhibit proofreading capabilities. This mechanism enables the polymerase to identify infrequent errors in the synthesis reaction through the absence of proper base pairing between misincorporated nucleotides. Upon mismatch detection, a 3′ to 5′ exonuclease activity is initiated, excising the erroneous base. Within most organisms, DNA polymerases operate as constituents of a substantial complex termed the replisome, which incorporates numerous accessory subunits like DNA clamps and helicases.

RNA-dependent DNA polymerases constitute a specialized class of polymerases that transcribe RNA sequences into DNA. Notable examples include reverse transcriptase, a viral enzyme crucial for retroviral cellular infection, and telomerase, which is indispensable for telomere replication. Specifically, HIV reverse transcriptase facilitates the replication of the AIDS virus. Telomerase is distinctive among polymerases due to the integration of its own RNA template within its structural composition. This enzyme catalyzes the synthesis of telomeres at chromosomal termini. Telomeres serve to inhibit the fusion of adjacent chromosomal ends and safeguard these termini from degradation.

Transcription, a process mediated by DNA-dependent RNA polymerase, involves the synthesis of an RNA strand from a DNA template. Initiation of gene transcription occurs when RNA polymerase associates with a specific DNA sequence known as a promoter, subsequently unwinding the DNA duplex. The enzyme then synthesizes a messenger RNA transcript complementary to the gene sequence until it encounters a DNA region designated as the terminator, at which point it ceases transcription and dissociates from the DNA. Similar to human DNA-dependent DNA polymerases, RNA polymerase II—the enzyme responsible for transcribing the majority of genes within the human genome—functions within a substantial protein complex comprising numerous regulatory and accessory subunits.

Genetic Recombination

Typically, a DNA helix maintains independence from other DNA segments. In human cells, distinct chromosomes are spatially segregated within the nucleus into designated "chromosome territories." This spatial partitioning of chromosomes is critical for DNA's role as a stable information repository. One of the infrequent instances of chromosomal interaction is during chromosomal crossover, a process integral to sexual reproduction and genetic recombination. Chromosomal crossover involves the breakage of two DNA helices, followed by the reciprocal exchange of segments and subsequent rejoining.

Genetic recombination facilitates the exchange of genetic information between chromosomes, thereby generating novel gene combinations. This process enhances the efficacy of natural selection and contributes significantly to the accelerated evolution of novel proteins. Furthermore, genetic recombination plays a role in DNA repair mechanisms, especially in cellular responses to double-strand breaks.

Homologous recombination represents the predominant form of chromosomal crossover, characterized by the involvement of two chromosomes possessing highly similar sequences. Conversely, non-homologous recombination poses a risk to cellular integrity, potentially leading to chromosomal translocations and various genetic abnormalities. The enzymatic catalysis of recombination is performed by recombinases, exemplified by RAD51. The initial stage of recombination involves the induction of a double-stranded break, which can result from endonuclease activity or DNA damage. Subsequently, a sequence of steps, partially catalyzed by the recombinase, culminates in the joining of the two helices via at least one Holliday junction, wherein a single-strand segment from each helix anneals to the complementary strand of the opposing helix. This Holliday junction, a tetrahedral structure, is capable of migrating along the chromosomal pair, facilitating strand exchange. The recombination process concludes with the cleavage of the junction and subsequent re-ligation of the liberated DNA. During recombination, only DNA strands exhibiting identical polarity undergo exchange. Cleavage occurs via two distinct mechanisms: east-west and north-south. North-south cleavage involves nicks in both DNA strands, whereas east-west cleavage preserves one DNA strand intact. The establishment of a Holliday junction during recombination is instrumental in fostering genetic diversity, enabling chromosomal gene exchange, and facilitating the expression of wild-type viral genomes.

Evolution

Deoxyribonucleic acid (DNA) stores the genetic information essential for the functioning, growth, and reproduction of all life forms. Nevertheless, the duration of DNA's involvement in this capacity throughout life's 4-billion-year history remains uncertain, given the hypothesis that primordial life forms might have utilized RNA as their genetic material. RNA could have served as a pivotal component of early cellular metabolism, possessing the dual capacity to transmit genetic information and facilitate catalysis through ribozymes. This hypothetical ancient RNA world, where nucleic acids fulfilled both catalytic and genetic roles, may have influenced the development of the contemporary genetic code, which is founded on four nucleotide bases. Such an evolutionary trajectory would arise from a balance between a limited number of bases, which enhances replication accuracy, and a greater number of bases, which improves the catalytic efficiency of ribozymes. Despite these theoretical constructs, direct evidence for ancient genetic systems is absent, primarily because DNA recovery from most fossils is unfeasible; DNA persists in the environment for under one million years and progressively degrades into short fragments when in solution. While assertions of older DNA have been presented, notably a report detailing the isolation of a viable bacterium from a 250-million-year-old salt crystal, these claims are subject to considerable debate.

The fundamental constituents of DNA, including adenine, guanine, and analogous organic molecules, are hypothesized to have originated extraterrestrially in space. Furthermore, intricate organic compounds vital for DNA and RNA, such as uracil, cytosine, and thymine, have been synthesized in laboratory environments that replicate conditions prevalent in outer space. These syntheses utilized precursor chemicals, including pyrimidine, which are present in meteorites. Pyrimidine, akin to polycyclic aromatic hydrocarbons (PAHs)—recognized as the most carbon-rich chemicals in the cosmos—may have formed within red giant stars or in interstellar dust and gas clouds.

Ancient DNA has been successfully extracted from prehistoric organisms, enabling direct observation of genomic evolution over significant timescales. This includes DNA from extinct species, such as the woolly mammoth, dating back millions of years.

Technological Applications

Genetic Engineering

Sophisticated methodologies have been established for the isolation of DNA from biological entities, exemplified by phenol-chloroform extraction, and for its manipulation within laboratory settings, such as through restriction digests and the polymerase chain reaction. Contemporary biology and biochemistry extensively employ these techniques within the domain of recombinant DNA technology. Recombinant DNA refers to an artificially constructed DNA sequence, synthesized by combining segments from various other DNA sequences. These constructs can be introduced into organisms, typically as plasmids or via viral vectors, in a suitable format. The resulting genetically modified organisms are then utilized for the production of diverse products, including recombinant proteins for medical research, or for cultivation in agricultural contexts.

DNA Profiling

Forensic scientists leverage DNA obtained from biological samples—such as blood, semen, skin, saliva, or hair—recovered from crime scenes to establish a genetic match to an individual, potentially identifying a perpetrator. This procedure is formally designated as DNA profiling, also known as DNA fingerprinting. The technique involves comparing the lengths of polymorphic regions within repetitive DNA sequences, specifically short tandem repeats and minisatellites, among different individuals. Generally, this methodology offers exceptional reliability for identifying corresponding DNA. Nevertheless, the identification process can become intricate if the crime scene exhibits contamination with DNA from multiple individuals. DNA profiling was pioneered in 1984 by the British geneticist Sir Alec Jeffreys and was first applied in forensic science to secure the conviction of Colin Pitchfork in the 1988 Enderby murders case.

The advancements in forensic science, particularly the capacity to achieve genetic matches from minute samples of blood, skin, saliva, or hair, have prompted the re-evaluation of numerous historical cases. Consequently, evidence that was scientifically unattainable during initial investigations can now be brought to light. This capability, coupled with the abolition of double jeopardy laws in certain jurisdictions, facilitates the reopening of cases where previous trials lacked sufficient evidence to persuade a jury. Individuals accused of grave offenses may be mandated to provide DNA samples for comparative analysis. A primary defense against forensically obtained DNA matches often involves asserting cross-contamination of evidence. This potential vulnerability has necessitated the implementation of rigorously strict handling protocols for new serious crime investigations.

DNA profiling serves as a successful method for the positive identification of victims in mass casualty incidents, human remains from severe accidents, and individuals within mass war graves, primarily through familial matching.

Furthermore, DNA profiling is employed in paternity testing to ascertain biological parentage or grandparentage, typically yielding a probability of 99.99% when a biological relationship exists between the alleged parent and the child. While DNA sequencing is commonly performed post-natally, emerging methodologies now permit paternity testing during pregnancy.

DNA Enzymes and Catalytic DNA

Deoxyribozymes, alternatively known as DNAzymes or catalytic DNA, were initially identified in 1994. These are predominantly single-stranded DNA sequences obtained from extensive libraries of random DNA sequences via a combinatorial strategy termed in vitro selection or systematic evolution of ligands by exponential enrichment (SELEX). DNAzymes facilitate a diverse array of chemical transformations, including RNA-DNA cleavage, RNA-DNA ligation, amino acid phosphorylation-dephosphorylation, and carbon-carbon bond formation. Their catalytic efficiency can accelerate reaction rates by up to 100 billion-fold compared to uncatalyzed reactions. The most thoroughly investigated category of DNAzymes comprises RNA-cleaving variants, which have applications in detecting various metal ions and in the development of therapeutic agents. Numerous metal-specific DNAzymes have been documented, such as the GR-5 DNAzyme (lead-specific), the CA1-3 DNAzymes (copper-specific), the 39E DNAzyme (uranyl-specific), and the NaA43 DNAzyme (sodium-specific). Notably, the NaA43 DNAzyme, exhibiting over 10,000-fold selectivity for sodium compared to other metal ions, has been utilized to construct a real-time intracellular sodium sensor.

Bioinformatics

Bioinformatics encompasses the creation of methodologies for the storage, data mining, searching, and manipulation of biological information, specifically including DNA nucleic acid sequence data. This field has spurred significant advancements in computer science, particularly in string searching algorithms, machine learning, and database theory. String searching or matching algorithms, designed to locate instances of a specific sequence of characters within a larger sequence, were developed to identify particular nucleotide sequences. DNA sequences can be aligned with other DNA sequences to ascertain homologous regions and pinpoint distinguishing mutations. These methodologies, especially multiple sequence alignment, are instrumental in investigating phylogenetic relationships and protein function. Large datasets comprising entire genomic DNA sequences, such as those generated by the Human Genome Project, are challenging to interpret without annotations that delineate gene locations and regulatory elements on each chromosome. Gene finding algorithms enable the identification of DNA sequence regions exhibiting characteristic patterns associated with protein- or RNA-coding genes, thereby allowing researchers to forecast the existence of specific gene products and their potential functions within an organism prior to experimental isolation. Furthermore, comparative genomics, involving the comparison of entire genomes, can elucidate the evolutionary history of a particular organism and facilitate the analysis of intricate evolutionary events.

DNA Nanotechnology

DNA nanotechnology leverages the distinctive molecular recognition capabilities of DNA and other nucleic acids to construct self-assembling branched DNA complexes possessing advantageous characteristics. In this context, DNA functions as a structural component rather than solely as a repository of biological information. This application has facilitated the development of two-dimensional periodic lattices (both tile-based and utilizing the DNA origami method) and three-dimensional polyhedral structures. Demonstrations have also included nanomechanical devices and algorithmic self-assembly, with these DNA constructs serving as templates for organizing other molecules, such as gold nanoparticles and streptavidin proteins. Moreover, DNA and other nucleic acids form the foundation of aptamers, which are synthetic oligonucleotide ligands designed for specific target molecules and employed across various biotechnological and biomedical fields.

History and Anthropology

The accumulation and subsequent inheritance of mutations within DNA provide a historical record, enabling geneticists to deduce the evolutionary trajectories and phylogenetic relationships of organisms through comparative analysis of DNA sequences. Phylogenetics, as a discipline, constitutes a formidable instrument in evolutionary biology. Furthermore, intra-species DNA sequence comparisons allow population geneticists to reconstruct the historical dynamics of specific populations. Such methodologies are applicable across diverse research domains, from ecological genetics to anthropology.

Information Storage Potential

DNA exhibits substantial potential as an information storage medium due to its significantly greater storage density compared to conventional electronic devices. Nevertheless, its widespread practical application is currently hindered by considerable expenses, protracted read and write latencies, and inadequate reliability.

Historical Context

The initial isolation of DNA occurred in 1869, when Swiss physician Friedrich Miescher identified a microscopic substance within the pus of discarded surgical bandages. Given its cellular nuclear localization, he designated this substance 'nuclein'. Subsequently, in 1878, Albrecht Kossel successfully isolated the non-protein constituent of nuclein, identifying it as nucleic acid, and subsequently characterized its five primary nucleobases.

In 1909, Phoebus Levene elucidated the nucleotide unit of RNA, then referred to as 'yeast nucleic acid,' comprising a base, sugar, and phosphate. Two decades later, in 1929, Levene further identified deoxyribose sugar within 'thymus nucleic acid,' which is now known as DNA. Levene posited that DNA comprised a sequence of four nucleotide units interconnected by phosphate groups, a concept termed the 'tetranucleotide hypothesis'. He theorized that this chain was short and exhibited a fixed, repeating sequence of bases. Concurrently, in 1927, Nikolai Koltsov advanced the hypothesis that hereditary traits are transmitted through a 'giant hereditary molecule' composed of 'two mirror strands' capable of semi-conservative replication, with each strand serving as a template. A pivotal discovery occurred in 1928 when Frederick Griffith's experiments demonstrated that characteristics of the 'smooth' form of Pneumococcus could be conferred upon the 'rough' form of the same bacterium through the co-incubation of heat-killed 'smooth' bacteria with live 'rough' bacteria. This experimental system offered the initial compelling evidence suggesting that DNA functions as the carrier of genetic information.

In 1933, Jean Brachet, through his investigations of virgin sea urchin eggs, proposed that DNA was localized within the cell nucleus, whereas RNA was exclusively situated in the cytoplasm. During this period, 'yeast nucleic acid' (RNA) was believed to be exclusive to plants, while 'thymus nucleic acid' (DNA) was considered specific to animals. The latter, DNA, was also hypothesized to be a tetramer primarily involved in buffering cellular pH. By 1937, William Astbury had generated the inaugural X-ray diffraction patterns, which provided evidence for a regular, ordered structure within DNA.

In 1943, Oswald Avery, collaborating with Colin MacLeod and Maclyn McCarty, conclusively identified DNA as the transforming principle, thereby substantiating Griffith's earlier hypothesis through what became known as the Avery–MacLeod–McCarty experiment. Concurrently, Erwin Chargaff formulated and disseminated a set of observations, subsequently termed Chargaff's rules, which stipulated that within the DNA of any organismal species, the molar quantity of guanine invariably approximates that of cytosine, and the molar quantity of adenine invariably approximates that of thymine.

By 1951, Alec Todd and his research team at the University of Cambridge had biochemically elucidated the structural arrangement of the DNA backbone, specifically detailing the sequential phosphodiester linkages between the 3' and 5' carbon atoms of the deoxyribose sugar moieties. This foundational work subsequently provided crucial corroboration for the later X-ray structural analyses conducted by Watson and Crick. Todd was subsequently honored with the 1957 Nobel Prize in Chemistry for these and other significant contributions to DNA research.

In late 1951, Francis Crick commenced his collaboration with James Watson at the Cavendish Laboratory, situated within the University of Cambridge. The definitive role of DNA in heredity was unequivocally established in 1952, when Alfred Hershey and Martha Chase, through their seminal Hershey–Chase experiment, demonstrated that DNA constitutes the genetic material of the enterobacteria phage T2.

In May 1952, Raymond Gosling, a graduate student under Rosalind Franklin's supervision, captured an X-ray diffraction image, designated "Photo 51," depicting DNA at high hydration levels. This pivotal photograph, provided to Watson and Crick by Maurice Wilkins, proved instrumental in their elucidation of the correct DNA structure. Franklin had previously informed Crick and Watson that the DNA backbones must reside on the molecule's exterior. Prior to this insight, both Linus Pauling and the Watson-Crick team had developed erroneous models featuring internal chains and outwardly projecting bases. Franklin's precise identification of the space group for DNA crystals subsequently validated her assertion. In February 1953, Linus Pauling and Robert Corey proposed a nucleic acid model comprising three intertwined chains, with phosphates positioned near the axis and bases on the exterior. Watson and Crick subsequently finalized their model, which is now universally recognized as the first accurate representation of the DNA double helix. On February 28, 1953, Crick famously interrupted patrons during lunchtime at The Eagle pub in Cambridge, England, to declare that he and Watson had "discovered the secret of life."

The April 25, 1953, issue of the journal Nature featured a series of five articles presenting the Watson and Crick double-helix structure of DNA and supporting empirical evidence. The structural details were initially reported in a letter titled "MOLECULAR STRUCTURE OF NUCLEIC ACIDS A Structure for Deoxyribose Nucleic Acid", which notably stated, "It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material." This foundational letter was succeeded by a contribution from Franklin and Gosling, marking the inaugural publication of their X-ray diffraction data and their original analytical methodology. Following this, a letter from Wilkins and two colleagues presented an analysis of in vivo B-DNA X-ray patterns, thereby corroborating the presence of the Watson and Crick structure in vivo.

In April 2023, new evidence led scientists to conclude that Rosalind Franklin was a significant contributor and an "equal player" in the DNA discovery process, contrary to some subsequent historical portrayals. In 1962, following Franklin's death, Watson, Crick, and Wilkins were jointly awarded the Nobel Prize in Physiology or Medicine. Nobel Prizes are exclusively conferred upon living recipients. The debate regarding appropriate credit for the discovery persists.

In an influential 1957 presentation, Crick articulated the central dogma of molecular biology, which posited the fundamental relationship among DNA, RNA, and proteins, and introduced the "adaptor hypothesis." Definitive confirmation of the replication mechanism inherent in the double-helical structure was subsequently provided in 1958 by the Meselson–Stahl experiment. Further investigations by Crick and his collaborators demonstrated that the genetic code was founded upon non-overlapping triplets of bases, termed codons, which enabled Har Gobind Khorana, Robert W. Holley, and Marshall Warren Nirenberg to decipher the genetic code. These collective findings are considered to mark the genesis of molecular biology.

In 1986, DNA analysis was first employed in a criminal investigation when UK police requested Alec Jeffreys of the University of Leicester to ascertain the involvement of a suspect who maintained his innocence in a specific case. Although the suspect had previously confessed to a recent rape-murder, he denied any participation in a similar crime committed three years prior. Nevertheless, the striking similarities between the two cases led police to conclude that both offenses were perpetrated by the same individual. However, all charges against the suspect were ultimately dismissed when Jeffreys' DNA testing exonerated him from both the earlier murder and the one to which he had confessed. Subsequent DNA profiling efforts led to the positive identification of another suspect, Colin Pitchfork, who was convicted in 1988 for both rape-murders.

References

DNA the Double Helix Game From the official Nobel Prize web site
Dolan DNA Learning Center
Proteopedia DNA
ENCODE threads explorer ENCODE home page at Nature
Double Helix 1953–2003 National Centre for Biotechnology Education
Educational Modules on Genetics for Educators: The DNA from the Beginning Accompanying Study Guide.
The Protein Data Bank's Molecule of the Month Feature: DNA.
A 'Clue to the Chemistry of Heredity Found,' published in The New York Times, June 1953. This article represents the initial American newspaper reportage concerning the elucidation of DNA's structure.
DNA from the Beginning: An
The Register of Francis Crick's Personal Papers, spanning the years 1938–2007, is housed at the Mandeville Special Collections Library, University of California, San Diego.
A seven-page, handwritten letter from Crick to his 12-year-old son Michael, penned in 1953, details the structure of DNA.

DNA

DNA