 |
Early Insights from the Human DNA
Sequence
Genomics and Its Impact on Science and Society: The Human
Genome Project and Beyond
What We've Learned Thus
Far
The first panoramic views of the
human genetic landscape have revealed a wealth of information and
some early surprises. Much remains to be deciphered in this vast
trove of information; as the consortium of HGP scientists concluded
in their seminal paper, “. . .the more we learn about the
human genome, the more there is to explore.” A few highlights
from the first publications analyzing the sequence
follow.
- The human genome contains 3.2
billion chemical nucleotide bases (A, C, T, and G).
- The average gene consists of 3000
bases, but sizes vary greatly, with the largest known human gene
being dystrophin at 2.4 million bases.
- The functions are unknown for more
than 50% of discovered genes.
- The human genome sequence is
almost (99.9%) exactly the same in all people.
- About 2% of the genome encodes
instructions for the synthesis of proteins.
- Repeat sequences that do not code
for proteins make up at least 50% of the human genome.
- Repeat sequences are thought to
have no direct functions, but they shed light on chromosome
structure and dynamics. Over time, these repeats reshape the genome
by rearranging it, thereby creating entirely new genes or modifying
and reshuffling existing genes.
- The human genome has a much
greater portion (50%) of repeat sequences than the mustard weed
(11%), the worm (7%), and the fly (3%).
- Over 40% of the predicted human
proteins share similarity with fruit-fly or worm
proteins.
- Genes appear to be concentrated in
random areas along the genome, with vast expanses of noncoding DNA
between.
- Chromosome 1 (the largest human
chromosome) has the most genes (2968), and the Y chromosome has the
fewest (231).
- Genes have been pinpointed and
particular sequences in those genes associated with numerous
diseases and disorders including breast cancer, muscle disease,
deafness, and blindness.
- Scientists have identified about 3
million locations where single-base DNA differences occur in
humans. This information promises to revolutionize the processes of
finding DNA sequences associated with such common diseases as
cardiovascular disease, diabetes, arthritis, and
cancers.
Organism |
Genome Size (Bases) |
Estimated
Genes |
Human (Homo sapiens) |
3.2 billion |
30,000 to 40,000 |
Laboratory mouse (M. musculus) |
2.6 billion |
30,000 |
Mustard weed (A. thaliana) |
100 million |
25,000 |
Roundworm (C. elegans) |
97 million |
19,000 |
Fruit fly (D. melanogaster) |
137 million |
13,000 |
Yeast (S. cerevisiae) |
12.1 million |
6,000 |
Bacterium (E. coli) |
4.6 million |
3,200 |
Human immunodeficiency virus (HIV) |
9700 |
9 |
The estimated number of human genes is only one-third as great
as previously thought, although the numbers may be revised as more
computational and experimental analyses are performed.
Scientists suggest that the genetic key to human complexity lies
not in gene number but in how gene parts are used to build
different products in a process called alternative splicing. Other
underlying reasons for greater complexity are the thousands of
chemical modifications made to proteins and the repertoire of
regulatory mechanisms controlling these processes.

|
|