The sequence identity of 7576% is well above the intronic level of 69%. Nucleic Acids Res. In one case, the data supported the previous genetic map assignment and contradicted the assembly. 11, 16771685 (2001), Hardies, S. C. et al. Yet this remains a time-consuming process. Cell 109, 283284 (2002), Kapranov, P. et al. Such regions comprised only a tiny fraction (<0.0001) of the total assembly, of which only half had been anchored to a chromosome. The DNA sequence of human chromosome 22. Gaining audience insights can be costly with the wrong tool. Following its introduction, ATAC-seq quickly became one of the leading methods for identification of open chromatin, largely due to the simplicity of the technique and low input requirements, which made it possible to study chromatin structure in rare samples. To do this, we estimated the proportion of the genome that is better conserved than would be expected given the underlying neutral rate of substitution. In the roughly 75 million years since the divergence of the human and mouse lineages, the process of evolution has altered their genome sequences and caused them to diverge by nearly one substitution for every two nucleotides (see below) as well as by deletion and insertion. In principle, de novo gene prediction can be improved by analysing aligned sequences from two related genomes to increase the signal-to-noise ratio135. Natl Acad. Lab. You need to indicate the reasoning behind your choice. The correlation of local lineage-specific SINE density is extremely strong (Fig. Many windows in the coding region get L-scores greater than 3, indicating less than a 1/1,000 chance of occurring under neutral evolution (Pselected(S) > 0.94; see Fig. Such preferences were studied in detail in the initial analysis of the human genome1, and essentially equivalent preferences are seen in the mouse genome (Fig. The genome also encodes many RNAs that do not encode proteins, including abundant RNAs involved in mRNA processing and translation (such as ribosomal RNAs and tRNAs), and more recently discovered RNAs involved in the regulation of gene expression and other functions (such as micro RNAs)165,166. 8, 731737 (2002), Clausen, B. E. et al. Immunol. USA 95, 1077410778 (1998), Santibanez-Koref, M. F., Gangeswaran, R. & Hancock, J. M. A relationship between lengths of microsatellites and nearby substitution rates in mammalian genomes. Furthermore, recent studies report that divergence at fourfold degenerate sites and SNP frequency are both correlated with the local rate of meiotic recombination258,266,267,268. We searched for contigs that were >20kb in size and contained >10kb of sequence in which the read coverage was at least twofold higher than the average. The correlations above are not explained by co-variation with local (G+C) content. Comparative genomic sequence analysis of the human chromosome 21 down syndrome critical region. 1401, 177186 (1998), Lin, J., Toft, D. J., Bengtson, N. W. & Linzer, D. I. Placental prolactins and the physiology of pregnancy. Our gene catalogue contains 656 of these gene predictions, indicating extensive agreement between these two independent analyses. It is no grand structure, it is in ruin! The walls are weak and are often strewin by the wind. No te quites los zapatos! A conspicuous feature of the repeat distribution is that LINE elements in both human and mouse show a preference for accumulating on sex chromosomes (Figs 12 and 15). But if orthologous sequences should be readily alignable, the question becomes: why isn't the alignable portion much higher than 40%? Pennsylvania is constantly coming up with bills and eventually, these bills will be successful. Chromosome X shows lower rates of substitution in both types of sites, consistent with the observation that the male mutation rate is approximately twice the female rate1 (see text). The analysis above allows us to infer the proportion of the genome under selection by decomposing the curve Sgenome into curves Sneutral and Sselected. 7, 502507 (2001), Paigen, K. A miracle enough: the power of mice. Nature 407, 900903 (2000), Chen, F. C., Vallender, E. J., Wang, H., Tzeng, C. S. & Li, W. H. Genomic divergence between human and chimpanzee estimated from large-scale alignments of genomic sequences. MeSH When we consider all exons rather than just coding exons, we find that 941 pairs (62%) have the same number of exons. Of 11,452 cDNA sequences from the curated RefSeq collection, 99.3% of the cDNAs could be aligned to the genome sequence (see Supplementary Information). At the nucleotide level, approximately 40% of the human genome can be aligned to the mouse genome. Disclaimer. Gene 100, 181187 (1991), Zoubak, S., Clay, O. Human chromosome 21 gene expression atlas in the mouse. Nature Genet. These methods tended to have significant overlap with the above-generated gene catalogues, but each tended to introduce significant numbers of predictions that were unsupported by other methods and that appeared to be false positives. Genome-wide detection of allelic imbalance using human SNPs and high- density DNA arrays. In general, the landmarks in the mouse genome are more closely spaced, reflecting the 14% smaller overall genome size. 22). Science 287, 22042215 (2000), Altschul, S. F. et al. Genomics 13, 10951107 (1992), Gardiner-Garden, M. & Frommer, M. CpG islands in vertebrate genomes. Anal. Please enable it to take advantage of the complete set of features! How does the title of the novel relate to "A Mouse"? Opin. Eur. Genome Res. The frequency of the various ratios is plotted on a logarithmic scale for both the autosomes (blue line) and the X chromosome (red line). 10, 758775 (2000), CAS & Bernard, G. Genes, isochores and bands in human chromosomes 21 and 22. Loss-of-heterozygosity analysis of small-cell lung carcinomas using single-nucleotide polymorphism arrays. 30, 3841 (2002), Kulp, D., Haussler, D., Reese, M. G. & Eeckman, F. H. Integrating database homology in a probabilistic gene structure model. It is still active in mouse (represented by MERVL and the MT and ORR1 MaLRs), but died out some 50Myr in human122. We acknowledge A. Holden for coordinating the Mouse Sequencing Consortium. Bldg. 18, 21192123 (2001), Dunham, I. et al. Accordingly, orthology need not be a 1:1 relationship and can sometimes be difficult to discern from paralogy (see protein section below concerning lineage-specific gene family expansion). Genet. The bars show per cent identity of the 15 bases to either side of translation start. The single most prevalent feature of mammalian genomes is their repetitive sequences, most of which are interspersed repeats representing fossils of transposable elements. and JavaScript. The red line indicates median values with standard deviation and 5% (green) and 95% (blue) confidence intervals. O'Brien, S.) 4.1104.142, (1992), Dietrich, W. F. et al. Lennie thinks she's pretty. A YAC-based physical map of the mouse genome. Rev. Genome Res. Comparative genomic sequence analysis of the human and mouse cystic fibrosis transmembrane conductance regulator genes. It can also identify some additional genes not detected in the evidence-based analysis. The BioCluster is housed in Hewlett-Packard's IQ Solutions Center, and was accessed remotely. Science. Starting from a common ancestral genome approximately 75Myr, the mouse and human genomes have each been shuffled by chromosomal rearrangements. Genome Res. Mamm. The mouse intron marked with an asterisk was verified by RTPCR from primers complementary to the flanking exons followed by direct product sequencing327. 27). 284). Such artefactual collapse could be detected as regions with unusually high read coverage, compared with the average depth of 7.4-fold in long assembled contigs. There are 9,785 predicted transcripts that do not correspond to known cDNAs, but these are built on the basis of similarity to known proteins. Examination of the corresponding interval in the human genome showed a rate of loss of these elements, broadly consistent with the 24% deletion rate in the human lineage assumed above (see Supplementary Information). Hao H, Shi B, Zhang J, Dai A, Li W, Chen H, Ji W, Gong C, Zhang C, Li J, Chen L, Yao B, Hu P, Yang H, Brosius J, Lai S, Shi Q, Deng C. Mol Biomed. Thus, domains are under greater purifying selection than are regions not containing domains. Trends Ecol. The strategy has four components: (1) production of a BAC-based physical map of the mouse genome by fingerprinting and sequencing the ends of clones of a BAC library44; (2) WGS sequencing to approximately sevenfold coverage and assembly to generate an initial draft genome sequence; (3) hierarchical shotgun sequencing of BAC clones covering the mouse genome combined with the WGS data to create a hybrid WGS-BAC assembly; and (4) production of a finished sequence by using the BAC clones as a template for directed finishing. We suggested a range of 30,00040,000 to allow for additional genes. The fact that so many of the 25 clusters are related to reproduction is unlikely to be coincidental. 29, 13521365 (2001), Hardison, R. C. Conserved noncoding sequences are reliable guides to regulatory elements. Nature 337, 283285 (1989), Sueoka, N. Directional mutation pressure and neutral molecular evolution. The N50 supercontig size of 16.9Mb far exceeds that achieved by any previous WGS assembly, and the agreement with genome-wide maps is excellent. Federal government websites often end in .gov or .mil. Alignment gaps are tenfold less common than in non-coding regions. Here, we will focus primarily on comparisons between the repeat content of the mouse and human genomes. The you to whom the speaker refers is humankind, non-human animals, and all living things on the planet. First, known protein-coding cDNAs are mapped onto the genome. The current catalogue (Ensembl build 29) contains 27,049 predicted transcripts aggregated into 22,808 predicted genes containing about 199,000 distinct exons (Table 10). Overall, 96% of nucleotides in the assembly have Arachne quality scores 40, corresponding to a predicted error rate of 1 per 10,000 bases. b, The probability, Pselected(S), that a 50-bp window is under selection as a function of its conservation score S = S(R). The reason for the greater density of SSRs in mouse is unknown. Reprod. When one steals one daimen-icker from a thrave or bundle of twenty-four, it is only a sma or small thing. 23 for the 50-bp windows in ancestral repeats, representing neutrally evolving DNA. EMBO Rep. 2, 388393 (2001), Kozak, M. Do the 5untranslated domains of human cDNAs challenge the rules for initiation of translation (or is it vice versa)? When applied to the 342 syntenic segments above, the most parsimonious path has 295 rearrangements. The mammalian genome is evolving in a non-uniform manner, with various measures of divergence showing substantial variation across the genome. In general, SSRs in which one strand is a polypurine tract and the other a polypyrimidine tract are much more common and extended in mouse than human. Thus, some small syntenic segments have probably been omittedthis issue will be addressed best when finished sequences of the two genomes are completed. Again, the outliers show a clear tendency to be repeat-poor in human (see Supplementary Information). Lennie talks. Comparative analysis is a method that is widely used in social science. In some regions of the genome that have been implicated in gene regulation, CpG dinucleotides are not methylated and thus are not subject to deamination and mutation. Cheng Y, Ma Z, Kim BH, Wu W, Cayting P, Boyle AP, Sundaram V, Xing X, Dogan N, Li J, Euskirchen G, Lin S, Lin Y, Visel A, Kawli T, Yang X, Patacsil D, Keller CA, Giardine B; Mouse ENCODE Consortium, Kundaje A, Wang T, Pennacchio LA, Weng Z, Hardison RC, Snyder MP. It is not the right time of year to find the green it needs. Proc. The Dual Axis Chart (one of the comparative analysis charts) comes with two y-axes and a single x-axis. Curr. We annotated the current sets of mouse and human proteins with respect to the InterPro classification of domains, motifs and proteins using the InterProScan computer resource179. It is used in many ways and fields to help people understand the similarities and differences between products better. Among the active class II elements in mouse are two abundant and active groups, the intracisternal-A particles (IAP) and the early-transposons (ETn). Don't read it before a birthday party or any other celebration. Biol. Of course, he states, the mouse should have an ill opinion of man. Other new gene predictions include homologues of aquaporin. Biol. Often ones plans go awry, and foresight may often be in vain or pointless when one never knows whats going to happen. The assembly contains 224,713 sequence contigs, which are connected by at least two read-pair links into supercontigs (or scaffolds). Beyond providing insight into evolutionary events that have moulded the chromosomes, this analysis facilitates further comparisons between the genomes. Curley's flirtatious wife shows up looking for Curley. It seems like Steinbeck is thinking of Lennie as the mouse, and George as the man who turns up its nest: life messes them both up, but at least Lennie doesn't have to remember any of it. 374, 5356 (1995), Simon, A. M., Veyssiere, G. & Jean, C. Structure and sequence of a mouse gene encoding an androgen-regulated protein: a new member of the seminal vesicle secretory protein family. 2, 919929 (2001), Storz, G. An expanding universe of noncoding RNAs. Lec. Figure 25 shows how conservation levels vary regionally within the features of a typical gene. With the complete sequence of the human genome nearly in hand1,2, the next challenge is to extract the extraordinary trove of information encoded within its roughly 3 billion nucleotides. Nature Biotechnol. Google Scholar, Loots, G. G. et al. USA 98, 24972502 (2001), Kumar, S. & Hedges, S. B. Apart from the absolute number of SSRs, there are also some marked differences in the frequency of certain SSR classes (Table 9)136. This information includes the blueprints for all RNAs and proteins, the regulatory elements that ensure proper expression of all genes, the structural elements that govern chromosome function, and the records of our evolutionary history. Cells. Comparative analysis of human and mouse development: From zygote to pre-gastrulation January 2019 Current Topics in Developmental Biology 136 DOI: 10.1016/bs.ctdb.2019.10.002 In book: Current. Sci. Biol. 9, 815824 (1999), Suzuki, Y. et al. 19, 302309 (2002), Wu, C. I. We screened the entire assembly for similar instances, affecting regions of at least 20kb. We examined the rate of deletion in the mouse genome, as measured by the fraction of non-aligning ancestral human DNA (NAanc). The assembly contains about 96% of the sequence of the euchromatic genome (excluding chromosome Y) in sequence contigs linked together into large units, usually larger than 50 megabases (Mb). In mammalian genomes, there is a positive correlation between gene density and (G+C) content81,86,87,88,89. Accessibility Sci. Genet. Ribonuclease A genes appear to have been under strong positive selection, possibly due to their significant role in host-defence mechanisms224. The RFX5 case is interesting, because disruption of the known mouse homologue alone does not reproduce the human disease, but may do so in conjunction with disruption of the newly identified paralogue158. Nature Genet. (in the press), Bernardi, G. The human genome: organization and evolutionary history. USA 98, 1019610201 (2001), Ashcroft, G. S. et al. If such regions are also common in the mouse genome, they might collapse into a single copy in the WGS assembly. Genetics 115, 535543 (1987), Jia, H. P. et al. An important issue in annotating mammalian genomes is distinguishing real genes from pseudogenes, that is, inactive gene copies. Guts and gastrulation: Emergence and convergence of endoderm in the mouse embryo. Yue F, Cheng Y, Breschi A, Vierstra J, Wu W, Ryba T, Sandstrom R, Ma Z, Davis C, Pope BD, Shen Y, Pervouchine DD, Djebali S, Thurman RE, Kaul R, Rynes E, Kirilusha A, Marinov GK, Williams BA, Trout D, Amrhein H, Fisher-Aylor K, Antoshechkin I, DeSalvo G, See LH, Fastuca M, Drenkow J, Zaleski C, Dobin A, Prieto P, Lagarde J, Bussotti G, Tanzer A, Denas O, Li K, Bender MA, Zhang M, Byron R, Groudine MT, McCleary D, Pham L, Ye Z, Kuan S, Edsall L, Wu YC, Rasmussen MD, Bansal MS, Kellis M, Keller CA, Morrissey CS, Mishra T, Jain D, Dogan N, Harris RS, Cayting P, Kawli T, Boyle AP, Euskirchen G, Kundaje A, Lin S, Lin Y, Jansen C, Malladi VS, Cline MS, Erickson DT, Kirkup VM, Learned K, Sloan CA, Rosenbloom KR, Lacerda de Sousa B, Beal K, Pignatelli M, Flicek P, Lian J, Kahveci T, Lee D, Kent WJ, Ramalho Santos M, Herrero J, Notredame C, Johnson A, Vong S, Lee K, Bates D, Neri F, Diegel M, Canfield T, Sabo PJ, Wilken MS, Reh TA, Giste E, Shafer A, Kutyavin T, Haugen E, Dunn D, Reynolds AP, Neph S, Humbert R, Hansen RS, De Bruijn M, Selleri L, Rudensky A, Josefowicz S, Samstein R, Eichler EE, Orkin SH, Levasseur D, Papayannopoulou T, Chang KH, Skoultchi A, Gosh S, Disteche C, Treuting P, Wang Y, Weiss MJ, Blobel GA, Cao X, Zhong S, Wang T, Good PJ, Lowdon RF, Adams LB, Zhou XQ, Pazin MJ, Feingold EA, Wold B, Taylor J, Mortazavi A, Weissman SM, Stamatoyannopoulos JA, Snyder MP, Guigo R, Gingeras TR, Gilbert DM, Hardison RC, Beer MA, Ren B; Mouse ENCODE Consortium. Continuity near telomeres tends to be lower, and two chromosomes (5 and X) have unusually large numbers of ultracontigs. In the next section, we show that gene predictions that avoid many of the biases of evidence-based gene prediction result in only a modest increase in the predicted gene count (in the range of about 1,000 genes). 278, 167181 (1998), Dermitzakis, E. & Clark, A. Evolution of transcription factor binding sites in mammalian gene regulatory regions: conservation and turnover. "Of Mice and Men" by John Steinbeck was named after Robert Burns' poem "To a Mouse." We used the genome-wide alignments to examine the extent of conservation in gene-related features, including coding regions, introns, untranslated regions, upstream regions and CpG islands. In a compare-and contrast, you also need to make links between A and B in the body of your essay if you want your paper to hold together. Lennie enters the bunkhouse secretly carrying his new puppy. Methyl-CpG is mutated by deamination to TpG, leading to approximately fivefold under-representation of CpG across the human1,95 and mouse genomes. Cell 110, 327338 (2002), Moran, J. et al. Comparative Analysis vs. In a sample of 101 predictions that failed to meet the criteria, the validation rate was 11% for genes with strong homology to human sequence and 3% for those without. These same four regions are exceptions in the mouse genome as well. As well as gene birth, the clusters bear witness to gene death: the Abp, P450 Cyp4a and Cyp4d cytochrome P450, and carboxylesterase families all contain one or more predicted pseudogene. On the basis of the fraction of mouse exons with human counterparts, the percentage of true exons among all predicted exons or the specificity of the initial mouse gene catalogue is estimated to be 93%. Singer,Jade P. Vinson,Claire M. Wade&Michael C. Zody, European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Cambridge, Hinxton, UK, Ewan Birney,Nick Goldman,Arkadiusz Kasprzyk,Emmanuel Mongin,Alistair G. Rust,Guy Slater,Arne Stabenau,Abel Ureta-Vidal,Simon Whelan,Ewan Birney,Nick Goldman,Arkadiusz Kasprzyk,Guy Slater,Arne Stabenau&Simon Whelan, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, CB10 1SA, Cambridge, Hinxton, UK, Rachel Ainscough,John Attwood,Jonathon Bailey,Karen Barlow,Stephan Beck,John Burton,Michele Clamp,Christopher Clee,Alan Coulson,James Cuff,Val Curwen,Tim Cutts,Joy Davies,Eduardo Eyras,Darren Grafham,Simon Gregory,Tim Hubbard,Adrienne Hunt,Matthew Jones,Ann Joy,Steven Leonard,Christine Lloyd,Lucy Matthews,Stuart McLaren,Kirsten McLay,Beverley Meredith,James C. Mullikin,Zemin Ning,Karen Oliver,Emma Overton-Larty,Robert Plumb,Simon Potter,Michael Quail,Jane Rogers,Carol Scott,Steve Searle,Ratna Shownkeen,Sarah Sims,Melanie Wall,Anthony P. West,David Willey,Sophie Williams,Michele Clamp,James Cuff,Val Curwen,Tim Cutts,Eduardo Eyras,Simon Gregory,Tim Hubbard,James C. Mullikin,Zemin Ning,Simon Potter&Steve Searle, Research Group in Biomedical Informatics, Institut Municipal d'Investigacio, Medica/Universitat Pompeu Fabra, Centre de Regulacio Genomica, Barcelona, Catalonia, Spain, Josep F. Abril,Roderic Guig,Gens Parra,Josep F. Abril,Roderic Guig&Gens Parra, Bioinformatics, GlaxoSmithKline, UW2230, 709 Swedeland Road, King of Prussia, Pennsylvania, 19406, USA, National Center for Biotechnology Information, National Institutes of Health, Bethesda, Maryland, 20892, USA, Richa Agarwala,Deanna M. Church,Wratko Hlavina,Donna R. Maglott,Victor Sapojnikov,Deanna M. Church,Wratko Hlavina,Donna R. Maglott&Victor Sapojnikov, Department of Mathematics, University of California at Berkeley, 970 Evans Hall, 94720, Berkeley, California, USA, Marina Alexandersson,Lior Pachter,Marina Alexandersson&Lior Pachter, Division of Medical Genetics, University of Geneva Medical School, 1 rue Michel-Servet, CH-1211, Geneva, Switzerland, Stylianos E. Antonarakis,Emmanouil T. Dermitzakis,Alexandre Reymond,Catherine Ucla,Stylianos E. Antonarakis,Emmanouil T. Dermitzakis,Alexandre Reymond&Catherine Ucla, Center for Biomolecular Science and Engineering, University of California, 95064, Santa Cruz, California, USA, Robert Baertsch,Mark Diekhans,Terrence S. Furey,Angela Hinrichs,Fan Hsu,Donna Karolchik,W. James Kent,Krishna M. Roskin,Matthias S. Schwartz,Charles Sugnet,Ryan J. Weber,Robert Baertsch,Mark Diekhans,Terrence S. Furey,Angela Hinrichs,Fan Hsu,Donna Karolchik,W. James Kent,Krishna M. Roskin,Matthias S. Schwartz,Charles Sugnet&Ryan J. Weber, EMBL, Meyerhofstrasse 1, 69117, Heidelberg, Germany, Peer Bork,Ivica Letunic,Mikita Suyama,David Torrents,Evgeny M. Zdobnov,Peer Bork,Ivica Letunic,Mikita Suyama,David Torrents&Evgeny M. Zdobnov, UK MRC Mouse Sequencing Consortium, MRC Mammalian Genetics Unit, Harwell, OX11 0RD, UK, Marc Botcherby,Stephen D. Brown,Robert D. Campbell&Ian Jackson, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Mailstop 84-171, Berkeley, California, 94720, USA, Nicolas Bray,Olivier Couronne,Inna Dubchak,Alex Poliakov,Edward M. Rubin,Nicolas Bray,Olivier Couronne,Inna Dubchak&Alex Poliakov, Department of Computer Science, Washington University, Box 1045, St Louis, Missouri, 63130, USA, Michael R. Brent,Paul Flicek,Evan Keibler,Ian Korf,Michael R. Brent,Paul Flicek,Evan Keibler&Ian Korf, School of Computer Science, University of Waterloo, Waterloo, Ontario, N2L 3G1, Canada, Daniel G. Brown,S. Batalov&Daniel G. Brown, The Jackson Laboratory, 600 Main Street, Bar Harbor, Maine, 04609, USA, Carol Bult,Wayne N. Frankel,Carol Bult&Wayne N. Frankel, Laboratory for Genome Exploration, RIKEN Genomic Sciences Center, Yokohama Institute, 1-7-22 Suchiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan, Piero Carninci,Yoshihide Hayashizaki,Jun Kawai&Yasushi Okazaki, Affymetrix Inc., Emeryville, California, 94608, USA, Simon Cawley,David Kulp,Raymond Wheeler,Simon Cawley,David Kulp&Raymond Wheeler, Departments of Statistics and Health Evaluation Sciences, The Pennsylvania State University, University Park, Pennsylvania, 16802, USA, Francesca Chiaromonte&Francesca Chiaromonte, National Human Genome Research Institute, National Institutes of Health, 31 Center Drive, Room 4B09, Bethesda, Maryland, 20892, USA, Francis S. Collins,Adam Felsenfeld,Mark Guyer,Jane Peterson,Kris Wetterstrand,Francis S. Collins&Adam Felsenfeld, Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, OX3 7BN, Oxford, UK, Richard R. Copley,Richard Mott,Richard R. Copley&Richard Mott, Department of Electrical Engineering, University of California, Berkeley, 231 Cory Hall, Berkeley, California, 94720, USA, Department of Human Anatomy and Genetics, MRC Functional Genetics Unit, University of Oxford, South Parks Road, OX1 3QX, Oxford, UK, Nicholas J. Dickens,Richard D. Emes,Leo Goodstadt,Chris P. Ponting,Eitan Winter,Nicholas J. Dickens,Richard D. Emes,Leo Goodstadt,Chris P. Ponting&Eitan Winter, Department of Human Genetics, University of Utah, Salt Lake City, Utah, 84112, USA, Diane M. Dunn,Andrew C. von Niederhausern&Robert B. Weiss, Howard Hughes Medical Institute and Department of Genetics, Washington University School of Medicine, St Louis, Missouri, 63110, USA, Sean R. Eddy,L. Steven Johnson,Thomas A. Jones&Sean R. Eddy, Departments of Biochemistry and Molecular Biology and Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania, 16802, USA, Laura Elnitski,Diana L. Kolbe,Laura Elnitski&Diana L. Kolbe, Department of Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania, 16802, USA, Pallavi Eswara,Webb Miller,Michael J. O'Connor,Scott Schwartz,Pallavi Eswara,Webb Miller&Scott Schwartz, Baylor College of Medicine, Human Genome Sequencing Center, One Baylor Plaza, MSC-226, Houston, Texas, 77030, USA, The Institute for Systems Biology, 1441 North 34th Street, Seattle, Washington, 98103, USA, Gustavo Glusman,Arian Smit,Gustavo Glusman&Arian Smit, National Human Genome Research Institute, National Institutes of Health, 50 South Drive, Building 50, Room 5523, Bethesda, Maryland, 20892, USA, Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania, 16802, USA, Ross C. Hardison,Shan Yang&Ross C. Hardison, Howard Hughes Medical Institute, University of California, Santa Cruz, California, 95064, USA, Department of Chemistry and Biochemistry, University of Oklahoma Advanced Center for Genome Technology, University of Oklahoma, 620 Parrington Oval, Room 311, Oklahoma, Norman, 73019, USA, Departments of Genetics and Medicine and Harvard-Partners Center for Genetics and Genomics, Harvard Medical School, Boston, Massachusetts, 02115, USA, Raju S. Kucherlapati&Kate T. Montgomery, Department of Statistics, The Pennsylvania State University, University Park, Pennsylvania, 16802, USA, Department of Computer Science, University of California, Santa Barbara, California, 93106, USA, US DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, California, 94598, USA, Department of Computer Science, University of Western Ontario, London, Ontario, N6A 5B7, Canada, Cold Spring Harbor Laboratory, PO Box 100, 1 Bungtown Road, Cold Spring Harbor, New York, 11724, USA, Wellcome Trust, 183 Euston Road, NW1 2BE, London, UK, Department of Computer Science and Engineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, California, 92093-0114, USA, Pavel Pevzner,Glenn Tesler,Pavel Pevzner&Glenn Tesler, Max Planck Institute for Molecular Genetics, Ihnestrasse 73, 14195, Berlin, Germany, Genome Therapeutics Corporation, 100 Beaver Street, Waltham, Massachusetts, 02453, USA, Bioinformatics Solutions Inc., 145 Columbia Street W, Waterloo, Ontario, N2L 3L2, Canada, Department of Molecular and Human Genetics, Baylor College of Medicine, Mailstop BCM226, Room 1419.01, One Baylor Plaza, Texas, Houston, 77030, USA, Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, 02138, USA, Eric S. Lander,Eric S. Lander&Eric S. Lander.