Info - Thalassiosira pseudonana CCMP 1335

PLEASE NOTE: The Thalassiosira pseudonana genome sequence is composed of "finished chromosomes" (Thaps3) and "unmapped sequence"(Thaps3_bd), which were annotated separately. Please use both portals to make a complete analysis of the genome.

Status

v3.0 finished chromsomes and unmapped sequence (May 2007): This Thalassiosira pseudonana genome sequence assembly was initially assembled using the JGI assembler, Jazz, constructed from whole genome shotgun paired end sequencing reads. That draft assembly was finished by JGI/Stanford Human Genome Center which produced the Thalassiosira_pseudonana_v3.031306 genome assembly reported here.

There are two parts to the T. pseudonana genome sequence assembly and annotation reported here: the Thaps3 "finished chromosomes" and the Thaps3_bd "unmapped sequence". The finished chromosomes consist of the finished genome sequence that could be reliably assembled into chromosomes based on an Optical map. The "unmapped sequence" consists of assembled scaffolds that could neither be mapped to finished chromosomes nor assigned to organelles, but that could be aligned to T. pseudonana ESTs which were not represented in the finished chromosomes. Because some ESTs not represented on the finished chromosomes are represented on the unmapped sequence scaffolds, these scaffolds are indicated as possible regions of the T. pseudonana genome not present in the finished chromosomes, as opposed to being alternate haplotypes of sequence already contained in the finished chromosomes.The finished chromosomes of the nuclear genome sequence from that assembly were annotated using the JGI Genome Annotation Pipeline and custom analyses, and are reported here as the Thaps3 annotation. The unmapped sequence were annotated in the same manner, and are reported here as the Thaps3_bd annotation.

Summary statistics for the 3.0 release (Thaps3 and Thaps3_bd), including comparison to the previous v1.0 release (thaps1, the previous public release), are below.

Nuclear Genome Assembly	thaps1	Thaps3	Thaps3_bd
Nuclear genome size (Mbp)	32.0	31	1.1
Sequencing read coverage depth	~8.78x	~12.8x	-
Total # of fasta sequences (nuclear)	1,146	28	37
Total # of fasta sequences (>2 Kbp)	595	28	37
Three largest Scaffolds (Mbp)	1.0 0.77 0.76	3.0 2.7 2.4	0.14 0.10 0.083

Gene Models	thaps1¹	Thaps3²	Thaps3_bd³	Thaps3_bd / Thaps3	thaps1 / Thaps3
length (bp) of:	average	average	average	ratio	ratio
gene	993.3	1,745.5	1,602.6	92%	57%
transcript	793.1	1,556	1,390.3	89%	51%
exon	334.4	613	529.2	86%	55%
intron	147.9	125.2	132.5	106%	118%
description:
protein length (aa)	261.4	498.7	429.6	86%	52%
exons per gene	2.37	2.54	2.63	104%	93%
# of gene models in track	11,397	11,390	386	3%	100%

[1] New Models Version 2.0

[2] Filtered gene models v3.0

[3] FilteredModels1

Collaborators

US Department of Energy Joint Genome Institute (JGI)
Stanford Human Genome Center
Virginia Armbrust ([email protected]) at the University of Washington School of Oceanography
Lab of Chris Bowler at Ecole Normale Superieure

Funding

This work was performed under the auspices of the US Department of Energy's Office of Science, Biological and Environmental Research Program, and by the University of California, Lawrence Berkeley National Laboratory under contract No. DE-AC02-05CH11231, Lawrence Livermore National Laboratory under Contract No. DE-AC52-07NA27344, and Los Alamos National Laboratory under contract No. DE-AC02-06NA25396.

Status

Collaborators

Links

Funding