Info • Thalassiosira pseudonana CCMP 1335

PLEASE NOTE: The Thalassiosira pseudonana genome sequence is composed of "finished chromosomes" (Thaps3) and "unmapped sequence"(Thaps3_bd), which were annotated separately. Please use both portals to make a complete analysis of the genome.

Status

v3.0 finished chromsomes and unmapped sequence (May 2007): This Thalassiosira pseudonana genome sequence assembly was initially assembled using the JGI assembler, Jazz, constructed from whole genome shotgun paired end sequencing reads. That draft assembly was finished by JGI/Stanford Human Genome Center which produced the Thalassiosira_pseudonana_v3.031306 genome assembly reported here.

There are two parts to the T. pseudonana genome sequence assembly and annotation reported here: the Thaps3 "finished chromosomes" and the Thaps3_bd "unmapped sequence". The finished chromosomes consist of the finished genome sequence that could be reliably assembled into chromosomes based on an Optical map. The "unmapped sequence" consists of assembled scaffolds that could neither be mapped to finished chromosomes nor assigned to organelles, but that could be aligned to T. pseudonana ESTs which were not represented in the finished chromosomes. Because some ESTs not represented on the finished chromosomes are represented on the unmapped sequence scaffolds, these scaffolds are indicated as possible regions of the T. pseudonana genome not present in the finished chromosomes, as opposed to being alternate haplotypes of sequence already contained in the finished chromosomes.The finished chromosomes of the nuclear genome sequence from that assembly were annotated using the JGI Genome Annotation Pipeline and custom analyses, and are reported here as the Thaps3 annotation. The unmapped sequence were annotated in the same manner, and are reported here as the Thaps3_bd annotation.

Summary statistics for the 3.0 release (Thaps3 and Thaps3_bd), including comparison to the previous v1.0 release (thaps1, the previous public release), are below.

Nuclear Genome Assembly thaps1 Thaps3 Thaps3_bd
Nuclear genome size (Mbp) 32.0 31 1.1
Sequencing read coverage depth ~8.78x ~12.8x -
Total # of fasta sequences (nuclear) 1,146 28 37
Total # of fasta sequences (>2 Kbp) 595 28 37
Three largest Scaffolds (Mbp) 1.0
0.77
0.76
3.0
2.7
2.4
0.14
0.10
0.083

Gene Models thaps11 Thaps32 Thaps3_bd3 Thaps3_bd / Thaps3 thaps1 / Thaps3
length (bp) of:
average
average
average
ratio
ratio
gene 993.3 1,745.5 1,602.6 92% 57%
transcript 793.1 1,556 1,390.3 89% 51%
exon 334.4 613 529.2 86% 55%
intron 147.9 125.2 132.5 106% 118%
description:
protein length (aa) 261.4 498.7 429.6 86% 52%
exons per gene 2.37 2.54 2.63 104% 93%
# of gene models in track 11,397 11,390 386 3% 100%

[1] New Models Version 2.0

[2] Filtered gene models v3.0

[3] FilteredModels1

Collaborators

Links

Funding

This work was performed under the auspices of the US Department of Energy's Office of Science, Biological and Environmental Research Program, and by the University of California, Lawrence Berkeley National Laboratory under contract No. DE-AC02-05CH11231, Lawrence Livermore National Laboratory under Contract No. DE-AC52-07NA27344, and Los Alamos National Laboratory under contract No. DE-AC02-06NA25396.