PLEASE NOTE: The Thalassiosira pseudonana genome sequence is composed of "finished chromosomes" (Thaps3) and "unmapped sequence"(Thaps3_bd), which were annotated separately. Please use both portals to make a complete analysis of the genome.
v3.0 finished chromsomes and unmapped sequence (May 2007): This Thalassiosira pseudonana genome sequence assembly was initially assembled using the JGI assembler, Jazz, constructed from whole genome shotgun paired end sequencing reads. That draft assembly was finished by JGI/Stanford Human Genome Center which produced the Thalassiosira_pseudonana_v3.031306 genome assembly reported here.
There are two parts to the T. pseudonana genome sequence assembly and annotation reported here: the Thaps3 "finished chromosomes" and the Thaps3_bd "unmapped sequence". The finished chromosomes consist of the finished genome sequence that could be reliably assembled into chromosomes based on an Optical map. The "unmapped sequence" consists of assembled scaffolds that could neither be mapped to finished chromosomes nor assigned to organelles, but that could be aligned to T. pseudonana ESTs which were not represented in the finished chromosomes. Because some ESTs not represented on the finished chromosomes are represented on the unmapped sequence scaffolds, these scaffolds are indicated as possible regions of the T. pseudonana genome not present in the finished chromosomes, as opposed to being alternate haplotypes of sequence already contained in the finished chromosomes.The finished chromosomes of the nuclear genome sequence from that assembly were annotated using the JGI Genome Annotation Pipeline and custom analyses, and are reported here as the Thaps3 annotation. The unmapped sequence were annotated in the same manner, and are reported here as the Thaps3_bd annotation.
Summary statistics for the 3.0 release (Thaps3 and Thaps3_bd), including comparison to the previous v1.0 release (thaps1, the previous public release), are below.
Nuclear Genome Assembly | thaps1 | Thaps3 | Thaps3_bd |
---|---|---|---|
Nuclear genome size (Mbp) | 32.0 | 31 | 1.1 |
Sequencing read coverage depth | ~8.78x | ~12.8x | - |
Total # of fasta sequences (nuclear) | 1,146 | 28 | 37 |
Total # of fasta sequences (>2 Kbp) | 595 | 28 | 37 |
Three largest Scaffolds (Mbp) | 1.00.770.76 | 3.02.72.4 | 0.140.100.083 |
Gene Models | thaps11 | Thaps32 | Thaps3_bd3 | Thaps3_bd / Thaps3 | thaps1 / Thaps3 |
---|---|---|---|---|---|
length (bp) of: |
average |
average |
average |
ratio |
ratio |
gene | 993.3 | 1,745.5 | 1,602.6 | 92% | 57% |
transcript | 793.1 | 1,556 | 1,390.3 | 89% | 51% |
exon | 334.4 | 613 | 529.2 | 86% | 55% |
intron | 147.9 | 125.2 | 132.5 | 106% | 118% |
description: | |||||
protein length (aa) | 261.4 | 498.7 | 429.6 | 86% | 52% |
exons per gene | 2.37 | 2.54 | 2.63 | 104% | 93% |
# of gene models in track | 11,397 | 11,390 | 386 | 3% | 100% |
[1] New Models Version 2.0
[2] Filtered gene models v3.0
[3] FilteredModels1
This work was performed under the auspices of the US Department of Energy's Office of Science, Biological and Environmental Research Program, and by the University of California, Lawrence Berkeley National Laboratory under contract No. DE-AC02-05CH11231, Lawrence Livermore National Laboratory under Contract No. DE-AC52-07NA27344, and Los Alamos National Laboratory under contract No. DE-AC02-06NA25396.