The genome sequence and gene models of Scenedesmus obliquus strain UTEX B 3031 (=DOE0152Z) were not determined by the Joint Genome Institute (JGI). The annotation was performed by Dr. Zaid McKie-Krisberg at Brooklyn College of the City University of New York. In order to allow comparative analyses with other genomes sequenced by the JGI, a copy of this genome is incorporated into the JGI Genome Portal. JGI tools were used to automatically annotate predicted proteins. Please note that the release presented here includes the gene annotation v1.0, and all gene models are included in the ExternalModels track. This annotation has not been published and permission should be requested for use.

We applied filters to remove if present: 1) transposable elements, 2) pseudogenes, 3) alternative transcripts and overlapping models, 4) alleles on secondary scaffolds, and 5) unsupported short models. This resulted in removal of 14,261 models from S. obliquus UTEX B 3031 and generation of the FilteredModels2 gene track. JGI tools were used to automatically annotate predicted proteins. Please note that this copy of the genome is not maintained by the JGI and is therefore not automatically updated.

S. obliquus UTEX B 3031 is likely diploid, and this is reflected in an assembly and annotation with significant separation of alleles. 1,705 of the 2,812 scaffolds are very similar to larger scaffolds and are predicted to constitute an alternate or secondary haplotype. To represent these primary and secondary haplotypes in the Portal, we have created 'primary alleles' and 'secondary alleles' gene model tracks, comprising the models found on each haplotype. The goal of the GeneCatalog (GC) is to produce a non-redundant set of models which captures the full functional repertoire of the genome, and so the few secondary alleles that are unique were included in the GC, while all others were not.

Summary statistics for the Scenedesmus obliquus UTEX B 3031 v1.0 release are below.
Genome Assembly
Genome Assembly size (Mbp) 210.26
Sequencing read coverage depth 86x
# of contigs 2812
# of scaffolds 2812
# of scaffolds >= 2Kbp 2812
Scaffold N50 348
Scaffold L50 (Mbp) 0.15
# of gaps 0
% of scaffold length in gaps 0.0%
Three largest Scaffolds (Mbp) 2.33, 1.49, 1.27

Gene Models FilteredModels2
length (bp) of: average median
gene 4373 3494
transcript 1878 1566
exon 259 156
intron 401 312
protein length (aa) 456 363
exons per gene 7.25 6
# of gene models 22378


