KMAP Biosphere Gene Catalogue study includes clusterings and annotations of genes found in the assembled contig sequences of ~62,500 metagenome sequencing runs of ~40,800 samples from 924 public projects. 90% identity clustering includes 789 million clusters. Total number of the singletons 420 million.

Note: current supplementary tables page is a work in progress, for now we include the supplementary summary tables, top-annotations tables, and the sequencing-effort table, together with the most recent diagrams we were working on, and the basic stats for the total number of clusters.

Cluster sizes histogram




Supplementary summary tables

Top annotations tables

#Clusters
Clusters:
- 90% identity clustering: 789,314,566
- 30% identity clustering: 290,216,023
Clusters with #members >= 2:
- 90% identity clustering: 369,611,220
- 30% identity clustering: 231,441,295
Singletons (clusters with one member only):
- 90% identity clustering: 419,703,346
- 30% identity clustering: 58,774,728
Annotated Clusters:
- 90% identity clustering: 573,281,610
- 30% identity clustering: 140,517,476
Annotated Clusters with #members >= 2:
- 90% identity clustering: 250,830,830
- 30% identity clustering: 81,905,646
Annotated singletons:
- 90% identity clustering: 322,450,780
- 30% identity clustering: 58,611,830
Unannotated Clusters:
- 90% identity clustering: 216,032,956
- 30% identity clustering: 149,698,547
Unannotated Clusters with #members >= 2:
- 90% identity clustering: 118,780,390
- 30% identity clustering: 149,535,649
Unannotated singletons:
- 90% identity clustering: 97,252,566
- 30% identity clustering: 162,898


Sequencing effort
Sequencing effort summary table, with #clusters for 90% identity and 30% identity clusterings:
(*) Singleton clusters with no annotations were not included in total numbers of clusters per biome.