KMAP Biosphere Gene Catalogue study includes clusterings and annotations of genes found in the assembled contig sequences of ~62,500 metagenome sequencing runs of ~40,800 samples from 924 public projects. 30% identity clustering includes 290 million clusters, excluding the nonannotated singletons. Total number of the singletons 58 million.

Example queries: Sediment biome genes that were clustered with genes of at least three other biomes | Genes that have MecA annotations

q=*

Number of clusters in similar biomes, based on single counting of sub-biomes

Biomes similarity-matrix data is also available as csv table for morpheus.js with this link, with this link for row and column normalized matrix, both includes the number of the clusters in biomes as row annotations. Different than the following pivot-view, for the similarity-matrix the number of shared clusters are normalised by the total number of the clusters without nonannotated singletons of the biomes of the rows or rows+columns.

Notes: (a) Because of the symmetry the final total given by the pivot table library is more than the actual number. (b) The numbers for comparing the biomes to themselfs do not include the not-annotated single member clusters but includes the annotated single member clusters and the clusters with more than one member genes irrespective of whether they are annotated.

Loading biomes similarity data for the pivottable view ..