sourmash: a library for MinHash sketching of DNA


sourmash is a toolbox for creating, comparing, and manipulating MinHash sketches of genomic data.

MinHash sketches provide a lightweight way to store "signatures" of large DNA or RNA sequence collections, and then compare or search them using a Jaccard index. MinHash sketches can be used to identify samples, find similar samples, identify data sets with shared sequences, and build phylogenetic trees (Ondov et al. 2015).

sourmash provides a command line script, a Python library, and a CPython module for MinHash sketches.


