How good is Torus at identifying tunes? Suppose we pick tunes in the
database uniformly at random and give Torus the first `k`
characters of their shape strings, for some fixed `k`. Then we
can define two possible measures:

- the
**uniqueness probability**= the probability that a tune is identified uniquely; - the
**resolution**= the reciprocal of the expected number of matches found.

We can express each of these measures as a percentage: the higher the
value, the better Torus is at identifying tunes by using `k`
shape characters. We can then plot these measures against `k`
to get an idea of the best length of shape string to use:

Ideally we'd expect both measures to reach 100% at some point, ie when we've used enough shape characters to distinguish any tune uniquely. However, it may be that there are pairs of tunes that Torus can't distinguish, in which case the measures won't reach this value. This can happen if the shape string for one tune appears at the start of the shape string for the other (and I can't extend the shorter shape string because I've now forgotten the relevant tune). Currently the final uniqueness probability is 97.0% and the final resolution is 98.0%.

The graph above suggests some puzzles: What shape should we
expect the curves to be? Do they approach each other as `k`
becomes large?

This page is maintained by Thomas Bending,
and was last modified on 28 March 2016.

Comments, criticisms and suggestions are welcome.
Copyright © Thomas Bending 2017.