Image signatures and distances

Consider these two photographs of the Mona Lisa:


(credit: Wikipedia Public domain)


(credit: WikiImages Public domain)

Though it’s obvious to any human observer that this is the same image, we can find a number of subtle differences: the dimensions, palette, lighting and so on are different in each image. image_match will give us numerical comparison:

from image_match.goldberg import ImageSignature
gis = ImageSignature()
a = gis.generate_signature(',_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg/687px-Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg')
b = gis.generate_signature('')
gis.normalized_distance(a, b)

Returns 0.22095170140933634. Normalized distances of less than 0.40 are very likely matches. If we try this again against a dissimilar image, say, Caravaggio’s Supper at Emmaus:


(credit: Wikipedia Public domain)

against one of the Mona Lisa photographs:

c = gis.generate_signature('')
gis.normalized_distance(a, c)

Returns 0.68446275381507249, almost certainly not a match. image_match doesn’t have to generate a signature from a URL; a file-path or even an in-memory bytestream will do (be sure to specify bytestream=True in the latter case).

Now consider this subtly-modified version of the Mona Lisa:


(credit: Michael Russell Attribution-ShareAlike 2.0 Generic)

How similar is it to our original Mona Lisa?

d = gis.generate_signature('')
gis.normalized_distance(a, d)

This gives us 0.42557196987336648. So markedly different than the two original Mona Lisas, but considerably closer than the Caravaggio.