-
Notifications
You must be signed in to change notification settings - Fork 33
Description
I'm getting different hash value outputs from python-cityhash than from other implementations. E.g.
fasthash::city::Hash64 - Rust
assert_eq!(Hash64::hash(b"hello"), 2578220239953316063);vs
>>> cityhash.CityHash64WithSeeds(b"BU9[85WWp/ HASH!", 2239407493875278174, 1318041201923111131)
5218636442198533358
I assume that is because the Python implementation hashes the entire String data structure, not just the 5 bytes of "hello".
That of course makes sense for the goal of a fast hash algorithm for arbitrary data structures.
But it would be nice to also have a way to hash arbitrary byte arrays. Perhaps some sort of argument "raw=True" could be added to the functions?
As an aside, it is frustratingly hard to find any official examples of hash values or test vectors from CityHash official sources!
They are absent from the presentation at CityHash: Fast Hash Functions for Strings. The city-test.cc code is inscrutable. That Rust example is the best I've found. A sad state of affairs.
So I'll add one more, and at the same time more clearly demonstrate the collision issue that was revealed by djb, Jean-Philippe Aumasson, and Martin Boßlet at 29C3: Hash-flooding DoS reloaded: attacks and defenses. I modified their poc citycollisions-20120730.tar.gz code (gone from their original web site, but preserved by the amazing archive.org!) to be clearer about how to call the code to reproduce their collisions.
$ ./citycollisions_ascii 30000
128-bit hex key 24fbab96507d3be76326ad973ed6d702
== 2 64-bit base-10 keys k1, k2: 2664912266603609063, 7144588723975804674
CityHash64WithSeeds( 'BU9[85WWp/ HASH!', 16, k1, k2 ) = a2f4696e95a3dfbc
CityHash64WithSeeds( '8{YDLn;d.2 HASH!', 16, k1, k2 ) = a2f4696e95a3dfbc
CityHash64WithSeeds( 'd+nkK&t?yr HASH!', 16, k1, k2 ) = a2f4696e95a3dfbc
CityHash64WithSeeds( '{A.#v5i]V{ HASH!', 16, k1, k2 ) = a2f4696e95a3dfbc