The Perfect Hash Function

In the article from 1984, Fredman and Komlos (1) mention the perfect hash function, that actually can store a sparse table with 0(1) worst case access time. And as they mention that both the data structure and the query algorithm were easy to implement, I have decided to give it a try.

You can always go fishing, if you cannot write the perfect hash function.

The magic is actually simple. They are using a simple function for hashing, that is nothing fancy:

Simple hash function.

Where:

  • k is a random number between 1 and p-1
  • p is a prime number, bigger than the biggest number in the set, that is being hashed
  • n is the number of positions on level 1 in our table, equal to the number of numbers that we are going to hash

Ok, but that function has one flaw – as you can imagine from the birthday paradox, there would easily be collisions with n bigger than 5. Which is usually the case with these tables. So what do we do? Or what did the authors do? They needed a good way to avoid collisions and they actually found one – they insert “buckets” withing “buckets”, and the level-2 buckets have positions, equal to the square of the items in them. Thus, they guarantee that a solution with no collisions with level 2 will be found easily – the chance to have a collision with positions equal to the square of items is actually 0.5 for any random k.

If the last sentence was something that you did not completely understand – go to the YouTube video and take a look.

The two levels are visible here.

Generally, the code is here:

GitHub code – https://github.com/Vitosh/Python_personal/tree/master/YouTube/051_Python-Hashing

1 – Michael L. Fredman, János Komlós, and Endre Szemerédi. 1984. Storing a Sparse Table with 0(1) Worst Case Access Time. J. ACM 31, 3 (July 1984), 538–544. https://doi.org/10.1145/828.1884

Tagged with: , , , ,