In the article from 1984, Fredman and Komlos (1) mention the perfect hash function, that actually can store a sparse table with 0(1) worst case access time. And as they mention that both the data structure and the query algorithm were easy to implement, I have decided to give it a try.

You can always go fishing, if you cannot write the perfect hash function.
The magic is actually simple. They are using a simple function for hashing, that is nothing fancy:

Simple hash function.
Where:
- k is a random number between 1 and p-1
- p is a prime number, bigger than the biggest number in the set, that is being hashed
- n is the number of positions on level 1 in our table, equal to the number of numbers that we are going to hash
Ok, but that function has one flaw – as you can imagine from the birthday paradox, there would easily be collisions with n bigger than 5. Which is usually the case with these tables. So what do we do? Or what did the authors do? They needed a good way to avoid collisions and they actually found one – they insert “buckets” withing “buckets”, and the level-2 buckets have positions, equal to the square of the items in them. Thus, they guarantee that a solution with no collisions with level 2 will be found easily – the chance to have a collision with positions equal to the square of items is actually 0.5 for any random k.
If the last sentence was something that you did not completely understand – go to the YouTube video and take a look.

The two levels are visible here.
Generally, the code is here:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
class FKSPerfectHash: def __init__(self, numbers: List[int]): self.elements = list(set(numbers)) self.n = len(self.elements) if self.n == 0: self.p = 1 self.buckets = [] self.level2_tables = [] return max_val = max(self.elements) self.p = next_prime(max_val) self._build_level_1() self._build_level_2() def _build_level_1(self): while True: self.k_main = random.randint(1, self.p - 1) self.buckets = [[] for _ in range(self.n)] for x in self.elements: idx = ((self.k_main * x) % self.p ) % self.n self.buckets[idx].append(x) sum_squares = sum(len(b)**2 for b in self.buckets) if sum_squares < 3 * self.n: break def _build_level_2(self): self.level2_tables = [None] * self.n for i, bucket in enumerate(self.buckets): b_size = len(bucket) if b_size == 0: self.level2_tables[i] = (0, 0, None) continue m_j = b_size * b_size while True: k_prim = random.randint(1, self.p-1) temp_storage = [None] * m_j collision_found = False for x in bucket: idx = ((k_prim * x) % self.p) % m_j if temp_storage[idx] is not None: collision_found = True break temp_storage[idx] = x if not collision_found: self.level2_tables[i] = (k_prim, m_j, temp_storage) break def query(self, x:int) -> bool: if self.n == 0: return False idx_1 = ((self.k_main * x) % self.p) %self.n params = self.level2_tables[idx_1] k_prim, m_j, storage = params if storage is None: return False idx_2 = ((k_prim * x) % self.p ) % m_j return storage[idx_2] == x data = [2, 4, 5, 15, 18, 30] fks = FKSPerfectHash(data) fks.query(5) |
GitHub code – https://github.com/Vitosh/Python_personal/tree/master/YouTube/051_Python-Hashing
1 – Michael L. Fredman, János Komlós, and Endre Szemerédi. 1984. Storing a Sparse Table with 0(1) Worst Case Access Time. J. ACM 31, 3 (July 1984), 538–544. https://doi.org/10.1145/828.1884
