z-logo
open-access-imgOpen Access
Linear Hashing
Author(s) -
Noga Alon,
Martin Dietzfelbinger,
Peter Bro Miltersen,
Erez Petrank,
Gábor Tardos
Publication year - 1997
Publication title -
brics report series
Language(s) - English
Resource type - Journals
eISSN - 1601-5355
pISSN - 0909-0878
DOI - 10.7146/brics.v4i16.18808
Subject(s) - hash function , cardinality (data modeling) , mathematics , dynamic perfect hashing , k independent hashing , universal hashing , affine transformation , class (philosophy) , discrete mathematics , finite field , combinatorics , set (abstract data type) , function (biology) , rolling hash , perfect hash function , double hashing , hash table , pure mathematics , computer science , computer security , artificial intelligence , evolutionary biology , biology , data mining , programming language
Consider the set ${\cal H}$ of all linear (or affine) transf ormations between two vector spaces over a finite field $F$. We study how good $\cal H$ is as a class of hash functions, namely we consider hashing a set $S$ of size $n$ into a range having the same cardinali ty $n$ by a randomly chosen function from ${\cal H}$ and look at the expected size of the largest hash bucket. $\cal H$ is a universal class of hash functions for any fini te field, but with respect to our measure different fields behave differen tly. \par If the finite field $F$ has $n$ elements then there is a bad set $S\subset F^2$ of size $n$ with expected maximal bucket size $\Omega(n^{1/3})$. If $n$ is a perfect square then there is even a bad set with largest bucket size {\em always} at least $\sqrt n$. (This is worst possible, since with respect to a universal class of hash functions every set of size $n$ has expected largest bucket size below $\sqrt n+1/2$.) \par If, however, we consider the field of two elements then we get much better bounds. The best previously known upper bound on the expected size of the largest bucket for this class was $O( 2^{\sqrt{\log n}})$. We reduce this upper bound to $O(\log n\log\log n)$. Note that this is not far from the guarantee for a random function. There, the average largest bucket would be $\Theta(\log n/\log \log n)$. \par In the course of our proof we develop a tool which may be of independent interest. Suppose we have a subset $S$ of a vector space $D$ over ${\bf Z}_2$, and consider a random linear mapping of $D$ to a smaller vector space $R$. If the cardinality of $S$ is larger than $c_\e|R|\log|R|$ then with probability $1-\e$, the image of $S$ will cover all elements in the range.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom