Introduction
如果所有的键都是小整数,我们可以用一个数组来实现无序的符号表,将键作为数组的索引而数组中的键 i
处储存的就是它对应的值。这样我们就可以快速访问任意键的值。
使用散列的查找算法分为两步:
第一步:使用散列函数将被查找的键转化为数组的一个索引
第二步:处理碰撞冲突(拉链法和线性探测法)
Issues.
- Computing the hash function.
- Equality test: Method for checking whether two keys are equal.
- Collision resolution: Algorithm and data structure to handle two keys that hash to the same array index.
散列表是算法在时间和空间上作出权衡的经典例子。
- 如果没有内存限制,我们可以直接将键作为(可能是一个超大的)数组的索引,那么所有的查找操作只需要访问内存一次即可完成。但这种理想情况不会经常出现,因为当键很多时需要的内存太大。
- 如果没有时间限制,我们可以使用无序数组并进行顺序查找,这样就只需要很少的内存。
Classic space-time tradeoff.
- No space limitation: trivial hash function with key as index.
- No time limitation: trivial collision resolution with sequential search.
- Space and time limitations: hashing (the real world).
hash function
Idealistic goal. Scramble the keys uniformly to produce a table index.
- Efficiently computable.
- Each table index equally likely for each key.
Java’s hash code conventions
All Java classes inherit a method hashCode()
, which returns a 32-bit int.
Requirement. If x.equals(y)
, then (x.hashCode() == y.hashCode())
.
Highly desirable. If !x.equals(y)
, then (x.hashCode() != y.hashCode())
.
Default implementation. Memory address of x
.
Legal (but poor) implementation. Always return 17
.
Customized implementations. Integer, Double, String, File, URL, Date, … User-defined types. Users are on their own.
Implementing hash code: integers, booleans, and doubles
Java library implementations
Implementing hash code: strings
Java library implementation
Performance optimization.
- Cache the hash value in an instance variable.
- Return cached value.
Hash code design
separate chaining
Collisions
Collisions. Two distinct keys hashing to same index.
- Birthday problem => can’t avoid collisions unless you have a ridiculous (quadratic) amount of memory.
- Coupon collector + load balancing => collisions are evenly distributed.
Separate chaining symbol table
Use an array of M < N linked lists.
- Hash: map key to integer
i
between0
andM - 1
. - Insert: put at front of
ith
chain (if not already there). - Search: need to search only
ith
chain.
Separate chaining ST: Java implementation
Analysis of separate chaining
Proposition. Under uniform hashing assumption, prob. that the number of keys in a list is within a constant factor of N / M is extremely close to 1.
ST implementations: summary
linear probing
Collision resolution: open addressing
Open addressing.
When a new key collides, find next empty slot, and put it there.
Linear probing hash table demo
Hash. Map key to integer i
between 0
and M-1
.
Search. Search table index i
; if occupied but no match, try i+1
, i+2
, etc.
Linear probing hash table summary
Hash. Map key to integer i
between 0
and M-1
.
Insert. Put at table index i
if free; if not try i+1
, i+2
, etc.
Search. Search table index i
; if occupied but no match, try i+1
, i+2
, etc.
Note. Array size M
must be greater than number of key-value pairs N
.