Unit V: Hashing
1. Introduction to Hashing
Hashing is a technique used to uniquely identify a specific object from a group of similar objects. In data structures, it provides a way to store and retrieve data in O(1) time on average.
- Hash Table: A data structure (usually an array) that stores records.
- Hash Key: A unique value used as input to a hash function to identify a record.
- Hash Function: A mathematical formula that maps a large key to a smaller integer, which serves as an index in the hash table.
2. Hash Functions & Characteristics
A good hash function is crucial for performance.
Characteristics of a Good Hash Function
- Uniform Distribution: It should distribute keys uniformly across the table to minimize collisions.
- Fast Computation: It should be easy and quick to compute.
- Determinism: For a given input, it must always produce the same output.
Types of Hash Functions
- Division Method: h(k) = k mod m (where m is the table size).
- Mid-Square Method: The key is squared, and the middle digits are used as the hash value.
- Folding Method: The key is divided into parts, which are added together to form the index.
3. Collision Resolution
A Collision occurs when two different keys produce the same hash value (index). Since a table slot can usually hold only one record, we must resolve this conflict.
4. Open Addressing (Closed Hashing)
In Open Addressing, all elements are stored within the hash table itself. If a collision occurs, we search for the next available empty slot.
Linear Probing
We search for the next slot linearly (index + 1, index + 2...).
- Problem: Primary Clustering (long sequences of occupied slots build up).
Quadratic Probing
We search using a quadratic function (index + 1², index + 2², index + 3²...).
- Benefit: Reduces primary clustering.
Double Hashing
Uses a second hash function to determine the step size for probing.
Formula: index = (hash1(key) + i * hash2(key)) mod m
5. Closed Addressing (Open Hashing / Chaining)
In Closed Addressing (commonly known as Chaining), each slot of the hash table contains a link to a linked list.
- Multiple keys mapping to the same index are simply added to the list at that index.
- Advantage: The table never gets "full" in the traditional sense; performance just degrades as chains get longer.
6. Exam Focus & FAQs
Exam Tips
- Selection: Use Chaining if you don't know the number of keys in advance.
- Performance: Double Hashing is often considered the best open addressing technique because it avoids clustering.
- Calculation: Be prepared to demonstrate Linear Probing with a simple modulo function (e.g., Table size 10, keys 12, 22, 32).
Frequently Asked Questions
Q: What is a Load Factor?
It is the ratio of the number of stored elements to the total table size (n/m). A higher load factor increases the chance of collisions.
Q: Why is a prime number often used for table size (m)?
Using a prime number as the divisor in the Division Method helps in distributing keys more uniformly.
Mnemonic: LCD
To remember Open Addressing techniques, think LCD: Linear, C(Quadratic starts with Q, but think 'Curve'), Double Hashing.