Wednesday, 22 April 2015

  • Concept Of hashing• Need of Hashing• Hash Collision• Dealing with Hash Collision• Resolving Hash Collisions by Open Addressing• Primary clustering• Double Rehash
  • 2. Concept Of hashing• Hashing: hashing is a technique for performing almost constant time in case of insertion deletion and find operation.• taking a very simple example, as array with its index as key is the example of table.• So each index (key) can be used for accessing values in the constant search time.• Mapping key must be simple to compute and must help in identifying the associated records.• Function that help us in generating such type of keys is termed as Hash Function.
  • 3. Hashing• let h(key) is hashing function that returns the hash code. h(key) = key%1000, which can produce any value between 0 and 999. as shown in figure:
  • 4. Need of Hashing• Hashing maps large data sets of variable length to smaller data sets of a fixed length. For example, an inventory file of a company having more than 100 items and the key to each record is a seven digit part number. To use direct indexing using entire seven digit key, an array of 10 million elements would be required. Which clearly is wastage of space, since company is unlikely to stock more than few thousand parts.• Hence hashing provides an alternative to convert seven digit key into an integer within limited range. The values returned by a hash function are called hash values, hash codes.
  • 5. • Suppose two keys k1 and k2 hashes such that h(k1) = h(k2). Here two keys hashes into the same value and are supposed to occupy same slot in hash table ,which is unacceptable.• Such a situation is termed as hash collision.
  • 6. Dealing with Hash Collision• Two methods to deal with hash collision are:• Rehashing and Chaining Rehashing: invokes a secondary hash function (say Rh(key)), which is applied successively until an empty slot is found, where a record can be placed. Chaining: builds a Linked list of items whose key hashes to same value. During search this short linked list is traversed sequentially for the desired key. This technique requires extra link field to each table position.
  • 7. hashingAnalysis:• The worst case running time for insertion is O(1).• Deletion of an element x can be accomplished in O(1) time if the lists are doubly linked.• In the worst case behaviour of chain-hashing, all n keys hash to the same slot, creating a list of length n. The worst-case time for search is thus θ(n) plus the time to compute the hash function.
  • 8. A good hash function is one that minimizes collision and spreads the records uniformly throughout the table. that is why it is desirable to have larger array size than actual number of records. More formally, suppose we want to store a set of size n in a table of size m. The ratio α = n/m is called a load factor, that is, the average number of elements stored in a Table.

No comments:

Post a Comment