本文是阅读《Taking Hash Tables Off The Shelf》和《Hashing Out Hash Functions》的读书笔记，文章来自Vaidehi Joshi在medium上发布baseCS系列，该栏目用可爱的漫画来生动地解释计算机基础、数据结构、算法、理论在实践中的应用。
Part 1. Intro
 It’s hard to understand breadth first search if you don’t already know Big O notation. And Big O notation doesn’t mean anything to you unless you’ve seen it in action, in the context of an algorithm.
 Metaphors比喻 can be a great way of learning through analogy类比, and they’re useful in illustrating otherwise complicated concepts.
Books on books on books
 array or linked list:O(n) space+time
 using binary search : O(logn) space+time
 not the most efficient to find
Mappings to make our lives easier
 Hash tables : an array + a hash function
 what makes them so efficient for our purposes is that they create a mapping, which is a relationship between two sets of data.
 Hash tables are made up of sets of pairs: a key and a value, often referred to as (k, v). The idea here is that if you have the key in one set of data, you can find the value that corresponds to it in another set of data.
 In the case of a hash table, it is the hash function that creates the mapping. Since we’re working with arrays, the key is the index of the array, and the value is the data that lives at that index.
 The hash function’s job is to take an item in as input, and calculate and return the key for where that item should live in the hash table.
Example 1
Hash buckets
Hash table has a size of 12, every single book needs to be stored somewhere within these twelve slots.
Code in JS
1  function bookHashing(bookTitle, hashTableSize) { 
Collision
A collision occurs when two elements are supposed to be inserted at the same place in an array.
 For starters, we can be sure that no hash function will always return a unique hash bucket value for every item that give it.
 In fact, a hash function will always input multiple elements to the same output (hash bucket), because the size of our dataset will usually be larger than the size of our hash table.
Super cool constant time
Practical context
Hash functions that lean on alphabetization always result in 26 hash buckets.
 An equal distribution of data amongst all the possible hash buckets.
Efficiency
Hash tables leverage constant time search, which is what makes them so powerful.
 It also only takes constant time to insert or delete an item into a hash table, because we’re always going to lean on our hash function to determine where to add or remove something!
 Libraries around the world use the Dewey Decimal Classification System, which is really just a hash table with 10 buckets (and of course, a complex hash function that still I don’t quite understand yet).
Part 2. Intro
 everything has its drawbacks.
 Hash tables, are really great options for storing and retrieving specific data, quickly. They’re not always the best tool for the job — for example, they’re not great for finding ordered data .
 hash tables do intrinsically have some issues of their own, and the usefulness of a hash table is tied directly to its hash function.
Hash table, are you any good?
a good hash table
 hinges on how good of a hash function the hash table has.
Collisions occur whenever
Collision resolution tactics
 There are a handful of ways to handle collisions in a hash function, and the important thing to remember that none of them is necessarily the “right tactic” to use.
 It all depends on your dataset, the size of your hash table, and what operations you know you’ll want to perform on the table later on.
Linear Probing
 looking for the next empty hash bucket nearby.
 This is a kind of rehashing, and this technique is known as linear probing.
The interesting thing about linear probing is that if the next hash bucket is also filled by an element, the hash function will just continue to probe through the hash table until it finds an empty bucket, cycling back if necessary.
Clustering
 simply moving over to the next available hash bucket and inserting an element at the “next free space” leads to something called clustering.
 either our hash function isn’t using up the entire hash table range
 it’s not spreading data evenly across the hash buckets of our hash table.
 wellspaced : data is spread across the table, across many different hash buckets, and spans the entire range of the table
Clustered vs. wellbalanced tables
 this again depends on our dataset.
 If we have a lot of elements that are ending up at one hash bucket, and we use linear probing to solve the problem of collision, we’ll end up with a clustered table!
 if designed a good hashing function, and table is big enough to accommodate our dataset, then linear probing can be a great solution for handling collisions.
Chaining
store multiple things in one hash bucket.
a handy dandy linked list!!
Instead of storing a single item at each hash bucket, we can chain multiple elements together so that each key of the table has a pointer that references a linked list.
 O(n) time:searching through a linked list takes as much time as there are elements.
Chaining still averages out to a constant search time
 we should have approximately the same size of linked lists at any hash bucket where there is a collision.
 The great thing about this is that, with a good hash function, chaining still averages out to have a search time of O(1), or constant lookup time.
The power is all in the function
 Unix and Linux operating systems use an internal hash table in order to store the contents of all the directories that are referenced by the environmental
PATH
variable.  If you’ve ever used a version of the
rehash
command, what you were really doing when you ran that command was recomputing and rehashing the system’s internal hash table!
Spell checker
C++中无序容器的支持
关联式容器用于搜索大量数据。用hash table的时间复杂度为O(1)。
unordered_multiset
1 

 定义set时只用声明value的类型即可，因为key等于value:unordered_multiset
c;  set直接插入value即可:c.insert(string(buf));
 容器总bucket数：c.bucket_count()。篮子数一定比元素总数多，当篮子数等于元素总数时，篮子总数就会double,然后打散元素重新放入到篮子中去。每个bucket中不一定有元素。选取的前二十个bucket中没有一个有元素很正常。
 载重因子:c.load_factor()，最大为1。每个篮子的链表不能太长，否则影响查找效率。
 容器自带的find永远比std的find快，上篇已经讲过了。
 set打印时，直接解引用：*pItem。
unordered_multimap
1 

 定义map的时候，注意模板类型：unordered_multimap<long, string> c; 。
 要注意的依旧是multimap 不可使用 [] 進行 insertion，需要用pair分别指定key和value的类型。因为key是可以重复的: c.insert(pair<long, string>(i, buf));
 target是key,根据key找到item:auto pItem = c.find(target);
 取得value的方法,pair的第二个：(*pItem).second 。
unordered_set
1 

unordered_map
1 

 注意map不适用std的find():auto pItem = c.find(target);
Resources
 Basics of Hash Tables, Prateek Garg
 Data Structure and Algorithms — Hash Table, Tutorialspoint
 Hash Tables, Professor John Morris
 CS210 Lab: Hash Table,University of Regina, Dept. of Computer Science
 Hash Tables, SparkNotes
 Introduction to Hash Tables, Professor Steve Cutchin
 Hash Function, Wolfram Mathworld
 Dictionaries and Hash Tables, Professor Sinisa Todorovic
 Linear Probing Hash Tables, Department of Computer Science, RMIT University
 Problem Solving with Algorithms and Data Structures: Hashing, Interactive Python
 Hash Tables, Professors Robert Sedgewick & Kevin Wayne
 Introduction to Hashing, Professor Ananda Gunawardena