一、哈希表介绍
在哈希表中进行添加,删除,查找等操作,性能十分高,在不考虑哈希冲突的情况下,仅需一次定位即可完成,时间复杂度为O(1),接下来说明这是怎么达到的。
数据结构的物理存储结构只有两种:顺序存储结构和链式存储结构(像栈,队列,树,图等是从逻辑结构去抽象的,映射到内存中,也是这两种物理组织形式),哈希表的主干就是数组+链表(当链表长度达到一定值时(默认是8),就会将链表转化为红黑树)。
如下图所示:
二、参数
2.1 Node
Node是HashMap的基本组成单元,每一个Node包含一个Key-Value键值对,它是HashMap中的一个静态内部类。
/**
* Basic hash bin node, used for most entries. (See below for
* TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
*/
static class Node<K,V> implements Map.Entry<K,V> {
final int hash;
final K key;
V value;
Node<K,V> next;
Node(int hash, K key, V value, Node<K,V> next) {
this.hash = hash;
this.key = key;
this.value = value;
this.next = next;
}
public final K getKey() { return key; }
public final V getValue() { return value; }
public final String toString() { return key + "=" + value; }
public final int hashCode() {
return Objects.hashCode(key) ^ Objects.hashCode(value);
}
public final V setValue(V newValue) {
V oldValue = value;
value = newValue;
return oldValue;
}
public final boolean equals(Object o) {
if (o == this)
return true;
if (o instanceof Map.Entry) {
Map.Entry<?,?> e = (Map.Entry<?,?>)o;
if (Objects.equals(key, e.getKey()) &&
Objects.equals(value, e.getValue()))
return true;
}
return false;
}
}
2.2 table
这个数组就是哈希表的主干,在第一次使用的时候初始化,并且在需要重新分配大小的时候会重新分配容量。被分配时,数组的长度总是2的倍数(在某些操作中,为了支持bootstrapping机制,数组的长度同样可以为0)
transient Node<K,V>[] table;
从table+Node可以看出,HashMap是由数组+链表组成的,数组是HashMap的主体,链表则主要是为了解决哈希冲突而存在的。
2.3 参数列表
2.3.1 size
指明了HashMap中包含多少键值对
transient int size;
2.3.2 DEFAULT_INITIAL_CAPACITY
HashMap的默认大小是16.
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 1
2.3.3 threshold
当size到达threshold时数组会扩容,这个值等于capacity*loadFactor。(默认会初始化为DEFAULT_INITIAL_CAPACITY,16)
int threshold;
2.3.4 loadFactor
负载因子,代表了table的填充度有多少,默认是0.75,加载因子存在的原因就是为了减少哈希冲突,如果初始桶为16,等到满16个元素再扩容,某些桶里面可能有多于一个的元素了,所以加载因子小于1。
final float loadFactor;
2.3.5 modCount
这个变量记载了HashMap被改变(代表着改变键值对的数量、键值对本身或者使用rehash改变了内部结构)的次数。因为HashMap非线程安全,在对HashMap进行迭代时,如果期间其他线程的参与导致HashMap的结构发生改变了,需要抛出ConcurrentModificationException异常。
可以改变modCount的操作:
- put
transient int modCount;
三、方法
3.1 构造函数
注意在HashMap的构造方法中,发现没有执行new操作,猜测可能跟ArrayList一样是在第一次添加的时候开辟的内存(在put中实现)
传入初始容量和负载因子的构造器
public HashMap(int initialCapacity, float loadFactor) {
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal initial capacity: " +
initialCapacity);
if (initialCapacity > MAXIMUM_CAPACITY)
initialCapacity = MAXIMUM_CAPACITY;
if (loadFactor <= 0 || Float.isNaN(loadFactor))
throw new IllegalArgumentException("Illegal load factor: " +
loadFactor);
this.loadFactor = loadFactor;
this.threshold = tableSizeFor(initialCapacity);
}
仅传入初始容量的构造器,将负载因子初始化为0.75
public HashMap(int initialCapacity) {
this(initialCapacity, DEFAULT_LOAD_FACTOR);
}
默认构造器,将capacity初始化为16,loadFactor初始化为0.75.
public HashMap() {
this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}
根绝传入的Map构造出一个新的HashMap,负载因子为0.75,capacity设置为足够容量传入Map的元素。
public HashMap(Map<? extends K, ? extends V> m) {
this.loadFactor = DEFAULT_LOAD_FACTOR;
putMapEntries(m, false);
}
注意从上面的代码中我们可以看出,在常规构造器中,没有为数组分配内存空间(有一个入参为指定Map的构造器除外),而是在执行put操作的时候才真正构建table数组。
3.2 put
逻辑图:
将特定的Key-Value键值对存储进入哈希表中,如果已经存在了Key,那么更新Value的值。
返回值:
当插入前不存在Key的时候返回null(返回null还有可能是之前与Key对应的Value值为null),当插入前存在Key的时候返回之前与Key结合的Value值。
public V put(K key, V value) {
return putVal(hash(key), key, value, false, true);
}
3.2.1 putVal
参数:
hash:Key的哈希值
Key:Key
Value:Value
onlyIfAbsent:当这个值为真的时候,不会修改已经存在的Key-Value键值对
evict:if false, the table is in creation mode.
返回值:
如果插入前存在Key-Value键值对,返回旧值,否则返回null
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
//当table数组为空数组时,进行resize()初始化
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
//这里注意,当key值为null时,hash值为0,存储在table[0]处
//当table[i]为空时,直接存储
//这里i=(n-1)&hash和hash%n相等,但是按位相与运算更快
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
//当table[i]已经存在了其他的值,这时候要利用链进行处理
else {
Node<K,V> e; K k;
//链上第一个键值对就是我们要替换的键值对
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p;
//链上元素过多,转化为树处理
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
//遍历链,直到最后一个节点的next为null或者有重复的key值
else {
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
//原本存在键值对的情况,替换value
if (e != null) {
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
//保证并发访问时,如果HashMap内部结构发生变化,则快速响应失败
++modCount;
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
同时注意HashMap的线程不安全就体现在这里:
看上面源码的11行if ((p = tab[i = (n - 1) & hash]) == null)
,这句话用来判断是否出现hash碰撞。
假设两个线程A、B都在进行put操作,并且hash函数计算出的插入下标是相同的,当线程A执行完第六行代码后由于时间片耗尽导致被挂起,而线程B得到时间片后在该下标处插入了元素,完成了正常的插入,然后线程A获得时间片,由于之前已经进行了hash碰撞的判断,所以此时不会再进行判断,而是直接进行插入,这就导致了线程B插入的数据被线程A覆盖了,从而线程不安全。
除此之外,还有就是代码的50行有一个++size
,还是线程A、B,这两个线程同时进行put操作时,假设当前HashMap的zise大小为10,当线程A执行到第38行代码时,从主内存中获得size的值为10后准备进行+1操作,但是由于时间片耗尽只好让出CPU,线程B快乐的拿到CPU还是从主内存中拿到size的值10进行+1操作,完成了put操作并将size=11写回主内存,然后线程A再次拿到CPU并继续执行(此时size的值仍为10),当执行完put操作后,还是将size=11写回内存,此时,线程A、B都执行了一次put操作,但是size的值只增加了1,所有说还是由于数据覆盖又导致了线程不安全。
3.2.1.1 resize
初始化或者将table的大小翻倍。如果原HashMap不为空,则如果原数组上只有单个的元素,将这个元素映射到扩容后的数组的相同位置,如果原数组上存在链,则将链上的元素拆分,拆分依据是原先length位置的最高位是否为一。因为之前的映射是hash与length-1按位相与,此时扩容了一倍后多出了一个1,比如原来capacity为8,则与0111按位相与,扩容为16后则与1111按位相与,所以可以按照最高位是否为1,即与1000相与是否为0,来判断原先在一条链上的元素是继续存储在原来的位置,还是在原来的位置上加上一个old length,存放到新的位置。
final Node<K,V>[] resize() {
Node<K,V>[] oldTab = table;
int oldCap = (oldTab == null) ? 0 : oldTab.length;
int oldThr = threshold;
int newCap, newThr = 0;
//原HashMap已经被初始化过的情况
if (oldCap > 0) {
//超过最大值,不扩容,只通过碰撞处理
if (oldCap >= MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return oldTab;
}
//没有超过最大值,那么扩充为原来的两倍
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)
newThr = oldThr << 1; // double threshold
}
//因为初始时capacity被放置在threshold中,所以将capacity设置为threshold
else if (oldThr > 0)
newCap = oldThr;
//利用默认值初始化
else { // zero initial threshold signifies using defaults
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
//重新计算threshold的值,得到新的resize上限
if (newThr == 0) {
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
threshold = newThr;
@SuppressWarnings({"rawtypes","unchecked"})
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;
//扩容时原HashMap非空
if (oldTab != null) {
//把每个bucket都移动到新的buckets中去
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
if ((e = oldTab[j]) != null) {
//清除掉原来table[i]中的值
oldTab[j] = null;
if (e.next == null)
newTab[e.hash & (newCap - 1)] = e;
else if (e instanceof TreeNode)
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
else { // preserve order
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
next = e.next;
//不存在高位,这些元素调整后放到原来oldCapacity的位置中
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
//存在高位,将这些元素放到新扩容出的部分中
else {
if (hiTail == null)
hiHead = e;
else
hiTail.next = e;
hiTail = e;
}
} while ((e = next) != null);
if (loTail != null) {
loTail.next = null;
newTab[j] = loHead;
}
if (hiTail != null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
}
3.2.1.2 newNode
创建一个常规(非树)的节点
Node<K,V> newNode(int hash, K key, V value, Node<K,V> next) {
return new Node<>(hash, key, value, next);
}
3.2.2 hash
当我们put的时候,首先计算Key的hash值,这里就调用了hash方法,hash方法实际是让key.hashCode()与key.hashCode>>>16进行异或运算,高16位补0,一个数和0异或结果不变,所以hash函数大概的作用就是:
高16bit不变,低16bit和高16bit做了一个异或,目的是减少碰撞。因为底层的table数组大小是2的幂,计算下标index=(table.length-1)&hash,如果不做hash处理,相当于散列生效的只有几个低bit位,为了减少散列的碰撞,设计者综合考虑了速度、作用、质量之后,使用高16bit和低16bit异或来简单处理减少碰撞。
/**
* Computes key.hashCode() and spreads (XORs) higher bits of hash
* to lower. Because the table uses power-of-two masking, sets of
* hashes that vary only in bits above the current mask will
* always collide. (Among known examples are sets of Float keys
* holding consecutive whole numbers in small tables.) So we
* apply a transform that spreads the impact of higher bits
* downward. There is a tradeoff between speed, utility, and
* quality of bit-spreading. Because many common sets of hashes
* are already reasonably distributed (so don't benefit from
* spreading), and because we use trees to handle large sets of
* collisions in bins, we just XOR some shifted bits in the
* cheapest possible way to reduce systematic lossage, as well as
* to incorporate impact of the highest bits that would otherwise
* never be used in index calculations because of table bounds.
*/
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
3.3 get
给定Key,查找对应的Value值,如果不存在key值,返回null。(同时注意返回null并不能说明不存在键值对,还有可能是Key对应的值为value,可以使用containsKey进行区分)。
public V get(Object key) {
Node<K,V> e;
return (e = getNode(hash(key), key)) == null ? null : e.value;
}
3.3.1 getNode
当定位到的数组位置不含链表时,那么查找,添加等操作很快,仅需一次寻址即可;如果定位到的数组包含链表,那么会遍历这个链表(当一个链表上的元素过多时,会转化为树处理)
/**
* Implements Map.get and related methods.
*
* @param hash hash for key
* @param key the key
* @return the node, or null if none
*/
final Node<K,V> getNode(int hash, Object key) {
Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
if ((tab = table) != null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & hash]) != null) {
if (first.hash == hash && // always check first node
((k = first.key) == key || (key != null && key.equals(k))))
return first;
if ((e = first.next) != null) {
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
do {
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
} while ((e = e.next) != null);
}
}
return null;
}
3.3.2 hash
/**
* Computes key.hashCode() and spreads (XORs) higher bits of hash
* to lower. Because the table uses power-of-two masking, sets of
* hashes that vary only in bits above the current mask will
* always collide. (Among known examples are sets of Float keys
* holding consecutive whole numbers in small tables.) So we
* apply a transform that spreads the impact of higher bits
* downward. There is a tradeoff between speed, utility, and
* quality of bit-spreading. Because many common sets of hashes
* are already reasonably distributed (so don't benefit from
* spreading), and because we use trees to handle large sets of
* collisions in bins, we just XOR some shifted bits in the
* cheapest possible way to reduce systematic lossage, as well as
* to incorporate impact of the highest bits that would otherwise
* never be used in index calculations because of table bounds.
*/
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
3.3.2.1. hashCode
/**
* Returns a hash code value for the object. This method is
* supported for the benefit of hash tables such as those provided by
* {@link java.util.HashMap}.
* <p>
* The general contract of {@code hashCode} is:
* <ul>
* <li>Whenever it is invoked on the same object more than once during
* an execution of a Java application, the {@code hashCode} method
* must consistently return the same integer, provided no information
* used in {@code equals} comparisons on the object is modified.
* This integer need not remain consistent from one execution of an
* application to another execution of the same application.
* <li>If two objects are equal according to the {@code equals(Object)}
* method, then calling the {@code hashCode} method on each of
* the two objects must produce the same integer result.
* <li>It is <em>not</em> required that if two objects are unequal
* according to the {@link java.lang.Object#equals(java.lang.Object)}
* method, then calling the {@code hashCode} method on each of the
* two objects must produce distinct integer results. However, the
* programmer should be aware that producing distinct integer results
* for unequal objects may improve the performance of hash tables.
* </ul>
* <p>
* As much as is reasonably practical, the hashCode method defined by
* class {@code Object} does return distinct integers for distinct
* objects. (This is typically implemented by converting the internal
* address of the object into an integer, but this implementation
* technique is not required by the
* Java™ programming language.)
*
* @return a hash code value for this object.
* @see java.lang.Object#equals(java.lang.Object)
* @see java.lang.System#identityHashCode
*/
public native int hashCode();
3.4 containsKey
public boolean containsKey(Object key) {
return getNode(hash(key), key) != null;
}
3.5 getOrDefault
当不存在这样的键值对或者原键指向的值为null时,返回defaultValue
@Override
public V getOrDefault(Object key, V defaultValue) {
Node<K,V> e;
return (e = getNode(hash(key), key)) == null ? defaultValue : e.value;
}
四、注意事项
在各种资料上都会提到:“重写equals时也要同时覆盖hashcode”,我们举个例子来看,如果重写了equals而不重写hashcode会发生什么样的问题。
4.1 重写和没有重写hashcode的对比
import java.util.*;
public class test {
private static class Person{
int idCard;
String name;
public Person(int idCard, String name) {
this.idCard = idCard;
this.name = name;
}
@Override
public boolean equals(Object o) {
if (this == o) {
return true;
}
if (o == null || getClass() != o.getClass()){
return false;
}
Person person = (Person) o;
//两个对象是否等值,通过idCard来确定
return this.idCard == person.idCard;
}
}
public static void main(String []args){
HashMap<Person,String> map = new HashMap<Person, String>();
Person person = new Person(1234,"乔峰");
System.out.println(person.hashCode());
//put到hashmap中去
map.put(person,"天龙八部");
//get取出,从逻辑上讲应该能输出“天龙八部”
Person x=new Person(1234,"萧峰");
System.out.println(person.equals(x));
System.out.println(x.hashCode());
System.out.println("结果:"+map.get(x));
}
}
输出结果:
460141958
true
1163157884
结果:null
重写后:
import java.util.*;
public class test {
private static class Person{
int idCard;
String name;
public Person(int idCard, String name) {
this.idCard = idCard;
this.name = name;
}
@Override
public boolean equals(Object o) {
if (this == o) {
return true;
}
if (o == null || getClass() != o.getClass()){
return false;
}
Person person = (Person) o;
//两个对象是否等值,通过idCard来确定
return this.name.equals(person.name);
}
@Override
public int hashCode(){
return name.hashCode();
}
}
public static void main(String []args){
HashMap<Person,String> map = new HashMap<Person, String>();
Person person = new Person(12345678,"乔峰");
System.out.println(person.hashCode());
//put到hashmap中去
map.put(person,"天龙八部");
//get取出,从逻辑上讲应该能输出“天龙八部”
Person x=new Person(1234,"乔峰");
System.out.println(person.equals(x));
System.out.println(x.hashCode());
System.out.println("结果:"+map.get(x));
}
}
输出结果:
645404
true
645404
结果:天龙八部
对比两个输出结果可以看出,如果没有重写hashcode,尽管我们在进行get和put操作的时候,使用的key从逻辑上是等值的(通过equals比较是相等的),但是由于没有重写hashcode方法,put操作时的hash值和get时候的hash值不相等就会定位到数组的不同地方,无法取到我们想要的值。
4.2 源码
首先看下equals方法和hashcode的源码
4.2.1 hashcode
/**
* Returns a hash code value for the object. This method is
* supported for the benefit of hash tables such as those provided by
* {@link java.util.HashMap}.
* <p>
* The general contract of {@code hashCode} is:
* <ul>
* <li>Whenever it is invoked on the same object more than once during
* an execution of a Java application, the {@code hashCode} method
* must consistently return the same integer, provided no information
* used in {@code equals} comparisons on the object is modified.
* This integer need not remain consistent from one execution of an
* application to another execution of the same application.
* <li>If two objects are equal according to the {@code equals(Object)}
* method, then calling the {@code hashCode} method on each of
* the two objects must produce the same integer result.
* <li>It is <em>not</em> required that if two objects are unequal
* according to the {@link java.lang.Object#equals(java.lang.Object)}
* method, then calling the {@code hashCode} method on each of the
* two objects must produce distinct integer results. However, the
* programmer should be aware that producing distinct integer results
* for unequal objects may improve the performance of hash tables.
* </ul>
* <p>
* As much as is reasonably practical, the hashCode method defined by
* class {@code Object} does return distinct integers for distinct
* objects. (This is typically implemented by converting the internal
* address of the object into an integer, but this implementation
* technique is not required by the
* Java™ programming language.)
*
* @return a hash code value for this object.
* @see java.lang.Object#equals(java.lang.Object)
* @see java.lang.System#identityHashCode
*/
public native int hashCode();
4.2.2 equals
/**
* Indicates whether some other object is "equal to" this one.
* <p>
* The {@code equals} method implements an equivalence relation
* on non-null object references:
* <ul>
* <li>It is <i>reflexive</i>: for any non-null reference value
* {@code x}, {@code x.equals(x)} should return
* {@code true}.
* <li>It is <i>symmetric</i>: for any non-null reference values
* {@code x} and {@code y}, {@code x.equals(y)}
* should return {@code true} if and only if
* {@code y.equals(x)} returns {@code true}.
* <li>It is <i>transitive</i>: for any non-null reference values
* {@code x}, {@code y}, and {@code z}, if
* {@code x.equals(y)} returns {@code true} and
* {@code y.equals(z)} returns {@code true}, then
* {@code x.equals(z)} should return {@code true}.
* <li>It is <i>consistent</i>: for any non-null reference values
* {@code x} and {@code y}, multiple invocations of
* {@code x.equals(y)} consistently return {@code true}
* or consistently return {@code false}, provided no
* information used in {@code equals} comparisons on the
* objects is modified.
* <li>For any non-null reference value {@code x},
* {@code x.equals(null)} should return {@code false}.
* </ul>
* <p>
* The {@code equals} method for class {@code Object} implements
* the most discriminating possible equivalence relation on objects;
* that is, for any non-null reference values {@code x} and
* {@code y}, this method returns {@code true} if and only
* if {@code x} and {@code y} refer to the same object
* ({@code x == y} has the value {@code true}).
* <p>
* Note that it is generally necessary to override the {@code hashCode}
* method whenever this method is overridden, so as to maintain the
* general contract for the {@code hashCode} method, which states
* that equal objects must have equal hash codes.
*
* @param obj the reference object with which to compare.
* @return {@code true} if this object is the same as the obj
* argument; {@code false} otherwise.
* @see #hashCode()
* @see java.util.HashMap
*/
public boolean equals(Object obj) {
return (this == obj);
}
由上面的源码可知,equals在其内部是调用了“==”,所以在不重写equals方法的情况下,equals方法是比较两个对象是否具有相同的引用,即是否指向了同一块内存地址。
而hashcode是一个本地方法,它返回的是这个对象的内存地址。
4.2.3 hashcode的规定
在应用程序的执行期间,只要对象的equals方法的比较操作所用到的信息没有被修改,那么对同一个对象的多次调用,hashCode方法都必须始终返回同一个值。在一个应用程序与另一个应用程序的执行过程中,执行hashCode方法所返回的值可以不一致。
如果两个对象根据equals(Object)方法比较是相等的,那么调用这两个对象中的hashCode方法都必须产生同样的整数结果
如果两个对象根据equals(Object)方法比较是不相等的,那么调用这两个对象中的hashCode方法,则不一定要求hashCode方法必须产生不用的结果。但是程序员应该知道,给不相等的对象产生截然不同的整数结果,有可能提高散列表的性能。
由上面三条规定可知,如果重写了equals方法而没有重写hashcode方法的话,就违反了第二条规定。相等的对象必须拥有相同的hashcode