HashSet源码解读

缘起

【1】中我们已经分析了HashMap的源码. 现在我们继续我们的Java集合之旅. 开始解读HashSet的源码. 其实弄懂了HashMap的源码的话,HashSet的源码很简单的.

分析

我们首先来看看

1
2
3
public class HashSet<E>
extends AbstractSet<E>
implements Set<E>, Cloneable, java.io.Serializable

​ 源码1

我们的HashSet实现了Set接口,然后AbstractSet提供了一些基础的实现. 而且还实现了Cloneable和Serializable接口. 表明他是可克隆的,可序列化的. 但是和HashMap一样,不是线程安全的集合类. 但是可以使用Collections工具类的synchronizedXXX方法(synchronizedMap、synchronizedSet)对其进行包装,成为线程安全的集合类

然后,我们看到了一个惊人的属性

1
private transient HashMap<E,Object> map;

​ 源码2

味道就出来了——HashSet貌似就是使用HashMap实现的.

然后我们看看

1
2
3
4
5
6
7
/**
* Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
* default initial capacity (16) and load factor (0.75).
*/
public HashSet() {
map = new HashMap<>();
}

​ 源码3

其他所有的HashSet的构造器全部都是初始化这个map属性.

然后我们看看HashSet的其他方法, 也是裸调这个map属性的方法的(就不做过多解释了,参见【1】即可)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
 public Iterator<E> iterator() {
return map.keySet().iterator();
}

public int size() {
return map.size();
}
public boolean isEmpty() {
return map.isEmpty();
}
public boolean contains(Object o) {
return map.containsKey(o);
}
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
public boolean remove(Object o) {
return map.remove(o)==PRESENT;
}
public void clear() {
map.clear();
}

​ 源码4

我们看到了一个特殊的对象——PRESENT, 它是

1
2
// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();

​ 源码5

与HashMap一样,HashSet是允许键值为null的.

但是HashSet里的HashMap属性map的值都是一样的(所以源码5中的注释写了”Dummy”). 都是一个PRESENT.

那难道HashSet就没有什么源码可以看了么? 不不不,我们还有clone方法

java.util.HashSet.clone()

1
2
3
4
5
6
7
8
9
public Object clone() {
try {
HashSet<E> newSet = (HashSet<E>) super.clone();
newSet.map = (HashMap<E, Object>) map.clone();
return newSet;
} catch (CloneNotSupportedException e) {
throw new InternalError(e);
}
}

​ 源码6

【2】中我们已经说过了clone方法的套路就是一般先调用super.clone(), 然后根据自己的需求进行一定程度的深度克隆(当然,完全不进行深度克隆也是可以的). 第一行的super.clone() 就是Object的clone方法. 【2】中已经说过了. 源码6的第四行做的事情就是处理newSet中的map属性,也进行了克隆. HashMap覆写了Object的的clone方法

java.util.HashMap.clone()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
 @SuppressWarnings("unchecked")
@Override
public Object clone() {
HashMap<K,V> result;
try {
result = (HashMap<K,V>)super.clone();
} catch (CloneNotSupportedException e) {
// this shouldn't happen, since we are Cloneable
throw new InternalError(e);
}
result.reinitialize();
result.putMapEntries(this, false);
return result;
}
其中java.util.HashMap.reinitialize()源码如下
/**
* Reset to initial default state. Called by clone and readObject.
*/
void reinitialize() {
table = null;
entrySet = null;
keySet = null;
values = null;
modCount = 0;
threshold = 0;
size = 0;
}

​ 源码7

套路也是这样,第6行先调用了其父类AbstractMap的clone方法(AbstractMap也覆写了clone方法)

java.util.AbstractMap.clone()

1
2
3
4
5
6
protected Object clone() throws CloneNotSupportedException {
AbstractMap<?,?> result = (AbstractMap<?,?>)super.clone();
result.keySet = null;
result.values = null;
return result;
}

​ 源码8

这里第2行的clone就是Object的clone方法了. 然后置空了keySet和values(这是父类AbstratcMap的默认访问权限的属性),然后直接返回了.源码7的第12行的putMapEntries就是把原本的HashMap中的所有键值对重新塞进源码7第六行克隆出来的result中. 最后返回result. 所以其实不难知道,HashMap的clone其实还是浅拷贝.

何以为证? 且看下例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import java.util.HashMap;
import java.util.Map;

public class Test {
public static void main(String[] args) {
HashMap<String, Person> map = new HashMap<>();
map.put("yfs", new Person("影法师", 12));
map.put("cfs", new Person("彩法师", 13));
HashMap<String, Person> clone_map = (HashMap<String, Person>) map.clone();
for (Map.Entry<String, Person> entry : map.entrySet()) {
System.out.println(entry.getValue());
}

for (Map.Entry<String, Person> entry : clone_map.entrySet()) {
System.out.println(entry.getValue());
entry.getValue().setAge(29);
}
for (Map.Entry<String, Person> entry : map.entrySet()) {
System.out.println(entry.getValue());
}

}
}
/* 输出结果
Person [name=彩法师, age=13]
Person [name=影法师, age=12]
Person [name=彩法师, age=13]
Person [name=影法师, age=12]
Person [name=彩法师, age=29]
Person [name=影法师, age=29] ---------------这就可以证明HashMap的clone是浅拷贝
*/

与HashMap出于一样的原因,HashSet的map属性依旧是transient的. 并且做了序列化的定制

而且HashSet的序列化和反序列化不难猜测就是针对其transient的HashMap属性map的.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
/* 序列化
Save the state of this <tt>HashSet</tt> instance to a stream (that is,serialize it).
*/
private void writeObject(java.io.ObjectOutputStream s)
throws java.io.IOException {
// Write out any hidden serialization magic
s.defaultWriteObject();

// Write out HashMap capacity and load factor
s.writeInt(map.capacity());
s.writeFloat(map.loadFactor());

// Write out size
s.writeInt(map.size());

// Write out all elements in the proper order.
for (E e : map.keySet())
s.writeObject(e);
}

/** 反序列化
* Reconstitute the <tt>HashSet</tt> instance from a stream (that is,
* deserialize it).
*/
private void readObject(java.io.ObjectInputStream s)
throws java.io.IOException, ClassNotFoundException {
// Read in any hidden serialization magic
s.defaultReadObject();

// Read capacity and verify non-negative.
int capacity = s.readInt();
if (capacity < 0) {
throw new InvalidObjectException("Illegal capacity: " +
capacity);
}

// Read load factor and verify positive and non NaN.
float loadFactor = s.readFloat();
if (loadFactor <= 0 || Float.isNaN(loadFactor)) {
throw new InvalidObjectException("Illegal load factor: " +
loadFactor);
}

// Read size and verify non-negative.
int size = s.readInt();
if (size < 0) {
throw new InvalidObjectException("Illegal size: " +
size);
}
// Set the capacity according to the size and load factor ensuring that
// the HashMap is at least 25% full but clamping to maximum capacity.
capacity = (int) Math.min(size * Math.min(1 / loadFactor, 4.0f),
HashMap.MAXIMUM_CAPACITY);

// Constructing the backing map will lazily create an array when the first element is
// added, so check it before construction. Call HashMap.tableSizeFor to compute the
// actual allocation size. Check Map.Entry[].class since it's the nearest public type to
// what is actually created.

SharedSecrets.getJavaOISAccess()
.checkArray(s, Map.Entry[].class, HashMap.tableSizeFor(capacity));

// Create backing HashMap
map = (((HashSet<?>)this) instanceof LinkedHashSet ?
new LinkedHashMap<E,Object>(capacity, loadFactor) :
new HashMap<E,Object>(capacity, loadFactor));

// Read in all elements in the proper order.
for (int i=0; i<size; i++) {
@SuppressWarnings("unchecked")
E e = (E) s.readObject();
map.put(e, PRESENT);
}
}

​ 源码9

关于序列化和反序列化第一行分别调用的defaultWriteObject和defaultReadObject方法的API注释是

1
2
3
4
5
defaultWriteObject:
Write the non-static and non-transient fields of the current class to this stream. This may only be called from the writeObject method of theclass being serialized. It will throw the NotActiveException if it iscalled otherwise.

defaultReadObject:
Read the non-static and non-transient fields of the current class from this stream. This may only be called from the readObject method of theclass being deserialized. It will throw the NotActiveException if it iscalled otherwise.

就是说任何类的非静态。非transient属性都会被(反)序列化,注意,不能在除了writeObject/readObject之外的地方调用这两个方法,不然抛NotActiveException异常.

纵观源码9,其实就是写了一些map的属性,最后将map中的所有键值对(对于HashSet,因为值都一样,所以只写了键)写出去了.

参考

【1】https://yfsyfs.github.io/2019/06/07/java-util-HashMap-%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%90/

【2】https://yfsyfs.github.io/2019/07/03/clone-%E7%A0%94%E7%A9%B6/