Dictionary、String 是 swift 标准库中的值类型数据结构,单从使用层面上而论,相信铁子们都可以不假思索、信手拈来。但如果在某种使用条件下,发生了运行时崩溃,阁下又将如何应对?

如果你看“题干”的话,就自然而然地想到这个情况就是多线程下数据一致性问题。至于如何保护临界区数据不在这篇文章的讨论范围,但我们不妨深入 Swift 源码,去揭开崩溃原因的神秘面纱~

Dictionry

倒数五个数,三、二、一,上 🌰:

1
2
3
4
5
6
7
8
9
10
11
func crashOfRaceConditionInDictionary() {
var dict: [String: Double] = [:]
let key = "ABCD"
DispatchQueue.concurrentPerform(iterations: 2) { index in
if index % 2 == 0 {
print(dict[key] as Any)
} else {
dict[key] = Double(index)
}
}
}

目前已知例程中会出现下面两种崩溃的情况:

  1. EXC_BAD_ACCESS
  2. [XXXX objectForKey:] unrecognized selector sent to instance XXXXXX

无疑,它是在通过下标访问字典元素时发生的。为了进一步了解实现细节,我们可以在 swift 的源码 中去搜索关键字(💡如果你对崩溃堆栈进行了符号化,可以精准定位到相关代码的行数。),代码在 DictionaryVariantsubscript(key: Key) -> Value? 方法中,下面是关键伪代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
extension Dictionary._Variant {
subscript(key: Key) -> Value? {
get { return lookup(key) }

// set
_modify {
let cocoa = asCocoa
var native = _NativeDictionary<Key, Value>(
cocoa, capacity: cocoa.count + 1)
self = .init(native: native)
yield &native[key, isUnique: true]
return
}
}

func lookup(_ key: Key) -> Value? {
let cocoaKey: AnyObject = _bridgeAnythingToObjectiveC(key)
guard let cocoaValue = asCocoa.lookup(cocoaKey) else { return nil }
return _forceBridgeFromObjectiveC(cocoaValue, Value.self)
}
}

事实上,Dictionary 底层通过 __Variant 中的 _BridgeStorage<__RawDictionaryStorage> 存储键值对。通过下标访问值时,在 iOS 运行环境下,会将 __Variant 中的存储对象通过值传递(__owned)转换为 __CocoaDictionary。然后将其通过 unsafeBitCast 强制转换为 NSDictionary,最后返回 object(forKey:) 值。当通过下标设置值时(当 Key 与 Value 非 Class 时,内部通过 canBeClass 来进行判断),其 __Variant 会被重新初始化。

当旧地址代表的内存被释放后,竞态条件访问时就会出现崩溃 1。而如果旧地址已经被重新分配且无法响应 object(forKey:) 消息时,则会导致崩溃 2。(此外要是其能够响应该消息,则会发生脏数据访问问题。)

String

字符串的问题发生在其拼接的时候,为了更容易复现,例子中加上了一个循环:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
func crashOfRaceConditionInString() {
var string = ""

for _ in 0..<50 {
DispatchQueue.concurrentPerform(iterations: 4) { index in
if index % 2 == 0 {
string += "ABC"
} else {
string += "DEF"
}
}
}
}

这次的崩溃 Xcode 给出了很友好的控制台输出:

Object 0xXXX of class __StringStorage deallocated with non-zero retain count 2. This object’s deinit, or something called from it, may have created a strong reference to self which outlived deinit, resulting in a dangling reference.

大致的意思是:String 的内部成员 __StringStorage 在被释放时,发现它的引用计数为 2。

其堆栈信息是这样的:

1
2
3
4
5
6
7
8
9
10
11
12
#0	0x0000000105aa8a4c in __pthread_kill ()
#1 0x00000001060f71d0 in pthread_kill ()
#2 0x00000001801375ec in abort ()
#3 0x000000018cc27e08 in swift::fatalErrorv(unsigned int, char const*, char*) ()
#4 0x000000018cc27e24 in swift::fatalError(unsigned int, char const*, ...) ()
#5 0x000000018cc2ba2c in swift_deallocClassInstance ()
#6 0x000000018cc2b94c in _swift_release_dealloc ()
#7 0x000000018cc2c4b0 in bool swift::RefCounts<swift::RefCountBitsT<(swift::RefCountInlinedness)1>>::doDecrementSlow<(swift::PerformDeinit)1>(swift::RefCountBitsT<(swift::RefCountInlinedness)1>, unsigned int) ()
#8 0x000000018cae4510 in _StringGuts.prepareForAppendInPlace(totalCount:otherUTF8Count:) ()
#9 0x000000018cae4618 in _StringGuts.append(_:) ()
#10 0x000000018c97bdac in static String.+= infix(_:_:) ()
#11 0x000000010491361c in closure #1 in ViewController.crashOfRaceConditionInString() at /Users/liwenkang/Documents/Swift/CrashOfRaceCondition/CrashOfRaceCondition/ViewController.swift:23

关键调用栈帧源码精简后结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
// Ensure unique native storage with sufficient capacity for the following
// append.
private mutating func prepareForAppendInPlace(
totalCount: Int,
otherUTF8Count otherCount: Int
) {
// See if we can accommodate without growing or copying. If we have
// sufficient capacity, we do not need to grow, and we can skip the copy if
// unique. Otherwise, growth is required.

//......

// If we have to resize anyway, and we fit in smol, we should have made one
//......

// Non-unique storage: just make a copy of the appropriate size, otherwise
// grow like an array.
let growthTarget: Int
if sufficientCapacity {
growthTarget = totalCount
} else {
growthTarget = Swift.max(
totalCount, _growArrayCapacity(nativeCapacity ?? 0))
}
self.grow(growthTarget) // NOTE: this already has exponential growth...
}

@usableFromInline
internal mutating func grow(_ n: Int) {
// ......

// TODO: Don't do this! Growth should only happen for append...
let growthTarget = Swift.max(n, (self.uniqueNativeCapacity ?? 0) * 2)

if _fastPath(isFastUTF8) {
let isASCII = self.isASCII
let storage = self.withFastUTF8 {
__StringStorage.create(
initializingFrom: $0,
codeUnitCapacity: growthTarget,
isASCII: isASCII)
}

self = _StringGuts(storage)
return
}

_foreignGrow(growthTarget)
}

@inline(never) // slow-path
private mutating func _foreignGrow(_ n: Int) {
let newString = String(_uninitializedCapacity: n) { buffer in
guard let count = _foreignCopyUTF8(into: buffer) else {
fatalError("String capacity was smaller than required")
}
return count
}
self = newString._guts
}

bool doDecrementSlow(RefCountBits oldbits, uint32_t dec) {
RefCountBits newbits;

// constant propagation will remove this in swift_release, it should only
// be present in swift_release_n
if (dec != 1 && oldbits.isImmortal(true)) {
return false;
}

bool deinitNow;
do {
newbits = oldbits;

bool fast =
newbits.decrementStrongExtraRefCount(dec);
if (fast) {
// Decrement completed normally. New refcount is not zero.
deinitNow = false;
}
else if (oldbits.isImmortal(false)) {
return false;
} else if (oldbits.hasSideTable()) {
// Decrement failed because we're on some other slow path.
return doDecrementSideTable<performDeinit>(oldbits, dec);
}
else {
// Decrement underflowed. Begin deinit.
// LIVE -> DEINITING
deinitNow = true;
assert(!oldbits.getIsDeiniting()); // FIXME: make this an error?
newbits = oldbits; // Undo failed decrement of newbits.
newbits.setStrongExtraRefCount(0);
newbits.setIsDeiniting(true);
}
} while (!refCounts.compare_exchange_weak(oldbits, newbits,
std::memory_order_release,
std::memory_order_relaxed));
if (performDeinit && deinitNow) {
std::atomic_thread_fence(std::memory_order_acquire);
_swift_release_dealloc(getHeapObject());
}

return deinitNow;
}

由于 dealloc 的流程过长,这里就不再贴出后续代码。不难看出,当 String 的长度增长到一定程度时,__StringStorage 会被扩容。对照数组的相关行为理解,原有的内存空间一定会被回收,并开辟新的内存空间去存储扩容后的字符串。而由于多线程竞态访问的原因,在某个时刻 __StringStorage 需要被释放时其引用计数不满足条件最终导致崩溃:

1
2
3
4
5
6
7
bool canBeFreedNow() const {
auto bits = refCounts.load(SWIFT_MEMORY_ORDER_CONSUME);
return (!bits.hasSideTable() &&
bits.getIsDeiniting() &&
bits.getStrongExtraRefCount() == 0 &&
bits.getUnownedRefCount() == 1);
}

总结

在日常的开发活动中,我们通常情况下会特别注意多线程的数据安全问题。但当使用某些第三方 SDK 时,可能会疏于防范(默认以为是主线程回调,实则不尽然),上述两个例子是笔者在工作过程中血淋淋地教训。或许我们都读过很多框架或者系统底层的相关源码,但单纯个人观点觉得,任何技术知识的学习如果能够落实到分析和解决问题上,相比漫无目的地为了阅读而阅读更能受益匪浅。知识如海洋,一个庞大的代码库足以让人迷失在其中,而明确的目的意识才能让我们在其中徜徉~