<p>I'm currently playing around with the <code>sync/atomic</code> package and I wanted to create a lock-free <code>struct</code> array which is save for concurrent use. </p>
<p>I would like to know if anyone has ever done this before and if so if it is even possible by using <code>sync/atomic</code>. Note that using a lock is probably way easier but that is not my intention. Would love to know if someone has a suggestion on how to implement this</p>
<p>Edit: for some reason I can only see 3 comments when reddit says there are 7. Excuse me if I don't respond! </p>
<hr/>**评论:**<br/><br/>NichtMitCommander: <pre><p>Maybe my lock-free hashmap has some inspiration for you: <a href="https://github.com/cornelk/hashmap" rel="nofollow">https://github.com/cornelk/hashmap</a></p></pre>epiris: <pre><p>In general these types of libraries should be avoided, this one also requires unsafe usage which makes it pretty much a deal breaker. They are usually slower in real applications and often mistakes (or purposely disingenuous) exist in the benchmarks. For example in your <a href="https://github.com/cornelk/hashmap/blob/master/benchmark_test.go#L109" rel="nofollow">BenchmarkRead*WithWritesUint</a> benchmarks you are not actually measuring write contention. What you are measuring 1x your count writes, which probably finishes before the reads even begin as run parallel has to go through its setup. Repeat this probably 3 times I imagine 1 * count, 100 * count, 10K * count. Each time only contention is during the initial iterations, which are almost certain to not run for the duration of the RunParallel calls.</p>
<p>A correct test would look more like:</p>
<pre><code>b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
for i := uintptr(0); i < benchmarkItemCount; i++ {
l.RLock()
j, _ := m[i]
l.RUnlock()
if j != i {
b.Fail()
}
// or w := i ; may yield slightly different access
// patterns, i.e. clustered contention for slots
w := benchmarkItemCount - i
l.Lock()
m[w] = w
l.Unlock()
}
}
})
</code></pre>
<p>Also you aren't testing the same type, you are storing a *uintptr in ReadHash and a uintptr value on the sync map. This detail isn't a huge difference because you're just working with something word sized which will hit all the same comparison operations. But when the comparison operations changed there would be noticeable divergences. Actually going to fix them cause I'm curious..</p>
<p>Okay- with the benchmarks fixed, for the usecase that sync.Map isn't intended for (write contention) your implementation is actually under 2x faster at the cost of unsafe import (not sure why you require that? interface would have identical performance characteristics but safer to use).</p>
<pre><code>// BenchmarkCorrectReadGoSyncMapUint-24 300000 4038 ns/op 0 B/op 0 allocs/op
// BenchmarkCorrectReadGoSyncMapValUint-24 300000 3998 ns/op 0 B/op 0 allocs/op
// BenchmarkCorrectReadHashMapUint-24 500000 2528 ns/op 0 B/op 0 allocs/op
</code></pre>
<p>I figured it would be faster for a contentious but consistent set of keys to use a map of atomic Values, and it was twice as fast.</p>
<pre><code>// BenchmarkCorrectReadGoAtomicMapUintMutex-24 1000000 1921 ns/op 0 B/op 0 allocs/op
// BenchmarkCorrectReadGoAtomicMapUintValMutex-24 1000000 1910 ns/op 0 B/op 0 allocs/op
</code></pre>
<p>Not picking on your or anything, but just wanted to share how benchmarks can be twisted to appear favorable when in reality they wouldn't serve well for real world applications. My atomic map wouldn't work in a real world application with a unknown set of keys, but as we see that isn't what we are benchmarking so I cheated :-) - sparse keys will likely have completely different foot prints, and creating computational pressure for your system, such as what may be experienced in real world applications will also start to show how user-space spin locking begins to break apart as the intelligence of the runtimes scheduler is eluded. Meaning you lose all scheduling fairness since there is a very real potential for a goroutine running on a M assigned a P being strained by an unrelated system process to be unlucky and block for a prolonged period of time. The kernel may preempt your busy loop, but the Go runtime scheduler will not.</p>
<p>I'm not saying these types of things never have advantages for some use cases, but they would have to be extremely specific.. in a general sense I just feel they are always the wrong solution.</p>
<p>gist of fixed <a href="https://gist.github.com/cstockton/e22b990bfc915a3cc4856220287ddbeb" rel="nofollow">benchmarks and my atomic map</a></p></pre>forfunc: <pre><p>Interesting project, some very cool code to play with thanks for sharing!</p></pre>seankhliao: <pre><p>your could try <a href="https://golang.org/pkg/sync/#Map" rel="nofollow"><code>sync.Map</code></a> which looks like it fits your use case (stable keys, updates limited to single goroutine)</p></pre>forfunc: <pre><p>Good suggestion I will take a look at that! </p></pre>jamra06: <pre><p>I’ve never heard of lock free arrays, but I read a book on concurrent algorithms a while back that creates log(n) locks to lock the array operations, but still free other parts of the array. I thought it was a cool trick. </p>
<p>What are you trying to accomplish? Maybe there is a way of using atomic operations differently. </p></pre>forfunc: <pre><p>That sounds interesting. What I'm currently doing is keeping an order book for a cryptotrading exchange in memory. This order book receives updates and also reads for the trading algorithms to run. It has up to 30 entries for both bid and ask. Which are 2 seperate arrays. Currently I'm copying the array and sending it through on every update with a call back function. The callback is implemented in the main function and will send the copy to other parts of the program. I was wondering if I could speed up this</p></pre>epiris: <pre><p>Is that really a bottle neck though? Cryptotrading sounds like a place where strong proof of correctness far outweighs cost of copy operations. That said though this is a better description what is the slow down? Do you have a profile you can share?</p>
<p>There is a good chance your causing slowdowns elsewhere, if you are more specific or have code to post how you transport the array I can probably post some safe performance tips. Without seeing it:</p>
<ul>
<li>Callbacks are often a source of a lot of unexpected allocations, it's hard for the compiler to prove variables captured in a closure don't escape the scope. Sometimes you can avoid this by capturing in parameters in a context like defer or Go.</li>
<li>Do you copy the entire array or only the changes? If you may only have half the indices saved, depending on the size of the array you could instead push a changeset slice, that is a set of struct values with the change and the index that it occurred at.</li>
<li>if the two things above don't help a syncpool for your changes can completely amoritize the cost of GC pressure, if it actually exists. </li>
</ul>
<p>All that is left is the copy operations, you can't do anything about that even with unsafe atomic primitives. If the callers are blocked until the frame that completes the work returns, you may be able to borrow their changes memory but you need strong guarantees it's safe of course. Probably not worth it considering just how cheap copying memory is next to crypto operations. Can't say much more without specifics, but that may help.</p></pre>forfunc: <pre><p>Thank you very much for your reply. This helped alot. I tend to use a lot of closures for flexibility but maybe this is something I need to change (either interface method or smntng). About the copying part it only sends a few items from the array as a copy the rest of the program not the whole array itself </p></pre>epiris: <pre><p>XY problem, describe your access patterns for the array. Otherwise create any array and you technically have one safe for concurrent reads and writes, long as it's not at the same index. FYI different types of lockless structures designed for high concurrency reads still use a Mutex for slow paths too, pure atomic sync primitives are more or less pretty awful in Go. This is because you have to resort to spin locking which is extremely inefficient because you have no means to cooperate with the scheduler from outside the runtime.</p></pre>forfunc: <pre><p>Thanks for the thorough explanation, is it possible to manually poke the scheduler from the runtime package? I think I will have to measure it first with a lock! Thanks </p></pre>epiris: <pre><p>Depends what you mean by poke, all the sync primitives Go provides are tightly integrated with the runtime, sync.Cond, mutex's, channels, etc. The only way through stdlib to interact with the runtime otherwise in a similar fashion is runtime.Gosched(); but it's usefulness is pretty minimal, it yields a G's current P but does not put the G to sleep. Think of it as essentially queueing the goroutine to the back of a queue of ready to execute goroutines. So while you wait for an atomic condition to be met, you have superfluous dequeues that waste time other goroutines could spend doing work. So the runtime is essentially polling your goroutine "are you ready? are you ready?...".</p>
<p>Compare this to the sync primitives like a cond, they all are parked (put to sleep) wasting no resources until an event (a goroutine calling broadcast or signal) occurs. At which point the runtime can use the event to unpark all the related goroutines into the queue (don't hold me to exact semantics).</p>
<p>This is why it's those types of libraries often post such web-scale benchmarks. It's pinning your cpu as hard as it can to do it's one task, meet an aotmic condition. But often they completely degrade when integrated into tight spots of real world applications, because there is real world work to be done elsewhere and the pressure caused on those components can emerge into completely different bottle necks elsewhere in your application.</p>
<p>Point here being for the most part lock contention is usually a non issue, when it becomes an issue usually reevaluating those points very carefully allows you to partition your data. Usually though you are using the word Mutex on that line, there is not true mutual exclusion requirements by all the callers who hold it, or the exclusion requirement can be lifted by refactoring the fields it protects to be transient or copied upon creation.</p></pre>tscs37: <pre><p>I did something vaguely in this direction using RCU (Read-Copy-Update).</p>
<p>The short overview would be; you have a pointer to an array and whenever you write to the array, you copy it entirely, make the modification and update the pointer.</p>
<p>You can guard the pointer in various ways but the main useful part is that you can have several readers on old versions of the array while a single writer can update and new readers see the fresh version.</p>
<p>If you need full ACID semantics (which I guess you do judging from other comments), you'll have to use a RWLock and fully serialize access with a single-writer-many-reader approach.</p>
<p>I would not recommend using sync/atomic as a lock is probably what you want.</p></pre>daveddev: <pre><p>A word of caution: Be sure to benchmark expected/realistic use cases of a read/write lock to ensure that it's accounting mechanisms aren't slower than using a more simple lock.</p>
<p><a href="https://youtu.be/KJS3ikoiLso" rel="nofollow">https://youtu.be/KJS3ikoiLso</a></p></pre>tgulacsi: <pre><p>You can access elements of a preallocated slice concurrently if you sync on the element. For example you can allocate a slice for the results, start as many goroutines as the length of the slice, having each goroutine write its result into the ith element of the slice.
Later you can walk the slice and reduce on it.</p>
<p>(This implicitly has two synchronization points: slice allocation, and signaling completion of the goroutines).</p></pre>
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889
- 请尽量让自己的回复能够对别人有帮助
- 支持 Markdown 格式, **粗体**、~~删除线~~、
`单行代码`
- 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
- 图片支持拖拽、截图粘贴等方式上传