returning []*T vs []T

polaris · · 462 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>When should we return []*T and when []T?</p> <p>Edit: Removed the article for less confusion.</p> <hr/>**评论:**<br/><br/>lobster_johnson: <pre><p>There are different use cases based on the pros/cons, and they&#39;re nearly exactly the same as non-slice versions (i.e. <code>T</code> vs. <code>*T</code>).</p> <p><code>[]T</code>:</p> <ul> <li>Pro: Contiguous in memory with respect to T (the runtime will allocate <code>sizeof(T) * cap</code>), which increases cache locality. For example, a loop over <code>[]int</code> can be very efficient and potentially even be vectorized.</li> <li>Con: To access a slice element, it has to be copied, which is more expensive than passing a pointer around. Similarly, modifying an element requires copying the element, modifying it and then copying it back.</li> </ul> <p><code>[]*T</code>:</p> <ul> <li>Pro: No copying needed in order to read/write elements.</li> <li>Con: Requires an indirection to dereference the stored pointer, which can point anywhere in RAM and will be unlikely to take advantage of cache locality.</li> </ul> <p>Cache locality also includes <a href="http://www.futurechips.org/chip-design-for-all/prefetching.html">RAM prefetching</a>; modern CPU architectures are complicated, but sequential access is generally faster than random access.</p> <hr/> <p>I would recommend using <code>[]T</code> unless you have a specific reason to want to minimize copying. For example, let&#39;s say we have this:</p> <pre><code>type Document struct { // Lots of fields here, making Document large } func ClassifyDocuments(docs []Document) map[string][]Document </code></pre> <p>Imagine we want to &#34;classify&#34; the documents based on some heuristic, like divide them into topics like &#34;business&#34;, &#34;sports&#34;, and so on. There&#39;s one input slice, and a map of topics to output slices. Some documents may be in multiple topics, though; and by using <code>[]Document</code>, we&#39;re potentially duplicating each document multiple times, which is wasteful. So we should probably do this instead:</p> <pre><code>func ClassifyDocuments(docs []*Document) map[string][]*Document </code></pre> <p>This allows the result to simply point to the same documents as the input. Except for the allocating the <code>map</code> and slices in the result, it&#39;s possible that this function doesn&#39;t need to allocate anything at all on the heap.</p></pre>connor4312: <pre><p>Also note that you can take a pointer to a slice element which avoids the copying and lets you call pointer methods on the type, while still maintaining that lovely chunk of continuous memory. Example: <a href="https://play.golang.org/p/pz0JVHj2dQ">https://play.golang.org/p/pz0JVHj2dQ</a></p> <p>The small downside is that (at least the last time I checked) in cases where the slice could otherwise be allocated on the stack, taking pointers to elements will cause Go to allocate it on the heap.</p></pre>lobster_johnson: <pre><p>Good point, and also in a loop to avoid copying via a <code>range</code>:</p> <pre><code>for i := range things { thing := &amp;things[i] } </code></pre></pre>zemo: <pre><blockquote> <p>modifying an element requires copying the element, modifying it and then copying it back.</p> </blockquote> <p>would this perform a copy?</p> <p><code>items[8].x = 10</code></p></pre>xiegeo: <pre><p>Use []*T when you need to use []*T. Otherwise just use []T. </p> <p>The rules are the same as using *T vs T.</p></pre>nhooyr: <pre><p>I&#39;ve always been using <code>*T</code> as the default for my methods, I thought it was the opposite. Use <code>*T</code> unless you need to use <code>T</code>.</p></pre>xiegeo: <pre><p>*T is a must for methods that modify T, or if you what *T to implement an interface, so methods tends to get called on pointers.</p></pre>nhooyr: <pre><p>But if I&#39;m not modifying it, I should use <code>T</code> by default unless I profile and find the receiver is large enough that it is causing issues?</p></pre>xiegeo: <pre><p>For method receivers, this sums it up nicely: <a href="https://golang.org/doc/faq#methods_on_values_or_pointers" rel="nofollow">https://golang.org/doc/faq#methods_on_values_or_pointers</a></p> <p>For me, if T is a rename of a simple type such as int, they I use T. but if T is a struct then I use *T in case I want to add a modifying method and the rest none modifying methods should be consistent.</p></pre>sh41: <pre><p>Relevant discussion in <code>go-github</code> library, started by Russ Cox:</p> <p><a href="https://github.com/google/go-github/issues/180">https://github.com/google/go-github/issues/180</a></p></pre>nesigma: <pre><p>Nice find! That settles it.</p> <p>So apparently the correct answer is that it depends on the size of T. When dealing with large structs it is better to use []*T like Russ Cox recommends especially because of the code that will iterate on that slice.</p> <p>It&#39;s finally crystal clear in my head. Thank you.</p></pre>uncle_bad_touches: <pre><p>Less memory fragmentation and fewer pointers to GC?</p></pre>kl0nos: <pre><p>When you copy slice out of the function you are not copying any data from the slice, you are copying pointer that is pointing to that data. Just use []T for slices.</p></pre>nesigma: <pre><blockquote> <p>Just use []T for slices.</p> </blockquote> <p>When is it more appropriate to return []*T?</p></pre>Deltigre: <pre><p>I used it temporarily for an object pool (to avoid excessive GC) but quickly changed to a linked list + free object stack implementation.</p> <p>Edit: another related use I can think of is maintaining small struct size with an array of pointers. Or you might want a list of objects you wish to mutate. Obviously these are all specialized use cases.</p></pre>materialdesigner: <pre><p>Isn&#39;t sync.Pool for this purpose?</p></pre>Deltigre: <pre><p>From the docs: </p> <blockquote> <p>On the other hand, a free list maintained as part of a short-lived object is not a suitable use for a Pool, since the overhead does not amortize well in that scenario. It is more efficient to have such objects implement their own free list.</p> </blockquote> <p>I use it to manage the per-routine pools because it includes synchronization, but the in-routine pools are singly-linked structs, placed in a stack when free.</p></pre>kl0nos: <pre><p>Don&#39;t use slice of pointers unless you really have to. It&#39;s another level of indirection, it will hurt your cache and prefetcher. Whenever you can just use []T instead of []*T.</p></pre>: <pre><p>[deleted]</p></pre>DocMerlin: <pre><p>No, an array is NOT a collection of pointers. It is a bunch of objects in memory. A slice is a pointer to an array, a length, and a capacity.</p></pre>Remi1115: <pre><p>I can&#39;t find any source that supports your statement.</p> <p>Could it be that you&#39;re confusing slices, which contain a pointer to the underlying array?</p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

462 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传