Unsafe conversion between strings and byte slices

polaris · · 447 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>I&#39;ve been experimenting with unsafe conversions of a string to a byte slice and I was wondering, if I&#39;m willing to accept that my code will not be portable and may break in future implementations, what is the best way, currently, to do such a conversion? </p> <p>The <a href="https://golang.org/pkg/unsafe/">unsafe package</a> explicitly states that &#34;Conversion of a reflect.SliceHeader or reflect.StringHeader Data field to or from Pointer&#34; is a valid pattern, so I think the following should be valid as well:</p> <pre><code>func unsafeStrToByte(s string) []byte { strHeader := (*reflect.StringHeader)(unsafe.Pointer(&amp;s)) var b []byte byteHeader := (*reflect.SliceHeader)(unsafe.Pointer(&amp;b)) byteHeader.Data = strHeader.Data // need to take the length of s here to ensure s is live until after we update b&#39;s Data // field since the garbage collector can collect a variable once it is no longer used // not when it goes out of scope, for more details see https://github.com/golang/go/issues/9046 l := len(s) byteHeader.Len = l byteHeader.Cap = l return b } func TestUnsafeStrToByte(t *testing.T) { s := &#34;fizzbuzz&#34; expected := []byte(s) assert.Equal(t, expected, unsafeStrToByte(s)) } </code></pre> <p>I&#39;ve also seen that the <a href="https://github.com/OneOfOne/xxhash">xxhash package</a> does the following conversion:</p> <pre><code>func writeString(w io.Writer, s string) (int, error) { if len(s) == 0 { return w.Write(nil) } ss := (*reflect.StringHeader)(unsafe.Pointer(&amp;s)) return w.Write((*[0x7fffffff]byte)(unsafe.Pointer(ss.Data))[:len(s):len(s)]) } </code></pre> <p>I&#39;ve tested this and it works also but it seems a little more hacky since it relies on taking a slice of a large array which doesn&#39;t actually exist.</p> <p>Anyone else do something similar?</p> <hr/>**评论:**<br/><br/>skelterjohn: <pre><p>I find it easier to stick with one type, usually []byte, and do string conversion at the last mile when needed. That&#39;s usually for terminal output and there won&#39;t be much of that.</p></pre>jeromefroe: <pre><p>That&#39;s similar to the use case I have in mind: passing a string to an io.Writer as the last step in a pipeline without having to do a copy to convert the string to a byte slice.</p></pre>kemitche: <pre><p>Perhaps you should try and find/use a Writer with a WriteString method?</p></pre>jeromefroe: <pre><p>That&#39;s certainly an option. I&#39;ve used a bytes.Buffer in front of an io.Writer to buffer writes before, and it does offer a WriteString method. However, io.Writer is a very common interface and it requires a byte slice, so to be as flexible as possible I&#39;d like to be able to support it.</p></pre>zeiko_is_back: <pre><p><a href="https://play.golang.org/p/J6I7vkh8O3" rel="nofollow">https://play.golang.org/p/J6I7vkh8O3</a></p></pre>jeromefroe: <pre><p>I also experimented with constructing the StringHeader directly, but I felt it might be preferable to allocate a normal string and then take a StringHeader from it because the unsafe package offers the following warning: </p> <p>&#34;In general, reflect.SliceHeader and reflect.StringHeader should be used only as *reflect.SliceHeader and *reflect.StringHeader pointing at actual slices or strings, never as plain structs. A program should not declare or allocate variables of these struct types.&#34;.</p></pre>driusan: <pre><p>I&#39;m a little confused about why you want to use unsafe for this when you can safely cast between string and []byte?</p></pre>jeromefroe: <pre><p>Doing so requires an allocation and a subsequent copy of the bytes in the old string into the byte slice. However, if I can guarantee that I own the only reference to the string and will not being using it again, through knowledge of my application, I&#39;d like to be able to cast the string into a byte slice without the overhead of the allocation and a copy. Admittedly it&#39;s an optimization that is perhaps unnecessary, but I was curious.</p></pre>kl0nos: <pre><p>Probably because converting from strings to []byte makes copy so additional allocation.</p></pre>hgjkghjkhgfhf: <pre><p>I would just use conversion. It is very likey that future Go implementations could optimize away the allocations, just using the backing array of the string as the backing array of the slice.</p> <p>Also, this conversion is already highly optimized within the runtime. Unless your strings are colossal, or you are doing many string conversions(majority of the workload), then letting the runtime handle this conversion is your best option.</p></pre>v0idl0gic: <pre><p>Take care when your add things to maps and use strings as the key to make a real string, for example in some of my code:</p> <pre><code>func (this *cacheKey) String() string { if this.strBuf.Len() &lt; 1 { this.buildKey() } return this.strBuf.String() } func (this *cacheKey) StringView() string { if this.strBuf.Len() &lt; 1 { this.buildKey() } return unsafeCastToString(this.strBuf.Bytes()) } func (this *cacheKey) BytesView() []byte { if this.strBuf.Len() &lt; 1 { this.buildKey() } return this.strBuf.Bytes() } //** Here be dragons ** //Evil code that creates a string from a byte, without a copy. //Use with care :) [aka probably only to do something read-only with a temp string) func unsafeCastToString(rawStr []byte) string { if len(rawStr) == 0 { return &#34;&#34; } hdr := &amp;reflect.StringHeader{ Data: uintptr(unsafe.Pointer(&amp;rawStr[0])), Len: len(rawStr), } return *(*string)(unsafe.Pointer(hdr)) } </code></pre> <p>StringView() can be used for comparisons and map lookups</p> <p>String() is used for putting things in maps</p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

447 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传