Why does Sum() from crypto hashes return array ?

xuanbao · 2016-11-15 08:00:10 · 688 次点击

这是一个分享于 2016-11-15 08:00:10 的资源，其中的信息可能已经有所发展或是发生改变。

The Sum method of hash.Hash return a byte slice.
Packages in subdirectories of crypto that provide hashes offer New functions that return implementations of hash.Hash. Create a hash with New, Write to it, call Sum(nil) and voila, you got your hash in a []byte.
They also directly provide a "convenient" Sum function witch directly hashes the data you gave it and return it to you. In an array...
Why an array ? In the case of crc32, i get that a uint32 can be more useful than a []byte but even then the function returning this uint32 and the method returning a []byte are clearly differentiated (hash.Hash.Sum vs crc32.Checksum)
But in the cases of hashes from crypto you have a function and a method looking pretty similar that have really different argument which look also pretty similar.
I was taught that the go way is to use slice most of the time and array only when they are needed. And I need some help to see why they are needed here.

评论：

bradfitz:

hash.Hash.Sum needs to return a slice because each implementation of Hash might have a different result size.

But crc32 always returns 32 bits, so uint32 makes sense.

And md5.Sum always returns 16 bytes (128 bit hash), so it returns an array to avoid allocating. Likewise with sha1.Sum, etc.

feeddageek:

/u/TheMerovius reply's made me consider that arrays may live on stack while slice will probably live on heap. Was that what you meant by allocation ? Place of allocation, not amount allocated ?
I think it start to make sens to me.

feeddageek:


And md5.Sum always returns 16 bytes (128 bit hash), so it returns an array to avoid allocating.  

Using an array to save 2 int worth of memory fell more like c than like go.  

Edit: Especially since you would have to slice that array to use it with about any function from the standard library.

F41LUR3:

Note the part about "avoid allocating". It's a performance concern, not so much a memory space concern, specially when you start using hashes in great number. The allocation overhead can add up really fast.

You're always welcome to wrap it and return what you want it to return for your use case.

feeddageek:

Then slice should be more efficient since array end up being copied...

Or is there something I don't understand correctly ?

https://play.golang.org/p/4tUHa8movG

edit: I mean it, I'm not being sarcastic. I fell there must be something I don't understand for this function to be like that and I wish to understand it.

TheMerovius:

If the compiler sees, that a [16]byte is returned, it can create 16 byte worth of space on the stack and have the function return it into that. Whereas, if a []byte is returned, it doesn't know the size, so the bytes themselves need to land on the heap.

Now, with inlining it doesn't need to matter, though a) I don't know if the compiler is clever enough and b) you can't always inline.

The [16]byte thing doesn't even need to be a compiler optimization; it can in theory be built into the calling convention.

All of that being said: I'm guessing. No idea about the actual implementation here.

[edit] Your example isn't very illustrative, because the bytes probably escape (as you do use a slice). That being said, a simplification shows, that my guess also doesn't happen :) So, no clue.

feeddageek:

You are right, me taking a slice of the array and returning it forces the compiler to allocate that array on heap. But even if this array live in heap after the return of get, the array that main obtain is a fresh new copy of it (on stack this time i suppose) whereas the array referenced by the returned slice, is the same as the one that was allocated inside get, the content of that array did not need to be copied.

JakeMolnar:

I wrote up some benchmarks for you that might help illustrate the difference. https://github.com/TheDorkKnight/arraybench

I tried to target only the difference between returning arrays and slices in my benchmarks, though I can't guarantee for sure that my methods are sound.

The upshot of my results: There is a noticeable improvement in performance when we return an array on the stack (BenchmarkSlice), instead of allocating a slice on the heap (BenchmarkArrayCopy).

I've also included benchmarks that help us compare performance when zero copying occurs (BenchmarkSliceWrappingArray and BenchmarkArray).

__ah:

You would usually just store the sum, not needing to "use it with about any function from the standard library." The sum will probably be stored in a database or in some in-memory structure. You can do equality comparisons with the byte arrays, which is generally all you want out of a sum.

var sumTable map[string][sha256.Size]byte
func check(id string, in []byte) bool {
    expectedSum := sumTable[id]
    newSum := sha256.Sum(in)
    return newSum == expectedSum
}

TheMerovius:

You will likely, with cryptographic hashes, not use ==, but use subtle.ConstantTimeCompare.

Redundancy_:

I actually find it a little confusing that md5.Sum does something different from hash.Hash.Sum

入群交流（和以上内容无关）：加入Go大咖交流群，或添加微信：liuxiaoyan-s 备注：入群；或加QQ群：692541889

688 次点击

加入收藏微博

slice

github

godoc

0 回复

暂无回复

添加一条新回复（您需要登录后才能回复没有账号？）

请尽量让自己的回复能够对别人有帮助
支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
支持 @ 本站用户；支持表情（输入 : 提示），见 Emoji cheat sheet
图片支持拖拽、截图粘贴等方式上传

Why does Sum() from crypto hashes return array ?

用户登录

今日阅读排行

一周阅读排行

最新主题