Beware of Scanf when trying to build optimized solutions.

xuanbao · · 514 次点击

这是一个分享于的资源，其中的信息可能已经有所发展或是发生改变。

So I'm aware there is a bug report out for this, or something of that nature when I dug into it (not that deep, honestly) but I thought I'd take here and just post a general warning about when you're writing solutions for problems reading in input. I realize my solutions were rather contrived as I work through a HankerRank problem (so beware, the code snippets I'm posting solve a solution on HackerRank - view at your own risk). The particular problem was looking to see if there was an index in a given list of values where the right and left side totals were equivalent. The input looked something like: <pre><code>2 3 1 2 3 4 1 2 3 3 </code></pre> and for that input, the expected output was: <pre><code>NO YES </code></pre> And to explain slightly, the first list, when starting at index 0, the right total is 0, the left total (excluding the current index) is 5. Moving forward you have 1 for the right total and 3 for the left total, and finally you end on 3 for the right total and 0 for left. So no, at no point is there a match. For the second one (and for brevity I'll use this notation {right total} <- {current value} -> {left total}) <pre><code>0 <- 1 -> 8 1 <- 2 -> 6 3 <- 3 -> 3 </code></pre> At this point, the left and right totals match and so the result is <code>"YES"</code>. So that's the problem - and here was my first solution to the problem (in Go). <a href="https://gist.github.com/bbuck/8cb4a69e6e9b9deb1fa9#file-scanf_solution-go">https://gist.github.com/bbuck/8cb4a69e6e9b9deb1fa9#file-scanf_solution-go</a> Which worked, and upon submission got a green check form all but two test cases. Those it timed out on (taking more than 4 seconds). So I spent some of my "Hackos" as they refer to it and bought the test case (and since it has to be purchased with a currency via the site I won't be making the test case available myself) and looked over the input. I ran it locally and compared to the expected output - got the right answers but it was running (I ran it with <code>time</code>) ~4.4 seconds. The test case itself was just the maximal number of values given so they topped with 10 lists, each 100,000 numbers long and that's where things were going south. So I hurriedly copied my solution into C and tried that. I didn't even change much - just using C's <code>scanf</code>. <a href="https://gist.github.com/bbuck/8cb4a69e6e9b9deb1fa9#file-scanf_solution-c">https://gist.github.com/bbuck/8cb4a69e6e9b9deb1fa9#file-scanf_solution-c</a> First, before I go forward let me be clear here - I'm not comparing C to Go directly. I never expected nor do I expect Go to match C 100% but the difference in speed between these two solutions was astounding to me. Go averaged 4.4 seconds, the C solution average 0.1 second. I submitted that, got all my green checkmarks and moved on with my life. Today it just kept eating at me, surely I could do something in Go that would get the runtime down to something that wouldn't time out. So I opted to give my first shot with the <code>bufio.Scanner</code> tool and use <code>strconv</code> to convert the numbers, and this is what I came up with up (please excuse lack of error handling, this was just a test for my personal use and not something I condone or encourage for real work): <a href="https://gist.github.com/bbuck/8cb4a69e6e9b9deb1fa9#file-bufio_conv_solution-go">https://gist.github.com/bbuck/8cb4a69e6e9b9deb1fa9#file-bufio_conv_solution-go</a> When running this under <code>time</code> it average about ~0.4 seconds. Not quite as fast, but 10x faster than when using <code>fmt.Scanf</code>. tl;dr Heads up, <code>fmt.Scanf</code> is pretty slow in Go. It's probably best to avoid it unless you're not reading an extensive amount of data. If you are, it's best to fallback to something like <code>bufio.Scanner</code> and convert what you need as necessary from there. I'd like to see, at some point, if the Go implementation of scanf can become more competitive to C's. <hr/>**评论：** Fwippy: <pre>I think this was posted a couple days ago on here, but Rob Pike doesn't hold <code>ScanF</code> in high opinion either: <a href="https://github.com/golang/go/issues/12275#issuecomment-133796990">https://github.com/golang/go/issues/12275#issuecomment-133796990</a></pre>jerf: <pre>Bear in mind huge swathes of Go's underlying code are pretty easy to look at, understand, and C&P for modification. You can start about <a href="https://golang.org/src/fmt/scan.go#L1139">here</a> if you're interested. But, as Rob Pike mentions in the post Fwippy links, really the whole Scanf interface (by which I mean not just Go, but the entire idea) is deeply and irretrievably flawed. It's a window back to almost the dawn of the text-based era, when programmers still labored under the delusion that handling text would be easy. (As successful as UNIX has been, if it had somehow not been invented, and someone just today came up with the idea of making a big ol' collection of tiny programs that emitted textual streams at each other that you could pipe together and then building huge swathes of infrastructure on this primitive, they'd be laughed out of the building.)</pre>izuriel: <pre>Well it's definitely not something I would use in an extensive way - it's very simple to get off the ground running with it here for these types of problems and all I was really trying to get at was a kind of "public service announcement" so to speak. Before I started working these challenges I'd never used it. I'd just gone with <code>strconv</code> and whatever process I needed to read the data in. But it's good to learn that Rob Pike cares little for it.</pre>fubo: <pre><blockquote> (As successful as UNIX has been, if it had somehow not been invented, and someone just today came up with the idea of making a big ol' collection of tiny programs that emitted textual streams at each other that you could pipe together and then building huge swathes of infrastructure on this primitive, they'd be laughed out of the building.) </blockquote> Eh. The preceding idea was to do that with binary records, often fixed-length, and with programming languages that make bad Perl examples look readable. Today, folks might use semi-human-readable but formally-defined formats like XML or JSON; or binary formats that can be unambiguously parsed into human-readable form for debugging, such as protobuf. But I doubt anyone wants to go back to the days of virtual card punches.</pre>jerf: <pre>You sort of reiterate my point... compared to XML/JSON/Protobuf, plain text is virtual card punching.</pre>fubo: <pre>Well, yeah. Except for the human-readability part. Classic Unix has its atrocities in that regard though. The mail system, particularly sendmail.cf and procmail, kind of stand out ... The other thing that jumps out at me as crappy about plain-text systems is that they always end up involving encoding for stuff that doesn't fit into their ideas of "text". Quoted-printable, anyone? (Oh look, again it's the mail system at fault.)</pre>jerf: <pre>Again... as you end up pointing out... the "human readability" aspect ends up oversold. Besides, general principle of programming, forcing an abstraction on top of something that can't actually conform to that abstraction produces very sad results. You seem to be looking at a rabbit with me... or, perhaps with just a bit of a perspective change on the same set of facts, you'll find <a href="https://upload.wikimedia.org/wikipedia/commons/thumb/a/ab/Kaninchen_und_Ente.png/1024px-Kaninchen_und_Ente.png" rel="nofollow">you also see a duck</a>, without any of the facts changing.... Ignore the sales pitch for plain-text human-readable protocols... we've been living with them for decades now, we have the experience to simply look at them. They aren't useless, but they've got some really, really, really bad issues that are basically unfixable. (And remember "plain-text" in the UNIX world really means text... even "HTTP" is more structure than UNIX has and is really no longer "plain text".) Look at them for what they are, both good and bad, not what they're "supposed" to be.</pre>enneff: <pre>You will get much better performance out of fmt.scan if you read from a buffered input stream. In your first example, wrap <code>os.Stdin</code> with a <code>bufio.Reader</code> and then pass that to <code>fmt.Fscanf</code>. I'm sure your results will be much better. </pre>izuriel: <pre>I'll give that a go and see what happens there.</pre>izuriel: <pre>Okay, my guess was that <code>Fscanf</code> functions like <code>Scanf</code>but when I switch things around to use it, the program no longer prints the correct results. To be clear, I added <code>reader := bufio.NewReader(os.Stdin)</code> and then changed all references to <code>Scanf</code> to <code>Fscanf(reader,</code>. I will continue to look into it, but that's unfortunate. In other news, it finished in 0.1 seconds - although without producing valid answers I'm not counting the time yet.</pre>enneff: <pre>If you made those changes as described then the code should function no differently to before. Want to share your updated snippet?</pre>izuriel: <pre>As soon as I get back in front of a computer. </pre>izuriel: <pre>As promised (better late than never). <a href="https://gist.github.com/bbuck/8cb4a69e6e9b9deb1fa9#file-bufio_scanf_solution-go" rel="nofollow">https://gist.github.com/bbuck/8cb4a69e6e9b9deb1fa9#file-bufio_scanf_solution-go</a> edit When I do some basic print debugging it seems to read some weird numbers. Given this input file: <pre><code>2 3 1 2 3 4 1 2 3 3 </code></pre> The first value (T, number of test cases) is read properly, <code>2</code>. The second value (n, number of items in test case) seems to be read properly as well <code>3</code> which would be the number of values in the array for the first test case. Then for the list it does read three values but we get [0 1 2] instead of [1 2 3] and then we get a <code>3</code> for the second test case value of <code>n</code> (should be <code>4</code>) and for that list we get [0 4 4] which is funny since 4 only appears in the list once. When swapped back to just <code>Scanf</code> everything works normally. There is definitely something firing differently with <code>bufio.Reader</code> and <code>Fscanf</code>.</pre>

入群交流（和以上内容无关）：加入Go大咖交流群，或添加微信：liuxiaoyan-s 备注：入群；或加QQ群：692541889

514 次点击

加入收藏微博

github

runtime

0 回复

添加一条新回复（您需要登录后才能回复没有账号？）

请尽量让自己的回复能够对别人有帮助
支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
支持 @ 本站用户；支持表情（输入 : 提示），见 Emoji cheat sheet
图片支持拖拽、截图粘贴等方式上传

Beware of Scanf when trying to build optimized solutions.

用户登录

今日阅读排行

一周阅读排行

最新主题