Beware of Scanf when trying to build optimized solutions.

xuanbao · · 514 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>So I&#39;m aware there is a bug report out for this, or something of that nature when I dug into it (not that deep, honestly) but I thought I&#39;d take here and just post a general warning about when you&#39;re writing solutions for problems reading in input.</p> <p>I realize my solutions were rather contrived as I work through a HankerRank problem (so beware, the code snippets I&#39;m posting solve a solution on HackerRank - view at your own risk). </p> <p>The particular problem was looking to see if there was an index in a given list of values where the right and left side totals were equivalent. The input looked something like:</p> <pre><code>2 3 1 2 3 4 1 2 3 3 </code></pre> <p>and for that input, the expected output was:</p> <pre><code>NO YES </code></pre> <p>And to explain slightly, the first list, when starting at index 0, the right total is 0, the left total (excluding the current index) is 5. Moving forward you have 1 for the right total and 3 for the left total, and finally you end on 3 for the right total and 0 for left. So no, at no point is there a match.</p> <p>For the second one (and for brevity I&#39;ll use this notation {right total} &lt;- {current value} -&gt; {left total})</p> <pre><code>0 &lt;- 1 -&gt; 8 1 &lt;- 2 -&gt; 6 3 &lt;- 3 -&gt; 3 </code></pre> <p>At this point, the left and right totals match and so the result is <code>&#34;YES&#34;</code>. So that&#39;s the problem - and here was my first solution to the problem (in Go).</p> <p><a href="https://gist.github.com/bbuck/8cb4a69e6e9b9deb1fa9#file-scanf_solution-go">https://gist.github.com/bbuck/8cb4a69e6e9b9deb1fa9#file-scanf_solution-go</a></p> <p>Which worked, and upon submission got a green check form all but two test cases. Those it timed out on (taking more than 4 seconds). So I spent some of my &#34;Hackos&#34; as they refer to it and bought the test case (and since it has to be purchased with a currency via the site I won&#39;t be making the test case available myself) and looked over the input. I ran it locally and compared to the expected output - got the right answers but it was running (I ran it with <code>time</code>) ~4.4 seconds.</p> <p>The test case itself was just the maximal number of values given so they topped with 10 lists, each 100,000 numbers long and that&#39;s where things were going south. So I hurriedly copied my solution into C and tried that. I didn&#39;t even change much - just using C&#39;s <code>scanf</code>.</p> <p><a href="https://gist.github.com/bbuck/8cb4a69e6e9b9deb1fa9#file-scanf_solution-c">https://gist.github.com/bbuck/8cb4a69e6e9b9deb1fa9#file-scanf_solution-c</a></p> <p>First, before I go forward let me be clear here - I&#39;m not comparing C to Go directly. I never expected nor do I expect Go to match C 100% but the difference in speed between these two solutions was astounding to me. Go averaged 4.4 seconds, the C solution average 0.1 second. I submitted that, got all my green checkmarks and moved on with my life.</p> <p>Today it just kept eating at me, surely I could do something in Go that would get the runtime down to something that wouldn&#39;t time out. So I opted to give my first shot with the <code>bufio.Scanner</code> tool and use <code>strconv</code> to convert the numbers, and this is what I came up with up (please excuse lack of error handling, this was just a test for my personal use and not something I condone or encourage for real work):</p> <p><a href="https://gist.github.com/bbuck/8cb4a69e6e9b9deb1fa9#file-bufio_conv_solution-go">https://gist.github.com/bbuck/8cb4a69e6e9b9deb1fa9#file-bufio_conv_solution-go</a></p> <p>When running this under <code>time</code> it average about ~0.4 seconds. Not quite as fast, but 10x faster than when using <code>fmt.Scanf</code>. </p> <p><strong>tl;dr</strong></p> <p>Heads up, <code>fmt.Scanf</code> is pretty slow in Go. It&#39;s probably best to avoid it unless you&#39;re not reading an extensive amount of data. If you are, it&#39;s best to fallback to something like <code>bufio.Scanner</code> and convert what you need as necessary from there.</p> <p>I&#39;d like to see, at some point, if the Go implementation of scanf can become more competitive to C&#39;s.</p> <hr/>**评论:**<br/><br/>Fwippy: <pre><p>I think this was posted a couple days ago on here, but Rob Pike doesn&#39;t hold <code>ScanF</code> in high opinion either: <a href="https://github.com/golang/go/issues/12275#issuecomment-133796990">https://github.com/golang/go/issues/12275#issuecomment-133796990</a></p></pre>jerf: <pre><p>Bear in mind huge swathes of Go&#39;s underlying code are pretty easy to look at, understand, and C&amp;P for modification. You can start about <a href="https://golang.org/src/fmt/scan.go#L1139">here</a> if you&#39;re interested.</p> <p>But, as Rob Pike mentions in the post Fwippy links, really the whole Scanf interface (by which I mean not just Go, but the entire <em>idea</em>) is deeply and irretrievably flawed. It&#39;s a window back to almost the dawn of the text-based era, when programmers still labored under the delusion that handling text would be easy.</p> <p>(As successful as UNIX has been, if it had somehow not been invented, and someone just today came up with the idea of making a big ol&#39; collection of tiny programs that emitted textual streams at each other that you could pipe together and then building huge swathes of infrastructure on this primitive, they&#39;d be laughed out of the building.)</p></pre>izuriel: <pre><p>Well it&#39;s definitely not something I would use in an extensive way - it&#39;s very simple to get off the ground running with it here for these types of problems and all I was really trying to get at was a kind of &#34;public service announcement&#34; so to speak. Before I started working these challenges I&#39;d never used it. I&#39;d just gone with <code>strconv</code> and whatever process I needed to read the data in.</p> <p>But it&#39;s good to learn that Rob Pike cares little for it.</p></pre>fubo: <pre><blockquote> <p>(As successful as UNIX has been, if it had somehow not been invented, and someone just today came up with the idea of making a big ol&#39; collection of tiny programs that emitted textual streams at each other that you could pipe together and then building huge swathes of infrastructure on this primitive, they&#39;d be laughed out of the building.)</p> </blockquote> <p>Eh. The preceding idea was to do that with binary records, often fixed-length, and with programming languages that make bad Perl examples look readable.</p> <p>Today, folks might use semi-human-readable but formally-defined formats like XML or JSON; or binary formats that can be unambiguously parsed into human-readable form for debugging, such as protobuf. But I doubt anyone wants to go back to the days of virtual card punches.</p></pre>jerf: <pre><p>You sort of reiterate my point... compared to XML/JSON/Protobuf, plain text <em>is</em> virtual card punching.</p></pre>fubo: <pre><p>Well, yeah. Except for the human-readability part.</p> <p>Classic Unix has its atrocities in that regard though. The mail system, particularly sendmail.cf and procmail, kind of stand out ...</p> <p>The other thing that jumps out at me as crappy about plain-text systems is that they always end up involving <em>encoding</em> for stuff that doesn&#39;t fit into their ideas of &#34;text&#34;. Quoted-printable, anyone? (Oh look, again it&#39;s the mail system at fault.)</p></pre>jerf: <pre><p>Again... as you end up pointing out... the &#34;human readability&#34; aspect ends up oversold. Besides, general principle of programming, forcing an abstraction on top of something that can&#39;t actually conform to that abstraction produces very sad results.</p> <p>You seem to be looking at a rabbit with me... or, perhaps with just a bit of a perspective change on the same set of facts, you&#39;ll find <a href="https://upload.wikimedia.org/wikipedia/commons/thumb/a/ab/Kaninchen_und_Ente.png/1024px-Kaninchen_und_Ente.png" rel="nofollow">you also see a duck</a>, without any of the facts changing....</p> <p>Ignore the sales pitch for plain-text human-readable protocols... we&#39;ve been living with them for decades now, we have the experience to simply look at them. They aren&#39;t useless, but they&#39;ve got some really, really, <em>really</em> bad issues that are basically unfixable. (And remember &#34;plain-text&#34; in the UNIX world really means <em>text</em>... even &#34;HTTP&#34; is more structure than UNIX has and is really no longer &#34;plain text&#34;.) Look at them for what they are, both good and bad, not what they&#39;re &#34;supposed&#34; to be.</p></pre>enneff: <pre><p>You will get much better performance out of fmt.scan if you read from a buffered input stream. In your first example, wrap <code>os.Stdin</code> with a <code>bufio.Reader</code> and then pass that to <code>fmt.Fscanf</code>. I&#39;m sure your results will be much better. </p></pre>izuriel: <pre><p>I&#39;ll give that a go and see what happens there.</p></pre>izuriel: <pre><p>Okay, my guess was that <code>Fscanf</code> functions like <code>Scanf</code>but when I switch things around to use it, the program no longer prints the correct results.</p> <p>To be clear, I added <code>reader := bufio.NewReader(os.Stdin)</code> and then changed all references to <code>Scanf</code> to <code>Fscanf(reader,</code>.</p> <p>I will continue to look into it, but that&#39;s unfortunate. In other news, it finished in 0.1 seconds - although without producing valid answers I&#39;m not counting the time yet.</p></pre>enneff: <pre><p>If you made those changes as described then the code should function no differently to before. Want to share your updated snippet?</p></pre>izuriel: <pre><p>As soon as I get back in front of a computer. </p></pre>izuriel: <pre><p>As promised (better late than never).</p> <p><a href="https://gist.github.com/bbuck/8cb4a69e6e9b9deb1fa9#file-bufio_scanf_solution-go" rel="nofollow">https://gist.github.com/bbuck/8cb4a69e6e9b9deb1fa9#file-bufio_scanf_solution-go</a></p> <p><strong>edit</strong></p> <p>When I do some basic print debugging it seems to read some weird numbers. Given this input file:</p> <pre><code>2 3 1 2 3 4 1 2 3 3 </code></pre> <p>The first value (T, number of test cases) is read properly, <code>2</code>. The second value (n, number of items in test case) seems to be read properly as well <code>3</code> which would be the number of values in the array for the first test case. Then for the list it does read three values but we get [0 1 2] instead of [1 2 3] and then we get a <code>3</code> for the second test case value of <code>n</code> (should be <code>4</code>) and for that list we get [0 4 4] which is funny since 4 only appears in the list once.</p> <p>When swapped back to just <code>Scanf</code> everything works normally. There is definitely something firing differently with <code>bufio.Reader</code> and <code>Fscanf</code>.</p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

514 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传