Does Go over do it sometimes?

polaris · · 411 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>Caveat Emptor - I am new to Go.</p> <p>Recently I have been fooling around in GO and decided I wanted to parse a large text file containing all movies pretty much ever made formatted in a tab delimited file. I obtained this from <a href="http://omdbapi.com/" rel="nofollow">OMDB</a>, he does great work and I happily donated. </p> <p>I thought this would be a simple program, as in PHP this was simple, I just split each line on \t and put it into an array.</p> <p>So looking at Go I am thinking, no problem. A little bit of research later, I find encoding/csv, great! I can just change the reader.Comma to be \t then read all! Go is great! Alas, I found the catch. Go is so bloody picky about quotations that the file is effectively useless. I am entirely unable to use this file since quotes exist in the text of some movie descriptions (like all of them) and that throws the error, and all is lost.</p> <p>I understand why Go does this, but I don&#39;t see why it is standard. As far as I can tell it is to allow multi-line text blocks, by using the quotes to denote boundaries. However, this has nothing to do with CSV or TSV. It should just accept each token as a string and spit it out. All this extra parsing just makes it so I will have to reformat all TSV/CSV I get to be Go friendly, any quotes to appear in text now need to be double quoted. Its too much extra, in my opinion it is over engineered.</p> <p>This needed to be a simple class, and I will write my own version of it, but I am disappointed by the stdlib.</p> <p>/rant</p> <p>Edit: So lots of people have been giving constructive options. LazyQuotes + FieldsPerRecord makes me happy. Sorry if it ruffled some feathers, I truly do enjoy Go, was just tired and frustrated.</p> <hr/>**评论:**<br/><br/>[deleted]: <pre><p><a href="https://play.golang.org/p/2xxs8pFCOs" rel="nofollow">https://play.golang.org/p/2xxs8pFCOs</a> Seems to work perfectly fine to me. What am I missing?</p> <blockquote> <p>ll this extra parsing just makes it so I will have to reformat all TSV/CSV I get to be Go friendly, any quotes to appear in text now need to be double quoted. Its too much extra, in my opinion it is over engineered.</p> </blockquote> <p>In my opinion, you don&#39;t seem to understand what <code>csv</code> is.</p></pre>shovelpost: <pre><p>The title should have been: &#34;Do I over do it sometimes?&#34;</p></pre>Fireynis: <pre><p>What&#39;s over doing it by using stdlib to something?</p></pre>neoasterisk: <pre><p>So this is basically wanting to use a screwdriver to chop a piece of meat and complaining that the screwdriver cannot be used for that purpose.</p> <p>Actually, I can relate to this situation. I&#39;ve faced a similar problem some months ago when I wanted to do something strange with my csv file and I was expecting <code>enconding/csv</code> to be configurable enough to let me do it. But the catch is, my change made my file to be a non-valid csv.</p> <p>The <code>enconding/csv</code> package is designed for valid csv and nothing else. If you want do decode something that <em>looks</em> or <em>is similar</em> to csv but <em>is not</em> csv then do not use <code>enconding/csv</code>. You are using the wrong tool. Simple as that.</p> <blockquote> <p>This needed to be a simple class, and I will write my own version of it, but I am disappointed by the stdlib.</p> </blockquote> <p>The Go standard library provides the basic tools to be able to read any file. All you need is a for loop and some programming. So it&#39;s not stdlib you should be disappointed with.</p></pre>Fireynis: <pre><p>The modification I made was a change to the delimiter, that is all. That hardly would be a massive customization.</p> <p>I disagree with the analogy you gave, rather it is like asking for hammer and being given a jack hammer. I just need to hit one nail not plow through concrete. </p> <p>Why does the CSV class need to deal with quotable strings? I anticipate simple tools from the stdlib to build on. I have been able to parse my file, I just found it silly that CSV made my life more complicated.</p></pre>dasacc22: <pre><p>Have an upvote, you&#39;re not the only one a little frustrated with the csv package. Being new to go, I&#39;d suggest not taking your experience with package csv too hard, the std lib is generally high quality and a great read for learning go as well.</p> <p>I&#39;d also recommend getting familiar with accessing the docs if you haven&#39;t already, for example: <a href="https://golang.org/pkg/encoding/csv/#Reader" rel="nofollow">https://golang.org/pkg/encoding/csv/#Reader</a></p> <p>&#34;If LazyQuotes is true, a quote may appear in an unquoted field and a non-doubled quote may appear in a quoted field.&#34;</p> <p>As someone else mentioned, setting LazyQuotes to true should take care of your case without you having to alter your data.</p> <p>Also, for accessing third-party documentation: <a href="https://godoc.org/" rel="nofollow">https://godoc.org/</a></p></pre>Fireynis: <pre><p>This seems like it could fix my problem. Thank you, I did not see that before.</p></pre>neoasterisk: <pre><blockquote> <p>The modification I made was a change to the delimiter, that is all. That hardly would be a massive customization.</p> </blockquote> <p>So this is no longer a CSV file yet you want to use <code>encoding/csv</code>?</p> <blockquote> <p>Why does the CSV class need to deal with quotable strings?</p> </blockquote> <p>Because of the CSV spec.</p></pre>Fireynis: <pre><p>Yes it&#39;s not CSV it&#39;s a TSV, not drastically different. But CSV spec varies wildly, this is gos selection of CSV spec. </p></pre>tmornini: <pre><p>Despite their similar names TSV is not CSV-with-tabs.</p></pre>_ak: <pre><p>There is no such thing as a specification. There exist various conflicting specifications for CSV by various companies and vendors, but that&#39;s about it. Given that mess, <code>encoding/csv</code> does a pretty good job. Sorry it doesn&#39;t quite match your requirements.</p></pre>upboatact: <pre><p>you must not realize how CSV files are encoded, and if we are talking about php it has fgetcsv which is the correct tool to use to parse CSV files, splitting on \t is the naive and very incorrect way but if you want to do it that way you have strings.Split and friends</p></pre>Fireynis: <pre><p>I realize CSV are fields delimited by commas, and TSV by tabs. Outside of that its a wild card. Its not that this is hard, its more that this is something that the stdlib could have handled, and everything else is abov and beyond. It is over engineered, let me decide how to parse quotables, but make it simple to tokenize my data.</p></pre>danieldk: <pre><p>There is functionality for simple tokenization as well:</p> <p><a href="https://golang.org/pkg/bufio/#Scanner" rel="nofollow">https://golang.org/pkg/bufio/#Scanner</a></p> <p>You can use any of the provided splitting functions or define your own. Scanners work great for \t and \n delimited text files.</p></pre>Fireynis: <pre><p>That is what I ended up using. Nice tool.</p></pre>jussij: <pre><p>That fact that CSV fields are delimited by commas is just the first of several requirements for a valid CSV file. You are missing about 6 other such requirements.</p> <p><a href="http://edoceo.com/utilitas/csv-file-format" rel="nofollow">http://edoceo.com/utilitas/csv-file-format</a></p></pre>ratatask: <pre><p>Well, sadly csv is one of those things that seems simple, but isn&#39;t because of the myriad of various incompatible variants.</p> <p>It&#39;s not clear what the actual errors you get is, but did you try setting LazyQuotes to true ?</p></pre>Fireynis: <pre><p>Someone else mentioned this LazyQuote flag. Sounds like it may resolve my issue. Thanks for pointing it out.</p></pre>MatthiasLuft: <pre><p>The other day someone posted a csv package here that reads concurrently, which was faster in some use cases. Anyone has a pointer to that package? </p></pre>Fireynis: <pre><p>I would love that, the file I am parsing are huge.</p></pre>Davmuz: <pre><p>So You Want To Write Your Own CSV code? <a href="http://tburette.github.io/blog/2014/05/25/so-you-want-to-write-your-own-CSV-code/" rel="nofollow">http://tburette.github.io/blog/2014/05/25/so-you-want-to-write-your-own-CSV-code/</a></p> <p>Even for PHP, don&#39;t write it yourself but use this <a href="http://php.net/manual/en/function.fgetcsv.php" rel="nofollow">http://php.net/manual/en/function.fgetcsv.php</a> and this <a href="http://php.net/manual/en/function.fputcsv.php" rel="nofollow">http://php.net/manual/en/function.fputcsv.php</a></p></pre>HadronHubbub: <pre><p>Can you give a sample of the TSV file (which includes a line that <code>encoding/csv</code> doesn&#39;t parse correctly)?</p></pre>metamatic: <pre><p>Go&#39;s CSV package isn&#39;t a general-purpose delimited data reader; it&#39;s specifically a CSV reader.</p> <p>If you have a tab-delimited file with no tab characters in the data, you can just use <code>strings.Split</code>to get the individual fields. If the values have quotes, you might find that <code>strconv.Unquote</code> solves that problem.</p></pre>cweimann: <pre><p>Your input data is tab delimited not csv. So don&#39;t use encoding/csv. All you need is string.Split. <a href="https://golang.org/pkg/strings/#Split" rel="nofollow">https://golang.org/pkg/strings/#Split</a></p></pre>[deleted]: <pre><blockquote> <p>Your input data is tab delimited not csv.</p> </blockquote> <p>The <code>delimiter</code> is irrelevant.</p> <blockquote> <p>All you need is string.Split. <a href="https://golang.org/pkg/strings/#Split" rel="nofollow">https://golang.org/pkg/strings/#Split</a></p> </blockquote> <p>If you actually have a csv file, this is absolutely the wrong thing to do.</p></pre>cweimann: <pre><p>Right, IF you have a csv file and he DOES NOT have a csv file. He has a tab delimited file. I know that because he said he has a tab delimited file. I&#39;m not really sure why you think he has a csv file. With csv files and code to deal with them typically quotes have meaning so if you try to use csv code on non-csv data you wind up with EXACTLY the problem he desccribes.</p></pre>[deleted]: <pre><p>Like I said &#34;The <code>delimiter</code> is irrelevant.&#34;. In this case none of the values are quoted, because they don&#39;t need to be; the quotes are part of the value.</p></pre>cweimann: <pre><p>Right, none of the values are quoted. Because it is a tab delimited file and not a tsv file. Therefore strings.Split is NOT absolutely the wrong thing to do. That is why I suggested it.</p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

411 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传