Csv vs. Json - which is better to read data from?

blov · · 546 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>Here in the office we read data from csv files, but someone mentioned that we could use json files because it was faster. I just want to know if it is worth the time.</p> <p>Or any other file you might think would fit better.</p> <p>Thanks!</p> <hr/>**评论:**<br/><br/>metamatic: <pre><p>It&#39;s worth pointing out that CSV is a lot less standardized than JSON. While there&#39;s <a href="https://tools.ietf.org/html/rfc4180" rel="nofollow">RFC 4180</a>, in the real world you&#39;ll find a lot of variations from that.</p> <p>I&#39;ve dealt with reading and writing files containing millions of records of data as CSV, and I very much doubt speed of CSV vs JSON parsing will be a bottleneck for anything you&#39;re doing.</p></pre>gopherdiesthrowaway: <pre><p>This. The whole question sounds like premature optimization.</p></pre>Kipzy: <pre><p>We get a primary csv file which we need to work on. I&#39;ve been doing just fine with the csvs, but someone from the team(2guys working in python) said we could read the csv and write the same thing to a new JSON file so they could work &#34;faster&#34;.</p></pre>barsonme: <pre><p>This entirely depends what your data looks like and what you&#39;re using it for.</p></pre>xrstf: <pre><p>Depends on your data.</p> <p>CSV can usually by parsed and processed incrementally, so your memory footprint stays relatively constant. For a JSON file, you usually parse it as a whole (like you&#39;d do with a DOM parser for XML). I had a case where I wanted to load JSON 100MB JSON files and the memory usage exploded in my face (in PHP and Node.js).</p> <p>You <em>can</em> incrementally parse JSON (like with a SAX parser for XML), but then you have to keep track of your state and such.</p> <p>What you gain from using JSON is stricter string handling, so if you data contains &#34;bad&#34; characters for CSV (line breaks, commas, quotes), JSON can help you handle that.</p> <p>Some people went with the best of both words: A file with one JSON document per line, so you can still incrementally walk through the file, but also have the niceness of JSON parsing. There&#39;s a dedicated name for these JSON-line files, but I forgot it.</p> <pre><code>{&#34;id&#34;:1,&#34;foo&#34;:2} {&#34;id&#34;:1,&#34;foo&#34;:2} {&#34;id&#34;:1,&#34;foo&#34;:2} </code></pre> <p>You can also be a bit more strict and create a valid JSON <em>file</em> like this:</p> <pre><code>[ {&#34;id&#34;:1,&#34;foo&#34;:2}, {&#34;id&#34;:1,&#34;foo&#34;:2}, {&#34;id&#34;:1,&#34;foo&#34;:2} ] </code></pre> <p>Just skip the first and last line and trim the trailing commas. Or read the file as a whole. If you are the only producer and consumer of the JSON files, this might be a choice; otherwise I&#39;d stay away from this, as it leads to weird &#34;we output JSON, but we promise to format it in a special way so you can read it line by line and do this magic.&#34; statements in documentations.</p></pre>cathalgarvey: <pre><p>Instead of serialising a list of objects, one JSON-object per line gives you the best of both worlds: incremental encoding/decoding and the cleanliness of JSON.</p></pre>sbinet: <pre><p>incrementally parsing JSON is relatively easy though: <a href="https://godoc.org/encoding/json#example-Decoder-Token" rel="nofollow">https://godoc.org/encoding/json#example-Decoder-Token</a></p> <p>see for example my solution to the advent-of-code day#12 (which involved counting numbers inside deeply nested JSON objects): <a href="https://github.com/sbinet/advent-of-code/blob/master/day-12/main.go" rel="nofollow">https://github.com/sbinet/advent-of-code/blob/master/day-12/main.go</a></p> <p>modifying it to hand over users a stream of JSON objects lines should be relatively straightforward...</p></pre>schoenobates: <pre><p>Could this be the json-per-line format?: <a href="http://jsonlines.org" rel="nofollow">http://jsonlines.org</a></p></pre>xrstf: <pre><p>Yep, that was it!</p></pre>Kipzy: <pre><p>I won&#39;t be the only one working with this, we are a team of 4 people, 2 in go and 2 in python. </p> <p>We get a primary csv file which we need to work on. I&#39;ve been doing just fine with the csvs, but someone from the team(the 2 guys from python) said we could read the csv and write the same thing to a new JSON file so they could work &#34;faster&#34;.</p></pre>mc_hammerd: <pre><p>csv for tables of data</p> <p>json for key/value store and supports arrays/objects as values</p></pre>cathalgarvey: <pre><p>CSV has interchange value: you can dump from and import to spreadsheets. If you don&#39;t need this, use JSON.</p> <p>If you&#39;re dumping lots of data, instead of dumping a slice of objects, instead dump each object on a new line: all the benefits of CSV&#39;s linewise parsing, with all the type-hinting and structure of JSON.</p></pre>Justinsaccount: <pre><p>Loading in a json file may be faster development wise, but it is definitely not faster from a file size and performance standpoint.</p> <p>This is not likely to matter unless you are working with multi gigabyte data files. I think the last time I benchmarked it, extracting some columns from a csv file was a bit faster than doing the same thing for the same file written using the &#34;json lines&#34; method. Over 20G of log files, this adds up.</p> <p>If all you are doing is sending around a few megs of data, use whatever is easiest to work with.</p></pre>RobLoach: <pre><p>JSON.... More control.</p></pre>Fwippy: <pre><p>It seems unlikely that the speed of reading CSV files is meaningfully slower than JSON in your application.</p> <p>Have you profiled your code and seen that a lot of time is taken in the CSV reader? If so, create a benchmark with CSV and JSON readers, and compare. If JSON comes out a ways ahead, then you can consider switching to it for speed reasons.</p> <p>But if you haven&#39;t done that, this really smells like a very premature optimization.</p></pre>Kipzy: <pre><p>I haven&#39;t done that, but i&#39;m doing fine with the csv file. </p> <p>We get a primary csv file which we need to work on, but someone from the team(2guys working in python) said we could read the csv and write the same thing to a new JSON file so they could work &#34;faster&#34;.</p></pre>Fwippy: <pre><p>Oh, like faster for coding time, easier for them to work with? That&#39;s definitely a valid reason, I thought you meant execution time of the program.</p></pre>Kipzy: <pre><p>and it is for execution time of the program, plus we report with Csvs files, so i really want to know if it is a waste of time doing that csv2json thing</p></pre>earthboundkid: <pre><p>Gob. </p></pre>Kipzy: <pre><p>it won&#39;t be Go exclusive</p></pre>lethalman: <pre><p>There&#39;s no reason you should use csv. Write a json object per each line, and you will save your life. It&#39;s not about being fast, it&#39;s about being robust.</p> <p>By the time you really need a csv because your tool doesn&#39;t support json, then convert json to csv.</p> <p>Two main reasons compared to csv:</p> <p><strong>You can store one JSON object in one text line</strong></p> <p>That means reading and parsing json is a lot easier to parallelize (as long as your final algorithm is parallelizable, of course). While for a csv, if strings are encoded within quotes, a single csv record may span multiple lines, which makes it harder to parallelize.</p> <p><strong>JSON has data types: null, bool, number, strings</strong></p> <p>Csv has just strings without null, unless you have some kind of protocol (like the first row in the csv describes the types). That means you are losing information when exporting to csv: when importing you may need to interpret a &#34;true&#34; string as true boolean, a &#34;number&#34; string as a number, and an empty &#34;&#34; string as null, which may or may not be correct unless you have a clear schema of your csv.</p></pre>Justinsaccount: <pre><blockquote> <p>JSON has data types: null, bool, number, strings</p> </blockquote> <p>You&#39;re missing the bigger ones: JSON natively supports nested arrays and hashes in a standard way.</p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

546 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传