Convert ruby script into a Go script

xuanbao · · 36 次点击    
<p>Hi guys, i have a ruby script to split big xml (20go~) into several xml. The script run in 15min and i would like to try Go for improving this running time.</p> <p>But i&#39;m pretty new in this language, and my way is hard.</p> <pre><code> require &#39;xml&#39; XML_FILE = &#39;./exported_0.xml&#39; # A special file class that we&#39;ll write its output in # successive files with the same number of items in it # (except for the last one which size may vary) class SplitFile def initialize(template_name, items_per_file) @template_name = template_name @items_per_file = items_per_file @item_count = 0 @file_index = 0 @fh = open() end def write(content) @fh.write(content) @fh.write(&#34;\n&#34;) @item_count += 1 if @item_count == @items_per_file self.close() self.open() end end def open exit if @file_index == 3 @file_index += 1 @fh = File.open(self.batch_name(), &#39;w&#39;) end def close @fh.close @item_count = 0 end def batch_name @template_name % @file_index end end # XML Processor to split the Product records into # smaller XML chunks class XMLSplitProcessor def initialize(xml_sourcefile, split_filename_template) @split_filename_template = split_filename_template @file_handle = SplitFile.new(split_filename_template, 1000) @reader = XML::Reader.file(xml_sourcefile) end def run while @reader.read do self.process_node() end end def process_node if @reader.name == &#39;Product&#39; &amp;&amp; reader.node_type == XML::Reader::TYPE_ELEMENT @file_handle.write(@reader.read_outer_xml) @reader.next end end end # Main XMLSplitProcessor.new(XML_FILE, &#34;batch_%s.xml&#34;).run </code></pre> <p>Do you have any tips to do the same things ? Espacially in XML ? Thx</p> <hr/>**评论:**<br/><br/>patrickdappollonio: <pre><p>Parse it with <code>encoding/xml</code> and handle the resulting Go structures </p></pre>AlpDelNeo: <pre><p>Do you mean something like this ? <a href="https://www.goinggo.net/2013/06/reading-xml-documents-in-go.html" rel="nofollow">https://www.goinggo.net/2013/06/reading-xml-documents-in-go.html</a></p> <p>am I forced to define all the node and child or can i just define a struct for my needed node ?</p> <pre><code>&lt;Products&gt; &lt;Product&gt; ... ... &lt;/Product&gt; &lt;Product&gt; ... ... &lt;/Product&gt; &lt;/Products&gt; </code></pre> <p>Could i define only Products and Products even if i have ton of child under Product ?</p> <p>My final purpose is to write 1000 Product into seperate file.</p></pre>slaveriq: <pre><p>You only need to define the fields in your struct that you need (and the parents to that field)</p></pre>AlpDelNeo: <pre><p>I used the code in the link below, but i can display only the attributes of one or all product.</p> <p>I don&#39;t understand how to display a complete product node...</p></pre>slaveriq: <pre><p>you probably want <code>[]Product</code> not just <code>Product</code></p></pre>masterwujiang: <pre><p>What&#39;s your purpose, for better maintainability or performance? I don&#39;t think Go will perform much better if your ruby code use XML library optimized by using C extension.</p></pre>AlpDelNeo: <pre><p>My purpose is performance. I handle a 20go xml file today, but this file could be 150Go in the future.</p> <p>I need to find a quickest solution than the ruby script.</p></pre>metamatic: <pre><p>I&#39;m not sure what XML library you&#39;re using, as it&#39;s not Ruby&#39;s standard library XML handling (REXML).</p> <p>Anyway, use Nokogiri and you&#39;ll probably <a href="http://www.rubyinside.com/ruby-xml-performance-benchmarks-1641.html" rel="nofollow">get a good speedup</a>. Nokogiri uses compiled C libraries for the heavy lifting, so it should be close in speed to Go, if not faster.</p> <p>If you&#39;re using a wrapper over libxml, which it <a href="https://xml4r.github.io/libxml-ruby/rdoc/classes/LibXML/XML/Reader.html" rel="nofollow">looks like you might be</a>, then you&#39;re already using native C code for the bulk of the work, so you probably won&#39;t get a big speedup from Go.</p></pre>
36 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet