<p>Hi guys, i have a ruby script to split big xml (20go~) into several xml. The script run in 15min and i would like to try Go for improving this running time.</p>
<p>But i'm pretty new in this language, and my way is hard.</p>
<pre><code> require 'xml'
XML_FILE = './exported_0.xml'
# A special file class that we'll write its output in
# successive files with the same number of items in it
# (except for the last one which size may vary)
class SplitFile
def initialize(template_name, items_per_file)
@template_name = template_name
@items_per_file = items_per_file
@item_count = 0
@file_index = 0
@fh = open()
end
def write(content)
@fh.write(content)
@fh.write("\n")
@item_count += 1
if @item_count == @items_per_file
self.close()
self.open()
end
end
def open
exit if @file_index == 3
@file_index += 1
@fh = File.open(self.batch_name(), 'w')
end
def close
@fh.close
@item_count = 0
end
def batch_name
@template_name % @file_index
end
end
# XML Processor to split the Product records into
# smaller XML chunks
class XMLSplitProcessor
def initialize(xml_sourcefile, split_filename_template)
@split_filename_template = split_filename_template
@file_handle = SplitFile.new(split_filename_template, 1000)
@reader = XML::Reader.file(xml_sourcefile)
end
def run
while @reader.read do
self.process_node()
end
end
def process_node
if @reader.name == 'Product' && reader.node_type == XML::Reader::TYPE_ELEMENT
@file_handle.write(@reader.read_outer_xml)
@reader.next
end
end
end
# Main
XMLSplitProcessor.new(XML_FILE, "batch_%s.xml").run
</code></pre>
<p>Do you have any tips to do the same things ? Espacially in XML ?
Thx</p>
<hr/>**评论:**<br/><br/>patrickdappollonio: <pre><p>Parse it with <code>encoding/xml</code> and handle the resulting Go structures </p></pre>AlpDelNeo: <pre><p>Do you mean something like this ? <a href="https://www.goinggo.net/2013/06/reading-xml-documents-in-go.html" rel="nofollow">https://www.goinggo.net/2013/06/reading-xml-documents-in-go.html</a></p>
<p>am I forced to define all the node and child or can i just define a struct for my needed node ?</p>
<pre><code><Products>
<Product>
...
...
</Product>
<Product>
...
...
</Product>
</Products>
</code></pre>
<p>Could i define only Products and Products even if i have ton of child under Product ?</p>
<p>My final purpose is to write 1000 Product into seperate file.</p></pre>slaveriq: <pre><p>You only need to define the fields in your struct that you need (and the parents to that field)</p></pre>AlpDelNeo: <pre><p>I used the code in the link below, but i can display only the attributes of one or all product.</p>
<p>I don't understand how to display a complete product node...</p></pre>slaveriq: <pre><p>you probably want <code>[]Product</code> not just <code>Product</code></p></pre>masterwujiang: <pre><p>What's your purpose, for better maintainability or performance? I don't think Go will perform much better if your ruby code use XML library optimized by using C extension.</p></pre>AlpDelNeo: <pre><p>My purpose is performance. I handle a 20go xml file today, but this file could be 150Go in the future.</p>
<p>I need to find a quickest solution than the ruby script.</p></pre>metamatic: <pre><p>I'm not sure what XML library you're using, as it's not Ruby's standard library XML handling (REXML).</p>
<p>Anyway, use Nokogiri and you'll probably <a href="http://www.rubyinside.com/ruby-xml-performance-benchmarks-1641.html" rel="nofollow">get a good speedup</a>. Nokogiri uses compiled C libraries for the heavy lifting, so it should be close in speed to Go, if not faster.</p>
<p>If you're using a wrapper over libxml, which it <a href="https://xml4r.github.io/libxml-ruby/rdoc/classes/LibXML/XML/Reader.html" rel="nofollow">looks like you might be</a>, then you're already using native C code for the bulk of the work, so you probably won't get a big speedup from Go.</p></pre>
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889
- 请尽量让自己的回复能够对别人有帮助
- 支持 Markdown 格式, **粗体**、~~删除线~~、
`单行代码`
- 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
- 图片支持拖拽、截图粘贴等方式上传