Calling into Go from Python?

blov · · 205 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>I&#39;m wondering if anyone has much experience interoperating between Go and Python3. I have a large Python3 code base that has a hot-path which I&#39;d like to try porting to Go; however, the cost of IPC/marshaling seems to prohibit a multiprocess approach, so I&#39;m considering rewriting the hot path in Go and having Python hand-off execution to the Go code. I&#39;m looking for any experiences from people who have tried this including blog posts, etc. Thanks!</p> <hr/>**评论:**<br/><br/>wybiral: <pre><p>You can build a shared library in Go [1] and then call into it with ctypes from Python [2]. But if marshalling really is the bottleneck stopping you from a multiprocess approach, my bet is that you may still have performance problems and overhead interacting between Python and Go.</p> <p>[1] <a href=""></a></p> <p>[2] <a href=""></a></p></pre>weberc2: <pre><p>Thanks for the links. I&#39;m somewhat concerned about the overhead of calling between Python and Go as well, but the problem is that there is a large Python data structure which needs to be operated on. If I can avoid marshaling that data structure and instead reference it from Go, then I&#39;ll save the marshaling cost, and if I can make only a few calls between Go and Python, then hopefully I can mitigate the overhead as well.</p></pre>jerf: <pre><p>Alas, the data structures between Python and Go are so dissimilar that the code for making Go use Python data structures directly would make using the Go <code>reflect</code> library look simple by comparison. That&#39;s even before we consider what the Go GC thinks of interacting with Python in Python&#39;s space.</p> <p>Plus, a non-trivial amount of Python&#39;s slowness is precisely the inefficient way it stores everything. If you wrote Go that tried to operate directly on Python&#39;s data structures, I wouldn&#39;t care to guarantee it would run much more quickly! You basically have a marshaling problem no matter what; the cost of shipping it across a network may actually not be that much by comparison.</p> <p>(The whole line from the dynamic scripting languages about how if they are too slow you can always just accelerate it by dropping down into C (or Go) is not something I&#39;d call a <em>lie</em> necessarily, but it is definitely something that is sold in a deceptive way. It&#39;s really useful for things like NumPy where you have a C component with an efficiently-packed data structure that you need to lightly coordinate in Python. But when you have a very complicated data structure in Python and you&#39;re running a complicated algorithm over it, the &#34;drop to a lower language&#34; option is hardly any easier than reimplementing from scratch in that language. You can end up with situations where it&#39;s still faster to do it Python because simply marshaling into an efficient format and the unmarshaling back out of it <em>already</em> blows out your time budget, even before you actually do the computation. It is sold as a magic bullet for all speed problems, and therefore you don&#39;t need to evar worry about speedz, dude! In reality it only works for a restricted subset of issues that may arise.)</p></pre>wybiral: <pre><p>Just tossing out ideas, but could the structure live in a Go process and then be operated on via an API from Python? Or possibly live in something like Redis/SQL/Mongo where both processes would have access?</p> <p>I realize that there may be too much Python logic mixed into the data structure, but generally you&#39;d expect the data to live closer to where the performance-critical code is executing otherwise moving it around will be an issue.</p></pre>weberc2: <pre><p>Unfortunately the structure is convoluted and largely undocumented. I don&#39;t know what permutations it would have, so I can&#39;t very well rewrite it in Go (not in its entirety anyway). I was hoping that there might be a way to rewrite known chunks and wrap those chunks in Python interfaces where necessary, but I&#39;d have to take care not to let it point into Go-managed memory. I&#39;m open to suggestions.</p></pre>sbinet: <pre><p>Maybe gopy could help you? It can now generate Python modules from Go packages (generating a .so file + the ctypes file to expose a somewhat pythonic interface.)</p> <p><a href="" rel="nofollow"></a></p></pre>Virtual-Go: <pre><p>You could also have your python code call a go binary to run in a new process. The go process could be provided all the data it needs to fully run without further communication and the calling process will be notified when the binary is finished running.</p> <p>Not sure on the performance overhead of this. Probably really depends on how much information you need to pass back and forth. If its suitable for the go binary to perform it&#39;s effects as side effects then you may be able to just wait for the binary to finish successfully and then continue via a callback in your python code without rebinding the data.</p> <p>I perform something similar but reversed on one of my machines for library access. I need to use a ruby library but my server is written in go. So I have a ruby script that I can pass a few command line arguments to and then I simply call the script from my go process. If I need to pass some data back I send it on standard out and read it in golang once the process finishes. For you the same approach in reverse may work out if you can minimize the necessary communications.</p></pre>tipsqueal: <pre><p>I think you should consider using something like <a href="" rel="nofollow">Cython</a> instead. If you&#39;re not familiar it lets you write a hybrid of Python and C code, the file is then compiled to a C-extension which can help improve performance.</p></pre>weberc2: <pre><p>Thanks for the suggestion, but I&#39;m wary of investing in Cython. I expect whatever language we choose to gradually eat away the Python (our performance requirements are projected to grow more severe) and I would prefer to have a large Go code base to a large Cython one. It may still prove to be the best choice, but I want to evaluate other options first.</p></pre>tipsqueal: <pre><p>If that&#39;s the case your best bet is to just re-write your Python code to Go (or whatever other language will work for your use case). There are several companies that write highly performant code in Python at scales that most applications will never reach. I would recommend watching <a href="" rel="nofollow">this video</a> from a developer at Instagram talking about Cython and how it worked for them. IIRC he goes over the good and bad.</p></pre>hell_0n_wheel: <pre><blockquote> <p>the cost of IPC/marshaling seems to prohibit a multiprocess approach</p> </blockquote> <p>The Linux kernel manages to pass data between processes at gigabit rates. Exactly how much data are you passing?</p></pre>weberc2: <pre><p>The cost is (de)serialization, not moving bits. We attempted to parallelize with multiprocessing, but the time spent pickling neutralized our gains. Unless there is a faster serialization format, I&#39;m not sure what to do.</p></pre>dgryski: <pre><p>Pickle is a terrible serialization format. There are many others (all with different tradeoffs, of course):</p> <p>A sample: <a href="" rel="nofollow"></a></p></pre>weberc2: <pre><p>What is your criteria for &#34;terrible&#34;? The reason I mentioned it was because I would expect it to be at least <em>performant</em>, and if we can&#39;t get good performance marshaling with pickle between Python processes, I wouldn&#39;t expect to get significant performance from a different format. I wasn&#39;t proposing using it as a serialization format between Python and Go.</p> <p>My original assumption may still be wrong, but hopefully this clarifies my rationale.</p></pre>hell_0n_wheel: <pre><blockquote> <p>I would expect it to be at least performant</p> </blockquote> <p>For specific applications, perhaps, but obv. not yours.</p> <p>Sounds like you need to make a CPython extension to do your serialization on the Python end, then pass the context to Go.</p></pre>weberc2: <pre><p>Presumably the popular serialization libs are already in C? Especially Pickle.</p></pre>Shammyhealz: <pre><p>Nope. In Python 2, there is pickle and cPickle. In Python 3, importing pickle will try to import cPickle or fall back to regular pickle. cPickle claims to be &#34;up to 1000 times faster&#34;. It might be worth checking whether you&#39;re actually loading cPickle.</p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

205 次点击  
加入收藏 微博
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传