go vs erlang for numerical computing

xuanbao · · 754 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>I&#39;m trying to decide which language is better suited for numerical computing. By this I mean, for applying numerical methods for optimization, specially evolutionary algorithms, in a concurrently manner. These are my some of my needs: math operations with vectors and matrices (such as dot product, sum by columns, vector addition, etc), like a very basic matlab.</p> <p>Currently my optimization toolbox is written completely in python, which is great because it has numpy, scipy, matplotlib, and it is easier than matlab in a few things. However, almost all of my algorithms can be executed concurrently and python is not so great for this, so I&#39;m looking for a better language for this task.</p> <p>As far as I know, erlang and go are excellent choices, and I like both of them. However, erlang is not the best option when vectors are involved (specially when you are working with matrices). On the other hand, go could be a better option because it could be easier to implement some matrix operations. However, it has no REPL nor pattern matching like erlang. So, which language would be better?</p> <hr/>**评论:**<br/><br/>bunnyslopist: <pre><p>The kind of concurrency found in golang and erlang are not suited for numerical computation. For high performance computing you need access to at least vectorization (SIMD) and fine-grained threading, neither of which you have direct access to in those languages. Of course, you have indirect access to it via binding to C/C++/Fortran libraries, which is what Python/numpy and Matlab are doing.</p></pre>sbinet: <pre><p>what do you mean by &#34;fine-grained threading&#34;? Go certainly has a very lightweight threading model (channels+goroutines) with access to low-level threading primitives (mutexes, locks).</p> <p>no NUMA awareness. yet. (see: <a href="https://docs.google.com/document/d/1d3iI2QWURgDIsSR6G2275vMeQ_X7w-qxM2Vp7iGwwuM/pub">https://docs.google.com/document/d/1d3iI2QWURgDIsSR6G2275vMeQ_X7w-qxM2Vp7iGwwuM/pub</a>)</p></pre>srbufi: <pre><p>This might help you: <a href="https://github.com/gonum">https://github.com/gonum</a></p></pre>SportingSnow21: <pre><p>The mature numerical library is gonum, as <a href="/u/srbufi" rel="nofollow">/u/srbufi</a> suggests. Go doesn&#39;t have a cousin to the scipy stack. </p> <p>I&#39;m working on an n-dimensional library to do some numpy-style work <a href="https://github.com/Kunde21/numgo" rel="nofollow">numgo</a> which will be backed by tuned assembly code (sse, avx, and fma). It&#39;s got a long way to go before it&#39;s production-ready, though. You&#39;re welcome to contribute if you have desire/time.</p></pre>howeman: <pre><p>As <a href="/u/srbufi" rel="nofollow">/u/srbufi</a> says, you should take a look at gonum. Specifically, there is a pull request in the works for concurrent global optimization, which paves the way for evolutionary algorithms.</p> <p>Simd is useful as <a href="/u/bunnyslopist" rel="nofollow">/u/bunnyslopist</a> says, though the things you mention (i.e. dot product) are implemented in gonum with simd</p></pre>sbinet: <pre><p>Go has actually many different REPL, at various stages of usability:</p> <ul> <li><a href="https://github.com/sbinet/igo" rel="nofollow">https://github.com/sbinet/igo</a></li> <li><a href="https://github.com/motemen/gore" rel="nofollow">https://github.com/motemen/gore</a></li> <li><a href="https://github.com/go-interpreter/ssainterp" rel="nofollow">https://github.com/go-interpreter/ssainterp</a></li> </ul> <p>see this document for an overview/proposal for an interpreter: <a href="https://docs.google.com/document/d/1Hvxf6NMPaCUd-1iqm_968SuHN1Vf8dLZQyHjvPyVE0Q/edit?usp=sharing" rel="nofollow">https://docs.google.com/document/d/1Hvxf6NMPaCUd-1iqm_968SuHN1Vf8dLZQyHjvPyVE0Q/edit?usp=sharing</a></p> <p>also, Go gained recently a Jupyter/IPython kernel: <a href="https://github.com/gophergala2016/gophernotes" rel="nofollow">https://github.com/gophergala2016/gophernotes</a></p> <p>(and, as others have said, there is gonum/matrix, gonum/lapack, gonum/optimize and gonum/plot for the rest)</p></pre>Toenex: <pre><p>I know your question is with respect to Go &amp; Erlang but have you considered <a href="http://julialang.org">Julia</a>? From what you have written I would have thought it merited consideration.</p></pre>robertmeta: <pre><p>As I said on the other thread, start a python per core, communicate over ZMQ and you will easily max out your CPUs. </p> <p>Go and Erlang are uniquely NOT suited for numeric computing, I would think you should be looking at R and Julia. Maybe Roger (<a href="http://www.senseye.io/announcing-roger-a-library-providing-simple-access-to-r-from-go/" rel="nofollow">http://www.senseye.io/announcing-roger-a-library-providing-simple-access-to-r-from-go/</a>).</p></pre>CSI_Tech_Dept: <pre><p>Erlang was not designed for heavy computations. It&#39;s strengths are primarily to write resilient programs. So any other language might be better.</p> <p>Since you have already code written in python I would recommend to explore multiprocessing module. It is not affected by GIL. And it is great when algorithms don&#39;t work on common memory regions (which is what you generally strive for with parallelism). Also with multiprocessing you can even make it work on multiple computers. If you need the processes to communicate with each other you might need to implement serialization.</p></pre>sbinet: <pre><p>in my experience, multiprocessing is a nice hack and generally serviceable, but it&#39;s not very reliable, even when working on the same machine: workers get stuck, workers don&#39;t report back to the master when stuck, signals are ignored, etc... it&#39;s great for a quick one-off parallelism job. not -again, in my experience- for a reliable job.</p></pre>auraham: <pre><p>I know that there are a lot of libraries for concurrency/parallelism in python. In your opinion, which one would be more reliable than others?</p></pre>jjolla888: <pre><p>try F# (fsharp.org)</p></pre>sybrandy: <pre><p>Just to add to this a little, in case you&#39;re interested the D Programming language has built-in SIMD capabilities and a parallelism package in the standard library. Also, any C library that can do what you want should be usable in both Go and Erlang, so if you need to use either you can go down that route.</p></pre>auraham: <pre><p>I started to read about dlang and i have to say it seems great.</p></pre>itsmontoya: <pre><p>Erlang is fantastic, but I would never use it for number crunching. Although I&#39;m nearly certain that golang would be the fastest of the two, there might be better languages choices for your use-case</p></pre>bboozzoo: <pre><p>For starters, you&#39;re interested in parallelism, not concurrency. Seconds, Erlang is realy bad for computation. There&#39;s HiPE, but it&#39;s really a huge a joke. Unless you want to spend time profiling BEAM, stay away from Erlang for raw computation. Erlang can do great in soft real-time ang gets really efficient with overhead of a single processes and their scheduling. Believe me or not, but I&#39;m actually slowly working on an article comparing soft real-time Elarng vs. Go. Unfortunately Go fares really bad in soft real-time, the 2 biggest hurdles being GC (<code>GOGC=off</code> was a must, otherwise tweaking the gc memory step, just to get killed by 120s enforced collection) and scheduling. With the Go&#39;s current execution model it is just impossible to achieve the same efficiency as Erlang, mainly because the scheduling if enforced by BEAM (Erlang&#39;s VM) for every couple of hundrded instructions. For comparison in Go, the scheduling points were normally places where you&#39;d wait for I/O, or recently, the additional point at the function stack change (eventually you need to fall back to <code>runtime.Gosched()</code>). If that is not enough, the GC is per process in Erlang, hence there&#39;s very little distracion enforced by GC when you run. This in start cotrast to Go with the stop the world model. Finally, the Erlang process vs. goroutine size leans toward Erlang. In Go, the goroutine stack is at least 2kB (since 1.4?), in Erlang is a couple of hundred bytes (IIRC 128-256B). To translated that into numbers, Raspberry Pi could handle ~136000 goroutines at once, with Erlang we could get up to ~280000 processes. </p> <p>But wait, the issue was about HPC. Summing up Erlang sucks, executed by VM, HiPE being a joke. Go sucks, the compiler is not as good as gcc or clang, I&#39;m not sure if it&#39;s aware of SIMD intrinsics yet. It certainly was not 2 years back. Your only hope in Go is interfacing with C libraries. Just stick to Python + NumPy. Try explore the avaialble parallelism in NumPy by making sure you&#39;re operating on matrices/vectors always. If that fails, take a look at Numba or Cython. From experience, Numba is easier to use. If you want more, look into Julia, or C++ with OpenMP.</p></pre>auraham: <pre><blockquote> <p>you&#39;re interested in parallelism, not concurrency</p> </blockquote> <p>That&#39;s totally right</p> <blockquote> <p>Believe me or not, but I&#39;m actually slowly working on an article comparing soft real-time Elarng vs. Go</p> </blockquote> <p>I would really like to read it!</p> <blockquote> <p>Just stick to Python + NumPy. Try explore the available parallelism in NumPy by making sure you&#39;re operating on matrices/vectors always.</p> </blockquote> <p>I will take this advice into account and keep my legacy code</p></pre>bboozzoo: <pre><p>There was a nice comparison of C, Python and Julia for scientific computation: <a href="https://www.ibm.com/developerworks/community/blogs/jfp/entry/A_Comparison_Of_C_Julia_Python_Numba_Cython_Scipy_and_BLAS_on_LU_Factorization?lang=en" rel="nofollow">https://www.ibm.com/developerworks/community/blogs/jfp/entry/A_Comparison_Of_C_Julia_Python_Numba_Cython_Scipy_and_BLAS_on_LU_Factorization?lang=en</a></p></pre>SportingSnow21: <pre><blockquote> <p>I&#39;m not sure if it&#39;s aware of SIMD intrinsics yet</p> </blockquote> <p>Just an FYI: Go will vectorize up to SSE2 level. Beyond that, the binaries become machine-specific. So, it&#39;s a trade-off in the compiler between cross-compilation capability and machine-specific efficiency. </p></pre>bboozzoo: <pre><p>Does it do auto vectorisation like GCC does? Besides, SSE2 is pretty modest, nowhere near what I would expect. There&#39;s always an option to use <code>gccgo</code>, almost the best of both worlds (though usually a relase behind)</p></pre>SportingSnow21: <pre><p>The auto-vectorization is similar to GCC (obviously they&#39;re different algorithms, so it won&#39;t be exact). I agree that SSE2 is pretty modest, but that&#39;s as much as can be guaranteed on an amd64 architecture. The compiler is designed around building on a dev machine for deployment to a server, so I can understand why they didn&#39;t get too machine-specific with it. Once the byte-code goes beyond the guaranteed instruction-set, deployment becomes a roulette game.</p> <p>The assembler allows you to roll vectorization with higher-level instructions, if you&#39;re willing to hand-tune some assembly code. Concurrency/Parallelism gains can be applied to more situations, so those are usually the best goal for server-side code. Higher-tier vectorization in the compiler is probably a couple years off and will probably require a compiler flag.</p></pre>Fwippy: <pre><p>120 seconds for garbage collection?</p></pre>bboozzoo: <pre><p>GC is enforced every 120 seconds, regardless of GOGC step (obviously disabled GOGC is set to <code>off</code>). There&#39;s even some issues reported for it on github <a href="https://github.com/golang/go/issues/12478" rel="nofollow">https://github.com/golang/go/issues/12478</a> </p> <p>Depending on the speed of memory and the perormance of memory controller the collection will take more or less time. For instance, for a ~380MB process, on Raspberry Pi it took ~18 ± 2s (note RsPi has SDRAM@250-500MHz,), while on BeagleBone Black it took roughly 8 ± 1s(DDR3@800MHz). Both CPUs, BCM2836 and OMAP335x are single core. On A20-OLinuXino-LIME (dual core Cortex-A7) the timings were pretty similar to BBB, 7 ± 1s. IMO, the root of the problem is the amount of memory to scan in a GC run. Go looks at the whole heap. In comparison, in Erlang the amount of memory scanned by GC is more confined, and most of the time only a process private heap is collected (think of it as a separate, GCed heap per goroutine). Detailed writeup is here: <a href="https://hamidreza-s.github.io/erlang%20garbage%20collection%20memory%20layout%20soft%20realtime/2015/08/24/erlang-garbage-collection-details-and-why-it-matters.html" rel="nofollow">https://hamidreza-s.github.io/erlang%20garbage%20collection%20memory%20layout%20soft%20realtime/2015/08/24/erlang-garbage-collection-details-and-why-it-matters.html</a></p> <p>The question is why we did these measurements. It&#39;s not that I wanted to show that Go is bad and Erlang is good. On the contrary, I want to find out in what application Go fares well. Now I know that soft real-time is at the moment, a no go for Go. To put that into some context, we have a couple of soft real-time servers written to a industrial and home automation applications. All of that is currently in Erlang. The idea was to check if there are viable alternatives, a we all know that Erlang is a niche solution and attracting new talent is a problem, but I gues it&#39;s a longer story for another post.</p></pre>Fwippy: <pre><p>Ah, thanks for the writeup! I&#39;d misinterpreted you and thought you meant you had some pathological case in which a single GC paused for two minutes. Definitely going to be reading your link.</p></pre>seufert: <pre><ol> <li>Do you have a performance problem? I did not read anything about it in your post. Seems you wan&#39;t to optimise prematurely a good working solution.</li> <li>Erlang and Go don&#39;t have good abstractions for matrix computation</li> <li>Erlang and Go don&#39;t have good abstractions for parallelism, only for concurrency. You want parallelism.</li> <li>Languages which are better suited: C/C++/Fortran because of the strong tool and library support. Newer languages which might better fit your need are Rust or Julia. </li> </ol> <p>In general: Don&#39;t switch away from Python &#34;just because&#34;. The tooling in python and the libraries are excellent. As long if you don&#39;t a big problem with the performance of your python solution: Don&#39;t switch away.</p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

754 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传