Can Go be a viable option for one-time-use scripts of the sort you might use Python for?

polaris · · 426 次点击    
这是一个分享于 的资源,其中的信息可能已经有所发展或是发生改变。
<p>I use Python basically every day for various tasks. In a lot of cases, it&#39;s for utility scripts that might process data files or modify a dataset so that it&#39;s easier to deal with, etc.</p> <p>For my master&#39;s thesis (in psycholinguistics), I&#39;ve written about 15,000 lines of Python so far across about 20 different scripts as part of an assembly line to get data from various sources, modify it in some manner, and pass it off to the next step in the assembly line.</p> <p>Python has been really, really great for this, because a lot of what I&#39;m doing is exploratory. I&#39;ll often have a vague idea of what I want my end result to be, but I basically have to figure out what it is and how to get there along the way.</p> <p>Unfortunately, for some things, Python is <em>soooooo</em> slow. As part of my project, I had many data points each broken down into in a tree structure. I also had an algorithm to quantify the similarity between any two data points. Then I had to compare each of ~20,000 data points with every other point. I would generously estimate that it took approximately forever for that computation to finish.</p> <p>I&#39;m a fairly experienced programmer; I actually completed most of a computer science degree before a systems programming course nearly killed me with boredom and I decided to focus on applying my CS skills to linguistics. This question is mostly about how the language itself can be used. Is it feasible to use Go for small but complex projects that don&#39;t have a clear plan set out at the beginning? I&#39;ve been reading the documentation and it seems like a nice language, but I&#39;m concerned about any initial overhead that might be required at the start of every new project.</p> <p>tl;dr: <strong>I like that with Python I can open up a blank file, start coding, and figure out my end goal in the process. Can you reasonably do that with Go?</strong></p> <hr/>**评论:**<br/><br/>howeman: <pre><p>I do that with Go; I do scientific algorithmic development research. While it is a great solution for me, your mileage my vary.</p> <p>There are clear ways in which python is better than Go as an ecosystem. There are tons of libraries that are already built and exist. Tools like ipython are great for interactively dealing with data. We&#39;ve been developing gonum and are closing the gap, but we&#39;re not anywhere close to what python has today.</p> <p>I do find Go to be great in terms of &#34;expanding&#34; from originally throwaway scripts. A major advantage is how great Go is with packages and composibility. Typically, it&#39;ll go something like this: </p> <ul> <li><p>I write a main function that&#39;s a big imperative code to test out an idea. </p></li> <li><p>I find a result that is interesting, and want to test variations on the theme, so I break the core functionality into functions and try with a couple of inputs</p></li> <li><p>I find that the function is actually a specific instance of a more general idea. I state this general idea as an interface, and define &#39;type mything struct{}&#39; with the original function as a method, and then add &#39;type thingtwo struct{}&#39; with a different instance of that method.</p></li> <li><p>Then I realize I need to do more than just the one main function is doing. For example, I want to make runs on a cluster, or I want a script that just plots, or whatever. I now rename the main package to &#39;projectname&#39;, and move the rest of the main function into &#39;$GOPATH/src/projectname/firstpass&#39;. The other main function(s) I need get different names (i.e. $GOPATH/src/projectname/clusterruns)</p></li> <li><p>Typically, at this point the project is getting someone complex, and I realize I want to test it out in a different regime. This new regime is generally somewhat more specific to a particular application, while the original exploratory code was more general. In linguistics, maybe you have a general idea about synonyms, but there are some particulars that apply to those used in england in 1800 that requires some particular treatment. I don&#39;t want to corrupt the clean inner code, so now I make a new package &#39;$GOPATH/src/projectname/particular&#39;. The base code in &#39;projectname&#39; is already defined with interfaces, so &#39;particular&#39; can just implement new types that satisfy those interfaces. It allows for the next level of &#34;throwaway&#34; script without interrupting the original code at all. </p></li> <li><p>I may eventually have an &#39;analyze&#39; subpackage that helps generate plots (or whatever) for all the different flavors.</p></li> <li><p>As always happens, some deadline comes up and you have to do ugly things in order to meet it. Here, embedding is really useful. I make a &#39;$GOPATH/src/projectname/ahhhhh&#39; (or whatever) where it uses embedding to wrap the original algorithms, and hack them in whatever way is necessary. I&#39;m able to do this without corrupting the original code, so I don&#39;t have to come back to a complete mess. (The exception is that I often have to make more things exported than I would like to enable the hack)</p></li> <li><p>At some point enough is stable that I try to make it a stable package so my future scripts can run off of it </p></li> </ul> <p>I hope that was helpful in seeing how it works for me. Typically somewhere in that process I find something that gonum should have but doesn&#39;t, and I add the functionality there. That list shrinks with every project, both due to my own contributions and the contributions of others. </p></pre>PaluMacil: <pre><p>This is a very useful post on the organic growth of a Go application. As a C# developer with some Python, I feel like this helps me feel confident to jump into a first Go project with just a main, with this type of growth path in mind.</p></pre>aaaqqq: <pre><blockquote> <p>Unfortunately, for some things, Python is soooooo slow</p> </blockquote> <p>For these things, first check whether Go speeds up things sufficiently to warrant changing your &#39;go to&#39; language.</p> <p>Go doesn&#39;t lack in capability but for the stuff that you&#39;re working on, Python has so many helpful libraries that I&#39;m not sure moving everything to Go would be good option (this is just a guess).</p> <p>What you could do is shift the time consuming parts to Go (after checking that Go does offer an improvement of at least an order of magnitude) while continuing to work with the language that you&#39;re comfortable and experienced in.</p> <blockquote> <p>I like that with Python I can open up a blank file, start coding, and figure out my end goal in the process. Can you reasonably do that with Go?</p> </blockquote> <p>Go compiles are very quick. So the experience of running <code>go run file.go</code> is quite similar to running <code>python file.py</code></p></pre>ionrock: <pre><p>I think this type of thinking is valid when you have existing code and can&#39;t migrate away easily, but it sounds like the needs of the author are amenable to rewriting it. While you can certainly profile the code, find the slow parts and see if you can speed up those aspects in some lower level language, that complicates your workflow quite a bit. If you make a change you have to rebuild your package, which might mean redeploying it (even in a virtualenv) in order to pick up changes. Also, debugging things becomes much more complicated. </p> <p>I do think it is more difficult to use Go to replace a scripting language like Python in some ways, but as others have mentioned, you often have a result that is more than fast enough. It also has the benefit of producing single executable, which can save a huge amount of time over deploying virtualenvs or installing via pypi on servers. </p></pre>IWugYouWugHeSheMeWug: <pre><blockquote> <p>For these things, first check whether Go speeds up things sufficiently to warrant changing your &#39;go to&#39; language.</p> </blockquote> <p>For all of my slowest pieces, I think it would. A lot of what I&#39;m doing is basic data structure manipulation. But whereas Python is incredibly slow at concurrent tree traversal, that&#39;s one of the examples on Go&#39;s homepage.</p> <blockquote> <p>Go doesn&#39;t lack in capability but for the stuff that you&#39;re working on, Python has so many helpful libraries that I&#39;m not sure moving everything to Go would be good option (this is just a guess).</p> </blockquote> <p>The script that I&#39;ve been working on the past few days has about 15 packages being imported at the top, and pretty much all of them are low-level utilities like pathlib, csv, itertools, operator, decimal, etc. The high level things have been requests and gzip because I was streaming all 300GB of Google&#39;s Chinese ngrams data and making the format not be shit in nice clean files totalling 2GB. But that could be a Python script feeding data to a Go script!</p> <p>The similarity between compiling Go and running a Python script is also great!</p> <p>I&#39;ll definitely have to look into this more closely!</p></pre>aaaqqq: <pre><blockquote> <p>The script that I&#39;ve been working on the past few days has about 15 packages being imported at the top, and pretty much all of them are low-level utilities like pathlib, csv, itertools, operator, decimal, etc. The high level things have been requests and gzip</p> </blockquote> <p>You should be able to do this stuff using the standard Go libraries. Btw, when I wrote &#39;helpful libraries&#39; I was referring to stuff like scikit-learn, numpy, matplotlib, etc</p></pre>egonelbre: <pre><p>Yes, I don&#39;t have much to add to other comments, other than to use <a href="https://github.com/loov/watchrun/releases" rel="nofollow">watchrun</a> to make the exploring easier. Usually I do this by running <code>watchrun go build -i . ;; myproject input.csv</code> and then it will rebuild/rerun on every save. Or a more minimal approach is to do <code>watchrun go run main.go input.csv</code>, although, I dislike creating lots of temporary executables into TEMP folder on Windows.</p></pre>Fwippy: <pre><p>I&#39;ve got my editor set up with a keybinding to save, build &amp; test (neovim with vim-go), but it&#39;s the same basic idea.</p></pre>egonelbre: <pre><p>Whatever works :)</p> <p>I&#39;ve found terribly annoying setting up editor, moving to another computer with a different editor, different OS and then needing to figure out a new way of doing things. Or, once your command-line programs work, then you find out GUI programs started from editor don&#39;t work as intended, because the &#34;binding&#34; exceeds some timeout etc...</p> <p><code>watchrun</code> was intended as the &#34;as long as you know how to run things from command-line, it just works&#34; approach. I&#39;ve used <code>watchrun</code> for Delphi, C, Java and Python, without much trouble. Alternatively, splitting the terminal and running vet, test, build, run at the same time.</p> <p>Of course, there are still cases where I setup keybindings myself.</p></pre>Justinsaccount: <pre><p>If you are naively comparing 20,000 data points to each other then that is 400,000,000 comparisons. The programming language you are using may help, but sometimes a better algorithm is the real solution.</p> <p>Depending on what you are doing python can easily be faster than Go. For example the python regexp library is written in C and is quite a bit faster than the one in Go. Similar story for python strings vs. go strings. If you benchmark strings.Split in go vs. string.split in python, you&#39;ll see about the same performance. Behind the scenes they are doing basically the same thing. Then there are things like pandas which is full of highly optimized code written in languages other than python. Python libraries are often bindings to C code while go libraries are often native (but for now, slower) go code. If pure python code is too slow, often just running your project with pypy will make it fast enough.</p> <p>For me the nice thing about go is the static typing, lower memory usage, and self contained binaries. Mostly what I have been doing lately is porting some &#39;quick and dirty&#39; python projects to go, making them more robust in the process. I find it easier to prototype in python, where I&#39;ll want to do something like</p> <pre><code>#dedup list foo foo = list(set(foo)) </code></pre> <p>which takes a bit more code in go.</p></pre>IWugYouWugHeSheMeWug: <pre><p>Oh, it was actually 50,000 not 20,000, I confused my numbers.</p> <p>The data points themselves were fairly complex trees. I actually significantly optimized it by storing subtree comparisons so that if a 6 level tree only differed by one leaf, the whole thing wouldn&#39;t need recomputed. But the types of all my variables were pretty much static. A language like Go would allow me to cut out a lot of the extra baggage in the Python stack, plus take advantage of built in concurrency.</p></pre>mwholt: <pre><blockquote> <p>I like that with Python I can open up a blank file, start coding, and figure out my end goal in the process. Can you reasonably do that with Go?</p> </blockquote> <p>Yes.</p> <p>I went through the same process in Go for machine learning and information retrieval programs at university. Start with what you know, and build out from there. I found that Go&#39;s strong type system and data structures (all 2 of them: slices and maps) forced me to clearly formulate my problem as I went along... no monkey business.</p> <p>In fact, Go&#39;s package system is also immensely helpful when you need guidance in how to formulate a project. It&#39;s a pretty loose system, but significantly it doesn&#39;t allow circular imports, and those errors always help me realize where I need a new, smaller package, or one larger package. Since packages correspond to &#34;concepts&#34; they are really a great way to </p> <blockquote> <p>I&#39;m concerned about any initial overhead that might be required at the start of every new project.</p> </blockquote> <p>Initial overhead at the start of a new project is:</p> <pre><code>package main func main() { } </code></pre> <p>Maybe a couple <code>go get</code>s in there, then a <code>go build</code> or, in your case, <code>go run *.go</code> would do.</p> <p>As for speed, my Go code using nearly the same algorithms would run sometimes twice as fast as my classmates who used Python... and mine wouldn&#39;t run out of memory as easily.</p></pre>windjammer13: <pre><p>For the specific case where you have to compare 20,000 values to every other value, how long is the compute time? What types of data structures are you using? I had a similar problem and I changed the data structure to a faster type and improved my algorithm and my comparison went from hours to a seconds.</p></pre>UnknownTed: <pre><p>nim would be a better fit <strong>IMO</strong> it can be run as script and it is really fast</p></pre>Redundancy_: <pre><p>You could also consider Julia, which is supposed to have good python integration.</p></pre>tiberiousr: <pre><p>I often write scratch code in a single file in Go to play with ideas before implementing them in a larger program. It&#39;s totally viable.</p></pre>lasizoillo: <pre><p>Go is good a option.</p> <p>Cython, numpy, numba, ... are other options. For example, <a href="https://spacy.io/" rel="nofollow">spaCy</a> uses cython to make a speedy nlp toolkit usable from python.</p> <p>Maybe you should try both options.</p></pre>awaitsV: <pre><p>i used to use python for the same purpose but then one day i was writing something (deciphering a string that used separate caesar cypher for each word) and the python program took too long, the same program in Go took very less time (python code might have been shitty) from then on i use Go for most stuff, small experiments that can be done in the repl are still in python, network/computation in Go. I even created a Go local package with a ton of helper stuff for commonly used functions.</p> <p>So, yes it can be reasonably be done in Go.</p></pre>IndianAlien: <pre><p>Anything that I need static types for, I use Go. Dynamic typing can be a pain. </p></pre>

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

426 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传