Golang Workflow Library

What is a golang library that allows you to locally express a series of steps. Each step is self contained with its inputs and outputs that are consumed/produced into the workflow instance context. Should ideally support baked in retries and multiple instances running at the same time. At this point I don’t need it to run distributed across machines and also don’t need it to persist its state. I don’t want to hardcode the logic as it’s possible is going to change, and I would also want the flexibility to rewire the logic at will. Bonus points if it supports some sort of DSL that allows you to express the logic in a yaml/json config file. Looked around but what I’ve found is either super heavy (cadence from uber) or super early and poorly tested (go flow) Any pointers to a somewhat better library that does all/some of the above? <hr/>**评论：** jerf: <pre>One of the reasons that you might not be able to find much out there for Go is that if you're just looking at this from a programming point of view, it's really easy to get an <a href="https://en.wikipedia.org/wiki/Inner-platform_effect" rel="nofollow">inner platform effect</a>-laden code base going. It is not entirely incorrect to say that Go is already a language that allows you to do this, in simple native Go itself. I'd want to see a serious, manifested example of something that you can't do in Go before I get too excited about something that adds a layer of other obstruction, but also still needs Go to be written. I say this as someone who has written a library for my own use that does this, as a sort of research project. My excuses were a desire to resource-bound the pipeline by controlling how many instances of each node were running and to get debugging visibility into it, and I'm still seriously considering just ripping it out, because it adds all this paperwork around each "node" vs. simply just doing the work in Go, and I'm not sure whether this is a net gain or not yet. And this is true even as I'm aware of the inner-platform effect and am generally working to avoid it. I'd say if you are going to go this route, the touchstone I'd look for is to look for an "if" statement in your outer layer of logic code. The moment you're writing "if the type of the message is this, send it here, otherwise send it there" in your JSON/YAML file, you know you've gone too far. That should be in Go code itself. Or if you write "if the message is successful, do this, otherwise do that", you've taken a general purpose programming language with general purpose capabilities and reduced it down to whatever tiny little thing someone half-heartedly implemented into a template file... and it's probably more verbose than just writing it in Go in the first place! You're not winning if you replace ten lines of Go with pages and pages of YAML or JSON specification. I speak as one who has at times made this error. Also, just as a general comment, if you really want to grok just how hard reusability of code is, this sort of workflow component system will really stick in your face and rub it on your nose, if you pay attention to what it is telling you. I'm not sure I've ever seen a component get seriously reused in such a system. The level at which I have seen things reused (little things like map functions and some of the monad niceties), Go is basically incapable of doing, because it doesn't do that sort of micro-scale code reuse the way something like Haskell can. Also, this is just about programming. If you're using workflows that involve humans, that's a whole different domain.</pre>samuellampa: <pre>Definitely second about those DSLs re-inventing programming languages over and over. Perhaps it makes sense in certain cases, but one has to consider that to make developing in a new DSL productive, one will need new syntax highlighting schemes, autocompletion support and code linters and verificators, and what have you ..., which is already available for existing languages. This is one reason why we went with a pure-Go API as default in SciPipe. The Go tooling is just too great not to use. The Yaml plans we've had, has been thought to generate scipipe Go-code, so that you could continue from there if you'd have any more complex use cases - like starting to include "ifs" here and there, as you refer to :) Also second about how hard it is to make re-usable components in Go. Lack of generics really shows its consequences here. What we've found helps, is to use a lot of struct-embedding, to create poormans-generic base components, that work with string->struct maps to keep track of things, and then embed those in outer structs when certain accessor methods need to be made more explicit (naturally help for things like autocompletion, testing, and documentation). We have documented this to some extent here: <a href="http://scipipe.org/howtos/reusable_components/" rel="nofollow">http://scipipe.org/howtos/reusable_components/</a> , with an example here: <a href="https://github.com/scipipe/scipipe/blob/master/examples/wrapper_procs/wrap.go" rel="nofollow">https://github.com/scipipe/scipipe/blob/master/examples/wrapper_procs/wrap.go</a> but we also figured we could use the same strategy for the inner components in scipipe, so now, the "BaseProcess" is such a poormans "generic" base component (<a href="https://github.com/scipipe/scipipe/blob/master/baseprocess.go" rel="nofollow">https://github.com/scipipe/scipipe/blob/master/baseprocess.go</a>), which is embedded into the "Process" component, which adds a few more methods which are more specific and concrete (<a href="https://github.com/scipipe/scipipe/blob/master/process.go" rel="nofollow">https://github.com/scipipe/scipipe/blob/master/process.go</a>). This lets us put generic, re-usable code in BaseProcess as much as possible, and re-use that in other specialized process types pretty easily. This seems to work great, and i'm only now scratching my head about how to do the same with the IP ("data packet") components in <a href="https://github.com/scipipe/scipipe/blob/master/ip.go" rel="nofollow">https://github.com/scipipe/scipipe/blob/master/ip.go</a> . That has turned out to be slightly harder because channels seem somewhat extra non-generic, tending to pollute everything that touches them into being locked to the channel's specific type. I have some ideas (collected in <a href="https://github.com/scipipe/scipipe/issues/45" rel="nofollow">https://github.com/scipipe/scipipe/issues/45</a> ), but any feedback and ideas around this is highly appreciated.</pre>jerf: <pre><blockquote> That has turned out to be slightly harder because channels seem somewhat extra non-generic, tending to pollute everything that touches them into being locked to the channel's specific type. </blockquote> In the thing I wrote, I gave up and use reflect to wire up the graph with the specific types, but I did also notice how hard it is to have one thing emitting, say, <code>[]byte</code> and something else that wants to receive an <code>io.Reader</code>. You'd love to be able to match those up with the obvious byte.Buffer wrapper, but the only way I've come up with being able to do it is to spin up an entire freaking goroutine for the conversion, which at scale is quite expensive, or lose the syntactic select. I stopped mucking around before I got to this point, but I did find myself wondering what the performance penalty of <a href="http://127.0.0.1:9000/pkg/reflect/#Select" rel="nofollow">reflect.Select</a> would be. If you could use something like that without too much penalty, then I can imagine how to put the conversion behind the scenes into library code that would use a smidge of reflect behind the scenes. It's possible you could get the general case down to some decent API (most of the time you're going to send and receive defined things along the defined connections, it might be that the common case doesn't look too ugly), while still leaving people the ability to add their own select clauses in a pinch. It's grotty-but-straightforward code on the other side. It isn't compile-time type safe, but it is "will either fail the first time or succeed always", which is still better than nothing. (It's really easy to catch those in unit tests, which is not quite the same thing as a compiler, but anyone serious really ought to have a pre-commit hook to run all their unit tests which makes it almost as good.)</pre>samuellampa: <pre>Thanks a lot for sharing!</pre>chuhnk: <pre>Actually looking to build this out from scratch for <a href="https://github.com/micro/go-micro" rel="nofollow">https://github.com/micro/go-micro</a>. Feel free to join the Slack and discuss <a href="http://slack.micro.mu/" rel="nofollow">http://slack.micro.mu/</a>.</pre>bjwschaap: <pre>Did you check out Conductor? (<a href="https://netflix.github.io/conductor/" rel="nofollow">https://netflix.github.io/conductor/</a>), it's not a Golang lib, but it might be interesting to use stand-alone.</pre>TextileWasp: <pre>Conductor seems to be a bit on the heavy side for what I am looking for. Will play with it and see if I can make it work.</pre>samuellampa: <pre>You might be interested to have a look at <a href="http://scipipe.org" rel="nofollow">http://scipipe.org</a> It supports separate, named ports, concurrently running processes, and can easily be rewired at will by a high level connection syntax. It has taken some inspiration from GoFlow, but does not depend on reflection, but instead uses channels directly between ports underneath, and also has a somewhat more terse syntax. (A yaml-based syntax has been sketched up ... we think we have a good idea of how to make it, so it can support 90% of workflow constructs, but haven't got to implement it yet). It also helps workflow design by allowing connecting one out-port to many in-ports (thus doing automatic duplication of data packets ... which is fine, since they are mostly immutable in scipipe anyway). It is right now pretty focused on components (processes) running shell commands, but custom components can be written in Go as well, by plugging in an anonymous func() in the proc.CustomExecute field, or by writing completely custom Go components (then you'll have to do a bit more work on reading and sending to in- and out-ports in the correct way). This is all documented on the website. Another caveat is that scipipe is right now heavily based on using the local file system as the persistance layer, and coordination point. This is planned to be made more flexible in the future. 1.0 is not (yet) reached, but we use it daily to run workflows locally and on HPC clusters (using the proc.Prepend field to prepend the HPC resource manager command to shell commands), and plan to reach 1.0 before the summer. Would love to hear your feedback on it, and what use cases you have. A few more heavy alternatives are: <ul> <li>Pachyderm (<a href="http://pachyderm.io" rel="nofollow">http://pachyderm.io</a>)</li> <li>ReFlow (<a href="https://github.com/grailbio/reflow" rel="nofollow">https://github.com/grailbio/reflow</a>) </li> <li>AWE (<a href="https://github.com/MG-RAST/AWE" rel="nofollow">https://github.com/MG-RAST/AWE</a>)</li> </ul> We made a comparison of these four mentioned ones at: <a href="http://gopherdata.io/post/more_go_based_workflow_tools_in_bioinformatics/" rel="nofollow">http://gopherdata.io/post/more_go_based_workflow_tools_in_bioinformatics/</a></pre>TextileWasp: <pre>Thanks Actually found scipipe while researching this before asking the question. I still need to play with it, but what was a bit discouraging when I saw it first was the file oriented nature of the processing (it makes sense for the types of problems it was build to handle). Will take a closer look again.</pre>samuellampa: <pre>Indeed. I'm actually now regretting a bit making the file-based nature so inherent to the design so early. I'm since a few months looking at ways to refactor it out to be more generic. It takes a lot of thinking though, because of Go's characteristics. I recently managed to make other core components of the system (processes and tasks) built on a really generic core, using a nice pattern that have crystallized, so I have high hopes to solve it in the near future.</pre>fridder: <pre>Would your Flowbase project be more applicable? <a href="https://github.com/flowbase/flowbase" rel="nofollow">https://github.com/flowbase/flowbase</a></pre>samuellampa: <pre>It would indeed (thanks for pointing that out). The main caveat now is that it is more of a pattern than a full library (but the pattern works great indeed), but also that the core of scipipe has in fact improved to become superiour in many aspects to flowbase. I'm planning to break out these improvements into an improved version of flowbase, which then scipipe can depend on hopefully, but haven't got to do it yet (so little time and so many distractions while pursuing a phd).</pre>fridder: <pre>I'd be open to helping with that as I am diving headlong into go to potentially replace an ETL-ish system</pre>samuellampa: <pre>That'd be awesome! Maybe a github issue on the flowbase repo could be a great place to start the thinking. Otherwise it is the "BaseProcess" and possibly "BaseTask" structs that specifically thought to break out. Now, there's a question of how much of the "automagic" included in scipipe to port to flowbase (such as using port structs instead of using plain channels as fields on the process structs). Perhaps experimenting a little and trying things out on real use cases will be the only way to get a good answer to that...</pre>jeffail: <pre>Hard to say without knowing specifically what your steps consist of but <a href="https://github.com/Jeffail/benthos" rel="nofollow">https://github.com/Jeffail/benthos</a> is a service for multiplexing different sources and sinks and performing common message streaming tasks. It exposes metrics and logging and performs retries internally. It's configured in either json or yaml and lets you write a list of processing steps for each input/output such as content based multiplexing, filtering messages (as text or json), mutating messages (as text or json), batching, splitting, (de)compressing, (un)archiving, etc. We use it at work as a swiss army knife for building platforms.</pre>jbendotnet: <pre>I have no experience of these beyond reading source/posts here... <a href="https://github.com/contribsys/faktory" rel="nofollow">https://github.com/contribsys/faktory</a> <a href="https://github.com/fireworq/fireworq" rel="nofollow">https://github.com/fireworq/fireworq</a></pre>TextileWasp: <pre>Much appreciated</pre>fridder: <pre>You could, kinda, re-work Mage to do this: <a href="https://magefile.org" rel="nofollow">https://magefile.org</a> I have been thinking of this issue as well as I do have some ETL-ish tasks that I need to run to populate a heavily normalized db. </pre>ui7_uy8: <pre>You didn't give an example of the product you are looking for that exists in other languages. It would be easier than let people guess what exact features you are looking for in a workflow library.</pre>letle: <pre>Looks like you want actor model</pre>samuellampa: <pre>Thinking now ... could either of glow or gleam be something? (It supports distributed mode, but seems simple enough to use that they might be worth it, even without using the distribution): <a href="https://github.com/chrislusf/glow" rel="nofollow">https://github.com/chrislusf/glow</a> <a href="https://github.com/chrislusf/gleam" rel="nofollow">https://github.com/chrislusf/gleam</a> Haven't tried them, but they seem popular. I know them from Chris' GopherAcademy posts: <a href="https://blog.gopheracademy.com/advent-2015/glow-map-reduce-for-golang/" rel="nofollow">https://blog.gopheracademy.com/advent-2015/glow-map-reduce-for-golang/</a> <a href="https://blog.gopheracademy.com/advent-2016/gleam-distributed-map-reduce-for-golang/" rel="nofollow">https://blog.gopheracademy.com/advent-2016/gleam-distributed-map-reduce-for-golang/</a></pre>

用户登录

今日阅读排行

一周阅读排行

最新主题