Go (Golang) 的工作流系统,受
Flow-based Programming 启示。
# SciPipe
SciPipe is an experimental library for writing [scientific Workflows](https://en.wikipedia.org/wiki/Scientific_workflow_system) in vanilla [Go(lang)](http://golang.org/). The architecture of SciPipe is based on an[flow-based programming](https://en.wikipedia.org/wiki/Flow-based_programming) like pattern in pure Go as presented in [this](http://blog.gopheracademy.com/composable-pipelines-pattern) and [this](https://blog.gopheracademy.com/advent-2015/composable-pipelines-improvements/) Gopher Academy blog posts.
UPDATE June 23, 2016:
See also [slides from a recent presentation of SciPipe for use in a Bioinformatics setting](http://www.slideshare.net/SamuelLampa/scipipe-a-lightweight-workflow-library-inspired-by-flowbased-programming).
An example workflow
Before going into details, let's look at a toy-example workflow, to get a feel for what writing workflows with SciPipe looks like:
<pre box-sizing:="" font-family:="" liberation="" font-size:="" margin-top:="" margin-bottom:="" font-stretch:="" line-height:="" word-wrap:="" padding:="" overflow:="" background-color:="" border-radius:="" word-break:="">package mainimport (
sp "github.com/scipipe/scipipe")func main() { // Initialize processes
foo := sp.NewFromShell("foowriter", "echo 'foo' > {o:foo}") f2b := sp.NewFromShell("foo2bar", "sed 's/foo/bar/g' {i:foo} > {o:bar}") snk := sp.NewSink() // Will just receive file targets, doing nothing
// Add output file path formatters for the components created above
foo.SetPathStatic("foo", "foo.txt")
f2b.SetPathExtend("foo", "bar", ".bar") // Connect network
f2b.In["foo"].Connect(foo.Out["foo"])
snk.Connect(f2b.Out["bar"]) // Add to a pipeline runner and run
pl := sp.NewPipelineRunner()
pl.AddProcesses(foo, f2b, snk)
pl.Run()
}</pre>
... and to see how we would run this code, let's assume we put this code in a file `myfirstworkflow.go` and run it. Then it can look like this:
<pre box-sizing:="" font-family:="" liberation="" font-size:="" margin-top:="" margin-bottom:="" font-stretch:="" line-height:="" word-wrap:="" padding:="" overflow:="" background-color:="" border-radius:="" word-break:="">[samuel test]$ go run myfirstworkflow.go
AUDIT 2016/06/09 17:17:41 Task:foowriter Executing command: echo 'foo' > foo.txt.tmp
AUDIT 2016/06/09 17:17:41 Task:foo2bar Executing command: sed 's/foo/bar/g' foo.txt > foo.txt.bar.tmp</pre>
As you see, it displays all the shell commands it has executed based on the defined workflow.
### Benefits
Some benefits of SciPipe that are not always available in other scientific workflow systems:
*
Easy-to-grasp behaviour:
Data flowing through a network.
*
Parallel:
Apart from the inherent pipeline parallelism, SciPipe processes also spawn multiple parallel tasks when the same process has multiple inputs.
*
Concurrent:
Each process runs in an own light-weight thread, and is not blocked by operations in other processes, except when waiting for inputs from upstream processes.
*
Inherently simple:
Uses Go's concurrency primitives (go-routines and channels) to create an "implicit" scheduler, which means very little additional infrastructure code. This means that the code is easy to modify and extend.
*
Resource efficient:
You can choose to stream selected outputs via Unix FIFO files, to avoid temporary storage.
*
Flexible:
Processes that wrap command-line programs and scripts can be combined with processes coded directly in Golang.
*
Custom file naming:
SciPipe gives you full control over how file names are produced, making it easy to understand and find your way among the output files of your computations.
*
Highly Debuggable(!):
Since everything in SciPipe is plain Go(lang), you can easily use the [gdb debugger](http://golang.org/doc/gdb) (preferrably with the [cgdb interface](https://www.youtube.com/watch?v=OKLR6rrsBmI) for easier use) to step through your program at any detail, as well as all the other excellent debugging tooling for Go (See eg [delve](https://github.com/derekparker/delve) and [godebug](https://github.com/mailgun/godebug)), or just use `println()` statements at any place in your code. In addition, you can easily turn on very detailed debug output from SciPipe's execution itself, by just turning on debug-level logging with `scipipe.InitLogDebug()` in your `main()` method.
*
Efficient:
Workflows are compiled into static compiled code, that runs fast.
*
Portable:
Workflows can be distributed as go code to be run with the `go run` command or compiled into stand-alone binaries for basically any unix-like operating system.
## [
](https://github.com/scipipe/scipipe#known-limitations)Known limitations
*
There is not yet a really comprehensive audit log generation. It is being worked on currently.
*
There is not yet support for the [Common Workflow Language](http://common-workflow-language.github.io/), but that is also something that we plan to support in the future.
### Connection to flow-based programming
From Flow-based programming, SciPipe uses the ideas of separate network (workflow dependency graph) definition, named in- and out-ports, sub-networks/sub-workflows and bounded buffers (already available in Go's channels) to make writing workflows as easy as possible.
In addition to that it adds convenience factory methods such as `scipipe.NewFromShell()` which creates ad hoc processes on the fly based on a shell command pattern, where inputs, outputs and parameters are defined in-line in the shell command with a syntax of `{i:INPORT_NAME}` for inports, and `{o:OUTPORT_NAME}` for outports and `{p:PARAM_NAME}` for parameters.