TL,DR: looking for repos/bloggers/projects that focus (usually in "interpreter" contexts I guess) on highest-perf approaches to runtime encoding (then execution) of general-purpose turing-complete computations — whether via stack-based VMs with own minimalist opcode/bytecode 'languages' or other approaches — ie. regardless of source language, just focusing mostly on what internal representation and 'compilation' schemes allow for fastest ultimate evaluation/execution. Specifically in a Go context (because failing that, I can still proceed to easily locate stuff in other-langs/lang-independent landscapes if need be).
I'm currently implementing from papers some early-90s FP VMs but it occurs to me that the FP aspect doesn't matter too much for the part where you want to (ponder how to most optimally) encode (to then perform fast-ish) 'computation in general' at runtime.
What I'm talking about is 'compiling' into an in-memory 'machine' state that can then (by very simple state transition rules) run fairly fast until termination (if any), eg. separating one-time transformations/validations/etc tasks from the actual computations, aka compilation (just to memory, not to an executable binary file).
Do any of you know of some 'advanced' projects in Go that (by their real-world nature, user base size, or project maturity) go beyond "naive eval traversal of some AST" / interpreterbook / the ubiquitous stuff that advertises itself as "demo to showcase basics focusing on readability/user-friendliness not performance"? (All useful stuff, just not what I'm looking for right here, right now.)
To better gauge the area where I'm looking to learn new tricks, let's say this is my "byte-code IL":
Speed of compilation-to-it (gm-compile.go) is not the most pressing issue, vs. execution starting in (and everything below) line #34 https://github.com/metaleap/go-corelang/blob/8a5ae65b1393140d41aa956aa1c2711bb558ac1c/impl-02-gmachine-mark6/gm-basics.go#L34
I tried a jump-table L100-106 based on anecdotes I read about big-switch-blocks, but that slowed down execution a fair bit (approx. doubling the time cost) — even now all the instructions in separate methods was a slowdown vs. an earlier all-instructions-inlined-in-the-one-big-switch-block in step()
.
In the end I'll probably go for a more minimalist IL and one more general-purpose / less graph-reduction-focused/inspired, and am sure I'll keep running into opportunities-galore to reduce compile-time generation of redundant instructions. But on a slightly higher level, I'm wondering if there have been efforts (ideally by doers tinkerers & hackers vs. PhD for comprehensibility reasons & metrics priorization) to evaluate whether a stacks+heap-based VM is really the most optimal design for "runtime-generated executable code generation", blogging-or-FOSSing enthusiasts probing various ideas from the design space and coming to conclusions or discovering tricks that would be relevant to me here. Heard of any? (I've only run into stuff that looks no less messy, perf-insensitive and "first throw" than my own stuff during my initial look-arounds.)
Last note, I'm sure lots of relevant stuff abounds in languages that wouldn't be very applicable/transferable to the Go backdrop here, especially in even lower-level languages — I reckon I can later on as the last resort utilize unsafe
for the last-mile perf squeeze (I'm thinking omitting bounds checks the various stacks and instruction-stream slices etc) — but for now I'm interested in slightly higher("just above pointer-arith or C/asm-level shenanigans")-level performance considerations / investigations / benches / ponderings — cheers guys, thanks in advance for sharing what you know of that I should, too!
I should add that my benchmark is simply factorial 15
essentially. This is about 1000-10000x slower (ever-varying) via the VM than compiled Go (ensuring of course that the say 15 is never constant to the compiler) — ie. the difference between 150-200ns and "anywhere between 200 and 2000 µs". I'd rather hope & aim for an at-worst 10x slow-down, rather than 1000-10000x of course am doing too many allocations still, I'm sure — the
instruction
shouldn't be such a big struct either — will also manually inline step
into eval
obviously — just looking as a side-track for other prior work out there playing in very similar sandboxes..
As a general rule I'll never want to emit actual physical-machine code, assembly or LLVM byte-code or anything-like-that, so really-not-married to the notion of "VMs / state-transition machines emulating stack-based machines" at all. But other there any other?
评论:
sbinet:
a few pointers:
- https://github.com/glycerine/zygomys
- https://github.com/mattn/anko (used by buffalo)
- https://github.com/go-interpreter/wagon (not used much (yet?))
Most promising, thx! Let's not forget your own that I just happened upon: https://github.com/sbinet/go-eval ;)
