golang - compile static files in app?

blov · 2017-04-22 06:00:18 · 679 次点击

这是一个分享于 2017-04-22 06:00:18 的资源，其中的信息可能已经有所发展或是发生改变。

Is anyone compiling static files like jpg and css to prevent having to run a http.FileServer(). The benefits would be portability and disk i/o latency.

If so, examples? This would be extremely useful while running single golang binaries in docker.

Cheers!

评论：

sh41:

I do. Primarily to have convenient binaries that are fully self-contained, easy to ship and run.

I've created and use vfsgen for this:

https://github.com/shurcooL/vfsgen

It creates an implementation of http.FileSystem for you, which is compatible with http.FileServer and many other things. It's very general.

Also see alternatives section in README, I've tried to list all known similar projects.

PsyWolf:

https://github.com/jteeuwen/go-bindata is simple and popular
https://github.com/GeertJohan/go.rice has fancier features
There are several other options

nicogogo:

I would not use go-bindata, it's been abandoned for some time now.

HarveyKandola:

We do exactly that using go-bindata.

You can poke around our code and build scripts to see what we do:

https://github.com/documize/community

UnknownTed:

https://github.com/UnnoTed/fileb0x

touristtam:

go-bindata seems to be the most popular lib out there.

readonly123:

The downside is memory pressure for assets which aren't necessarily used often. And you totally bypass the ability of webservers (which are extremely good at this) to cache them, much less using a cdn

There is basically zero advantage to doing this in a docker container, either, since you can ship the entire application as one container. That's the point. Or have microservices (or a single service, whatever) in go to handle dynamic elements. Linux's disk caching is very good about keeping frequently used files in buffer.

Consider breaking your static assets out to another container entirely (or a volume mapped to an nginx container). Use kubernetes or docker-swarm to orchestrate

epiris:

In general I think most your points are fair, but you should also take into consideration the number of assets, byte size and how often they change. When those numbers are small, it may make sense.

readonly123:

It makes even less sense then, since hosting static assets with a webserver will be even more memory efficient, and your (mostly irrelevant) concerns about i/o go away

Plus, you then don't need to redeploy an entire container at all. Just update the assets on a volume mapped to a pod or to the webserver itself.

This is a solved problem. http.FileServer() is not the right way. nginx is

epiris:

It makes less sense to measure the advantages to an engineering problem and take on additional infrastructure by default? How could serving a few static files that rarely change from the binary be better than requiring a Nginx server. More net flows, file system boundaries, you are adding more failure points to monitor and maintain.

Point is, measure, do what makes sense. Patterns that work against you surface early and it's much easier to add new technology than it is to turn it off.

readonly123:

Your original question was opaque. But consider you're already talking about requiring docker, which is extra infrastructure and a failure point. One which requires a complete redeployment of your container if you want to tweak CSS. Which one is worse? Adding a second container for nginx (and architecting your application to scale horizontally from the start by turning the golang bits into a microservice), which lets you tweak disk buffer sizes, pipelining, max connections, standardized logging which can be trivially consumed by other tooling, etc. Or "I want to move this div, let's rebuild the binary, rebuild the container, and redeploy"?

For standardized logging/tooling alone, I'd pick caching nginx

Your whole question is basically "should I pursue this incredibly premature optimization which doesn't follow any patterns/antipatterns and provides no introspection"?

epiris:

I had no original question. I merely said your points where fair and added for OP it may make sense if the work required to pivot would be minimal. Experiencing these things that pry at your curiosity is what allow you to form strong educated opinions like you have here. Happy coding.

readonly123:

Sorry, I thought you were OP for some reason...

tscs37:

The (usually) tiny bit of added memory usage from a few static assets is far outweighed by the vastly faster access and lower latency. You don't ever have to hit the disk, it's already loaded and ready to rock.

Can't beat not having to hit the disk at all.

lluad:

If you're reading from your own binary then the page that the data is in is either in memory already (in your address space), or needs to be loaded from disk.

If you're reading from the filesystem then the page that the data is in is either in memory already (in the filesystem cache), or needs to be loaded from disk.

There are reasons to embed files in some cases, but they're not magically "in ram" or "faster to access".

readonly123:

To be fair, the OP didn't give any indication at all of how many assets he was talking about.

But "vastly faster access" and "lower latency" is frankly stupid when you're talking about access over the network. Round-trip is already high enough, plus page render time. Unless you're trying to game the number of requests per second you can hit, the loss of metrics (because your application probably isn't logging successful/failed requests, and if it is, it's definitely not in a format that can be consumed with zero effort by graphite, plus then you're touching the disk anyway).

Plus, you are then relying on go stdlib to do the right thing with a complete clusterfuck (web browsers). It wasn't long ago that there were 3 major issues with net/http's implementation of http pipelining. A problem web servers don't have

tscs37:

But "vastly faster access" and "lower latency" is frankly stupid when you're talking about access over the network. Round-trip is already high enough, plus page render time.

Network latency can be as low as 0.100ms for a local intranet site, a harddisk may achieve average latency of around 2ms when you're not hitting it's caches.

Round-trip is already high enough

Depends on the network.

plus page render time

The page render time is of no concern to this, if the server can finish the request sooner rather than waiting for the disk, we can close the network stream sooner and handle another connection.

Plus, you are then relying on go stdlib to do the right thing with a complete clusterfuck

You're relying on stdlib to pipe a bytestream over network with minimal headers if you're using ServeContent.

A problem web servers don't have

Nginx or Apache will introduce an array of dependencies, memory usage and disk IO usage that you simply don't have by hitting only files in memory.

Unless you're website exceeds 20MB in assets, which it honestly shouldn't for any reason IMO, you will probably not see any reduction in memory pressure compared to backing in the assets to the binary.

It wasn't long ago that there were 3 major issues with net/http's implementation of http pipelining.

That is relevant how?

readonly123:

Network latency can be as low as 0.100ms for a local intranet site, a harddisk may achieve average latency of around 2ms when you're not hitting it's caches.

Again, caching in webservers is a solved problem. The first user may hit 2ms per file (though it's extremely unlikely). Subsequent users will be served straight from cache.

Depends on the network.

It does, but that's not relevant. 'Network access is the slowest part of the stack' has been accepted wisdom for over a decade.

The page render time is of no concern to this, if the server can finish the request sooner rather than waiting for the disk, we can close the network stream sooner and handle another connection.

If were's talking about turnaround and visibility to users, it is of concern to this. Page render time is gonna be longer than 2ms. Is going from 2ms to .1ms a worthwhile tradeoff versus configuring the asset caching without modifying/redeploying a binary, lack of standardized logs, etc when that 1.9ms is less than a blip to the end user?

You're relying on stdlib to pipe a bytestream over network with minimal headers if you're using ServeContent.

I meant stdlib handling pipelining properly with a mess of browsers. For example. 'Is your old, corporate-mandated browser buggy and slow? Let's show a message, and pipelining won't work'.

Nginx or Apache will introduce an array of dependencies, memory usage and disk IO usage that you simply don't have by hitting only files in memory. Unless you're website exceeds 20MB in assets, which it honestly shouldn't for any reason IMO, you will probably not see any reduction in memory pressure compared to backing in the assets to the binary.

He's talking about packaging JPGs. Adding nginx as a separate docker container basically adds nothing to depdendencies given that he's already talking about docker. The memory usage is trivial, dependencies don't matter (docker), and the disk i/o is primarily logging, which you'd want.

It also makes it easier to scale it out later.

That is relevant how?

Because pipelining is substantially faster than 1 request per file.

tscs37:

Again, caching in webservers is a solved problem. The first user may hit 2ms per file (though it's extremely unlikely). Subsequent users will be served straight from cache.

That is only true if the entirety of the assets fit in cache and you're the only user with a single application on the entire server, which for any real world application does not hold true.

Once the file is evicted from cache, you're back to disk latency.

It does, but that's not relevant. 'Network access is the slowest part of the stack' has been accepted wisdom for over a decade.

And it's barely true anymore, on a modern gigabit or above network the slowest part of the stack is more likely to be the disk.

If were's talking about turnaround and visibility to users, it is of concern to this

Page Render time is irrelevant. It happens in the browser, not server.

The server doesn't care once the connection is closed or idle. It's not about shaving off time on the user side either.

I meant stdlib handling pipelining properly with a mess of browsers. For example. 'Is your old, corporate-mandated browser buggy and slow? Let's show a message, and pipelining won't work'.

Your example seems to only include a POST request, which will most likely not be issued most of the time for a static asset website, that's mostly GET requests.

(edit) The issue you mention has been fixed, so it's irrelevant anyway.

Adding nginx as a separate docker container basically adds nothing to depdendencies given that he's already talking about docker.

Adding nginx as a dependency adds nginx as dependency, period.

And nginx has it's own set of dependencies.

And it uses CPU time.

Because pipelining is substantially faster than 1 request per file.

That is not what I was asking about it's relevance.

If Pipelining is broken, that is not relevant to wether it's better to use in-memory bytestreams or nginx to server static assets.

I simply don't understand why you would want to introduce a whole application stack including usage of the entire filesystem stack when you could use a simple in-memory byte stream that simply doesn't use any of that.

Sending an in-memory bytestream is just inherently faster, even considering caches, than reading a file. Reading a file involves making the appropriate syscalls and waiting for the kernel to decide wether the file is in cache or to read it from disk, getting the memory mapped file, potentially allocating a buffer in memory and then piping it out to the network.

Reading the in-memory bytestream means wrapping it in bytes.NewReader and io.Copy if you're going for the simplest solution, something that can be done without entering kernelspace at all and using no additional memory.

It is simply objective faster to do than reading from disk.

The question you should actually consider is wether the binary gets too big for memory, at which point you'll be hitting swap which will be slower than reading files.

readonly123:

That is only true if the entirety of the assets fit in cache and you're the only user with a single application on the entire server, which for any real world application does not hold true.

OP's original question is about 'some JPG and CSS files', so this is true.

It's also true if you have a caching nginx server as part of your application deployment with kubernetes or docker-swarm, since it won't be shared.

Disk latency is not a problem until your cache hits more than 50MB (for nginx, assuming the disk isn't busy enough otherwise that the kernel won't hold onto it, which it probably will). But hey, that's trivially configurable by anyone who can google it.

And it's barely true anymore, on a modern gigabit or above network the slowest part of the stack is more likely to be the disk.

Gigabit doesn't affect latency. But your argument here seems to be predicated on intranet access, not geographic latency and making hops through various ISPs. We don't know that's true. At all.

Page Render time is irrelevant. It happens in the browser, not server.

Page render time is a critical component of responsiveness metrics. It's never irrelevant.

The server doesn't care once the connection is closed or idle. It's not about shaving off time on the user side either.

You're assuming something OP didn't say.

Your example seems to only include a POST request, which will most likely not be issued most of the time for a static asset website, that's mostly GET requests.

(edit) The issue you mention has been fixed, so it's irrelevant anyway.

It was the first thing I hit on a trivial Google search. But 'a purpose, built application which does nothing but handle edge cases doesn't matter if I can cross my fingers for stdlib' is shortsighted.

Adding nginx as a dependency adds nginx as dependency, period. And nginx has it's own set of dependencies. And it uses CPU time.

None of which matter. At all. In a docker container, the base OS for his golang app also has its own set of dependencies. And the docker daemon takes CPU time. The horror!

That is not what I was asking about it's relevance. If Pipelining is broken, that is not relevant to wether it's better to use in-memory bytestreams or nginx to server static assets.

It is still relevant, because pipelining is much less likely to be broken on nginx.

I simply don't understand why you would want to introduce a whole application stack including usage of the entire filesystem stack when you could use a simple in-memory byte stream that simply doesn't use any of that.

I'm not talking about that. OP is, when talking about 'running single application binaries in Docker'.

It's also because I run a large-scale project for Red Hat, and stuff like 'horizontal scaling', 'configurability', 'ease of redeployment', and 'log introspection'; are actually relevant things, as they should be for you, also.

Sending an in-memory bytestream is just inherently faster, even considering caches, than reading a file. Reading a file involves making the appropriate syscalls and waiting for the kernel to decide wether the file is in cache or to read it from disk, getting the memory mapped file, potentially allocating a buffer in memory and then piping it out to the network.

A couple hundredths of a millisecond, but the tradeoffs are worthwhile.

Reading the in-memory bytestream means wrapping it in bytes.NewReader and io.Copy if you're going for the simplest solution, something that can be done without entering kernelspace at all and using no additional memory.

Assuming that variable instantiation takes no additional memory. And assuming you even need to worry about what the kernel is doing.

You seem to think that using stdlib is inherently better than trusting nginx and the kernel (which have teams dedicated just to caching performance) is automatically better. I don't see why.

It is simply objective faster to do than reading from disk.

You're throwing away the experience and wisdom of the largest sites on the internet based on spitballing arguments.

Adding an additional layer is a tradeoff which is marginally slower. But, like my original assertion, that tradeoff carries with it ease of management, deployment, compatibility, and scaling. For an almost immeasurably small different in response time, you get (essentially) everything. That's the point.

The question you should actually consider is wether the binary gets too big for memory, at which point you'll be hitting swap which will be slower than reading files.

There's basically no point at which a golang binary should ever be large enough to be swapping, so that doesn't matter.

RenThraysk:

But as static CSS/JS should be compressed, you have 3 times (plain, gzip, brotli) versions of the same file?

readonly123:

I'm not even sure what you mean here, or why you wouldn't be running it through uglify first and letting nginx handle gzip if the client requests it

RenThraysk:

So you run static assets through uglify, but then don't run them through gzip or brotli? And then use something like ngx_brotli static module to save nginx having to compress on demand.

readonly123:

Honestly, I run everything through webpack with plugins to handle compression, then let nginx serve it and figure it out and serve (either gzipped files which are deflated on the fly if needed or straight gzip)

Since brotli isn't in the packaged version of nginx, I don't use it, but the product I work on is much more invested in horizontal scalability across kubernetes than strictly "maximum throughput from a single nginx instance"

earthboundkid:

Plug for https://github.com/carlmjohnson/monterey-jack here.

readonly123:

Why not just use webpack or gulp? No reason to use a weird stack

earthboundkid:

Because I want to use Go. :-)

carsncode:

Ew, NodeJS. Why introduce a dependency on that garbage to a perfectly good Go project?

readonly123:

Because it's the web. You're probably writing JS anyway. And there's not a good reason to reinvent the wheel when actively-developed, industry-standard tooling exists

earthboundkid:

Depends on the scenario. I agree that serious web dev nowadays will require webpack, babel, React/whatever, Sass, etc., so you may as well just handle stuff there, but for a simple project you might not need all that stuff or want to figure out how to configure it.

入群交流（和以上内容无关）：加入Go大咖交流群，或添加微信：liuxiaoyan-s 备注：入群；或加QQ群：692541889

679 次点击

加入收藏微博

nginx

github

docker

web

0 回复

暂无回复

添加一条新回复（您需要登录后才能回复没有账号？）

请尽量让自己的回复能够对别人有帮助
支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
支持 @ 本站用户；支持表情（输入 : 提示），见 Emoji cheat sheet
图片支持拖拽、截图粘贴等方式上传

golang - compile static files in app?

用户登录

今日阅读排行

一周阅读排行

最新主题