Golang graceful restart: best approach?

polaris · 2015-03-28 10:10:11 · 2259 次点击

这是一个分享于 2015-03-28 10:10:11 的资源，其中的信息可能已经有所发展或是发生改变。

Hello Reddit,

I stumbled upon 2 different approaches to the graceful restart problem: http://blog.scalingo.com/post/105609534953/graceful-server-restart-with-go and http://grisha.org/blog/2014/06/03/graceful-restart-in-golang/

According to Grisha's comment here: https://news.ycombinator.com/item?id=8773681 scalingo's method has flaws.

Are there any downsides to Grisha's approach?

I'm developing locally but the binary will be deployed on AWS's ELB Golang Docker instance. I need to be able to gracefully update binary without having previous requests ignored.

Edit: EC2 derouting requests to one instance and sending it to another is the easiest and considerably safer than other approaches, but I'm trying to maximize and optimize on vertical scalability (1 instance with zero-downtime restart) before going forward with horizontal scaling.

评论：

davecheney:

There are two parts of this question.


How can I stop accepting new requests while finalising existing request in fly ?

How can I restart my application without being temporarily unavailable ?

My answer to both is the same: if you care about availability to that degree then you must use some sort of load balancer to proxy requests and balance them across multiple application servers.

With that infrastructure in place the rest of the pieces become


Use some sort of middle-ware to keep a counter of requests in fly.

Close the listening socket, this will cause your load balancer to stop sending requests to your process.

Wait for the counter to drop to zero or a some timeout to fire

call os.Exit(0) and let your process manager (upstart, daemontools, systemd -- you are using one of those right ?) restart you.

pcstyle:

Thanks! Yeah I'm using the Load Balancer to handle SSL, supervisord to load several processes. Of those processes is: github.com/codegangsta/gin which reloads updated code, but I'm dropping the gin/code-reload approach for simplicity' sake, and just updating the binary.

It doesn't answer all my questions though: Let's assume there's only 1 instance running and say I close the socket; the load balancer stops sending requests to this specific and only instance. Process is waiting to respond to all requests before exiting, process manager kicks in and restarts the process. But where does the binary update occur? There's a small window during which the server will be unable to handle new requests before exiting. I want the old binary to respond to requests made before the update, and the new binary to respond to all requests made after the update, ergo zero-downtime restarts.

davecheney:

It doesn't answer all my questions though: Let's assume there's only 1 instance running and say I close the socket;

This is a bad approach, I explained why above. You cannot achieve this without having multiple application servers.

the load balancer stops sending requests to this specific and only instance. Process is waiting to respond to all requests before exiting, process manager kicks in and restarts the process.

The process manager should not get involved until your process exits, and your process should only exit once all the outstanding requests have been handled -- or you decide they have had long enough.

But where does the binary update occur? There's a small window during which the server will be unable to handle new requests before exiting. I want the old binary to respond to requests made before the update, and the new binary to respond to all requests made after the update, ergo zero-downtime restarts.

You can either replace the binary on the server, or upload a new one and change the symlink that points to the current version of your binary to your new one so that when your process manager goes to restart the program it is running the new version, not the old one.

How do you replace the binary ? sftp, rsync, etc. This should happen before you start to drain traffic off the application server instance you want to upgrade.

pcstyle:

the load balancer stops sending requests to this specific and only instance. Process is waiting to respond to all requests before exiting, process manager kicks in and restarts the process.

The process manager should not get involved until your process exits, and your process should only exit once all the outstanding requests have been handled -- or you decide they have had long enough.

Sorry about that, meant process is killed and process manager reloads it.

Thanks for answering! You seem to be against forking a new process, passing previous requests made to it and exiting. Why? Why is that a bad approach?

Scalingo explains it as: "The server stops accepting new connections, but the socket is kept opened. The new version of the process is started. The socket is ‘given’ to the new process which will start accepting new connections. Once the old process has finished serving its client, the process has to stop."

davecheney:


Thanks for answering! You seem to be against forking a new process, passing previous requests made to it and exiting. Why? Why is that a bad approach?

My suggested approach is simple, it's used everywhere, not just in Go applications and most importantly, does not let you cheat by trying to convince yourself that one application server is sufficient.

daaku:

We (Parse) wanted similar things and wrote some libraries for this:

@davecheney is right in that ideally you want load balancer managed deploys, and the server should only need to be able to gracefully shutdown without dropping requests. If you can get here, https://github.com/facebookgo/httpdown will help you build this.

If you really want full graceful restart, and are okay with being limited to posix-y environments, https://github.com/facebookgo/grace will give you full graceful restarts (it builds on top of httpdown).

R2A2:

Here's a solution which interacts nicely with Negroni. https://github.com/tylerb/graceful

As soon as it receives a SIGINT or a SIGTERM, it stops listening, and you can start another process in its place while it tidies itself up.

I hope your load balancer would re-route requests automatically, but test it. Some incoming requests might get 503'd during the switchover. To minimize the gap, you could have your new process initialize before you stop the old process, and then await some signal (SIGCONT?) before listening. Fire that SIGCONT immediately after the SIGINT to the old process, and the gap should become tiny.

guneycan:

You can use ElasticBeanstalk (AWS) for deploying & scaling go applications.

With EB, you can clone an envirement with their cli tool like: "eb clone" and deploy your application to the new env and then swap urls. So you don't exp any downtime.

hamax:

This is a very interesting question for any language but the answer really depends on what you're comfortable with and what your infrastructure looks like.

Here are a couple of approaches:

Start new ec2 instances, deploy the new app to them, add them to ebs (via api) and remove/stop the old ones. This is will work for any language but depends on the fact that you're able to set up new instances automatically.

Install nginx/haproxy/... on each instance. Start new instances on all hosts with new ports and tell your reverse proxy to switch to them. Haproxy has a nice api, but for nginx I'd recommend updating the config and reloading it. This also has a couple of drawbacks. If your nginx serves on port 80 your deploy script will need sudo access and you need to write a fairly complicated deploy script to manage the reverse proxy(similar to what you'll need for elb)

Use facebook's grace or any similar package to do the app restart. This looks fairly simple in theory but it also has a couple of problems. First, it's architecture specific, which means you'll have to find another way to do it if you'll decide to also deploy something not written in go. But the bigger drawback is that it assumes you can simply switch the application's binary, send an arbitrary signal (in case of grace it's usr2) to the process and that your process manager won't lose the track of the process if it changes the process id. So in your case where you run the application in docker you won't be able to switch the docker container, instead you'll have to switch the binary inside of it. Which is kind of awkward. If you (or anyone else reading this) runs their app under supervisor you're also out of luck since supervisor can't handle the change of pid and only the newest version supports arbitrary signals.

So as a conclusion I'd recommend the option 1 even though it's aws specific and requires a fairly complicated deploy script.

Now for some advertisment, I started working on a project which replaces the process control system but also includes the reverse proxy. It starts a new instance of your app, switches the traffic to it when the optional healthcheck passes and shuts down the old app when the number of open connections is 0. It's very simple to use and solves all major problems with approaches 1 and 2. It's also completely untested, lacks documentation and I only work on it when I have some down time from my day to day job, so I won't recommend using it yet. But if you want to try it, give me some feedback or even contribute code, I would be very grateful. Here is the link: https://github.com/hamaxx/gracevisor

So yeah, sorry for not giving you a silver bullet (yet) but I hope this helps.

Ps: I'm giving two talks on this subject this week so I'll attach the slides to this post when I'll have them.

pcstyle:

Thanks hamax! The nginx reverse proxy is already setup to serve static files and to reverse-proxy to port 3000 on which the go app runs. I didn't want to be limited to Golang, so I kept my options open, to better maintain the app and make it easier to develop the app into components/micro-services. 
Supervisord runs nginx and golang, so using nginx given my setup isn't too farfetched. 
What I'm really aiming for is to just deploy the binary, which is the fastest time possible. Although hacking into docker to replace the binary is a backward approach, it might also be the fastest.
My biggest concern with Option #1 is that the dockerfile needs about 20 mins (on elasticbeanstalk) to get all dependencies, the go dependencies, and run the app.

EpistemicFaithCrisis:

SO_REUSEPORT

入群交流（和以上内容无关）：加入Go大咖交流群，或添加微信：liuxiaoyan-s 备注：入群；或加QQ群：692541889

2259 次点击

加入收藏微博

github

nginx

docker

gin

0 回复

暂无回复

添加一条新回复（您需要登录后才能回复没有账号？）

请尽量让自己的回复能够对别人有帮助
支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
支持 @ 本站用户；支持表情（输入 : 提示），见 Emoji cheat sheet
图片支持拖拽、截图粘贴等方式上传

Golang graceful restart: best approach?

用户登录

今日阅读排行

一周阅读排行

最新主题