write smart proxy step by step 2 (单节点转发)

董泽润 · 2017-02-10 14:43:08 · 2380 次点击 · 预计阅读时间 5 分钟 · 大约8小时之前开始浏览

这是一个创建于 2017-02-10 14:43:08 的文章，其中的信息可能已经有所发展或是发生改变。

第一次，站长亲自招 Gopher 了>>>

胡歌

写在前面

根据第一篇理论内容，本文基于 Redis Client Protocol 实现最精简的单机转发版本。不包含连接池，网络超时，命令检测，集群，性能统计和服务注册等功能。

Archer

该版本 Proxy 命名为 Archer, 意为弓箭手，熟悉 War3 的老玩家肯定知道，三本弓手很厉害。后续的开发也都是基于这个版本，代码大家感兴趣可自行下载。

https://github.com/dongzerun/archer

数据结构

For Simple Strings the first byte of the reply is "+"

For Errors the first byte of the reply is "-"

For Integers the first byte of the reply is ":"

For Bulk Strings the first byte of the reply is "$"

For Arrays the first byte of the reply is "*"

Redis 协议比较简单，综上5种类型，实现通用的接口

type Resp interface {

Encode() []byte // 生成满足 Client 协议的二进制数据

String() string // 返回字符串数据以供 Debug

Type() string // 标记类型: SimpleResp, ErrorResp, IntResp, BulkResp, ArrayResp

}

五种类型 SimpleResp, ErrorResp, IntResp, BulkResp, ArrayResp 均由 BaseResp 组合。

type BaseResp struct {

Rtype string // Resp 类型

Args [][]byte // 对于Command, 一般Args[0]是命令名称，Args[1]是key

}

网络协议报文收发

对于服务来讲，稳定且高效的网络协议报文收发尤其重要。好在 Redis 足够简单，从 Socket 读出数据，根据首字母判断类型，后续再收发 fixed-length 数据即可。但有两点需要注意:

1. Redis 支持直接发送PING\r\n或QUIT\r\n，也就是说第一个字节除了+-$:*，还有可能是p或q

2. 二进制安全，因为可能携带\r\n，收数据时不能使用ReadBytes(\n)，而要读入长度，再读取固定长度 Fixed-Length 数据

函数( parser.go : ReadProtocol )比较简单，算注释空行不到100。

Pipeline 设计

什么是 Pipeline 呢？就是异步收发。客户端批量发送命令，再批量收回包，或是两个线程，一个负责发送命令，一个负责读命令，对于时延要求高的程序，可以考虑使用 Pipeline。

单机 Redis 没有任何争议，但是在集群模式下，后端节点处理命令不同，Pipeline 收发命令的顺序不能乱，对设计要求较高。能想到最简单的办法，就是增加 Sequence Id, 在 Proxy 返回数据前重排。

type Session struct { // session.go 省略不必要结构体成员

resps chan Resp // Response Buffer Channel

cmds chan *ArrayResp // Command Buffer Channel

}

客户端每个连接分配一个 Session, 开启三个 Goroutine: WriteLoop, Dispatch, ReadLoop, 分别对应写响应，分发命令，读命令。这是在 Proxy 层实现的 Pipeline, Buffer大小默认4096，可以在启动时设置，由于是 session 级别的，不宜设置过大。

压测性能

命令都是一样的，100个并发单次1000000请求

redis-benchmark -h localhost -n 1000000 -c 100 -r 20 -q -p 6379

单机压测 Redis

PING_INLINE: 80153.90 requests per second

PING_BULK: 81327.27 requests per second

SET: 42105.26 requests per second

GET: 42147.86 requests per second

INCR: 73163.59 requests per second

LPUSH: 81752.77 requests per second

LPOP: 82196.28 requests per second

SADD: 80925.79 requests per second

SPOP: 81866.55 requests per second

LPUSH (needed to benchmark LRANGE): 44499.82 requests per second

LRANGE_100 (first 100 elements): 27935.30 requests per second

LRANGE_300 (first 300 elements): 17784.42 requests per second

LRANGE_500 (first 450 elements): 11870.28 requests per second

LRANGE_600 (first 600 elements): 10386.80 requests per second

MSET (10 keys): 33320.01 requests per second

Proxy tcp_nodelay=true

PING_INLINE: 83542.19 requests per second

PING_BULK: 82973.78 requests per second

SET: 37010.99 requests per second

GET: 41614.65 requests per second

INCR: 64412.24 requests per second

LPUSH: 55081.24 requests per second

LPOP: 67272.12 requests per second

SADD: 56821.41 requests per second

SPOP: 40950.04 requests per second

LPUSH (needed to benchmark LRANGE): 40146.13 requests per second

LRANGE_100 (first 100 elements): 23648.49 requests per second

LRANGE_300 (first 300 elements): 9001.06 requests per second

LRANGE_500 (first 450 elements): 6696.13 requests per second

LRANGE_600 (first 600 elements): 5235.33 requests per second

MSET (10 keys): 31629.55 requests per second

Proxy tcp_nodelay=false

PING_INLINE: 83187.76 requests per second

PING_BULK: 80749.35 requests per second

SET: 40062.50 requests per second

GET: 49862.88 requests per second

INCR: 73099.41 requests per second

LPUSH: 71942.45 requests per second

LPOP: 69309.67 requests per second

SADD: 60150.38 requests per second

SPOP: 39624.36 requests per second

LPUSH (needed to benchmark LRANGE): 42016.81 requests per second

LRANGE_100 (first 100 elements): 26281.21 requests per second

LRANGE_300 (first 300 elements): 9603.38 requests per second

LRANGE_500 (first 450 elements): 6552.95 requests per second

LRANGE_600 (first 600 elements): 4945.60 requests per second

MSET (10 keys): 29850.75 requests per second

这个性能还算比较满意，Go的网络连接默认 tcp_nodelay = true, 关闭后发现，在小包的吞吐量上有提升，对于大包就不明显，甚至偏低。

Pprof

开启 Pprof 查看，最终开销都会落到 net.Conn 的读写系统调用。群里的同学给出方法: 合并请求，原则是减少系统调用次数，不过当前场景可能不适合。离线或是对时延要求不敏感的可以这么做。

关于 tcp_nodelay

Nagle's algorithm 是为了解决网络中小包开销的问题，如果发送端欲多次发送包含少量字符的数据包（一般情况下，后面统一称长度小于MSS的数据包为小包，与此相对，称长度等于MSS的数据包为大包，为了某些对比说明，还有中包，即长度比小包长，但又不足一个MSS的包），则发送端会先将第一个小包发送出去，而将后面到达的少量字符数据都缓存起来而不立即发送，直到收到接收端对前一个数据包报文段的ACK确认、或当前字符属于紧急数据，或者积攒到了一定数量的数据（比如缓存的字符数据已经达到数据包报文段的最大长度）等多种情况才将其组成一个较大的数据包发送出去。

默认情况下 MSS 536字节，对于 Redis 服务，响应包最小只有5字节 ($-1\r\n)，MySQL OK_HEADER 包也不足11字节。数据库属于 OLTP 场景，要求时延很小，从这方面看不建义关闭tcp_nodelay, 肯定会遇到意想不到的 BUG 。

另外对于时延要求不高，离线和长连接推送服务，个人感觉可以关闭。

如下几篇文章值得参考

1. 神秘的40毫秒延迟与 TCP_NODELAY

2. nginx 关于 TCP_NODELAY 设定

3. Nagle 网络拥塞控制算法

4. kingshard性能优化网络篇

5. MSS

结语

这是最精简版本，骨架有了，接下来就要在 Dispatch 上做文章，要处理路由以及 ASK MOVE 请求，以及 Failover 后的动态感知。

前段时间被琅琊榜刷屏，推荐胡歌的一首老歌《逍遥叹》。那时他还没有遭遇车祸，那时他还是李逍遥，爱着他的赵灵儿。

有疑问加站长微信联系（非本文作者）

本文来自：简书

感谢作者：董泽润

查看原文：write smart proxy step by step 2 (单节点转发)

入群交流（和以上内容无关）：加入Go大咖交流群，或添加微信：liuxiaoyan-s 备注：入群；或加QQ群：692541889

2380 次点击

加入收藏微博

收入我的专栏

上一篇：beego自动化文档

下一篇：设置GOPATH

redis

pprof

channel

时延

0 回复

添加一条新回复（您需要登录后才能回复没有账号？）

请尽量让自己的回复能够对别人有帮助
支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
支持 @ 本站用户；支持表情（输入 : 提示），见 Emoji cheat sheet
图片支持拖拽、截图粘贴等方式上传

关注我

扫码关注领全套学习资料
加入 QQ 群：
- 192706294（已满）
- 731990104（已满）
- 798786647（已满）
- 729884609（已满）
- 977810755（已满）
- 815126783（已满）
- 812540095（已满）
- 1006366459（已满）
- 692541889
加入微信群：liuxiaoyan-s，备注入群
也欢迎加入知识星球 Go粉丝们（免费）

write smart proxy step by step 2 (单节点转发)

写在前面

Archer

数据结构

网络协议报文收发

Pipeline 设计

压测性能

关于 tcp_nodelay

结语

用户登录

今日阅读排行

一周阅读排行

关注我

写在前面

Archer

数据结构

网络协议报文收发

Pipeline 设计

压测性能

关于 tcp_nodelay

结语

write smart proxy step by step 2 (单节点转发)

写在前面

Archer

数据结构

网络协议报文收发

Pipeline 设计

压测性能

关于 tcp_nodelay

结语

用户登录

今日阅读排行

一周阅读排行

关注我

给该专栏投稿 写篇新文章

收入到我管理的专栏 新建专栏

写在前面

Archer

数据结构

网络协议报文收发

Pipeline 设计

压测性能

关于 tcp_nodelay

结语

给该专栏投稿写篇新文章

收入到我管理的专栏新建专栏