runtime库--io模型1--netpoll

buptbill220 · 2019-06-10 14:05:49 · 1372 次点击 · 预计阅读时间 6 分钟 · 大约8小时之前开始浏览

这是一个创建于 2019-06-10 14:05:49 的文章，其中的信息可能已经有所发展或是发生改变。

第一次，站长亲自招 Gopher 了>>>

本文主要讲解epoll初始化、epoll调度、poll内部结构

核心代码文件：

src/runtime/proc.go

src/runtime/netpoll.go

src/runtime/netpollxxx.go(netpoll_epoll.go, netpoll_kqueue.go等；这里只关注netpoll_epoll.go实现)

src/internal/poll/fd_poll_runtime.go

src/internal/poll/fd_unix.go

runtime/defs_linux_arch.go(defs_linux_amd64.go等)

netpoll初始化：

通过netpoll_epoll.go里的netpollinit函数实现，其源码如下：

func netpollinit() {
epfd = epollcreate1(_EPOLL_CLOEXEC)
if epfd >= 0 {
return
}
epfd = epollcreate(1024)
if epfd >= 0 {
closeonexec(epfd)
return
}
println("runtime: epollcreate failed with", -epfd)
throw("runtime: netpollinit failed")
}

上面代码很简单，就是创建一个全局epoll句柄。其中参数_EPOLL_CLOEXEC表示最大支持的并发连接数，根据定义_EPOLL_CLOEXEC = 0x80000，也即是2^19~=50w（根据系统资源而定--内存），如果创建失败，则默认支持1024个。

这里有个问题，netpollinit函数什么时候会调用？很容易想到在系统初始化的时候调用netpollinit，这样确实很简单。但是对于一个非网络io服务的应用程序（服务）来说，这样无疑是有损的，表现在2方面：1：浪费内存（内核态epoll结构树等），2：epoll线程空转。

是否有更好的方式，对于非网络型应用程序不创建epoll结构，对于网络有需求的使用epoll？这里我们可以想到单例设计，确实golang设计者们是怎么做的。其实现如下：

1：
func (fd *FD) Init(net string, pollable bool) error {
// We don't actually care about the various network types.
if net == "file" {
fd.isFile = true
}
if !pollable {
return nil
}
return fd.pd.init(fd)
}

2：
func (pd *pollDesc) init(fd *FD) error {
serverInit.Do(runtime_pollServerInit)
ctx, errno := runtime_pollOpen(uintptr(fd.Sysfd))
if errno != 0 {
if ctx != 0 {
runtime_pollUnblock(ctx)
runtime_pollClose(ctx)
}
return syscall.Errno(errno)
}
pd.runtimeCtx = ctx
return nil
}

3：
func poll_runtime_pollServerInit() {
netpollinit()
atomic.Store(&netpollInited, 1)
}

4：

func netpollopen(fd uintptr, pd *pollDesc) int32 {
var ev epollevent
ev.events = _EPOLLIN | _EPOLLOUT | _EPOLLRDHUP | _EPOLLET
*(**pollDesc)(unsafe.Pointer(&ev.data)) = pd
return -epollctl(epfd, _EPOLL_CTL_ADD, int32(fd), &ev)
}

其入口是通过fd_unix.go中FD方法Init-->fd_poll_runtime.go中pollDesc方法init-->netpoll.go中函数runtime_pollServerInit。

其中serverInit.Do(runtime_pollServerInit)是单例Once实现。

实现过程简言之：应用程序NewFD-->生成pollDesc-->初始化epoll-->pollDesc绑定到epfd

注意：netpollopen里使用边沿触发（golang为了高效）

netpoll调度：

整个go进程主入口在proc.go里main函数实现，具体调度流程如下（后续分章节细讲内核调度）

系统启动，会创建不同的线程：sysmon=system monitor（系统监控）；work就是各个业务线程；

其中netpoll调用有2个来源主要来源（1主2辅）：

1：每个业务线程worker会穿插调度netpoll

2：sysmon线程会每隔10ms调度（只是为了防止worker因为太忙，长时间内没有调度netpoll）

图中左方是业务放使用串行场景，右方是golang内核实现：其中业务方wait event只会，会把当前goroutine放到等待io事件队列，并且把goroutine上下文和fd关联；netpoll调度之后会把拿到当前事件fd（读、写），并把对应goroutine切换到可执行队列，随后业务方系统调用read/write。。。达到串行编程，并行执行的目的。

这里有个疑问，为啥不单独设计一个线程来做io调度（nanosleep(10us)这种），而是放到worker里调度？

个人认为，这里涉及的golang设计问题。

首先考虑epoll本身模型，1：多进程单线程模型一个epoll_wait之后，处理每个连接；2：单进程多线程模型使用队列实现；

如果这里使用单独线程来做io调度，那么会出现上述类似1的情况，本身线程太重（处理io、处理业务逻辑）；另一种做法是单独线程只是给当前业务goroutine调度到可执行队列（这种做法是可取的）；

但是从golang设计思想：并行执行，串行编程。即是任何异步io、channel，在业务层看来就是串行调用，底层看到异步io调用方仅仅是一个普通goroutine，那么和普通channel等没有任何区别，只需要把相对应的异步io调用方goroutine切换到可执行队列即可，所以只需要普通worker来做netpoll调度。（后续讲TcpConn相关）

poll内部结构：

//go:notinheap
type pollDesc struct {
link *pollDesc // in pollcache, protected by pollcache.lock

// The lock protects pollOpen, pollSetDeadline, pollUnblock and deadlineimpl operations.
// This fully covers seq, rt and wt variables. fd is constant throughout the PollDesc lifetime.
// pollReset, pollWait, pollWaitCanceled and runtime·netpollready (IO readiness notification)
// proceed w/o taking the lock. So closing, rg, rd, wg and wd are manipulated
// in a lock-free way by all operations.
// NOTE(dvyukov): the following code uses uintptr to store *g (rg/wg),
// that will blow up when GC starts moving objects.
lock mutex // protects the following fields
fd uintptr
closing bool
seq uintptr // protects from stale timers and ready notifications
rg uintptr // pdReady, pdWait, G waiting for read or nil
rt timer // read deadline timer (set if rt.f != nil)
rd int64 // read deadline
wg uintptr // pdReady, pdWait, G waiting for write or nil
wt timer // write deadline timer
wd int64 // write deadline
user uint32 // user settable cookie
}

type pollCache struct {
lock mutex
first *pollDesc
// PollDesc objects must be type-stable,
// because we can get ready notification from epoll/kqueue
// after the descriptor is closed/reused.
// Stale notifications are detected using seq variable,
// seq is incremented when deadlines are changed or descriptor is reused.
}

所有fd绑定到pollDesc结构给epfd使用，pollDesc从pollCache中申请、回收

结构本身很简单，无过多分析。

主要分析下pollDesc使用非托管内存

非托管内存：在某些情况下，运行时必须非托管内存中分配垃圾回收堆之外的对象。如果对象是内存管理器的一部分，或者必须在调用者可能没有 P 的情况下分他们，则这些分配和回收是有必要的

这里要分析pollDesc必须为非托管内存，其实源代码已经做了注释，PollDesc objects must be type-stable，就是PollDesc必须是稳定结构类型。因为啥？与epoll本身有关，相关代码如下：

1：
type epollevent struct {
events uint32
data [8]byte // unaligned uintptr
}

2：
n := epollwait(epfd, &events[0], int32(len(events)), waitms)
if n < 0 {
if n != -_EINTR {
println("runtime: epollwait on fd", epfd, "failed with", -n)
throw("runtime: netpoll failed")
}
goto retry
}
var gp guintptr
for i := int32(0); i < n; i++ {
ev := &events[i]
if ev.events == 0 {
continue
}
var mode int32
if ev.events&(_EPOLLIN|_EPOLLRDHUP|_EPOLLHUP|_EPOLLERR) != 0 {
mode += 'r'
}
if ev.events&(_EPOLLOUT|_EPOLLHUP|_EPOLLERR) != 0 {
mode += 'w'
}
if mode != 0 {
pd := *(**pollDesc)(unsafe.Pointer(&ev.data))

netpollready(&gp, pd, mode)
}
}

在epoll红黑树里，每个节点绑定了一个回调事件对应的数据上下文指针：即epollevent中data成员，也即是pollDesc指针，该指针必须始终有效直到该事件被释放。如果在gc堆上申请内存，会导致一个可能异常的场景：

假设pollDesc在堆上分配（不可能在栈上）

想象一下，开启tcp连接server，由于在业务层设置超时，而提前主动关闭fd（pollDesc），导致该pollDesc被gc掉；而底层epoll event事件到来，由于data所指向的pollDesc指针，并且该指针无效或者是被分配给其他用处，那就嗝屁了。

或者另一个场景，该pollDesc被转换成uintptr（unsafe），导致pollDesc不受gc约束，而引起释放。

所以pollDesc必须是在非托管内存分配（内存始终存在），不受gc约束。

注意：如果open一个连接之后，必须close，否则会引起非托管内存泄露。

----------------------------------------我是分割线----------------------------------------

如有不对地方，请大家指正多多包涵。祝好~~~

有疑问加站长微信联系（非本文作者）

本文来自：知乎专栏

感谢作者：buptbill220

查看原文：runtime库--io模型1--netpoll

入群交流（和以上内容无关）：加入Go大咖交流群，或添加微信：liuxiaoyan-s 备注：入群；或加QQ群：692541889

1372 次点击

加入收藏微博

收入我的专栏

上一篇：对 Golang 代码调用 Elasticsearch 进行单元测试

下一篇：context库--如何获取goroutine id、实现gls

runtime

io

线程

goroutine

0 回复

暂无回复

添加一条新回复（您需要登录后才能回复没有账号？）

请尽量让自己的回复能够对别人有帮助
支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
支持 @ 本站用户；支持表情（输入 : 提示），见 Emoji cheat sheet
图片支持拖拽、截图粘贴等方式上传

runtime库--io模型1--netpoll

netpoll初始化：

netpoll调度：

poll内部结构：

用户登录

今日阅读排行

一周阅读排行

关注我

netpoll初始化：

netpoll调度：

poll内部结构：

runtime库--io模型1--netpoll

netpoll初始化：

netpoll调度：

poll内部结构：

用户登录

今日阅读排行

一周阅读排行

关注我

给该专栏投稿 写篇新文章

收入到我管理的专栏 新建专栏

netpoll初始化：

netpoll调度：

poll内部结构：

给该专栏投稿写篇新文章

收入到我管理的专栏新建专栏