Docker与CGroup资源限制结合内幕深入剖析-Docker商业环境实战

数据云技术社区 · · 926 次点击 · · 开始浏览    
这是一个创建于 的文章,其中的信息可能已经有所发展或是发生改变。

专注于大数据及容器云核心技术解密,可提供全栈的大数据+云原生平台咨询方案,请持续关注本套博客。如有任何学术交流,可随时联系。更多内容请关注《数据云技术社区》公众号。

1 Linux Cgroup(新瓶装旧酒)

  • Linux Cgroup最主要的作用是为一个进程组设置资源使用的上限,这些资源包括CPU、内存、磁盘、网络等。在linux中,Cgroup给用户提供的操作接口是文件系统,其以文件和目录的方式组织在/sys/fs/cgroup路径下。
  • Cgroup的资源限制是通过目录进行控制的,比如:在/sys/fs/cgroup/cpu目录下面创建hello文件夹,将会自动生成一堆默认cpu限制文件。
  • docker首先会在/sys/fs/cgroup/cpu路径下创建名为docker的目录
  • 紧接着会在docker的目录下创建容器id名称的子目录
  • 对容器的cpu的使用限制是通过操作容器id子目录下的文件设置达成的
  • 容器内的进程均受容器的资源设置限制
  • 其他的资源比如内存、网络等设置与cpu结构相同

2 Docker容器与Cgroup结合

  • 初始化场景下:Docker容器没有启动。
  • 因为Docker容器没有运行,/sys/fs/cgroup/cpu/docker目录下面没有对应资源限制,如下所示:
  • 因为Docker容器没有运行,/sys/fs/cgroup/memory/docker目录下面没有对应资源限制,如下所示:
  • 启动运行时容器时,在/sys/fs/cgroup/cpu/docker目录下面创建d65aa14f8c929631f83c267b5575b07771b148e2f200ba1756236104169ce917目录
docker run -itd --rm --cpu-shares 512 progrium/stress --cpu 1  --timeout 1000s
复制代码
  • 如下就是启动运行时容器时对应的场景

3 Docker容器基于Cgroup进行压测

3.1 CPU份额测试

  • 运行3个容器,指定容器的--cpu-share的值分别为512、512、1024,这3个容器使用CPU的时间比例为1:1:2,使用ctop或者top查看CPU利用率,理想的情况下,CPU占用接近25%、25%、50%
docker run -itd --rm --cpu-shares 512 progrium/stress --cpu 1  --timeout 1000s
docker run -itd --rm --cpu-shares 512 progrium/stress --cpu 1  --timeout 100s
docker run -itd --rm --cpu-shares 1024 progrium/stress --cpu 1  --timeout 100s
复制代码
  • 启动三个Docker容器进程
  • 查看CPU占比为1:1:2
  • 查看/sys/fs/cgroup/cpu/docker目录下对应三个目录,分别是:4ba04effda39be626d3bd1945b90a43e4ff99471a3296e05616c58a1c11ba873,d65aa14f8c929631f83c267b5575b07771b148e2f200ba1756236104169ce917,fe2f33bad9d15e42b7b2941528394e256ea7e119f3f5a563029c032a30519e5e
  • 查看cpu.shares限制文件对应的值512:512:1024

3.1 Memory份额测试

  • 运行2个stress容器,测试内存的占用,每个容器产生4个线程,第一个容器每个线程消耗128MB内存,第二个容器的4个线程每个消耗256MB内存:
docker stop $(docker ps -q) & docker rm $(docker ps -aq)

docker run --rm -it progrium/stress --cpu 2 --io 1 --vm 2 --vm-bytes 128M --timeout 60s
docker run -itd --rm  progrium/stress --vm 4 --vm-bytes 128M  --timeout 100s
docker run -itd --rm  progrium/stress --vm 4 --vm-bytes 256M  --timeout 100s

[root@worker3 local]# docker run -itd --rm  progrium/stress --vm 4 --vm-bytes 128M  --timeout 100s
888b0a8b4afdd92e241d0446c63d940cb486559f86263a070577a3860e0f5356
[root@worker3 local]# docker run -itd --rm  progrium/stress --vm 4 --vm-bytes 256M  --timeout 100s
25cc7585895d2e5f9fee4a1723d5cc09464c426c9213854ce56e1d6bb3c1256d

top - 23:31:52 up  2:23,  3 users,  load average: 5.95, 3.93, 2.45
Tasks: 113 total,   9 running, 104 sleeping,   0 stopped,   0 zombie
%Cpu(s): 10.7 us, 89.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  1872956 total,   853736 free,   821600 used,   197620 buff/cache
KiB Swap:        0 total,        0 free,        0 used.   848116 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                                                 
30972 root      20   0  138380  75868    256 R 12.3  4.1   0:01.57 stress                                                                                                  
30975 root      20   0  138380  21092    256 R 12.3  1.1   0:01.57 stress                                                                                                  
31022 root      20   0  269452 158148    256 R 12.3  8.4   0:01.54 stress                                                                                                  
31023 root      20   0  269452 133648    256 R 12.3  7.1   0:01.54 stress                                                                                                  
31025 root      20   0  269452 135696    256 R 12.3  7.2   0:01.54 stress                                                                                                  
30973 root      20   0  138380  71772    256 R 12.0  3.8   0:01.57 stress                                                                                                  
30974 root      20   0  138380  28556    256 R 12.0  1.5   0:01.57 stress                                                                                                  
31024 root      20   0  269452  31076    256 R 12.0  1.7   0:01.54 stress  


[root@worker3 docker]# ls
16a19075bbc6f7525bbcef670fcf920223d6a54396bad4393110fdf9c6afd57c  memory.kmem.max_usage_in_bytes      memory.memsw.failcnt             memory.stat
cgroup.clone_children                                             memory.kmem.slabinfo                memory.memsw.limit_in_bytes      memory.swappiness
cgroup.event_control                                              memory.kmem.tcp.failcnt             memory.memsw.max_usage_in_bytes  memory.usage_in_bytes
cgroup.procs                                                      memory.kmem.tcp.limit_in_bytes      memory.memsw.usage_in_bytes      memory.use_hierarchy
ecb8d7dac939ff1c45713928a406721f078fdce87965c04f63291b8ef4172717  memory.kmem.tcp.max_usage_in_bytes  memory.move_charge_at_immigrate  notify_on_release
memory.failcnt                                                    memory.kmem.tcp.usage_in_bytes      memory.numa_stat                 tasks
memory.force_empty                                                memory.kmem.usage_in_bytes          memory.oom_control
memory.kmem.failcnt                                               memory.limit_in_bytes               memory.pressure_level
memory.kmem.limit_in_bytes                                        memory.max_usage_in_bytes           memory.soft_limit_in_bytes
复制代码

4 Docker容器基于Cgroup资源控制

4.1 Docker进程与tasks对应关系

  • 通过docker exec在容器中运行stress服务,并查看tasks文件
[root@worker3 local]# docker run -tid --name stressbash --entrypoint bash progrium/stress
998683eae5b45f6b66fdf74806092ad06fdf7cef99067b6844d225a537081e33

[root@worker3 local]# docker exec -it stressbash stress --vm-bytes 128M --vm 4
stress: info: [28] dispatching hogs: 0 cpu, 0 io, 4 vm, 0 hdd

top - 23:51:43 up  2:43,  4 users,  load average: 4.16, 2.75, 2.90
Tasks: 110 total,   5 running, 105 sleeping,   0 stopped,   0 zombie
%Cpu(s):  5.6 us, 94.4 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  1872956 total,  1252252 free,   429780 used,   190924 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  1244348 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                                                 
31933 root      20   0  138380  71760    188 R 25.3  3.8   0:30.33 stress                                                                                                  
31934 root      20   0  138380   4176    188 R 25.0  0.2   0:30.33 stress                                                                                                  
31935 root      20   0  138380  67664    188 R 25.0  3.6   0:30.33 stress                                                                                                  
31936 root      20   0  138380 122960    188 R 24.7  6.6   0:30.32 stress  

[root@worker3 ~]# docker top stressbash
UID                 PID                 PPID                C                   STIME               TTY                 TIME                CMD
root                31722               31711               0                   23:46               pts/3               00:00:00            bash
root                31922               31913               0                   23:49               pts/5               00:00:00            stress --vm-bytes 128M --vm 4
root                31933               31922               25                  23:49               pts/5               00:00:02            stress --vm-bytes 128M --vm 4
root                31934               31922               25                  23:49               pts/5               00:00:02            stress --vm-bytes 128M --vm 4
root                31935               31922               25                  23:49               pts/5               00:00:02            stress --vm-bytes 128M --vm 4
root                31936               31922               25                  23:49               pts/5               00:00:02            stress --vm-bytes 128M --vm 4

[root@worker3 998683eae5b45f6b66fdf74806092ad06fdf7cef99067b6844d225a537081e33]# pwd
/sys/fs/cgroup/memory/docker/998683eae5b45f6b66fdf74806092ad06fdf7cef99067b6844d225a537081e33

[root@worker3 998683eae5b45f6b66fdf74806092ad06fdf7cef99067b6844d225a537081e33]# cat tasks 
31722
31922
31933
31934
31935
31936
复制代码

4.2 Docker容器cpu限制

[root@worker3 local]# docker run -itd --rm  progrium/stress --cpu 1 --vm-bytes 200M
e7baea0c5c535234404ce18452242d108bc5dd41a7d5fc3d1b12e3052bf8c027

[root@worker3 e7baea0c5c535234404ce18452242d108bc5dd41a7d5fc3d1b12e3052bf8c027]# cat cpu.cfs_period_us
100000
[root@worker3 e7baea0c5c535234404ce18452242d108bc5dd41a7d5fc3d1b12e3052bf8c027]# cat cpu.cfs_quota_us
-1
[root@worker3 e7baea0c5c535234404ce18452242d108bc5dd41a7d5fc3d1b12e3052bf8c027]# pwd
/sys/fs/cgroup/cpu/docker/e7baea0c5c535234404ce18452242d108bc5dd41a7d5fc3d1b12e3052bf8c027
可见没有对stress容器做cpu限制

top - 23:57:59 up  2:50,  4 users,  load average: 2.04, 3.08, 3.08
Tasks: 104 total,   3 running, 101 sleeping,   0 stopped,   0 zombie
%Cpu(s): 99.7 us,  0.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  1872956 total,  1536080 free,   145972 used,   190904 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  1528180 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                                                                 
32032 root      20   0    7304    100      0 R 99.7  0.0   0:38.25 stress 
复制代码
  • 对容器cpu做限制,--cpu-period=100000,--cpu-quota=60000
docker run -itd  --cpu-period 100000 --cpu-quota 60000 --rm  progrium/stress --cpu 1 --vm-bytes 200M
db962ae56fbc087591dd96685ca94056c2e61bbc987ef638d35a94a290f00d33

[root@worker3 db962ae56fbc087591dd96685ca94056c2e61bbc987ef638d35a94a290f00d33]# ls
cgroup.clone_children  cgroup.procs  cpuacct.usage         cpu.cfs_period_us  cpu.rt_period_us   cpu.shares  notify_on_release
cgroup.event_control   cpuacct.stat  cpuacct.usage_percpu  cpu.cfs_quota_us   cpu.rt_runtime_us  cpu.stat    tasks
[root@worker3 db962ae56fbc087591dd96685ca94056c2e61bbc987ef638d35a94a290f00d33]# cat cpu.cfs_period_us 
100000
[root@worker3 db962ae56fbc087591dd96685ca94056c2e61bbc987ef638d35a94a290f00d33]# cat cpu.cfs_quota_us 
60000

top命令
top - 00:03:42 up  2:55,  4 users,  load average: 0.63, 1.43, 2.35
Tasks: 104 total,   2 running, 102 sleeping,   0 stopped,   0 zombie
%Cpu(s): 58.2 us,  0.0 sy,  0.0 ni, 41.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  1872956 total,  1533696 free,   148388 used,   190872 buff/cache
KiB Swap:        0 total,        0 free,        0 used.  1525792 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                                                              
32164 root      20   0    7304     96      0 R 60.1  0.0   1:06.34 stress 
复制代码

4.2 Docker容器内存限制

  • 重点关注 --memory,--memory-swap,--memory-swappiness三个参数
[root@worker3 ~]# free -h
              total        used        free      shared  buff/cache   available
Mem:           1.8G        145M        1.5G        9.6M        186M        1.5G
Swap:            0B          0B          0B

[root@worker3 local]# docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
[root@worker3 local]# docker run -itd --name stress --memory 1G --memory-swap 3G --memory-swappiness 20 --entrypoint bash progrium/stress737f2b75ab731f649ee3e7194439d525051ec937a8532f64c524cf30c52e550f
[root@worker3 local]# free -h
              total        used        free      shared  buff/cache   available
Mem:           1.8G        145M        1.5G        9.6M        186M        1.5G
Swap:            0B          0B          0B
[root@worker3 local]# docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
737f2b75ab73        progrium/stress     "bash"              24 seconds ago      Up 24 seconds   


[root@worker3 docker]# pwd
/sys/fs/cgroup/memory/docker
[root@worker3 docker]# cd 737f2b75ab731f649ee3e7194439d525051ec937a8532f64c524cf30c52e550f/
[root@worker3 737f2b75ab731f649ee3e7194439d525051ec937a8532f64c524cf30c52e550f]# ls
cgroup.clone_children           memory.kmem.slabinfo                memory.memsw.failcnt             memory.soft_limit_in_bytes
cgroup.event_control            memory.kmem.tcp.failcnt             memory.memsw.limit_in_bytes      memory.stat
cgroup.procs                    memory.kmem.tcp.limit_in_bytes      memory.memsw.max_usage_in_bytes  memory.swappiness
memory.failcnt                  memory.kmem.tcp.max_usage_in_bytes  memory.memsw.usage_in_bytes      memory.usage_in_bytes
memory.force_empty              memory.kmem.tcp.usage_in_bytes      memory.move_charge_at_immigrate  memory.use_hierarchy
memory.kmem.failcnt             memory.kmem.usage_in_bytes          memory.numa_stat                 notify_on_release
memory.kmem.limit_in_bytes      memory.limit_in_bytes               memory.oom_control               tasks
memory.kmem.max_usage_in_bytes  memory.max_usage_in_bytes           memory.pressure_level
[root@worker3 737f2b75ab731f649ee3e7194439d525051ec937a8532f64c524cf30c52e550f]# cat memory.limit_in_bytes 
1073741824 对应--memory=1G
[root@worker3 737f2b75ab731f649ee3e7194439d525051ec937a8532f64c524cf30c52e550f]# cat memory.memsw.limit_in_bytes 
3221225472  对应--memory-swap=3G
[root@worker3 737f2b75ab731f649ee3e7194439d525051ec937a8532f64c524cf30c52e550f]# cat memory.swappiness 
20  对应--memory-swappiness 20
复制代码

4.3 容器oom kill分析

[root@worker3 local]# free -h
              total        used        free      shared  buff/cache   available
Mem:           1.8G        142M        1.5G        9.5M        191M        1.5G
Swap:            0B          0B          0B
[root@worker3 local]# docker run --rm -it  progrium/stress --cpu 1 --vm 2 --vm-bytes 19.9999G
stress: info: [1] dispatching hogs: 1 cpu, 0 io, 2 vm, 0 hdd
stress: dbug: [1] using backoff sleep of 9000us
stress: dbug: [1] --> hogcpu worker 1 [5] forked
stress: dbug: [1] --> hogvm worker 2 [6] forked
stress: dbug: [1] using backoff sleep of 3000us
stress: dbug: [1] --> hogvm worker 1 [7] forked
stress: dbug: [7] allocating 20401094656 bytes ...
stress: FAIL: [7] (495) hogvm malloc failed: Cannot allocate memory
stress: FAIL: [1] (395) <-- worker 7 returned error 1
stress: WARN: [1] (397) now reaping child worker processes
stress: dbug: [1] <-- worker 5 reaped
stress: dbug: [1] <-- worker 6 reaped
stress: FAIL: [1] (452) failed run completed in 0s
复制代码
  • 当容器使用完宿主机内存,检测容器内存不足,将会oom被kill
  • 对容器设置memory限制且不使用swap,并设置--oom-kill-disable=true,经过一段时间,容器并没有发生oom,stats命令中的MEM %参数的值一直是100.00%
docker run --rm -it --memory 5G -memory-swap 5G --oom-kill-disable=true progrium/stress --cpu 1 --vm 2 --vm-bytes 1G

去掉 --oom-kill-disable=true 的设置,检测结果,很快容器就发生了oom killer事件,
可见容器只有在设置了memory限制之后,--oom-kill-disable才会起作用
复制代码

5 总结

本文针对Docker与Cgroup资源限制结合再次进行深度总结,花费将近5个小时。

专注于大数据及容器云核心技术解密,可提供全栈的大数据+云原生平台咨询方案,请持续关注本套博客。如有任何学术交流,可随时联系。更多内容请关注《数据云技术社区》公众号。


有疑问加站长微信联系(非本文作者)

本文来自:掘金

感谢作者:数据云技术社区

查看原文:Docker与CGroup资源限制结合内幕深入剖析-Docker商业环境实战

入群交流(和以上内容无关):加入Go大咖交流群,或添加微信:liuxiaoyan-s 备注:入群;或加QQ群:692541889

926 次点击  
加入收藏 微博
暂无回复
添加一条新回复 (您需要 登录 后才能回复 没有账号 ?)
  • 请尽量让自己的回复能够对别人有帮助
  • 支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
  • 支持 @ 本站用户;支持表情(输入 : 提示),见 Emoji cheat sheet
  • 图片支持拖拽、截图粘贴等方式上传