专注于大数据及容器云核心技术解密,可提供全栈的大数据+云原生平台咨询方案,请持续关注本套博客。如有任何学术交流,可随时联系。更多内容请关注《数据云技术社区》公众号。
1 Linux Cgroup(新瓶装旧酒)
- Linux Cgroup最主要的作用是为一个进程组设置资源使用的上限,这些资源包括CPU、内存、磁盘、网络等。在linux中,Cgroup给用户提供的操作接口是文件系统,其以文件和目录的方式组织在/sys/fs/cgroup路径下。
- Cgroup的资源限制是通过目录进行控制的,比如:在/sys/fs/cgroup/cpu目录下面创建hello文件夹,将会自动生成一堆默认cpu限制文件。
- docker首先会在/sys/fs/cgroup/cpu路径下创建名为docker的目录
- 紧接着会在docker的目录下创建容器id名称的子目录
- 对容器的cpu的使用限制是通过操作容器id子目录下的文件设置达成的
- 容器内的进程均受容器的资源设置限制
- 其他的资源比如内存、网络等设置与cpu结构相同
2 Docker容器与Cgroup结合
- 初始化场景下:Docker容器没有启动。
- 因为Docker容器没有运行,/sys/fs/cgroup/cpu/docker目录下面没有对应资源限制,如下所示:
- 因为Docker容器没有运行,/sys/fs/cgroup/memory/docker目录下面没有对应资源限制,如下所示:
- 启动运行时容器时,在/sys/fs/cgroup/cpu/docker目录下面创建d65aa14f8c929631f83c267b5575b07771b148e2f200ba1756236104169ce917目录
docker run -itd --rm --cpu-shares 512 progrium/stress --cpu 1 --timeout 1000s
复制代码
- 如下就是启动运行时容器时对应的场景
3 Docker容器基于Cgroup进行压测
3.1 CPU份额测试
- 运行3个容器,指定容器的--cpu-share的值分别为512、512、1024,这3个容器使用CPU的时间比例为1:1:2,使用ctop或者top查看CPU利用率,理想的情况下,CPU占用接近25%、25%、50%
docker run -itd --rm --cpu-shares 512 progrium/stress --cpu 1 --timeout 1000s
docker run -itd --rm --cpu-shares 512 progrium/stress --cpu 1 --timeout 100s
docker run -itd --rm --cpu-shares 1024 progrium/stress --cpu 1 --timeout 100s
复制代码
- 启动三个Docker容器进程
- 查看CPU占比为1:1:2
- 查看/sys/fs/cgroup/cpu/docker目录下对应三个目录,分别是:4ba04effda39be626d3bd1945b90a43e4ff99471a3296e05616c58a1c11ba873,d65aa14f8c929631f83c267b5575b07771b148e2f200ba1756236104169ce917,fe2f33bad9d15e42b7b2941528394e256ea7e119f3f5a563029c032a30519e5e
- 查看cpu.shares限制文件对应的值512:512:1024
3.1 Memory份额测试
- 运行2个stress容器,测试内存的占用,每个容器产生4个线程,第一个容器每个线程消耗128MB内存,第二个容器的4个线程每个消耗256MB内存:
docker stop $(docker ps -q) & docker rm $(docker ps -aq)
docker run --rm -it progrium/stress --cpu 2 --io 1 --vm 2 --vm-bytes 128M --timeout 60s
docker run -itd --rm progrium/stress --vm 4 --vm-bytes 128M --timeout 100s
docker run -itd --rm progrium/stress --vm 4 --vm-bytes 256M --timeout 100s
[root@worker3 local]# docker run -itd --rm progrium/stress --vm 4 --vm-bytes 128M --timeout 100s
888b0a8b4afdd92e241d0446c63d940cb486559f86263a070577a3860e0f5356
[root@worker3 local]# docker run -itd --rm progrium/stress --vm 4 --vm-bytes 256M --timeout 100s
25cc7585895d2e5f9fee4a1723d5cc09464c426c9213854ce56e1d6bb3c1256d
top - 23:31:52 up 2:23, 3 users, load average: 5.95, 3.93, 2.45
Tasks: 113 total, 9 running, 104 sleeping, 0 stopped, 0 zombie
%Cpu(s): 10.7 us, 89.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 1872956 total, 853736 free, 821600 used, 197620 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 848116 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
30972 root 20 0 138380 75868 256 R 12.3 4.1 0:01.57 stress
30975 root 20 0 138380 21092 256 R 12.3 1.1 0:01.57 stress
31022 root 20 0 269452 158148 256 R 12.3 8.4 0:01.54 stress
31023 root 20 0 269452 133648 256 R 12.3 7.1 0:01.54 stress
31025 root 20 0 269452 135696 256 R 12.3 7.2 0:01.54 stress
30973 root 20 0 138380 71772 256 R 12.0 3.8 0:01.57 stress
30974 root 20 0 138380 28556 256 R 12.0 1.5 0:01.57 stress
31024 root 20 0 269452 31076 256 R 12.0 1.7 0:01.54 stress
[root@worker3 docker]# ls
16a19075bbc6f7525bbcef670fcf920223d6a54396bad4393110fdf9c6afd57c memory.kmem.max_usage_in_bytes memory.memsw.failcnt memory.stat
cgroup.clone_children memory.kmem.slabinfo memory.memsw.limit_in_bytes memory.swappiness
cgroup.event_control memory.kmem.tcp.failcnt memory.memsw.max_usage_in_bytes memory.usage_in_bytes
cgroup.procs memory.kmem.tcp.limit_in_bytes memory.memsw.usage_in_bytes memory.use_hierarchy
ecb8d7dac939ff1c45713928a406721f078fdce87965c04f63291b8ef4172717 memory.kmem.tcp.max_usage_in_bytes memory.move_charge_at_immigrate notify_on_release
memory.failcnt memory.kmem.tcp.usage_in_bytes memory.numa_stat tasks
memory.force_empty memory.kmem.usage_in_bytes memory.oom_control
memory.kmem.failcnt memory.limit_in_bytes memory.pressure_level
memory.kmem.limit_in_bytes memory.max_usage_in_bytes memory.soft_limit_in_bytes
复制代码
4 Docker容器基于Cgroup资源控制
4.1 Docker进程与tasks对应关系
- 通过docker exec在容器中运行stress服务,并查看tasks文件
[root@worker3 local]# docker run -tid --name stressbash --entrypoint bash progrium/stress
998683eae5b45f6b66fdf74806092ad06fdf7cef99067b6844d225a537081e33
[root@worker3 local]# docker exec -it stressbash stress --vm-bytes 128M --vm 4
stress: info: [28] dispatching hogs: 0 cpu, 0 io, 4 vm, 0 hdd
top - 23:51:43 up 2:43, 4 users, load average: 4.16, 2.75, 2.90
Tasks: 110 total, 5 running, 105 sleeping, 0 stopped, 0 zombie
%Cpu(s): 5.6 us, 94.4 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 1872956 total, 1252252 free, 429780 used, 190924 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 1244348 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31933 root 20 0 138380 71760 188 R 25.3 3.8 0:30.33 stress
31934 root 20 0 138380 4176 188 R 25.0 0.2 0:30.33 stress
31935 root 20 0 138380 67664 188 R 25.0 3.6 0:30.33 stress
31936 root 20 0 138380 122960 188 R 24.7 6.6 0:30.32 stress
[root@worker3 ~]# docker top stressbash
UID PID PPID C STIME TTY TIME CMD
root 31722 31711 0 23:46 pts/3 00:00:00 bash
root 31922 31913 0 23:49 pts/5 00:00:00 stress --vm-bytes 128M --vm 4
root 31933 31922 25 23:49 pts/5 00:00:02 stress --vm-bytes 128M --vm 4
root 31934 31922 25 23:49 pts/5 00:00:02 stress --vm-bytes 128M --vm 4
root 31935 31922 25 23:49 pts/5 00:00:02 stress --vm-bytes 128M --vm 4
root 31936 31922 25 23:49 pts/5 00:00:02 stress --vm-bytes 128M --vm 4
[root@worker3 998683eae5b45f6b66fdf74806092ad06fdf7cef99067b6844d225a537081e33]# pwd
/sys/fs/cgroup/memory/docker/998683eae5b45f6b66fdf74806092ad06fdf7cef99067b6844d225a537081e33
[root@worker3 998683eae5b45f6b66fdf74806092ad06fdf7cef99067b6844d225a537081e33]# cat tasks
31722
31922
31933
31934
31935
31936
复制代码
4.2 Docker容器cpu限制
[root@worker3 local]# docker run -itd --rm progrium/stress --cpu 1 --vm-bytes 200M
e7baea0c5c535234404ce18452242d108bc5dd41a7d5fc3d1b12e3052bf8c027
[root@worker3 e7baea0c5c535234404ce18452242d108bc5dd41a7d5fc3d1b12e3052bf8c027]# cat cpu.cfs_period_us
100000
[root@worker3 e7baea0c5c535234404ce18452242d108bc5dd41a7d5fc3d1b12e3052bf8c027]# cat cpu.cfs_quota_us
-1
[root@worker3 e7baea0c5c535234404ce18452242d108bc5dd41a7d5fc3d1b12e3052bf8c027]# pwd
/sys/fs/cgroup/cpu/docker/e7baea0c5c535234404ce18452242d108bc5dd41a7d5fc3d1b12e3052bf8c027
可见没有对stress容器做cpu限制
top - 23:57:59 up 2:50, 4 users, load average: 2.04, 3.08, 3.08
Tasks: 104 total, 3 running, 101 sleeping, 0 stopped, 0 zombie
%Cpu(s): 99.7 us, 0.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 1872956 total, 1536080 free, 145972 used, 190904 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 1528180 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
32032 root 20 0 7304 100 0 R 99.7 0.0 0:38.25 stress
复制代码
- 对容器cpu做限制,--cpu-period=100000,--cpu-quota=60000
docker run -itd --cpu-period 100000 --cpu-quota 60000 --rm progrium/stress --cpu 1 --vm-bytes 200M
db962ae56fbc087591dd96685ca94056c2e61bbc987ef638d35a94a290f00d33
[root@worker3 db962ae56fbc087591dd96685ca94056c2e61bbc987ef638d35a94a290f00d33]# ls
cgroup.clone_children cgroup.procs cpuacct.usage cpu.cfs_period_us cpu.rt_period_us cpu.shares notify_on_release
cgroup.event_control cpuacct.stat cpuacct.usage_percpu cpu.cfs_quota_us cpu.rt_runtime_us cpu.stat tasks
[root@worker3 db962ae56fbc087591dd96685ca94056c2e61bbc987ef638d35a94a290f00d33]# cat cpu.cfs_period_us
100000
[root@worker3 db962ae56fbc087591dd96685ca94056c2e61bbc987ef638d35a94a290f00d33]# cat cpu.cfs_quota_us
60000
top命令
top - 00:03:42 up 2:55, 4 users, load average: 0.63, 1.43, 2.35
Tasks: 104 total, 2 running, 102 sleeping, 0 stopped, 0 zombie
%Cpu(s): 58.2 us, 0.0 sy, 0.0 ni, 41.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 1872956 total, 1533696 free, 148388 used, 190872 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 1525792 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
32164 root 20 0 7304 96 0 R 60.1 0.0 1:06.34 stress
复制代码
4.2 Docker容器内存限制
- 重点关注 --memory,--memory-swap,--memory-swappiness三个参数
[root@worker3 ~]# free -h
total used free shared buff/cache available
Mem: 1.8G 145M 1.5G 9.6M 186M 1.5G
Swap: 0B 0B 0B
[root@worker3 local]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
[root@worker3 local]# docker run -itd --name stress --memory 1G --memory-swap 3G --memory-swappiness 20 --entrypoint bash progrium/stress737f2b75ab731f649ee3e7194439d525051ec937a8532f64c524cf30c52e550f
[root@worker3 local]# free -h
total used free shared buff/cache available
Mem: 1.8G 145M 1.5G 9.6M 186M 1.5G
Swap: 0B 0B 0B
[root@worker3 local]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
737f2b75ab73 progrium/stress "bash" 24 seconds ago Up 24 seconds
[root@worker3 docker]# pwd
/sys/fs/cgroup/memory/docker
[root@worker3 docker]# cd 737f2b75ab731f649ee3e7194439d525051ec937a8532f64c524cf30c52e550f/
[root@worker3 737f2b75ab731f649ee3e7194439d525051ec937a8532f64c524cf30c52e550f]# ls
cgroup.clone_children memory.kmem.slabinfo memory.memsw.failcnt memory.soft_limit_in_bytes
cgroup.event_control memory.kmem.tcp.failcnt memory.memsw.limit_in_bytes memory.stat
cgroup.procs memory.kmem.tcp.limit_in_bytes memory.memsw.max_usage_in_bytes memory.swappiness
memory.failcnt memory.kmem.tcp.max_usage_in_bytes memory.memsw.usage_in_bytes memory.usage_in_bytes
memory.force_empty memory.kmem.tcp.usage_in_bytes memory.move_charge_at_immigrate memory.use_hierarchy
memory.kmem.failcnt memory.kmem.usage_in_bytes memory.numa_stat notify_on_release
memory.kmem.limit_in_bytes memory.limit_in_bytes memory.oom_control tasks
memory.kmem.max_usage_in_bytes memory.max_usage_in_bytes memory.pressure_level
[root@worker3 737f2b75ab731f649ee3e7194439d525051ec937a8532f64c524cf30c52e550f]# cat memory.limit_in_bytes
1073741824 对应--memory=1G
[root@worker3 737f2b75ab731f649ee3e7194439d525051ec937a8532f64c524cf30c52e550f]# cat memory.memsw.limit_in_bytes
3221225472 对应--memory-swap=3G
[root@worker3 737f2b75ab731f649ee3e7194439d525051ec937a8532f64c524cf30c52e550f]# cat memory.swappiness
20 对应--memory-swappiness 20
复制代码
4.3 容器oom kill分析
[root@worker3 local]# free -h
total used free shared buff/cache available
Mem: 1.8G 142M 1.5G 9.5M 191M 1.5G
Swap: 0B 0B 0B
[root@worker3 local]# docker run --rm -it progrium/stress --cpu 1 --vm 2 --vm-bytes 19.9999G
stress: info: [1] dispatching hogs: 1 cpu, 0 io, 2 vm, 0 hdd
stress: dbug: [1] using backoff sleep of 9000us
stress: dbug: [1] --> hogcpu worker 1 [5] forked
stress: dbug: [1] --> hogvm worker 2 [6] forked
stress: dbug: [1] using backoff sleep of 3000us
stress: dbug: [1] --> hogvm worker 1 [7] forked
stress: dbug: [7] allocating 20401094656 bytes ...
stress: FAIL: [7] (495) hogvm malloc failed: Cannot allocate memory
stress: FAIL: [1] (395) <-- worker 7 returned error 1
stress: WARN: [1] (397) now reaping child worker processes
stress: dbug: [1] <-- worker 5 reaped
stress: dbug: [1] <-- worker 6 reaped
stress: FAIL: [1] (452) failed run completed in 0s
复制代码
- 当容器使用完宿主机内存,检测容器内存不足,将会oom被kill
- 对容器设置memory限制且不使用swap,并设置--oom-kill-disable=true,经过一段时间,容器并没有发生oom,stats命令中的MEM %参数的值一直是100.00%
docker run --rm -it --memory 5G -memory-swap 5G --oom-kill-disable=true progrium/stress --cpu 1 --vm 2 --vm-bytes 1G
去掉 --oom-kill-disable=true 的设置,检测结果,很快容器就发生了oom killer事件,
可见容器只有在设置了memory限制之后,--oom-kill-disable才会起作用
复制代码
5 总结
本文针对Docker与Cgroup资源限制结合再次进行深度总结,花费将近5个小时。
专注于大数据及容器云核心技术解密,可提供全栈的大数据+云原生平台咨询方案,请持续关注本套博客。如有任何学术交流,可随时联系。更多内容请关注《数据云技术社区》公众号。
有疑问加站长微信联系(非本文作者)