![图片[1]-高效排查磁盘IO过高问题——从iotop到Docker的完整解决方案](https://share.0f1.top/wwj/typora/2025/02/23/202502231132332.webp)
一、问题背景:磁盘IO异常告警
某服务器在2025年2月21日15:31:28触发磁盘IO过高告警,具体表现为磁盘读取速率超过40 MB/s。系统监控显示,实例mg-1c
的磁盘读取字节数异常升高,初步判断可能存在性能瓶颈。
告警信息:
触发时间: 2025.02.21 15:31:28
结束时间: 2025.02.21 15:40:28
事件信息:
description: Disk 读取字节数 速率超过 40 MB/s
summary: instance: mg-1c disk 读取字节数 速率过高
value: 5.152714752e+07
二、排查步骤:从系统工具到容器分析
1. 使用iotop定位高IO进程
首先,通过iotop
工具实时监控磁盘IO活动,安装并执行以下命令:
sudo dnf install iotop -y
sudo iotop -b -n 2 -o -d 5
排查结果:
Total DISK READ : 0.00 B/s | Total DISK WRITE : 0.00 B/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
Total DISK READ : 37.39 M/s | Total DISK WRITE : 26.28 K/s
Actual DISK READ: 43.38 M/s | Actual DISK WRITE: 397.36 K/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
b'3923110 be/4 root 5.43 M/s 0.00 B/s 0.00 % 0.43 % bin/xray-linux-amd64 -c bin/config.json'
b'3435738 be/4 1001 1532.89 K/s 0.00 B/s 0.00 % 0.20 % nginx-prometheus-exporter -nginx.scrape-uri http://op:8080/stub_status'
b'3917012 be/4 root 7.36 M/s 0.00 B/s 0.00 % 0.18 % x-ui'
b'2747955 be/4 root 8.92 M/s 0.00 B/s 0.18 % 0.10 % dockerd -H fd:// --containerd=/run/containerd/containerd.sock'
b'2944966 be/4 root 5.86 M/s 0.00 B/s 0.48 % 0.10 % containerd'
b'3917011 be/4 root 2.68 M/s 3.98 K/s 0.00 % 0.05 % x-ui'
b'3435893 be/4 nobody 633.06 K/s 0.00 B/s 0.00 % 0.04 % nginx: worker process'
b' 1 be/4 root 85.20 K/s 0.00 B/s 0.00 % 0.03 % systemd --system --deserialize 23'
b'3356573 be/4 earic 190.32 K/s 0.00 B/s 0.00 % 0.03 % systemd --user'
b'3849162 be/4 root 1341.77 K/s 0.00 B/s 0.04 % 0.03 % dockerd -H fd:// --containerd=/run/containerd/containerd.sock'
b'3435851 be/4 root 848.86 K/s 0.00 B/s 0.01 % 0.02 % containerd-shim-runc-v2 -namespace moby -id 3b66333a179ac3f90de35416eae66608abb38e514f7678c38cef9ad82933b4fe -address /run/containerd/containerd.sock'
b'3464093 be/4 root 222.17 K/s 0.00 B/s 0.00 % 0.02 % platform-python -s /sbin/iotop -b -n 2 -o -d 5'
b' 48 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.01 % [kswapd0]'
b'3433415 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.01 % [kworker/0:1-events]'
b'3847960 be/4 root 1004.14 K/s 0.00 B/s 0.22 % 0.01 % dockerd -H fd:// --containerd=/run/containerd/containerd.sock'
b' 439 be/3 root 0.00 B/s 19.11 K/s 0.00 % 0.01 % [jbd2/vda1-8]'
b'3791945 be/4 root 6.37 K/s 815.42 B/s 0.03 % 0.01 % dockerd -H fd:// --containerd=/run/containerd/containerd.sock'
b'3435521 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.01 % [kworker/u160:1-flush-253:0]'
b'3819152 be/4 root 25.48 K/s 0.00 B/s 0.00 % 0.00 % dockerd -H fd:// --containerd=/run/containerd/containerd.sock'
b'3435850 be/4 root 3.98 K/s 815.42 B/s 0.00 % 0.00 % containerd-shim-runc-v2 -namespace moby -id 3b66333a179ac3f90de35416eae66608abb38e514f7678c38cef9ad82933b4fe -address /run/containerd/containerd.sock'
b' 46848 be/4 root 652.17 K/s 0.00 B/s 0.24 % 0.00 % containerd-shim-runc-v2 -namespace moby -id 622986b1e28ba7eb1a56d2ebe0f3bf4bcabaf802eea14622de9bd9a499121eaa -address /run/containerd/containerd.sock'
b'3848008 be/4 root 1630.83 B/s 0.00 B/s 0.00 % 0.00 % dockerd -H fd:// --containerd=/run/containerd/containerd.sock'
b'3435650 be/4 root 35.83 K/s 0.00 B/s 0.01 % 0.00 % containerd-shim-runc-v2 -namespace moby -id 9b63ba347601219fb7100fe32fa9bda479f2035a1d57275034d97b8cedf84a62 -address /run/containerd/containerd.sock'
b'3435652 be/4 root 7.17 K/s 815.42 B/s 0.09 % 0.00 % containerd-shim-runc-v2 -namespace moby -id 9b63ba347601219fb7100fe32fa9bda479f2035a1d57275034d97b8cedf84a62 -address /run/containerd/containerd.sock'
b'2747984 be/4 root 11.94 K/s 0.00 B/s 0.07 % 0.00 % dockerd -H fd:// --containerd=/run/containerd/containerd.sock'
b'3435645 be/4 root 15.13 K/s 815.42 B/s 0.07 % 0.00 % containerd-shim-runc-v2 -namespace moby -id 20dc9ceb3a19eced06a6f9878ca58814920f1c1b0aae617a01e1ac389ab82f80 -address /run/containerd/containerd.sock'
b'3435648 be/4 root 70.87 K/s 0.00 B/s 0.00 % 0.00 % containerd-shim-runc-v2 -namespace moby -id 20dc9ceb3a19eced06a6f9878ca58814920f1c1b0aae617a01e1ac389ab82f80 -address /run/containerd/containerd.sock'
b'3435884 be/4 root 815.42 B/s 0.00 B/s 0.02 % 0.00 % containerd-shim-runc-v2 -namespace moby -id 20dc9ceb3a19eced06a6f9878ca58814920f1c1b0aae617a01e1ac389ab82f80 -address /run/containerd/containerd.sock'
b'3437790 be/4 root 3.98 K/s 0.00 B/s 0.02 % 0.00 % containerd-shim-runc-v2 -namespace moby -id 20dc9ceb3a19eced06a6f9878ca58814920f1c1b0aae617a01e1ac389ab82f80 -address /run/containerd/containerd.sock'
b'3435651 be/4 root 815.42 B/s 0.00 B/s 0.00 % 0.00 % containerd-shim-runc-v2 -namespace moby -id 9b63ba347601219fb7100fe32fa9bda479f2035a1d57275034d97b8cedf84a62 -address /run/containerd/containerd.sock'
b'3435722 be/4 root 11.15 K/s 0.00 B/s 0.05 % 0.00 % containerd-shim-runc-v2 -namespace moby -id 9b63ba347601219fb7100fe32fa9bda479f2035a1d57275034d97b8cedf84a62 -address /run/containerd/containerd.sock'
b'3436061 be/4 root 43.00 K/s 0.00 B/s 0.20 % 0.00 % v2ray run -c /etc/v2ray/config.json'
b'3435810 be/4 root 815.42 B/s 0.00 B/s 0.01 % 0.00 % containerd-shim-runc-v2 -namespace moby -id
初步结论:dockerd
进程的磁盘读取速率最高,达到8.92 MB/s,是导致高IO的主要嫌疑对象。
2. 分析Docker服务日志
进一步排查docker.service
日志,查看是否有异常信息:
sudo journalctl -u docker.service
日志片段:
sudo journalctl -u docker.service
13:38:20 mg-1c dockerd[2747778]: time="2025-02-20T13:38:20.318316369+08:00" level=error msg="stream copy e>
Feb 20 13:38:20 mg-1c dockerd[2747778]: time="2025-02-20T13:38:20.387341951+08:00" level=error msg="stream copy e>
Feb 20 13:38:20 mg-1c dockerd[2747778]: time="2025-02-20T13:38:20.387430426+08:00" level=error msg="stream copy e>
Feb 20 13:38:31 mg-1c dockerd[2747778]: time="2025-02-20T13:38:28.782679435+08:00" level=warning msg="[resolver] >
Feb 20 13:46:53 mg-1c dockerd[2747778]: time="2025-02-20T13:46:35.937347186+08:00" level=warning msg="Health chec>
Feb 20 13:46:53 mg-1c dockerd[2747778]: time="2025-02-20T13:46:37.403404522+08:00" level=warning msg="Health chec>
Feb 20 13:46:54 mg-1c dockerd[2747778]: time="2025-02-20T13:46:37.403414350+08:00" level=warning msg="Health chec>
Feb 20 13:46:57 mg-1c dockerd[2747778]: time="2025-02-20T13:46:57.142588124+08:00" level=error msg="stream copy e>
Feb 20 13:46:57 mg-1c dockerd[2747778]: time="2025-02-20T13:46:57.158526912+08:00" level=error msg="stream copy e>
Feb 20 13:46:57 mg-1c dockerd[2747778]: time="2025-02-20T13:46:57.241337471+08:00" level=error msg="stream copy e>
Feb 20 13:46:57 mg-1c dockerd[2747778]: time="2025-02-20T13:46:57.242566668+08:00" level=error msg="stream copy e>
Feb 20 14:00:19 mg-1c dockerd[2747778]: time="2025-02-20T14:00:19.072714004+08:00" level=error msg="stream copy e>
Feb 20 14:00:19 mg-1c dockerd[2747778]: time="2025-02-20T14:00:19.074344978+08:00" level=error msg="stream copy e>
Feb 20 14:00:27 mg-1c dockerd[2747778]: time="2025-02-20T14:00:27.858465605+08:00" level=info msg="ignoring event>
Feb 20 14:10:50 mg-1c dockerd[2747778]: time="2025-02-20T14:10:50.162083112+08:00" level=info msg="ignoring event>
Feb 20 14:10:50 mg-1c dockerd[2747778]: time="2025-02-20T14:10:50.369688649+08:00" level=warning msg="failed to c>
Feb 20 22:56:53 mg-1c dockerd[2747778]: 2025/02/20 22:56:53 http: superfluous response.WriteHeader call from go.o>
Feb 21 01:00:02 mg-1c dockerd[2747778]: time="2025-02-21T01:00:02.534369694+08:00" level=warning msg="failed to p>
Feb 21 13:50:05 mg-1c dockerd[2747778]: time="2025-02-21T13:50:04.838516674+08:00" level=warning msg="Health chec>
Feb 21 13:50:05 mg-1c dockerd[2747778]: time="2025-02-21T13:50:04.963386516+08:00" level=warning msg="Health chec>
Feb 21 13:50:05 mg-1c dockerd[2747778]: time="2025-02-21T13:50:04.513722115+08:00" level=warning msg="Health chec>
Feb 21 13:50:06 mg-1c dockerd[2747778]: time="2025-02-21T13:50:06.148703392+08:00" level=error msg="stream copy e>
Feb 21 13:50:06 mg-1c dockerd[2747778]: time="2025-02-21T13:50:06.479503883+08:00" level=error msg="stream copy e>
Feb 21 13:50:07 mg-1c dockerd[2747778]: time="2025-02-21T13:50:07.005172231+08:00" level=error msg="stream copy e>
Feb 21 13:50:07 mg-1c dockerd[2747778]: time="2025-02-21T13:50:07.391353173+08:00" level=error msg="stream copy e>
Feb 21 13:50:07 mg-1c dockerd[2747778]: time="2025-02-21T13:50:07.416658301+08:00" level=error msg="stream copy e>
Feb 21 13:50:07 mg-1c dockerd[2747778]: time="2025-02-21T13:50:07.416813781+08:00" level=error msg="stream copy e>
Feb 21 14:09:13 mg-1c dockerd[2747778]: time="2025-02-21T14:09:12.928474967+08:00" level=error msg="[resolver] fa>
Feb 21 14:12:46 mg-1c dockerd[2747778]: time="2025-02-21T14:12:45.732464857+08:00" level=warning msg="Health chec>
Feb 21 14:12:47 mg-1c dockerd[2747778]: time="2025-02-21T14:12:45.978509882+08:00" level=warning msg="Health chec>
Feb 21 14:12:48 mg-1c dockerd[2747778]: time="2025-02-21T14:12:46.149816150+08:00" level=warning msg="Health chec>
Feb 21 14:12:49 mg-1c dockerd[2747778]: time="2025-02-21T14:12:47.886928997+08:00" level=error msg="stream copy e>
Feb 21 14:12:49 mg-1c dockerd[2747778]: time="2025-02-21T14:12:49.384400178+08:00" level=error msg="stream copy e>
Feb 21 14:12:49 mg-1c dockerd[2747778]: time="2025-02-21T14:12:47.887093398+08:00" level=error msg="stream copy e>
Feb 21 14:12:49 mg-1c dockerd[2747778]: time="2025-02-21T14:12:49.384384422+08:00" level=error msg="stream copy e>
Feb 21 14:12:50 mg-1c dockerd[2747778]: time="2025-02-21T14:12:50.341426622+08:00" level=error msg="stream copy e>
Feb 21 14:12:50 mg-1c dockerd[2747778]: time="2025-02-21T14:12:50.341508793+08:00" level=error msg="stream copy e>
Feb 21 14:19:46 mg-1c dockerd[2747778]: time="2025-02-21T14:19:44.497933371+08:00" level=error msg="[resolver] fa>
Feb 21 15:07:13 mg-1c dockerd[2747778]: time="2025-02-21T15:06:52.787931444+08:00" level=warning msg="Health chec>
Feb 21 15:07:13 mg-1c dockerd[2747778]: time="2025-02-21T15:06:52.807545713+08:00" level=warning msg="Health chec>
Feb 21 15:07:13 mg-1c dockerd[2747778]: time="2025-02-21T15:06:52.552786141+08:00" level=warning msg="Health chec>
Feb 21 15:07:40 mg-1c dockerd[2747778]: time="2025-02-21T15:07:40.726333113+08:00" level=info msg="ignoring event>
Feb 21 15:10:12 mg-1c dockerd[2747778]: time="2025-02-21T15:10:03.400444454+08:00" level=warning msg="Health chec>
Feb 21 15:10:15 mg-1c dockerd[2747778]: time="2025-02-21T15:10:03.358478493+08:00" level=warning msg="Health chec>
Feb 21 15:10:18 mg-1c dockerd[2747778]: time="2025-02-21T15:10:11.368754157+08:00" level=error msg="stream copy e>
Feb 21 15:10:19 mg-1c dockerd[2747778]: time="2025-02-21T15:10:03.659393539+08:00" level=warning msg="Health chec>
Feb 21 15:10:20 mg-1c dockerd[2747778]: time="2025-02-21T15:10:11.368622930+08:00" level=error msg="stream copy e>
Feb 21 15:10:21 mg-1c dockerd[2747778]: time="2025-02-21T15:10:11.368758006+08:00" level=error msg="stream copy e>
Feb 21 15:10:21 mg-1c dockerd[2747778]: time="2025-02-21T15:10:11.684465150+08:00" level=error msg="stream copy e>
Feb 21 15:10:32 mg-1c dockerd[2747778]: time="2025-02-21T15:10:31.959187143+08:00" level=error msg="stream copy e>
Feb 21 15:10:32 mg-1c dockerd[2747778]: time="2025-02-21T15:10:32.205480578+08:00" level=error msg="stream copy e>
Feb 21 15:18:54 mg-1c dockerd[2747778]: time="2025-02-21T15:17:49.305525742+08:00" level=warning msg="Health chec>
Feb 21 15:18:54 mg-1c dockerd[2747778]: time="2025-02-21T15:17:47.612571505+08:00" level=warning msg="Health chec>
Feb 21 15:18:54 mg-1c dockerd[2747778]: time="2025-02-21T15:17:48.586391377+08:00" level=warning msg="Health chec>
Feb 21 15:21:31 mg-1c dockerd[2747778]: time="2025-02-21T15:21:31.635431516+08:00" level=info msg="ignoring event>
Feb 21 15:24:05 mg-1c dockerd[2747778]: time="2025-02-21T15:24:04.906699298+08:00" level=info msg="ignoring event>
Feb 21 15:40:19 mg-1c dockerd[2747778]: time="2025-02-21T15:40:19.578083505+08:00" level=info msg="ignoring event>
Feb 21 15:40:19 mg-1c dockerd[2747778]: time="2025-02-21T15:40:19.788735584+08:00" level=warning msg="failed to c>
Feb 21 15:41:39 mg-1c dockerd[2747778]: time="2025-02-21T15:41:39.691650341+08:00" level=info msg="ignoring event>
Feb 21 15:41:39 mg-1c dockerd[2747778]: time="2025-02-21T15:41:39.817460633+08:00" level=warning msg="ShouldResta>
Feb 21 15:41:41 mg-1c dockerd[2747778]: time="2025-02-21T15:41:41.701752737+08:00" level=info msg="ignoring event>
Feb 21 15:41:41 mg-1c dockerd[2747778]: time="2025-02-21T15:41:41.720489104+08:00" level=info msg="ignoring event>
Feb 21 15:41:41 mg-1c dockerd[2747778]: time="2025-02-21T15:41:41.758900634+08:00" level=warning msg="ShouldResta>
Feb 21 15:41:41 mg-1c dockerd[2747778]: time="2025-02-21T15:41:41.759159753+08:00" level=warning msg="ShouldResta>
Feb 21 15:42:21 mg-1c dockerd[2747778]: 2025/02/21 15:42:21 http: superfluous response.WriteHeader call from go.o>
Feb 21 15:42:36 mg-1c dockerd[2747778]: time="2025-02-21T15:42:36.492336623+08:00" level=error msg="[resolver] fa>
Feb 21 15:42:36 mg-1c dockerd[2747778]: time="2025-02-21T15:42:36.498797995+08:00" level=error msg="[resolver] fa>
lines 8-75/75 (END)
发现:日志中频繁出现stream copy error
等错误信息,但未明确指向具体容器,需要进一步排查。
3. 使用docker stats监控容器IO
通过docker stats
命令监控各容器的资源使用情况,定位高IO容器:
sudo docker stats
输出结果:
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
3b66333a179a op 0.79% 4.473MiB / 120MiB 3.73% 2.4MB / 2.52MB 587MB / 0B 8
9b63ba347601 v2ray 3.14% 23.75MiB / 100MiB 23.75% 2.02MB / 1.98MB 145MB / 0B 7
20dc9ceb3a19 nginx-exporter 2.25% 7.883MiB / 80MiB 9.85% 21.8kB / 41.8kB 847MB / 0B 12
622986b1e28b x-ui 0.02% 28.46MiB / 450.3MiB 6.32% 861MB / 851MB 3.52TB / 53.2kB 21
6da09f29c722 node 0.00% 9.5MiB / 300MiB 3.17% 76.8MB / 1.26GB 359GB / 0B 7
关键发现:x-ui
容器的磁盘读取量高达3.52TB,是导致高IO的主要原因。
三、问题定位:x-ui容器导致高IO
通过以上排查,确定x-ui
容器是导致磁盘IO过高的根本原因。其异常高的磁盘读取量(3.52TB)严重影响了系统性能。
可能原因:
- 容器内应用程序频繁读写磁盘
- 容器配置不当,导致资源过度消耗
- 容器与宿主机之间的IO调度问题
四、解决方案:关闭异常容器
为快速恢复系统性能,决定关闭x-ui
容器:
docker-compose down
验证结果: 关闭容器后,磁盘IO恢复正常,系统性能稳定。
五、总结与建议
总结: 本次排查通过iotop
定位高IO进程,结合docker.service
日志和docker stats
监控,最终确定并解决了x-ui
容器导致的磁盘IO过高问题。
建议:
- 定期监控容器资源使用:使用
docker stats
等工具定期检查容器性能,及时发现异常。 - 优化容器配置:合理配置容器资源限制,避免单个容器过度消耗系统资源。
- 日志分析与告警:完善日志收集与告警机制,确保问题能够被快速发现和处理。
© 版权声明
THE END