高效排查磁盘IO过高问题——从iotop到Docker的完整解决方案

图片[1]-高效排查磁盘IO过高问题——从iotop到Docker的完整解决方案

一、问题背景:磁盘IO异常告警

某服务器在2025年2月21日15:31:28触发磁盘IO过高告警,具体表现为磁盘读取速率超过40 MB/s。系统监控显示,实例mg-1c的磁盘读取字节数异常升高,初步判断可能存在性能瓶颈。

告警信息

触发时间: 2025.02.21 15:31:28
结束时间: 2025.02.21 15:40:28
事件信息:
description: Disk 读取字节数 速率超过 40 MB/s
summary: instance: mg-1c disk 读取字节数 速率过高
value: 5.152714752e+07

二、排查步骤:从系统工具到容器分析

1. 使用iotop定位高IO进程

首先,通过iotop工具实时监控磁盘IO活动,安装并执行以下命令:

sudo dnf install iotop -y
sudo iotop -b -n 2 -o -d 5

排查结果

Total DISK READ :       0.00 B/s | Total DISK WRITE :       0.00 B/s
Actual DISK READ:       0.00 B/s | Actual DISK WRITE:       0.00 B/s
    TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN      IO    COMMAND
Total DISK READ :      37.39 M/s | Total DISK WRITE :      26.28 K/s
Actual DISK READ:      43.38 M/s | Actual DISK WRITE:     397.36 K/s
    TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN      IO    COMMAND
b'3923110 be/4 root        5.43 M/s    0.00 B/s  0.00 %  0.43 % bin/xray-linux-amd64 -c bin/config.json'
b'3435738 be/4 1001     1532.89 K/s    0.00 B/s  0.00 %  0.20 % nginx-prometheus-exporter -nginx.scrape-uri http://op:8080/stub_status'
b'3917012 be/4 root        7.36 M/s    0.00 B/s  0.00 %  0.18 % x-ui'
b'2747955 be/4 root        8.92 M/s    0.00 B/s  0.18 %  0.10 % dockerd -H fd:// --containerd=/run/containerd/containerd.sock'
b'2944966 be/4 root        5.86 M/s    0.00 B/s  0.48 %  0.10 % containerd'
b'3917011 be/4 root        2.68 M/s    3.98 K/s  0.00 %  0.05 % x-ui'
b'3435893 be/4 nobody    633.06 K/s    0.00 B/s  0.00 %  0.04 % nginx: worker process'
b'      1 be/4 root       85.20 K/s    0.00 B/s  0.00 %  0.03 % systemd --system --deserialize 23'
b'3356573 be/4 earic     190.32 K/s    0.00 B/s  0.00 %  0.03 % systemd --user'
b'3849162 be/4 root     1341.77 K/s    0.00 B/s  0.04 %  0.03 % dockerd -H fd:// --containerd=/run/containerd/containerd.sock'
b'3435851 be/4 root      848.86 K/s    0.00 B/s  0.01 %  0.02 % containerd-shim-runc-v2 -namespace moby -id 3b66333a179ac3f90de35416eae66608abb38e514f7678c38cef9ad82933b4fe -address /run/containerd/containerd.sock'
b'3464093 be/4 root      222.17 K/s    0.00 B/s  0.00 %  0.02 % platform-python -s /sbin/iotop -b -n 2 -o -d 5'
b'     48 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.01 % [kswapd0]'
b'3433415 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.01 % [kworker/0:1-events]'
b'3847960 be/4 root     1004.14 K/s    0.00 B/s  0.22 %  0.01 % dockerd -H fd:// --containerd=/run/containerd/containerd.sock'
b'    439 be/3 root        0.00 B/s   19.11 K/s  0.00 %  0.01 % [jbd2/vda1-8]'
b'3791945 be/4 root        6.37 K/s  815.42 B/s  0.03 %  0.01 % dockerd -H fd:// --containerd=/run/containerd/containerd.sock'
b'3435521 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.01 % [kworker/u160:1-flush-253:0]'
b'3819152 be/4 root       25.48 K/s    0.00 B/s  0.00 %  0.00 % dockerd -H fd:// --containerd=/run/containerd/containerd.sock'
b'3435850 be/4 root        3.98 K/s  815.42 B/s  0.00 %  0.00 % containerd-shim-runc-v2 -namespace moby -id 3b66333a179ac3f90de35416eae66608abb38e514f7678c38cef9ad82933b4fe -address /run/containerd/containerd.sock'
b'  46848 be/4 root      652.17 K/s    0.00 B/s  0.24 %  0.00 % containerd-shim-runc-v2 -namespace moby -id 622986b1e28ba7eb1a56d2ebe0f3bf4bcabaf802eea14622de9bd9a499121eaa -address /run/containerd/containerd.sock'
b'3848008 be/4 root     1630.83 B/s    0.00 B/s  0.00 %  0.00 % dockerd -H fd:// --containerd=/run/containerd/containerd.sock'
b'3435650 be/4 root       35.83 K/s    0.00 B/s  0.01 %  0.00 % containerd-shim-runc-v2 -namespace moby -id 9b63ba347601219fb7100fe32fa9bda479f2035a1d57275034d97b8cedf84a62 -address /run/containerd/containerd.sock'
b'3435652 be/4 root        7.17 K/s  815.42 B/s  0.09 %  0.00 % containerd-shim-runc-v2 -namespace moby -id 9b63ba347601219fb7100fe32fa9bda479f2035a1d57275034d97b8cedf84a62 -address /run/containerd/containerd.sock'
b'2747984 be/4 root       11.94 K/s    0.00 B/s  0.07 %  0.00 % dockerd -H fd:// --containerd=/run/containerd/containerd.sock'
b'3435645 be/4 root       15.13 K/s  815.42 B/s  0.07 %  0.00 % containerd-shim-runc-v2 -namespace moby -id 20dc9ceb3a19eced06a6f9878ca58814920f1c1b0aae617a01e1ac389ab82f80 -address /run/containerd/containerd.sock'
b'3435648 be/4 root       70.87 K/s    0.00 B/s  0.00 %  0.00 % containerd-shim-runc-v2 -namespace moby -id 20dc9ceb3a19eced06a6f9878ca58814920f1c1b0aae617a01e1ac389ab82f80 -address /run/containerd/containerd.sock'
b'3435884 be/4 root      815.42 B/s    0.00 B/s  0.02 %  0.00 % containerd-shim-runc-v2 -namespace moby -id 20dc9ceb3a19eced06a6f9878ca58814920f1c1b0aae617a01e1ac389ab82f80 -address /run/containerd/containerd.sock'
b'3437790 be/4 root        3.98 K/s    0.00 B/s  0.02 %  0.00 % containerd-shim-runc-v2 -namespace moby -id 20dc9ceb3a19eced06a6f9878ca58814920f1c1b0aae617a01e1ac389ab82f80 -address /run/containerd/containerd.sock'
b'3435651 be/4 root      815.42 B/s    0.00 B/s  0.00 %  0.00 % containerd-shim-runc-v2 -namespace moby -id 9b63ba347601219fb7100fe32fa9bda479f2035a1d57275034d97b8cedf84a62 -address /run/containerd/containerd.sock'
b'3435722 be/4 root       11.15 K/s    0.00 B/s  0.05 %  0.00 % containerd-shim-runc-v2 -namespace moby -id 9b63ba347601219fb7100fe32fa9bda479f2035a1d57275034d97b8cedf84a62 -address /run/containerd/containerd.sock'
b'3436061 be/4 root       43.00 K/s    0.00 B/s  0.20 %  0.00 % v2ray run -c /etc/v2ray/config.json'
b'3435810 be/4 root      815.42 B/s    0.00 B/s  0.01 %  0.00 % containerd-shim-runc-v2 -namespace moby -id 

初步结论dockerd进程的磁盘读取速率最高,达到8.92 MB/s,是导致高IO的主要嫌疑对象。


2. 分析Docker服务日志

进一步排查docker.service日志,查看是否有异常信息:

sudo journalctl -u docker.service

日志片段

sudo journalctl -u docker.service
 13:38:20 mg-1c dockerd[2747778]: time="2025-02-20T13:38:20.318316369+08:00" level=error msg="stream copy e>
Feb 20 13:38:20 mg-1c dockerd[2747778]: time="2025-02-20T13:38:20.387341951+08:00" level=error msg="stream copy e>
Feb 20 13:38:20 mg-1c dockerd[2747778]: time="2025-02-20T13:38:20.387430426+08:00" level=error msg="stream copy e>
Feb 20 13:38:31 mg-1c dockerd[2747778]: time="2025-02-20T13:38:28.782679435+08:00" level=warning msg="[resolver] >
Feb 20 13:46:53 mg-1c dockerd[2747778]: time="2025-02-20T13:46:35.937347186+08:00" level=warning msg="Health chec>
Feb 20 13:46:53 mg-1c dockerd[2747778]: time="2025-02-20T13:46:37.403404522+08:00" level=warning msg="Health chec>
Feb 20 13:46:54 mg-1c dockerd[2747778]: time="2025-02-20T13:46:37.403414350+08:00" level=warning msg="Health chec>
Feb 20 13:46:57 mg-1c dockerd[2747778]: time="2025-02-20T13:46:57.142588124+08:00" level=error msg="stream copy e>
Feb 20 13:46:57 mg-1c dockerd[2747778]: time="2025-02-20T13:46:57.158526912+08:00" level=error msg="stream copy e>
Feb 20 13:46:57 mg-1c dockerd[2747778]: time="2025-02-20T13:46:57.241337471+08:00" level=error msg="stream copy e>
Feb 20 13:46:57 mg-1c dockerd[2747778]: time="2025-02-20T13:46:57.242566668+08:00" level=error msg="stream copy e>
Feb 20 14:00:19 mg-1c dockerd[2747778]: time="2025-02-20T14:00:19.072714004+08:00" level=error msg="stream copy e>
Feb 20 14:00:19 mg-1c dockerd[2747778]: time="2025-02-20T14:00:19.074344978+08:00" level=error msg="stream copy e>
Feb 20 14:00:27 mg-1c dockerd[2747778]: time="2025-02-20T14:00:27.858465605+08:00" level=info msg="ignoring event>
Feb 20 14:10:50 mg-1c dockerd[2747778]: time="2025-02-20T14:10:50.162083112+08:00" level=info msg="ignoring event>
Feb 20 14:10:50 mg-1c dockerd[2747778]: time="2025-02-20T14:10:50.369688649+08:00" level=warning msg="failed to c>
Feb 20 22:56:53 mg-1c dockerd[2747778]: 2025/02/20 22:56:53 http: superfluous response.WriteHeader call from go.o>
Feb 21 01:00:02 mg-1c dockerd[2747778]: time="2025-02-21T01:00:02.534369694+08:00" level=warning msg="failed to p>
Feb 21 13:50:05 mg-1c dockerd[2747778]: time="2025-02-21T13:50:04.838516674+08:00" level=warning msg="Health chec>
Feb 21 13:50:05 mg-1c dockerd[2747778]: time="2025-02-21T13:50:04.963386516+08:00" level=warning msg="Health chec>
Feb 21 13:50:05 mg-1c dockerd[2747778]: time="2025-02-21T13:50:04.513722115+08:00" level=warning msg="Health chec>
Feb 21 13:50:06 mg-1c dockerd[2747778]: time="2025-02-21T13:50:06.148703392+08:00" level=error msg="stream copy e>
Feb 21 13:50:06 mg-1c dockerd[2747778]: time="2025-02-21T13:50:06.479503883+08:00" level=error msg="stream copy e>
Feb 21 13:50:07 mg-1c dockerd[2747778]: time="2025-02-21T13:50:07.005172231+08:00" level=error msg="stream copy e>
Feb 21 13:50:07 mg-1c dockerd[2747778]: time="2025-02-21T13:50:07.391353173+08:00" level=error msg="stream copy e>
Feb 21 13:50:07 mg-1c dockerd[2747778]: time="2025-02-21T13:50:07.416658301+08:00" level=error msg="stream copy e>
Feb 21 13:50:07 mg-1c dockerd[2747778]: time="2025-02-21T13:50:07.416813781+08:00" level=error msg="stream copy e>
Feb 21 14:09:13 mg-1c dockerd[2747778]: time="2025-02-21T14:09:12.928474967+08:00" level=error msg="[resolver] fa>
Feb 21 14:12:46 mg-1c dockerd[2747778]: time="2025-02-21T14:12:45.732464857+08:00" level=warning msg="Health chec>
Feb 21 14:12:47 mg-1c dockerd[2747778]: time="2025-02-21T14:12:45.978509882+08:00" level=warning msg="Health chec>
Feb 21 14:12:48 mg-1c dockerd[2747778]: time="2025-02-21T14:12:46.149816150+08:00" level=warning msg="Health chec>
Feb 21 14:12:49 mg-1c dockerd[2747778]: time="2025-02-21T14:12:47.886928997+08:00" level=error msg="stream copy e>
Feb 21 14:12:49 mg-1c dockerd[2747778]: time="2025-02-21T14:12:49.384400178+08:00" level=error msg="stream copy e>
Feb 21 14:12:49 mg-1c dockerd[2747778]: time="2025-02-21T14:12:47.887093398+08:00" level=error msg="stream copy e>
Feb 21 14:12:49 mg-1c dockerd[2747778]: time="2025-02-21T14:12:49.384384422+08:00" level=error msg="stream copy e>
Feb 21 14:12:50 mg-1c dockerd[2747778]: time="2025-02-21T14:12:50.341426622+08:00" level=error msg="stream copy e>
Feb 21 14:12:50 mg-1c dockerd[2747778]: time="2025-02-21T14:12:50.341508793+08:00" level=error msg="stream copy e>
Feb 21 14:19:46 mg-1c dockerd[2747778]: time="2025-02-21T14:19:44.497933371+08:00" level=error msg="[resolver] fa>
Feb 21 15:07:13 mg-1c dockerd[2747778]: time="2025-02-21T15:06:52.787931444+08:00" level=warning msg="Health chec>
Feb 21 15:07:13 mg-1c dockerd[2747778]: time="2025-02-21T15:06:52.807545713+08:00" level=warning msg="Health chec>
Feb 21 15:07:13 mg-1c dockerd[2747778]: time="2025-02-21T15:06:52.552786141+08:00" level=warning msg="Health chec>
Feb 21 15:07:40 mg-1c dockerd[2747778]: time="2025-02-21T15:07:40.726333113+08:00" level=info msg="ignoring event>
Feb 21 15:10:12 mg-1c dockerd[2747778]: time="2025-02-21T15:10:03.400444454+08:00" level=warning msg="Health chec>
Feb 21 15:10:15 mg-1c dockerd[2747778]: time="2025-02-21T15:10:03.358478493+08:00" level=warning msg="Health chec>
Feb 21 15:10:18 mg-1c dockerd[2747778]: time="2025-02-21T15:10:11.368754157+08:00" level=error msg="stream copy e>
Feb 21 15:10:19 mg-1c dockerd[2747778]: time="2025-02-21T15:10:03.659393539+08:00" level=warning msg="Health chec>
Feb 21 15:10:20 mg-1c dockerd[2747778]: time="2025-02-21T15:10:11.368622930+08:00" level=error msg="stream copy e>
Feb 21 15:10:21 mg-1c dockerd[2747778]: time="2025-02-21T15:10:11.368758006+08:00" level=error msg="stream copy e>
Feb 21 15:10:21 mg-1c dockerd[2747778]: time="2025-02-21T15:10:11.684465150+08:00" level=error msg="stream copy e>
Feb 21 15:10:32 mg-1c dockerd[2747778]: time="2025-02-21T15:10:31.959187143+08:00" level=error msg="stream copy e>
Feb 21 15:10:32 mg-1c dockerd[2747778]: time="2025-02-21T15:10:32.205480578+08:00" level=error msg="stream copy e>
Feb 21 15:18:54 mg-1c dockerd[2747778]: time="2025-02-21T15:17:49.305525742+08:00" level=warning msg="Health chec>
Feb 21 15:18:54 mg-1c dockerd[2747778]: time="2025-02-21T15:17:47.612571505+08:00" level=warning msg="Health chec>
Feb 21 15:18:54 mg-1c dockerd[2747778]: time="2025-02-21T15:17:48.586391377+08:00" level=warning msg="Health chec>
Feb 21 15:21:31 mg-1c dockerd[2747778]: time="2025-02-21T15:21:31.635431516+08:00" level=info msg="ignoring event>
Feb 21 15:24:05 mg-1c dockerd[2747778]: time="2025-02-21T15:24:04.906699298+08:00" level=info msg="ignoring event>
Feb 21 15:40:19 mg-1c dockerd[2747778]: time="2025-02-21T15:40:19.578083505+08:00" level=info msg="ignoring event>
Feb 21 15:40:19 mg-1c dockerd[2747778]: time="2025-02-21T15:40:19.788735584+08:00" level=warning msg="failed to c>
Feb 21 15:41:39 mg-1c dockerd[2747778]: time="2025-02-21T15:41:39.691650341+08:00" level=info msg="ignoring event>
Feb 21 15:41:39 mg-1c dockerd[2747778]: time="2025-02-21T15:41:39.817460633+08:00" level=warning msg="ShouldResta>
Feb 21 15:41:41 mg-1c dockerd[2747778]: time="2025-02-21T15:41:41.701752737+08:00" level=info msg="ignoring event>
Feb 21 15:41:41 mg-1c dockerd[2747778]: time="2025-02-21T15:41:41.720489104+08:00" level=info msg="ignoring event>
Feb 21 15:41:41 mg-1c dockerd[2747778]: time="2025-02-21T15:41:41.758900634+08:00" level=warning msg="ShouldResta>
Feb 21 15:41:41 mg-1c dockerd[2747778]: time="2025-02-21T15:41:41.759159753+08:00" level=warning msg="ShouldResta>
Feb 21 15:42:21 mg-1c dockerd[2747778]: 2025/02/21 15:42:21 http: superfluous response.WriteHeader call from go.o>
Feb 21 15:42:36 mg-1c dockerd[2747778]: time="2025-02-21T15:42:36.492336623+08:00" level=error msg="[resolver] fa>
Feb 21 15:42:36 mg-1c dockerd[2747778]: time="2025-02-21T15:42:36.498797995+08:00" level=error msg="[resolver] fa>
lines 8-75/75 (END)

发现:日志中频繁出现stream copy error等错误信息,但未明确指向具体容器,需要进一步排查。


3. 使用docker stats监控容器IO

通过docker stats命令监控各容器的资源使用情况,定位高IO容器:

sudo docker stats

输出结果

CONTAINER ID   NAME             CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
3b66333a179a   op               0.79%     4.473MiB / 120MiB     3.73%     2.4MB / 2.52MB    587MB / 0B        8
9b63ba347601   v2ray            3.14%     23.75MiB / 100MiB     23.75%    2.02MB / 1.98MB   145MB / 0B        7
20dc9ceb3a19   nginx-exporter   2.25%     7.883MiB / 80MiB      9.85%     21.8kB / 41.8kB   847MB / 0B        12
622986b1e28b   x-ui             0.02%     28.46MiB / 450.3MiB   6.32%     861MB / 851MB     3.52TB / 53.2kB   21
6da09f29c722   node             0.00%     9.5MiB / 300MiB       3.17%     76.8MB / 1.26GB   359GB / 0B        7

关键发现x-ui容器的磁盘读取量高达3.52TB,是导致高IO的主要原因。


三、问题定位:x-ui容器导致高IO

通过以上排查,确定x-ui容器是导致磁盘IO过高的根本原因。其异常高的磁盘读取量(3.52TB)严重影响了系统性能。

可能原因

  • 容器内应用程序频繁读写磁盘
  • 容器配置不当,导致资源过度消耗
  • 容器与宿主机之间的IO调度问题

四、解决方案:关闭异常容器

为快速恢复系统性能,决定关闭x-ui容器:

docker-compose down

验证结果: 关闭容器后,磁盘IO恢复正常,系统性能稳定。


五、总结与建议

总结: 本次排查通过iotop定位高IO进程,结合docker.service日志和docker stats监控,最终确定并解决了x-ui容器导致的磁盘IO过高问题。

建议

  1. 定期监控容器资源使用:使用docker stats等工具定期检查容器性能,及时发现异常。
  2. 优化容器配置:合理配置容器资源限制,避免单个容器过度消耗系统资源。
  3. 日志分析与告警:完善日志收集与告警机制,确保问题能够被快速发现和处理。

© 版权声明
THE END
喜欢就支持一下吧
点赞11 分享