验证镜像是否支持RDMA
更新时间:2025-08-24
概述
本文介绍了如何验证当前镜像是否支持 RDMA 能力,您可以根据下文中的步骤某个镜像是否符合 RDMA 的使用条件。
百舸分布式训练中,预置的Pytorch镜像已经默认支持RDMA能力,推荐基于百舸预置的Pytorch镜像构建您的自定义镜像。
自定义镜像安装RDMA软件包
目前主流的训练容器镜像是基于 Ubuntu 构建的,本文将介绍在如何在ubuntu的环境中验证
- 执行如下命令安装测试软件包。
Plain Text
1apt update && apt install -y infiniband-diags
- 使用
ibstatus
命令查看网卡速率。这里我们测试的是A800实例。可以看到本例中网卡(mlx5_1)速率(rate)为100Gb/s
,这是符合预期的。
Plain Text
1# ibstatus
2Infiniband device 'mlx5_0' port 1 status:
3 default gid: 0000:0000:0000:0000:0000:0000:0000:0000
4 base lid: 0x0
5 sm lid: 0x0
6 state: 4: ACTIVE
7 phys state: 5: LinkUp
8 rate: 100 Gb/sec (4X EDR)
9 link_layer: Ethernet
10
11Infiniband device 'mlx5_1' port 1 status:
12 default gid: 0000:0000:0000:0000:0000:0000:0000:0000
13 base lid: 0x0
14 sm lid: 0x0
15 state: 4: ACTIVE
16 phys state: 5: LinkUp
17 rate: 100 Gb/sec (4X EDR)
18 link_layer: Ethernet
19
20Infiniband device 'mlx5_2' port 1 status:
21 default gid: 0000:0000:0000:0000:0000:0000:0000:0000
22 base lid: 0x0
23 sm lid: 0x0
24 state: 4: ACTIVE
25 phys state: 5: LinkUp
26 rate: 100 Gb/sec (4X EDR)
27 link_layer: Ethernet
- 执行如下命令检查是否安装 RDMA 相关库。
Plain Text
1dpkg -l perftest ibverbs-providers libibumad3 libibverbs1 libnl-3-200 libnl-route-3-200 librdmacm1
输出示例
Plain Text
1# dpkg -l perftest ibverbs-providers libibumad3 libibverbs1 libnl-3-200 libnl-route-3-200 librdmacm1
2dpkg-query: no packages found matching perftest
3Desired=Unknown/Install/Remove/Purge/Hold
4| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
5|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
6||/ Name Version Architecture Description
7+++-=======================-============-============-===========================================================
8ii ibverbs-providers:amd64 39.0-1 amd64 User space provider drivers for libibverbs
9ii libibumad3:amd64 39.0-1 amd64 InfiniBand Userspace Management Datagram (uMAD) library
10ii libibverbs1:amd64 39.0-1 amd64 Library for direct userspace use of RDMA (InfiniBand/iWARP)
11ii libnl-3-200:amd64 3.5.0-0.1 amd64 library for dealing with netlink sockets
12ii libnl-route-3-200:amd64 3.5.0-0.1 amd64 library for dealing with netlink sockets - route interface
13ii librdmacm1:amd64 39.0-1 amd64 Library for managing RDMA connections
上述输出信息中包含了已安装(如ibverbs-providers:amd64
、libibumad3:amd64
等)和未安装(perftest
)的软件。
如有软件包未安装,请继续执行第4步的操作安装软件;如已经安装全部软件,则可以直接验证是否支持RDMA
- 执行命令安装上述软件包
Plain Text
1apt update && apt install -y perftest ibverbs-providers libibumad3 libibverbs1 libnl-3-200 libnl-route-3-200 librdmacm1
- 执行如下命令再次查看软件包安装情况
Go
1dpkg -l perftest ibverbs-providers libibumad3 libibverbs1 libnl-3-200 libnl-route-3-200 librdmacm1
输出示例
Plain Text
1# dpkg -l perftest ibverbs-providers libibumad3 libibverbs1 libnl-3-200 libnl-route-3-200 librdmacm1
2Desired=Unknown/Install/Remove/Purge/Hold
3| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
4|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
5||/ Name Version Architecture Description
6+++-=======================-============-============-===========================================================
7ii ibverbs-providers:amd64 39.0-1 amd64 User space provider drivers for libibverbs
8ii libibumad3:amd64 39.0-1 amd64 InfiniBand Userspace Management Datagram (uMAD) library
9ii libibverbs1:amd64 39.0-1 amd64 Library for direct userspace use of RDMA (InfiniBand/iWARP)
10ii libnl-3-200:amd64 3.5.0-0.1 amd64 library for dealing with netlink sockets
11ii libnl-route-3-200:amd64 3.5.0-0.1 amd64 library for dealing with netlink sockets - route interface
12ii librdmacm1:amd64 39.0-1 amd64 Library for managing RDMA connections
13ii perftest 4.4+0.37-1 amd64 Infiniband verbs performance tests
验证是否支持RDMA
在分布式任务中,以调试的形式进入到容器中。
- 在容器中执行以下脚本获取gid信息。
Plain Text
1#!/bin/bash
2#
3# Copyright (c) 2016 Mellanox Technologies. All rights reserved.
4#
5# This Software is licensed under one of the following licenses:
6#
7# 1) under the terms of the "Common Public License 1.0" a copy of which is
8# available from the Open Source Initiative, see
9# http://www.opensource.org/licenses/cpl.php.
10#
11# 2) under the terms of the "The BSD License" a copy of which is
12# available from the Open Source Initiative, see
13# http://www.opensource.org/licenses/bsd-license.php.
14#
15# 3) under the terms of the "GNU General Public License (GPL) Version 2" a
16# copy of which is available from the Open Source Initiative, see
17# http://www.opensource.org/licenses/gpl-license.php.
18#
19# Licensee has the right to choose one of the above licenses.
20#
21# Redistributions of source code must retain the above copyright
22# notice and one of the license notices.
23#
24# Redistributions in binary form must reproduce both the above copyright
25# notice, one of the license notices in the documentation
26# and/or other materials provided with the distribution.
27#
28# Author: Moni Shoua <monis@mellanox.com>
29#
30
31black='\E[30;50m'
32red='\E[31;50m'
33green='\E[32;50m'
34yellow='\E[33;50m'
35blue='\E[34;50m'
36magenta='\E[35;50m'
37cyan='\E[36;50m'
38white='\E[37;50m'
39
40bold='\033[1m'
41
42gid_count=0
43
44# cecho (color echo) prints text in color.
45# first parameter should be the desired color followed by text
46function cecho ()
47{
48 echo -en $1
49 shift
50 echo -n $*
51 tput sgr0
52}
53
54# becho (color echo) prints text in bold.
55becho ()
56{
57 echo -en $bold
58 echo -n $*
59 tput sgr0
60}
61
62function print_gids()
63{
64 dev=$1
65 port=$2
66 for gf in /sys/class/infiniband/$dev/ports/$port/gids/* ; do
67 gid=$(cat $gf);
68 if [ $gid = 0000:0000:0000:0000:0000:0000:0000:0000 ] ; then
69 continue
70 fi
71 echo -e $(basename $gf) "\t" $gid
72 done
73}
74
75echo -e "DEV\tPORT\tINDEX\tGID\t\t\t\t\tIPv4 \t\tVER\tDEV"
76echo -e "---\t----\t-----\t---\t\t\t\t\t------------ \t---\t---"
77DEVS=$1
78if [ -z "$DEVS" ] ; then
79 DEVS=$(ls /sys/class/infiniband/)
80fi
81for d in $DEVS ; do
82 for p in $(ls /sys/class/infiniband/$d/ports/) ; do
83 for g in $(ls /sys/class/infiniband/$d/ports/$p/gids/) ; do
84 gid=$(cat /sys/class/infiniband/$d/ports/$p/gids/$g);
85 if [ $gid = 0000:0000:0000:0000:0000:0000:0000:0000 ] ; then
86 continue
87 fi
88 if [ $gid = fe80:0000:0000:0000:0000:0000:0000:0000 ] ; then
89 continue
90 fi
91 _ndev=$(cat /sys/class/infiniband/$d/ports/$p/gid_attrs/ndevs/$g 2>/dev/null)
92 __type=$(cat /sys/class/infiniband/$d/ports/$p/gid_attrs/types/$g 2>/dev/null)
93 _type=$(echo $__type| grep -o "[Vv].*")
94 if [ $(echo $gid | cut -d ":" -f -1) = "0000" ] ; then
95 ipv4=$(printf "%d.%d.%d.%d" 0x${gid:30:2} 0x${gid:32:2} 0x${gid:35:2} 0x${gid:37:2})
96 echo -e "$d\t$p\t$g\t$gid\t$ipv4 \t$_type\t$_ndev"
97 else
98 echo -e "$d\t$p\t$g\t$gid\t\t\t$_type\t$_ndev"
99 fi
100 gid_count=$(expr 1 + $gid_count)
101 done #g (gid)
102 done #p (port)
103done #d (dev)
104
105echo n_gids_found=$gid_count
输出示例
Plain Text
1# sh cmd.sh
2DEV PORT INDEX GID IPv4 VER DEV
3--- ---- ----- --- ------------ --- ---
4mlx5_0 1 44 0000:0000:0000:0000:0000:ffff:ac10:0942 172.16.9.66 v1 eth0
5mlx5_0 1 45 0000:0000:0000:0000:0000:ffff:ac10:0942 172.16.9.66 v2 eth0
6mlx5_1 1 4 0000:0000:0000:0000:0000:ffff:1912:0106 25.18.1.6 v1 roce1
7mlx5_1 1 5 0000:0000:0000:0000:0000:ffff:1912:0106 25.18.1.6 v2 roce1
8mlx5_2 1 4 0000:0000:0000:0000:0000:ffff:1912:0116 25.18.1.22 v1 roce2
9mlx5_2 1 5 0000:0000:0000:0000:0000:ffff:1912:0116 25.18.1.22 v2 roce2
10mlx5_3 1 4 0000:0000:0000:0000:0000:ffff:1912:0126 25.18.1.38 v1 roce3
11mlx5_3 1 5 0000:0000:0000:0000:0000:ffff:1912:0126 25.18.1.38 v2 roce3
12mlx5_4 1 4 0000:0000:0000:0000:0000:ffff:1912:0136 25.18.1.54 v1 roce4
13mlx5_4 1 5 0000:0000:0000:0000:0000:ffff:1912:0136 25.18.1.54 v2 roce4
14mlx5_5 1 4 0000:0000:0000:0000:0000:ffff:1912:0166 25.18.1.102 v1 roce5
15mlx5_5 1 5 0000:0000:0000:0000:0000:ffff:1912:0166 25.18.1.102 v2 roce5
16mlx5_6 1 4 0000:0000:0000:0000:0000:ffff:1912:01c6 25.18.1.198 v1 roce6
17mlx5_6 1 5 0000:0000:0000:0000:0000:ffff:1912:01c6 25.18.1.198 v2 roce6
18mlx5_7 1 4 0000:0000:0000:0000:0000:ffff:1912:0216 25.18.2.22 v1 roce7
19mlx5_7 1 5 0000:0000:0000:0000:0000:ffff:1912:0216 25.18.2.22 v2 roce7
20mlx5_8 1 4 0000:0000:0000:0000:0000:ffff:1912:0246 25.18.2.70 v1 roce8
21mlx5_8 1 5 0000:0000:0000:0000:0000:ffff:1912:0246 25.18.2.70 v2 roce8
22n_gids_found=18
- 在容器中输入以下测试命令
Plain Text
1# ib_write_bw -d mlx5_1 -x <gid> -p 18516
信息从上述第一步的返回中获取,这里我们测试 mlx5_1网卡,选择v2版本,这里 gid 为 5
返回示例如下
Plain Text
1# ib_write_bw -d mlx5_1 -x 5 -p 18516
2************************************
3* Waiting for client to connect... *
4************************************
- 在同一容器内继续输入如下命令。
Go
1ib_write_bw -d mlx5_1 127.0.0.1 -x <gid> -p 18516 --report_gbits //<gid>:请求端gid需和服务端一致
示例如下:
Plain Text
1# ib_write_bw -d mlx5_1 127.0.0.1 -x 5 -p 18516 --report_gbits
2---------------------------------------------------------------------------------------
3 RDMA_Write BW Test
4 Dual-port : OFF Device : mlx5_1
5 Number of qps : 1 Transport type : IB
6 Connection type : RC Using SRQ : OFF
7 PCIe relax order: ON
8 ibv_wr* API : ON
9 TX depth : 128
10 CQ Moderation : 1
11 Mtu : 4096[B]
12 Link type : Ethernet
13 GID index : 5
14 Max inline data : 0[B]
15 rdma_cm QPs : OFF
16 Data ex. method : Ethernet
17---------------------------------------------------------------------------------------
18 local address: LID 0000 QPN 0xb1e5 PSN 0x893dde RKey 0x016006 VAddr 0x007efdfae11000
19 GID: 00:00:00:00:00:00:00:00:00:00:255:255:25:18:01:06
20 remote address: LID 0000 QPN 0xb1e4 PSN 0x344ea9 RKey 0x007905 VAddr 0x007f9f69f3d000
21 GID: 00:00:00:00:00:00:00:00:00:00:255:255:25:18:01:06
22---------------------------------------------------------------------------------------
23 #bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps]
24 65536 5000 100.77 100.77 0.192207
25---------------------------------------------------------------------------------------
带宽值(BW peak
、BW average
)在 100Gb/s
左右,符合预期。
这里不同GPU机型的RDMA带宽规格不同,您可以从机器套餐规格中获取。
如无输出或报错请重新安装相关软件包,检查是否有配置项的遗漏。