如何确认镜像是否支持RDMA
更新时间:2025-12-23
概述
本文介绍了如何验证当前镜像是否支持 RDMA 能力,您可以根据下文中的步骤,确定某个镜像是否符合 RDMA 的使用条件。
百舸分布式训练中,预置的Pytorch镜像已经默认支持RDMA能力,推荐基于百舸预置的Pytorch镜像构建您的自定义镜像。
验证镜像是否支持RDMA
在分布式训练任务中,以调试的形式进入到容器中(webshell)。按以下步骤进行验证,如果报错或者结果不符合预期,则镜像不支持RDMA,可更换使用官方镜像或者根据官网文档的指导,在镜像中安装RDMA相关软件包。
- 在容器中执行以下脚本获取gid信息。
Plain Text
1#!/bin/bash
2#
3# Copyright (c) 2016 Mellanox Technologies. All rights reserved.
4#
5# This Software is licensed under one of the following licenses:
6#
7# 1) under the terms of the "Common Public License 1.0" a copy of which is
8# available from the Open Source Initiative, see
9# http://www.opensource.org/licenses/cpl.php.
10#
11# 2) under the terms of the "The BSD License" a copy of which is
12# available from the Open Source Initiative, see
13# http://www.opensource.org/licenses/bsd-license.php.
14#
15# 3) under the terms of the "GNU General Public License (GPL) Version 2" a
16# copy of which is available from the Open Source Initiative, see
17# http://www.opensource.org/licenses/gpl-license.php.
18#
19# Licensee has the right to choose one of the above licenses.
20#
21# Redistributions of source code must retain the above copyright
22# notice and one of the license notices.
23#
24# Redistributions in binary form must reproduce both the above copyright
25# notice, one of the license notices in the documentation
26# and/or other materials provided with the distribution.
27#
28# Author: Moni Shoua <monis@mellanox.com>
29#
30
31black='\E[30;50m'
32red='\E[31;50m'
33green='\E[32;50m'
34yellow='\E[33;50m'
35blue='\E[34;50m'
36magenta='\E[35;50m'
37cyan='\E[36;50m'
38white='\E[37;50m'
39
40bold='\033[1m'
41
42gid_count=0
43
44# cecho (color echo) prints text in color.
45# first parameter should be the desired color followed by text
46function cecho ()
47{
48 echo -en $1
49 shift
50 echo -n $*
51 tput sgr0
52}
53
54# becho (color echo) prints text in bold.
55becho ()
56{
57 echo -en $bold
58 echo -n $*
59 tput sgr0
60}
61
62function print_gids()
63{
64 dev=$1
65 port=$2
66 for gf in /sys/class/infiniband/$dev/ports/$port/gids/* ; do
67 gid=$(cat $gf);
68 if [ $gid = 0000:0000:0000:0000:0000:0000:0000:0000 ] ; then
69 continue
70 fi
71 echo -e $(basename $gf) "\t" $gid
72 done
73}
74
75echo -e "DEV\tPORT\tINDEX\tGID\t\t\t\t\tIPv4 \t\tVER\tDEV"
76echo -e "---\t----\t-----\t---\t\t\t\t\t------------ \t---\t---"
77DEVS=$1
78if [ -z "$DEVS" ] ; then
79 DEVS=$(ls /sys/class/infiniband/)
80fi
81for d in $DEVS ; do
82 for p in $(ls /sys/class/infiniband/$d/ports/) ; do
83 for g in $(ls /sys/class/infiniband/$d/ports/$p/gids/) ; do
84 gid=$(cat /sys/class/infiniband/$d/ports/$p/gids/$g);
85 if [ $gid = 0000:0000:0000:0000:0000:0000:0000:0000 ] ; then
86 continue
87 fi
88 if [ $gid = fe80:0000:0000:0000:0000:0000:0000:0000 ] ; then
89 continue
90 fi
91 _ndev=$(cat /sys/class/infiniband/$d/ports/$p/gid_attrs/ndevs/$g 2>/dev/null)
92 __type=$(cat /sys/class/infiniband/$d/ports/$p/gid_attrs/types/$g 2>/dev/null)
93 _type=$(echo $__type| grep -o "[Vv].*")
94 if [ $(echo $gid | cut -d ":" -f -1) = "0000" ] ; then
95 ipv4=$(printf "%d.%d.%d.%d" 0x${gid:30:2} 0x${gid:32:2} 0x${gid:35:2} 0x${gid:37:2})
96 echo -e "$d\t$p\t$g\t$gid\t$ipv4 \t$_type\t$_ndev"
97 else
98 echo -e "$d\t$p\t$g\t$gid\t\t\t$_type\t$_ndev"
99 fi
100 gid_count=$(expr 1 + $gid_count)
101 done #g (gid)
102 done #p (port)
103done #d (dev)
104
105echo n_gids_found=$gid_count
输出示例
Plain Text
1# sh cmd.sh
2DEV PORT INDEX GID IPv4 VER DEV
3--- ---- ----- --- ------------ --- ---
4mlx5_0 1 44 0000:0000:0000:0000:0000:ffff:ac10:0942 172.16.9.66 v1 eth0
5mlx5_0 1 45 0000:0000:0000:0000:0000:ffff:ac10:0942 172.16.9.66 v2 eth0
6mlx5_1 1 4 0000:0000:0000:0000:0000:ffff:1912:0106 25.18.1.6 v1 roce1
7mlx5_1 1 5 0000:0000:0000:0000:0000:ffff:1912:0106 25.18.1.6 v2 roce1
8mlx5_2 1 4 0000:0000:0000:0000:0000:ffff:1912:0116 25.18.1.22 v1 roce2
9mlx5_2 1 5 0000:0000:0000:0000:0000:ffff:1912:0116 25.18.1.22 v2 roce2
10mlx5_3 1 4 0000:0000:0000:0000:0000:ffff:1912:0126 25.18.1.38 v1 roce3
11mlx5_3 1 5 0000:0000:0000:0000:0000:ffff:1912:0126 25.18.1.38 v2 roce3
12mlx5_4 1 4 0000:0000:0000:0000:0000:ffff:1912:0136 25.18.1.54 v1 roce4
13mlx5_4 1 5 0000:0000:0000:0000:0000:ffff:1912:0136 25.18.1.54 v2 roce4
14mlx5_5 1 4 0000:0000:0000:0000:0000:ffff:1912:0166 25.18.1.102 v1 roce5
15mlx5_5 1 5 0000:0000:0000:0000:0000:ffff:1912:0166 25.18.1.102 v2 roce5
16mlx5_6 1 4 0000:0000:0000:0000:0000:ffff:1912:01c6 25.18.1.198 v1 roce6
17mlx5_6 1 5 0000:0000:0000:0000:0000:ffff:1912:01c6 25.18.1.198 v2 roce6
18mlx5_7 1 4 0000:0000:0000:0000:0000:ffff:1912:0216 25.18.2.22 v1 roce7
19mlx5_7 1 5 0000:0000:0000:0000:0000:ffff:1912:0216 25.18.2.22 v2 roce7
20mlx5_8 1 4 0000:0000:0000:0000:0000:ffff:1912:0246 25.18.2.70 v1 roce8
21mlx5_8 1 5 0000:0000:0000:0000:0000:ffff:1912:0246 25.18.2.70 v2 roce8
22n_gids_found=18
- 在容器中输入以下测试命令
Plain Text
1# ib_write_bw -d mlx5_1 -x <gid> -p 18516
- 信息从上述第一步的返回中获取,这里我们测试 mlx5_1网卡,选择v2版本,这里 gid 为 5
返回示例如下
Plain Text
1# ib_write_bw -d mlx5_1 -x 5 -p 18516
2************************************
3* Waiting for client to connect... *
4************************************
- 在同一容器内继续输入如下命令。
Plain Text
1Go复制
2ib_write_bw -d mlx5_1 127.0.0.1 -x <gid> -p 18516 --report_gbits //<gid>:请求端gid需和服务端一致
示例如下:
Plain Text
1# ib_write_bw -d mlx5_1 127.0.0.1 -x 5 -p 18516 --report_gbits
2---------------------------------------------------------------------------------------
3 RDMA_Write BW Test
4 Dual-port : OFF Device : mlx5_1
5 Number of qps : 1 Transport type : IB
6 Connection type : RC Using SRQ : OFF
7 PCIe relax order: ON
8 ibv_wr* API : ON
9 TX depth : 128
10 CQ Moderation : 1
11 Mtu : 4096[B]
12 Link type : Ethernet
13 GID index : 5
14 Max inline data : 0[B]
15 rdma_cm QPs : OFF
16 Data ex. method : Ethernet
17---------------------------------------------------------------------------------------
18 local address: LID 0000 QPN 0xb1e5 PSN 0x893dde RKey 0x016006 VAddr 0x007efdfae11000
19 GID: 00:00:00:00:00:00:00:00:00:00:255:255:25:18:01:06
20 remote address: LID 0000 QPN 0xb1e4 PSN 0x344ea9 RKey 0x007905 VAddr 0x007f9f69f3d000
21 GID: 00:00:00:00:00:00:00:00:00:00:255:255:25:18:01:06
22---------------------------------------------------------------------------------------
23 #bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps]
24 65536 5000 100.77 100.77 0.192207
25---------------------------------------------------------------------------------------
带宽值(BW peak、BW average)在100Gb/s左右,符合预期。
这里不同GPU机型的RDMA带宽规格不同,您可以从机器套餐规格中获取。
如无输出或报错说明镜像不支持RDMA,请重新安装相关软件包,检查是否有配置项的遗漏。
