一个4节点机群性能不及预期,进行测试和调优
4节点网络架构,lscpu查看CPU信息
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 96
On-line CPU(s) list: 0-95
Thread(s) per core: 1
Core(s) per socket: 48
Socket(s): 2
NUMA node(s): 2
Vendor ID: AuthenticAMD
CPU family: 23
Model: 49
Model name: AMD EPYC 7K62 48-Core Processor
Stepping: 0
CPU MHz: 1500.000
CPU max MHz: 2600.0000
CPU min MHz: 1500.0000
BogoMIPS: 5200.00
Virtualization: AMD-V
L1d cache: 32K
L1i cache: 32K
L2 cache: 512K
L3 cache: 16384K
NUMA node0 CPU(s): 0-47
NUMA node1 CPU(s): 48-95
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 cpb cat_l3 cdp_l3 hw_pstate sme retpoline_amd ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip overflow_recov succor smca
Code language: PHP (php)
分别测试ip网络和ib网络
ip 测试ping和ssh,node03 ping值平均超过2ms,偏高
配置frp端口转发
ibutil安装无误,但ibping不通
$ ibping node01
ibwarn: [56768] mad_rpc_open_port: can't open UMAD port ((null):0)
ibping: iberror: failed: Failed to open '(null)' port '0'
Code language: PHP (php)
ifconfig发现ib端口也有报错
$ ifconfig
eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 222.199.132.59 netmask 255.255.255.0 broadcast 222.199.132.255
inet6 2001:da8:20c:a133::efd4 prefixlen 128 scopeid 0x0<global>
inet6 fe80::e092:20cf:1f15:65b0 prefixlen 64 scopeid 0x20<link>
ether 3c:ec:ef:71:97:88 txqueuelen 1000 (Ethernet)
RX packets 187011950 bytes 33023832426 (30.7 GiB)
RX errors 0 dropped 2 overruns 0 frame 0
TX packets 43565873 bytes 31451560266 (29.2 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eno2: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
ether 3c:ec:ef:71:97:89 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2044
inet 1.0.0.1 netmask 255.255.0.0 broadcast 1.0.255.255
inet6 fe80::ac0b:e92c:2446:5d13 prefixlen 64 scopeid 0x20<link>
Infiniband hardware address can be incorrect! Please read BUGS section in ifconfig(8).
infiniband A0:00:02:20:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 txqueuelen 256 (InfiniBand)
RX packets 403252956 bytes 746504437315 (695.2 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 49504976 bytes 900446939075 (838.6 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
ib1: flags=4099<UP,BROADCAST,MULTICAST> mtu 4092
Infiniband hardware address can be incorrect! Please read BUGS section in ifconfig(8).
infiniband A0:00:03:00:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 txqueuelen 256 (InfiniBand)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 6045989368 bytes 334618451075 (311.6 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 6045989368 bytes 334618451075 (311.6 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Code language: HTML, XML (xml)
ip a 命令看下
$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 3c:ec:ef:71:97:88 brd ff:ff:ff:ff:ff:ff
inet 222.199.132.59/24 brd 222.199.132.255 scope global noprefixroute dynamic eno1
valid_lft 6993sec preferred_lft 6993sec
inet6 2001:da8:20c:a133::efd4/128 scope global noprefixroute dynamic
valid_lft 6496sec preferred_lft 6196sec
inet6 fe80::e092:20cf:1f15:65b0/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
link/ether 3c:ec:ef:71:97:89 brd ff:ff:ff:ff:ff:ff
4: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256
link/infiniband a0:00:02:20:fe:80:00:00:00:00:00:00:00:02:c9:03:00:a0:93:41 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
inet 1.0.0.1/16 brd 1.0.255.255 scope global noprefixroute ib0
valid_lft forever preferred_lft forever
inet6 fe80::ac0b:e92c:2446:5d13/64 scope link noprefixroute
valid_lft forever preferred_lft forever
5: ib1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 4092 qdisc mq state DOWN group default qlen 256
link/infiniband a0:00:03:00:fe:80:00:00:00:00:00:00:00:02:c9:03:00:a0:93:42 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
Code language: PHP (php)
报错没有了,应该是ib物理地址过长,ifconfig不支持,要用ip a命令查看。之后用ethtool看下这两个ib端口ib0和ib1。
$ ./ethtool ib0
Settings for ib0:
Supported ports: [ ]
Supported link modes: Not reported
Supported pause frame use: No
Supports auto-negotiation: No
Supported FEC modes: Not reported
Advertised link modes: Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: Not reported
Speed: 56000Mb/s
Duplex: Full
Port: Other
PHYAD: 255
Transceiver: internal
Auto-negotiation: on
Cannot get wake-on-lan settings: Operation not permitted
Link detected: yes
$ ./ethtool ib1
Settings for ib1:
Supported ports: [ ]
Supported link modes: Not reported
Supported pause frame use: No
Supports auto-negotiation: No
Supported FEC modes: Not reported
Advertised link modes: Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: Not reported
Speed: Unknown!
Duplex: Unknown! (255)
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: off
MDI-X: Unknown
Cannot get wake-on-lan settings: Operation not permitted
Link detected: no
Code language: JavaScript (javascript)
ibstat也看下
$ ibstat
CA 'mlx4_0'
CA type: MT4099
Number of ports: 2
Firmware version: 2.42.5000
Hardware version: 1
Node GUID: 0x0002c90300a09340
System image GUID: 0x0002c90300a09343
Port 1:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 1
LMC: 0
SM lid: 1
Capability mask: 0x0251486a
Port GUID: 0x0002c90300a09341
Link layer: InfiniBand
Port 2:
State: Down
Physical state: Polling
Rate: 10
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x02514868
Port GUID: 0x0002c90300a09342
Link layer: InfiniBand
Code language: JavaScript (javascript)
ibnodes也看下
# ibnodes
Ca : 0xf4521403006cba20 ports 2 "node03 HCA-1"
Ca : 0x0002c90300a4a920 ports 2 "node02 HCA-1"
Ca : 0x0002c90300a09340 ports 2 "node01 HCA-1"
Ca : 0xe41d2d0300231ac0 ports 2 "node04 HCA-1"
Switch : 0x0002c90300721600 ports 36 "SwitchX - Mellanox Technologies" base port 0 lid 2 lmc 0
Code language: PHP (php)
iblinkinfo
# iblinkinfo
CA: node04 HCA-1:
0xe41d2d0300231ac1 6 1[ ] ==( 4X 14.0625 Gbps Active/ LinkUp)==> 2 4[ ] "SwitchX - Mellanox Technologies" ( )
CA: node02 HCA-1:
0x0002c90300a4a921 5 1[ ] ==( 4X 14.0625 Gbps Active/ LinkUp)==> 2 2[ ] "SwitchX - Mellanox Technologies" ( )
CA: node03 HCA-1:
0xf4521403006cba21 3 1[ ] ==( 4X 14.0625 Gbps Active/ LinkUp)==> 2 3[ ] "SwitchX - Mellanox Technologies" ( )
Switch: 0x0002c90300721600 SwitchX - Mellanox Technologies:
2 1[ ] ==( 4X 14.0625 Gbps Active/ LinkUp)==> 1 1[ ] "node01 HCA-1" ( )
2 2[ ] ==( 4X 14.0625 Gbps Active/ LinkUp)==> 5 1[ ] "node02 HCA-1" ( )
2 3[ ] ==( 4X 14.0625 Gbps Active/ LinkUp)==> 3 1[ ] "node03 HCA-1" ( )
2 4[ ] ==( 4X 14.0625 Gbps Active/ LinkUp)==> 6 1[ ] "node04 HCA-1" ( )
2 5[ ] ==( Down/ Polling)==> [ ] "" ( )
2 6[ ] ==( Down/ Polling)==> [ ] "" ( )
2 7[ ] ==( Down/ Polling)==> [ ] "" ( )
2 8[ ] ==( Down/ Polling)==> [ ] "" ( )
2 9[ ] ==( Down/ Polling)==> [ ] "" ( )
2 10[ ] ==( Down/ Polling)==> [ ] "" ( )
2 11[ ] ==( Down/ Polling)==> [ ] "" ( )
2 12[ ] ==( Down/ Polling)==> [ ] "" ( )
2 13[ ] ==( Down/ Polling)==> [ ] "" ( )
2 14[ ] ==( Down/ Polling)==> [ ] "" ( )
2 15[ ] ==( Down/ Polling)==> [ ] "" ( )
2 16[ ] ==( Down/ Polling)==> [ ] "" ( )
2 17[ ] ==( Down/ Polling)==> [ ] "" ( )
2 18[ ] ==( Down/ Polling)==> [ ] "" ( )
2 19[ ] ==( Down/ Polling)==> [ ] "" ( )
2 20[ ] ==( Down/ Polling)==> [ ] "" ( )
2 21[ ] ==( Down/ Polling)==> [ ] "" ( )
2 22[ ] ==( Down/ Polling)==> [ ] "" ( )
2 23[ ] ==( Down/ Polling)==> [ ] "" ( )
2 24[ ] ==( Down/ Polling)==> [ ] "" ( )
2 25[ ] ==( Down/ Polling)==> [ ] "" ( )
2 26[ ] ==( Down/ Polling)==> [ ] "" ( )
2 27[ ] ==( Down/ Polling)==> [ ] "" ( )
2 28[ ] ==( Down/ Polling)==> [ ] "" ( )
2 29[ ] ==( Down/ Polling)==> [ ] "" ( )
2 30[ ] ==( Down/ Polling)==> [ ] "" ( )
2 31[ ] ==( Down/ Polling)==> [ ] "" ( )
2 32[ ] ==( Down/ Polling)==> [ ] "" ( )
2 33[ ] ==( Down/ Polling)==> [ ] "" ( )
2 34[ ] ==( Down/ Polling)==> [ ] "" ( )
2 35[ ] ==( Down/ Polling)==> [ ] "" ( )
2 36[ ] ==( Down/ Polling)==> [ ] "" ( )
CA: node01 HCA-1:
0x0002c90300a09341 1 1[ ] ==( 4X 14.0625 Gbps Active/ LinkUp)==> 2 1[ ] "SwitchX - Mellanox Technologies" ( )
Code language: PHP (php)
ping一下ib卡的IPoIB地址1.0.0.1~1.0.0.4都是通的。
可能是每个节点都运行了opensm导致冲突,判断依据如下
[root@node01 ~]# opensm -v
-------------------------------------------------
OpenSM 5.7.2.MLNX20201014.9378048
Command Line Arguments:
Verbose option -v (log flags = 0x7)
Log File: /var/log/opensm.log
-------------------------------------------------
OpenSM 5.7.2.MLNX20201014.9378048
Using default GUID 0x2c90300a09341
Entering DISCOVERING state
Error from osm_opensm_bind (0x2A)
Perhaps another instance of OpenSM is already running
Exiting SM
Code language: PHP (php)