Hi, have anyone came accross the following warning/error:
[ 82.824842][ T1013] [drm] Not enough PCI address space for a large BAR.
[ 82.824846][ T1013] amdgpu 0000:30:00.0: BAR 0: assigned [mem 0x7f40000000-0x7f4fffffff 64bit pref]
[ 82.824944][ T1013] amdgpu 0000:30:00.0: BAR 2: assigned [mem 0x7f50000000-0x7f501fffff 64bit pref]
[ 82.825411][ T1013] amdgpu 0000:30:00.0: amdgpu: VRAM: 16368M 0x0000008000000000 - 0x00000083FEFFFFFF (16368M used)
[ 82.825413][ T1013] amdgpu 0000:30:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[ 82.825416][ T1013] amdgpu 0000:30:00.0: amdgpu: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
[ 82.825466][ T1013] [drm] Detected VRAM RAM=16368M, BAR=256M
[ 82.825467][ T1013] [drm] RAM width 256bits GDDR6
[ 82.825509][ T1013] [drm] amdgpu: 16368M of VRAM memory ready
[ 82.825513][ T1013] [drm] amdgpu: 16368M of GTT memory ready.
[ 82.825518][ T1013] [drm] GART: num cpu pages 131072, num gpu pages 131072
[ 82.827887][ T1013] [drm] PCIE GART of 512M enabled (table at 0x0000008000000000).
[ 82.828764][ T1013] amdgpu 0000:30:00.0: amdgpu: PSP runtime database doesn't exist
networkd[668]: wlan0: DHCPv4 address 192.168.1.33/24 via 192.168.1.1
networkd[668]: wlan0: Configured
timesyncd[468]: Network configuration changed, trying to establish connection.
1]: Starting resolvconf-pull-resolved.service...
1]: Started resolvconf-pull-resolved.service.
[ 108.216918][ C2] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [modprobe:1013]
[ 108.216923][ C2] Modules linked in: ccm amdgpu(OE+) iommu_v2 amdttm(OE) amdkcl(OE) amd_sched(OE) drm_kms_helper cec drm drm_panel_orientation_quirks cfbfillrect cfbimgblt cfbcopyarea i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt fb fbdev 8192eu(OE) edac_mce_amd kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rtl8xxxu aesni_intel crypto_simd cryptd glue_helper mac80211 rapl wmi_bmof efi_pstore k10temp(OE) sp5100_tco ccp cfg80211 input_leds joydev libarc4 mac_hid sch_fq_codel sunrpc droptcpsock(OE) ip_tables x_tables autofs4 hid_generic usbhid hid uas usb_storage i2c_piix4 r8169 realtek ahci libahci wmi gpio_amdpt gpio_generic
[ 108.216958][ C2] CPU: 2 PID: 1013 Comm: modprobe Tainted: G OE 5.10.0-hiveos #83.hiveos.211201
[ 108.216959][ C2] Hardware name: Micro-Star International Co., Ltd MS-7B86/B450-A PRO MAX (MS-7B86), BIOS M.D0 05/17/2021
The effect is that the rig takes a lot of time to turn off: one can observe each card stopping their fans even after ~20s after the previous one has stopped, which I believe is related to the soft lockup bug mentioned above.
Cheers!