Watchdog: BUG: soft lockup - CPU#2 stuck for 22s! after amdgpu: PSP runtime database doesn't exist

Hi, have anyone came accross the following warning/error:

[   82.824842][ T1013] [drm] Not enough PCI address space for a large BAR.
[   82.824846][ T1013] amdgpu 0000:30:00.0: BAR 0: assigned [mem 0x7f40000000-0x7f4fffffff 64bit pref]
[   82.824944][ T1013] amdgpu 0000:30:00.0: BAR 2: assigned [mem 0x7f50000000-0x7f501fffff 64bit pref]
[   82.825411][ T1013] amdgpu 0000:30:00.0: amdgpu: VRAM: 16368M 0x0000008000000000 - 0x00000083FEFFFFFF (16368M used)
[   82.825413][ T1013] amdgpu 0000:30:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[   82.825416][ T1013] amdgpu 0000:30:00.0: amdgpu: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
[   82.825466][ T1013] [drm] Detected VRAM RAM=16368M, BAR=256M
[   82.825467][ T1013] [drm] RAM width 256bits GDDR6
[   82.825509][ T1013] [drm] amdgpu: 16368M of VRAM memory ready
[   82.825513][ T1013] [drm] amdgpu: 16368M of GTT memory ready.
[   82.825518][ T1013] [drm] GART: num cpu pages 131072, num gpu pages 131072
[   82.827887][ T1013] [drm] PCIE GART of 512M enabled (table at 0x0000008000000000).
[   82.828764][ T1013] amdgpu 0000:30:00.0: amdgpu: PSP runtime database doesn't exist
networkd[668]: wlan0: DHCPv4 address 192.168.1.33/24 via 192.168.1.1
networkd[668]: wlan0: Configured
timesyncd[468]: Network configuration changed, trying to establish connection.
1]: Starting resolvconf-pull-resolved.service...
1]: Started resolvconf-pull-resolved.service.
[  108.216918][    C2] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [modprobe:1013]
[  108.216923][    C2] Modules linked in: ccm amdgpu(OE+) iommu_v2 amdttm(OE) amdkcl(OE) amd_sched(OE) drm_kms_helper cec drm drm_panel_orientation_quirks cfbfillrect cfbimgblt cfbcopyarea i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt fb fbdev 8192eu(OE) edac_mce_amd kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rtl8xxxu aesni_intel crypto_simd cryptd glue_helper mac80211 rapl wmi_bmof efi_pstore k10temp(OE) sp5100_tco ccp cfg80211 input_leds joydev libarc4 mac_hid sch_fq_codel sunrpc droptcpsock(OE) ip_tables x_tables autofs4 hid_generic usbhid hid uas usb_storage i2c_piix4 r8169 realtek ahci libahci wmi gpio_amdpt gpio_generic
[  108.216958][    C2] CPU: 2 PID: 1013 Comm: modprobe Tainted: G           OE     5.10.0-hiveos #83.hiveos.211201
[  108.216959][    C2] Hardware name: Micro-Star International Co., Ltd MS-7B86/B450-A PRO MAX (MS-7B86), BIOS M.D0 05/17/2021

The effect is that the rig takes a lot of time to turn off: one can observe each card stopping their fans even after ~20s after the previous one has stopped, which I believe is related to the soft lockup bug mentioned above.

Cheers!

This topic was automatically closed 416 days after the last reply. New replies are no longer allowed.