Bad Riser example

The error look like this

[drm:amdgpu_ib_ring_tests] *ERROR* amdgpu: failed testing IB on
ring 1 (-110).
[drm:amdgpu_vce_ring_test_ib] *ERROR* amdgpu: IB test timed out.
[drm:amdgpu_ib_ring_tests] *ERROR* amdgpu: failed testing IB on
ring 2 (-110).
[drm:amdgpu_vce_ring_test_ib] *ERROR* amdgpu: IB test timed out.
[drm:amdgpu_ib_ring_tests] *ERROR* amdgpu: failed testing IB on
ring 3 (-110).
[drm:amdgpu_vce_ring_test_ib] *ERROR* amdgpu: IB test timed out.
[drm:amdgpu_ib_ring_tests] *ERROR* amdgpu: failed testing IB on
ring 4 (-110).
[drm:amdgpu_vce_ring_test_ib] *ERROR* amdgpu: IB test timed out.

The reason is bad soldering or even broken line on riser. Third from the right on the image

And another example

6 Likes

I’m also getting this error from two different cards, even when plugged directly into my motherboard (no riser). I have 2 cards with this error.

There is also this: https://bugs.archlinux.org/task/53042, https://bbs.archlinux.org/viewtopic.php?id=225597

Not only that - I can take and swap risers with working cards and those cards continue to work while the same cards that fail IB RING 12 mine just fine in a different system (it is only RING 12).

Both are XFX RX 480s without modded BIOS (didn’t get to mod them yet).

I’ve proven it isn’t a riser issue and it isn’t with the cards (nor with which PCIe slot they are using).

Help?

I can confirm what Rootless said. I’m facing the same problem on my 12 gpu rig, running rx570 / rx580. Changed/switched risers, changed mobos (h110/tb250), same random IB RING errors on boot (with different numbers). This affects Ethos, Smos and Hive as well, though I had more luck running Ethos (different kernel+firmware I suppose, but this did not solve the problem completely). It’s so freaking annoying, cause it requires a physical restart of the PSU.
Help, please…?

I can confirm it too. But i’ve flash the stok rom to cards, make 20+ reboots, and don’t have this error. I think kernel in HIVEOS can’t work with moded bios. Maybe it’s ROCM works bad. RX580 4Gb Elpida cards have this problem. But Radeon RX 580 8gb workes well. Dima, need help, what we can do ?

Other RX580 4gb Elpida works well, hive ver. 0-5.32 all rigs

The same problem and change more tgat 20 risers and different card and mb. No stability… help.

same problem with rx580 Pulse Elpida 4Gb - no solution found yet

Same error. Have rig 570 4gb sapphire ITX.
I find card with this error and flash it on default bios and the problem solved.

hola a todos, creo que es el mismo problema y según comentarios es por que una usa una bios original de las tarjetas, encontraron solución?

The farm did not work stable - it hung every couple of days, sometimes restart (so that all video cards would start normally) took up to half an hour. Motherboard was Biostar TB250-BTC. I changed it to Asrock H110 Pro BTC +. “Amdgpu: ring” errors started to pop up. I solved the problem by replacing several fail risers.

Ферма работала не стабильно - раз в пару дней висла, перезагружалась иногда на перезапуск (чтобы нормально запустились все видеокарты) уходило до полу часа. Материнка Biostar TB250-BTC. Поменял на Asrock H110 Pro BTC+. Посыпались ошибки “amdgpu: ring”. Причина оказалась сразу в нескольких в райзерах.

I am getting the following errors, could these also be related to risers? I ordered some new risers but they’re coming in a month or so

Jan 10 12:25:23 hive5700XT kernel: [58483.988705] amdgpu: Failed to export SMU metrics table!
Jan 10 12:25:28 hive5700XT kernel: [58488.988954] amdgpu: Msg issuing pre-check failed and SMU may be not in the right state!

Hello!

Did you solve this problem? Is it about riser? When you change any settings system gives this error and you need to shut down right?

Thanks,
Özgür

I have the same problem.
So frustrating.
I tried many things:
connecting the card directly into the motherboard - same error
reinstalled BIOS for the mobo
changed to working risers
added RAM
even had a chat with the hiveos assistant (he thought it is the BIOS)
if anyone can share more experience on how to make the GPU’s UVD to be working again - that will be great!
Thanks!