Hiveos error

NVRM: Xid (PCI:0000:04:00): 62, pid=935, 1ee5(803c) 00000000 00000000

Jul 20 04:10:42 Bankai kernel: NVRM: Xid (PCI:0000:04:00): 62, pid=935, 1ee5(803c) 00000000 00000000

Hello ,

I am getting this error gpu driver error no temps gpu are lost restarting

Does anyone know which gpu causing this and why

Thanks

1 Like

Piggybacking on your post (hope that’s ok).

I am getting similar errors and same symptoms (GPU suddenly stops showing watts, then temps disappear, then GPUs go offline). It happens about 2-4 hours after restarting. I thought it was the 2070S, but this morning, the 3080Ti (mining ERG) started acting up.

Any help would be awesome.

Things I have tried:

  • restarting
  • updating nvidia drivers
  • updating BIOS
  • setting all PCIe to Gen2
  • changing miners
  • taking out GPUs to reduce load
  • switching around risers
  • no OC

Jul 30 11:47:35 rig6 kernel: [ 141.317947][T10582] NVRM: Xid (PCI:0000:0d:00): 43, pid=10537, Ch 00000010
Jul 30 11:48:18 rig6 kernel: [ 184.310406][ T1060] NVRM: Xid (PCI:0000:0d:00): 13, pid=16136, Graphics SM Warp Exception on (GPC 4, TPC 3, SM 1): Out Of Range Address
Jul 30 11:48:18 rig6 kernel: [ 184.310442][ T1060] NVRM: Xid (PCI:0000:0d:00): 13, pid=16136, Graphics Exception: ESR 0x525fb0=0xc03000e 0x525fb4=0x20 0x525fa8=0x4c1eb72 0x525fac=0x174
Jul 30 11:48:18 rig6 kernel: [ 184.324545][T16180] NVRM: Xid (PCI:0000:0d:00): 43, pid=16136, Ch 00000010
Jul 30 11:48:59 rig6 kernel: [ 225.380030][ T4498] NVRM: Xid (PCI:0000:0d:00): 31, pid=21000, Ch 00000013, intr 00000000. MMU Fault: ENGINE HOST0 HUBCLIENT_HOST faulted @ 0x6_222c8000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ
Jul 30 11:49:05 rig6 kernel: [ 231.191075][ T1060] NVRM: Xid (PCI:0000:0d:00): 31, pid=953, Ch 00000000, intr 00000000. MMU Fault: ENGINE HOST9 HUBCLIENT_HOST faulted @ 0x1_00060000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ
Jul 30 11:49:12 rig6 kernel: [ 238.577707][ C0] NVRM: Xid (PCI:0000:0d:00): 8, pid=2356, Channel 00000008

Update:
I fixed the issue on my rig, so I am sharing what worked for others who may have the same issue. Turns out it was a riser issue after all - but one that I caused (and not a hardware error).

Two of the small riser adapter cards were incorrectly paired via USB with the wrong riser.

My riser cards, which are “VER010-X,” come with riser adapters that state, “VER008S.” See screenshots, below. When assembling the rig, I saw the mismatched version labels and opted to replace the small riser adapter cards with others I had of VER10. Big mistake.

After pairing the little riser adapters to the correct risers they came packaged with, the rig mined for 12 hours without throwing any errors. Also, the wattages and temps display properly on all cards. Feeling bold, I added a couple cards and another PSU to be safe. See screenshot below. System has been stable for approx. 12 hrs. Hope this helps someone. Cheers.

image

4 Likes

Im getting GPU driver error, no temps and are going to check this asap

dammed!! i might have the same problem! i was lowering oc trying to make the updates , downgrading upgrades, everything not working. i was thinking its something to do with risers but couldnt put my hand on it. will check this asap. thank you !!!

This didnt work for me.
Used a new riser package. Same thing.

Did it work for you.?

Reboot the rig with 70 seconds delay on overclocks. That worked for me. its something to do with the Hiveos and dag files loading. But also good to check the risers and pcie slots on mother board if they are toast !

I have a 300 second delay.
It was my overclock settings. Too high. I rebooted with no overclocks and gradually went up. I sacrificed about 1.5 mh but its been stable.

Thank you! This helped!

2 Likes

Nice. Almost a year later and fixed my issue. It would have been difficult for me to see that I mismatched one of my riser packages. I would have never looked or thought about looking had I not read this post. Nice detail. People helping people.

1 Like

This topic was automatically closed 416 days after the last reply. New replies are no longer allowed.