GPU Error after upgrade to new version

Hi,

I’ve just upgraded to 0.6-217@220515 and 2 of my 3060s are showing with GPU Driver errors and keep failing.

I’ve tried turning off all over clocks but I still can get them to stay up.

The error message isn’t particularly informative… any suggestions?
GPU Health Data:
01:00.0 Temp: 65C Fan: 65% Power: 153W
05:00.0 Temp: 64C Fan: 63% Power: 148W
06:00.0 Temp: 55C Fan: 59% Power: 142W
Latest GPU driver errors list:

on the latest kernel/drivers? any gpu errors that list the pcie location?

I am having this issue on 3 rigs now. 8th GPU fails with nvidia GPU driver error and then won’t get recognized again unless I rebuild the OS. Something is wrong with either the NVIDIA drivers or HiveOS

are you on the latest kernel/hiveos/drivers? can you post a screenshot of your worker overview and of any error messages?

1 Like

I have the same isue…
GPU Health Data:
00:02.0 Temp: 0C Fan: 0% Power: 0W
01:00.0 Temp: 52C Fan: 33% Power: 130W
03:00.0 Temp: 52C Fan: 42% Power: 130W
04:00.0 Temp: 51C Fan: 41% Power: 130W
05:00.0 Temp: 52C Fan: 38% Power: 130W
06:00.0 Temp: 57C Fan: 37% Power: 235W
07:00.0 Temp: 53C Fan: 47% Power: 130W
08:00.0 Temp: 58C Fan: 0% Power: 235W
09:00.0 Temp: 55C Fan: 42% Power: 130W
Latest GPU driver errors list:
May 15 17:51:12 Hagen_5 kernel: NVRM: Xid (PCI:0000:08:00): 62, pid=927, 0000(0000) 00000000 00000000
May 15 17:51:12 Hagen_5 kernel: NVRM: Xid (PCI:0000:08:00): 45, pid=3123, Ch 00000010
May 15 17:51:17 Hagen_5 kernel: NVRM: Xid (PCI:0000:08:00): 45, pid=3123, Ch 00000011
May 15 17:51:17 Hagen_5 kernel: NVRM: Xid (PCI:0000:08:00): 45, pid=3123, Ch 00000012
May 15 17:51:17 Hagen_5 kernel: NVRM: Xid (PCI:0000:08:00): 45, pid=3123, Ch 00000013
May 15 17:51:17 Hagen_5 kernel: NVRM: Xid (PCI:0000:08:00): 45, pid=3123, Ch 00000014
May 15 17:51:17 Hagen_5 kernel: NVRM: Xid (PCI:0000:08:00): 45, pid=3123, Ch 00000015
May 15 17:51:17 Hagen_5 kernel: NVRM: Xid (PCI:0000:08:00): 45, pid=3123, Ch 00000016
May 15 17:51:17 Hagen_5 kernel: NVRM: Xid (PCI:0000:08:00): 45, pid=3123, Ch 00000017

Can you help us?

This is since the update of the LHR unlock…

Bad OCs on all cards. dont use core offsets/power limits on modern cards.

to tune OC per card. start at 0 across the board and find the highest stable mem clock.

then find the lowest lcoked core clock that maintains full hashrate. this will also use the least amount of power it needs to run at that core clock.

your error says gpu 6 is carising the crash, but you should finx all the OCs and go from there. reduce oc on cards that crash after doing the above steps

I will try, thank you!!!

And the Kernel Drivers ar OK?
Thanks a lot!!!

youre on an older beta image. it wouldnt hurt to install the latest stable image, but that wont impact much. your drivers are fine but youre still using core offsets (-500) and power limits. your core clock should be around 1060mhz on the 3070s and 1080mhz on the 3080s

I have the same problem! And I have this OC with 6 months! The problem started maybe a week ago! So I think the problem is not the OC!

The same ocs may not work for full unlock like they used to work for 75% unlock. Reduce mem clock and reboot. Repeat until solved.

This topic was automatically closed 416 days after the last reply. New replies are no longer allowed.