GPU driver errors and GPUs lost, forcing reboots

OK so I ended up here with the same issues you are having. I have been troubleshooting this for days. Checked everything including power cables and risers. Changed them all out. Even changed my motherboard and cpu. still had the same problem. Come to find out I had a usb extension on one my cables that was causing the issue. When i cutout the extension all of my problems went away. So remove extensions and/or change usb cables or make sure they are fully seated. Hopefully this will fix your problem.

1 Like

I had the same problem,
I decided to access the disk drive through Windows and delete the amd-oc.conf (nvidia-oc.conf if it is Nvidia) and autofan.conf files
In the rig.conf file it deletes the lines referring to the WatchDog.
After that I started HiveOs again and the rig started working again. :fireworks:
or simply format your disk and reinstall the system and redo the settings.

3 Likes

Thanks for this dude, this issue has been hounding me for a week i had a few unexpected power interruptions and EVERY time after that i have all these weird nvidia driver issues and my oc’s wont apply.
and the only why i could get it working is to reload the flash drive. but this saved me a lot of time doing that every the drives. so +1 from me
PS: i used shell in a box and only removed my nvidia-oc.conf and autofan.conf and restarted.

cd /hive-config
rm nvidia-oc.conf autofan.conf

Please use the above commands at own risk and understand what you are doing.

4 Likes

I had the same problem.(ERGO, 2miners, t-rex / rig:3060ti,3070ti, 3080)
I solved it by:

  1. Downgrade hiveos.
  2. Downgrade nvidia drivers to stable version.
  3. Many tries to set good OC.
    Now its working fine.
1 Like

Same issue, four 3080. So frustrating. If this works I will praise NEoKhajitt and evandrop to my grandchildren. Will post again. So far stable.

@KryptoMc can you share what stable versions you are using please?

its N 460.91.03

1 Like

what version of hive os? thanks

i updated hiveos to last version and it still stable :slight_smile:

the last version of hive OS 0.6-210@210920 with drivers NVDIA 460.91.03 right?

yes bruh

1 Like

Thanks for this, works fine !!! :ok_hand:

1 Like

Yes the 90.03 is good had zero errors for 2 weeks

Yeah, works for me, too! Big double thumb up :+1: :+1:

1 Like

Hola

Tuve el mismo problema con 4 3070, lo que me sirvió, como indica un comentario, es eliminar el OC, luego probar que los cables usb no sean el problema (como indica otro comentario) encendiendo una tarjeta a la vez, luego dos (tengo dos tarjetas por PSU) y finalmente las 4, pero tenía una tarjeta que me seguía dando fan 0 y error en OC, lo que hice es cambiar la ficha del riser (la que va a la MB) de posición hasta que me tomo el OC y nuevamente está minando como antes

espero que sirva

2 Likes

Hola @eduardogt21 gracias por tu comentario. Me podrías decir/mostrar cómo eliminar el OC? Tu tienes actualizados los drivers de HiveOS y Nvidia a los actuales? Gracias por tu ayuda. Estoy con este problema hace meses y mi RIG solo anoche se rebooteó 7 veces…

Hey @NEoKhajitt can you explain me how you did use those commands? Thanks

This issue is relationship with the overclock settings , ti was happened to me on the RTX 3060 with LHR, After I isolate all the GPU’s I found that this one with the Overclock settings 1600 on core and 2600 on MEM is not stable So after I change the setting to 1450 and 2400 I recover for two days the GPU without issues after that I saw again the issue but it was recovered after the reboot, It is possible that I will need to change again the OC to find the correct settings, at begin of this issue the settings were 0 on Core and 1600 on MEM but I had a power around 120W and a temperatures around 60 grades for one month without any issues, So don’t worries you only need to find the correct the OC for the GPU maybe you will sacrifice hashrate to get stable the GPU



Log2

1 Like

sir would mine explain to me step to step how did use those commands?

Okay let me discuss one by one and I will discuss how to solve it without spending money first

first, it says gpu are lost because you activated watchdog (reboot if gpu is offline), try to disable that

secondly,try to reflash the usb

Lastly try to check all gpu connection/riser its can be also if gpu is old/used

Thats all from me :slight_smile: