One of my rigs began producing an “Autofan: GPU driver error, rebooting” message yesterday. What could be the cause? I upgraded to version 5.63 yesterday. Could it be an issue with that version?
GPUs are MSI GeForce GTX 1060 and 1070 Gaming X.
Please advise. Thank you.
Update: Same message now appearing on another rig.
On the first rig, the issue seems to have been related to a defective riser card or GPU. I disconnect both and the system has been mining without issue for 16 hours now.
On the second rig, I upgraded it to 5.64, and the rig produced a new error message of “Autofan: GPU driver error, no temps.”
Identical and similar messages on two different rigs with different brand and model GPUs… root problem seems to be with HiveOS. Disappointing I haven’t any gotten replies, especially from the HiveOS team.
autofan.conf didn’t exist, so I created it with the following data:
#URL
#https://forum.hiveos.farm/t/how-to-use-autofan-autofan/4551/4?u=77164
#https://hiveos.farm/changelog/
REBOOT_ON_ERROR=1
# Target GPU temperature
TARGET_TEMP=60
# Minimal fan speed
MIN_FAN=30
# Stop mining at critical temp
CRITICAL_TEMP=85
# Set to 1 to disable AMD fan control
NO_AMD=1
Please note that I disabled auto fan control for AMD GPUs because I don’t have any.
For those wondering how to create the autofan.conf file:
first, SSH in to your rig. The type the following:
nano /hive-config/autofan.conf
then add the following into the file contents:
#https://hiveos.farm/changelog/
REBOOT_ON_ERROR=1
# Target GPU temperature
TARGET_TEMP=60
# Minimal fan speed
MIN_FAN=30
# Stop mining at critical temp
CRITICAL_TEMP=85
# Set to 1 to disable AMD fan control
NO_AMD=1
So after extensive testing, the riser and GPU in question were not defective at all. What actually resolved the issue was reducing the GPU quantity from 12 to 10 in that particular rig. I came to that realization/conclusion yesterday when I re-installed the riser and GPU and the error messages re-appeared, then, again, I removed a riser and GPU, and the error messages stopped.
This particular rig used to function properly with 12 GPUs. However, that may have been on Windows, before I switched the rigs over to Hive OS.
For clarification, my other rig that also produced these error messages has only 6 GPUs. So the 11+ GPU issue doesn’t apply in every situation.
Getting same error. I have 5 1060s and 5 1080ti connected to my rig with Asus B250 Mining MoBo. Rig’s working fine for 24hrs and then gets this error. Will try to fix with manual autofan config…
I’ve just started using HiveOS (7x 1060 3GB MSI on 270P mobo), but this issue started right away for me. I’ve replaced risers & graphics cards with no change to the error. The error is random in timing, ranging from about 45 minutes to 26 hours.
Using the “Tuning” option & enabling the hashrate watchdog has mitigated the issue for me without requiring manual intervention. I didn’t try the manual autofan config because my experience is that manual fixes are undone by upgrades.
I had the same problem with one of my freshly upgraded rigs, so far I’ve downgraded to 0.5-60 that it’s been running before. Will see how it goes, but I think it’s the proper version to use before autofans were implemented. So far it works great for a couple of hours. Just a solution for you to try.
If it helped, tips are welcome:
3LMaJKvM5UgWqhJ6dgmLMGryuBafUo9gdT