Tried that and so far its working. I didnt set fans to 100% but I reduced the temp target by 5C and reduced the memory overclock. I could only test it for 12h as I had WiFi problems so I had to stop it, but thats more than I was able to get before. I bought some thermal pads and will the replacing them to see if that helps. Hopefully tomorrow when I wake up I will see a nice dashboard with no alerts.
If I try to set the fan manually on my 3080 it does not work if all the other fans are running on autofan. The fan still works off the auto fan settings. It also provides the error on manual fan settings thats its rebooting because Autofan: Unable to set fan speed error.
If I use a fan percentage on my 3080 it is never used unless I manually set all my cards to use manual settings and turn of autofan. Not a great fix.
Pep, what are your OC settings for your 3080, and autofan settings? I tried what was suggested by support. No more reboot, but the rig just locks up and goes offline. I do not see the suggestion from support as a fix.
I reduce my 3080 Memory OC from 1550 to 1350, and lowered my temperature target from 70 to 65. I did change some other things like reduce memory overclock from other cards and reduce power draw for some of them as well. I also changed my driver from 460.* to 455.*, but I dont think that has anything to do with it as I did that a couple of days ago and reboots kept happening. So far 18h uptime with no errors, hadnt seen that in a while. I think I had the same problem you are talking about the rig going offline but I cant remember what I did to fix that. One way to go about it is set all of your cards to a very safe overclock and change nvidia drivers and try luck, and form there on keep changing settings until it breaks.
Quick update. Changed thermal pads, apart from the horrible job I did at it ( Wrong size, got a bit of a PCB flex but I cant be bothered to fix it right now), I set all my OC settings back to normal as well as the Autofan orignal 70C target and so far 7h up time with no reboots so it does look like hot Memory temperatures was the problem. If you are able to find good documentation on your GPUs thermal pad sizes I would 100% recommend changing them specially in the 30 series cards, if not, do it at your own risk.
Edit: One day and still no problems
I am replacing all the thermal pads tomorrow on my 3080’s. I still am unable to get the fans to follow Autofan or the fan rate that I have in the OC settings.
This issue was driving me nuts until I came across this post : 3080 – GPU Driver Error, no temps - #31 by highflyer
Also, I don’t know when the Nvidia overclocking was changed, but now you can set the Absolute Core Clock instead of the Core Clock Offset.
With absolute core clock and static fan settings, my system has been completely stable and the efficiency of my cards has improved.
Na opção Autofan deixei como 50°C e por hora isso resolveu ficou mais de 24 horas sem apresentar o erro do Autofan.
In the Autofan option I left it as 50°C and for now it was solved, it was more than 24 hours without showing the Autofan error.
I’m currently testing if static fan settings will stop this annoying error.
Autofan: unable to set fan speed
If this is the case, what an inconvenience. Hasn’t even been fixed in months.
After troubleshooting, I changed to different PCIe slots which has fixed this problem. Setting static fan speeds did nothing for me.
It helped me. Thanks!
gpu 1 hung detected! [21:20:03] error - if this error happened several times, or failed to detect some gpus, please check your hardware.
İ have this problem guys some one help me ?
This topic was automatically closed 416 days after the last reply. New replies are no longer allowed.