HiveOS freezes multiple times a day: power cycle needed

Hi everybody,

I’m using HiveOS and it’s been giving me alot of problems. Sometimes a specific GPU stops mining, and it shows “err” I also lowered the OC from 2400 to 2100 and still hasn’t helped. My temps are pretty low as well. My main issue currently is a few times a day the rig and the OS completely freezes and I need to power cycle it, and then it happens again a few hours later. I have set up Watchdog but it doesn’t seem to help with this. New Ram did nothing too.

Specs:

Asus TUF z590

8gb ddr4 2666mhz

120 GB Patriot Burst

2 x 1300 watt Seagate (each powering 2 gpus, and one of them is powering the mobo, cpu)

4 x RTX 3090 EVGA FTW3 (OC: 1200 core, 2400 mem)

HiveOS 0.6-212@211130

Ethermine TREX miner

My risers are powered by PCIE as well, and it says im pulling 1.1 - 1.3kW

All I want is it to run 24/7 with no issues, any help is appreciated.

can you post a screenshot of your worker overview screen? showing everything

Here you are https://imgur.com/a/WgqKdMs

in the future you can just paste the screenshot into the reply box instead of using a 3rd party image hosting site, your core clocks are a bit high, i would lower them to 1140mhz, and increase fan speed to 100%. that should improve stability a lot.

Ok done. I’ll let you know if it helps

Hi, it’s still happening again. Randomly just freezes, and I have to force shutdown and start it back up again.

Do you have a display hooked to the rig? If so what does it freeze on?

Yes I have a display hooked up to it, where the entire screen is frozen, until I power cycle. Also on the hiveos website it just shows as rig offline. Also the ethermine website says I have stopped mining too.

right, but whats on the display when it freezes, was it just mining like normal, or did it have an error, or building the dag file etc etc?

So this is what the log said

type or paste code h0220301 21:08:29 GPU #3: using kernel #2                                                                                                                                                                          
20220301 21:08:30 GPU #2: using kernel #5                                                                                                                                                                          
20220301 21:08:31 GPU #0: using kernel #4                                                                                                                                                                          
20220301 21:08:33 GPU #1: using kernel #2                                                                                                                                                                          
20220301 21:08:35 [ OK ] 1/1 - 493.96 MH/s, 45ms ... GPU #3 | 4.31 G                                                                                                                                               
20220301 21:08:41 TREX: Can't find nonce with device [ID=2, GPU #2], cuda exception: CUDA_ERROR_LAUNCH_FAILED, try to reduce overclock to stabilize GPU state                                                      
20220301 21:08:41 WARN: Miner is going to shutdown...                                                                                                                                                              
20220301 21:08:41 Main loop finished. Cleaning up resources...                                                                                                                                                     
20220301 21:08:41 ApiServer: stopped listening on 127.0.0.1:4059                                                                                                                                                   
20220301 21:08:43 T-Rex finished.  ere

reduce mem clock on gpu 2

I just reset the OC on GPU 2 and i’ll let you know what happens. I will try and increase it from there. I set it from 2400 to 2100 with still the same issue. What do you think would be an acceptable clock speed?

leave the core setting, and just reduce memory each time it crashes.

Leave at 0 or 1140?

leave at 1140, 0(default) is way more core than needed for ethash

So I solved the issue by replacing the riser and trying a new USB port, I have been testing for about a week for stability also gradually increasing the OC, it can now reach 2400 stable. Thanks for all the help

1 Like

This topic was automatically closed 416 days after the last reply. New replies are no longer allowed.