Rig total freeze, requires power reset

I have been mining between March and April successfully. No issues. At the end of April, my rig started to stop responding every ~6 hours. I have to reboot it with a power reset.

I started my rig again yesteday. A freeze happened today morning. I have looked at all logs

/var/log/miner/phoenixminer/phoenixminer.log
/var/log/kern.log
dmesg
/var/log/hive-agent.log
/var/log/syslog

I have found nothing. I have no idea what caused this freezes of the rig. It worked well for ~2 months in March-April.

I have tried already lowering the memory OC, did not change anything. My OC was stable for 2 month, so I don’t think it is linked to the OC.

I have no monitor attached to the rig ATM.

1 Like

Hi, have you solved the problem somehow?

Currently experiencing the same thing with one of my rigs… it freezes completely at random times, if I connect the monitor I see just frozen screen with non-blinking cursor. Logs don’t seem to show anything special. It doesn’t respond to keyboard or anything. The HiveOS app shows the rig as offline.

I have plugged in a USB watchdog, but even that is not working (the watchdog works in other scenarios). Currently running from USB stick, but switched several of them and that seems not to be the issue.

Thanks.

After some updates with Hive OS, it stopped doing that. I don’t know what was causing that. I don’t know if it was loosing internet connection, or if the miner was freezing or the OS was freezing, since I found nothing with the logs.

I also reinstalled from scratch the SSD, in case the OS partition was corrupted.

Ok, thanks anyway for the comment. I will try switching from USB to SSD and see.

For now, I’ve created a workaround by getting a smart plug (Tp-link) and setting a IFTTT for the plug to switch off and on (so hard reset the rig) when I receive a Telegram message from Hive saying that that specific rig is Offline… but that should not be a final solutions.

Hi all,

I have the same problem, one rig freezes at random times, and the only way to get it up again is to switch both of the PSUs off and on again.

@dncz:
I would also create that workaround, like you did. I have some of those smart plugs from TP-Link, too.

How exactly did you connect the Hive Bot with IFTTT, so IFTTT could trigger the TP-Link plugs?
I have tried for some time now, but I cannot connect the 2 bots (Hive Bot and IFTTT Bot), because bots in Telegram Messenger never can see each others messages. I have made groups and channels where the bots are admins and connected the grups/channels with IFTTT.

But when I try to create a trigger in IFTTT, it simply gets no messages from the Hive Bot. Only when I write directly to the IFTTT Bot (eg with “/IFTTT”), it triggers. So it does not work automatically, if a rig is offline and the Hive Bot sends a notification.

Do you have any advise?

Everything could be much easier, if you could create a rule in the HiveOS dashboard, that e.g., if a rig goes offline, a certain URL is called. Or better, if HiveOS would have a connector to IFTTT, one could make funny things like turning the lamp in the living room to red light if the temerature of the rigs is too high an so on.

Thanks.

Hi,
do you have TP-Link TAPO or KASA plugs? For some reason, only Kasa can be triggered by IFTTT… so I also bought a couple of Sonoff’s Pow R2 (the one that also measures the consumption) for next rigs.

I also re-did the trigger to a “better/safer” solution… I programmed a PHP script that is run as a cron job every 2 minutes and calls HiveOS API and gets the status of each worker. If that specific worker is offline for at least 5 mins I call IFTTT API as an incoming webhook. That is much easier than fighting with the API of Telegram. So I recommend it that way. The API of HiveOS is very simple to use and get the data.

Hope that helps.

My man, I am dealing with the same issue with one of my Nvidia Rig…every single of them got at least once in 24 hours… during sleep most of the time… Is there any chance you can show us how to solve it by using php script? I found in IFTTT that I can’t link IFTTT bot to HIVE Bot… i need Hive BOT to talk to IFTTT bot instead. Or if there is a way i can automatically forward the message from Hive BOT to IFTTT Bot to trigger the action. (i am really desparated

I have the KASA plugs (HS110) which should work with IFTTT. For the next rigs I ordered some Shelly 2.5 which support MQTT and also have power measuring. Probably there is a way that if the power consumption drops below a certain value for some time, to make the switches to turn off and on again. So there would be no need to have a server running a script (or learning how to code in PHP).

Thanks for your help!

Hi, I am not a profi developer, so you’d probably laugh at my code :-), but I can guide you through it, was not difficult at all to set it up. I run the cron script every 2 minutes.

First, generate an API key for Hive API - in your Account Profile Settings in Hive.

Then start a script (PHP or any other language) that calls this API call “/farms/{$farmId}/workers”. The response contains an array showing all the details about each worker (including online/offline status), what matters to you is [“stats”][“online”], which should be “true”. See more in the Hive API docs.

So then it comes the IFTTT. Inside it, I have set up a trigger on incoming webhook for each of the smartplugs/rigs (Create - Webhooks - Receive a web request - name it somehow - eg. rig1reset). Then get your IFTTT API key.

The script should then go through the HiveAPI response with any iterator (for/foreach/while) and check if all the workers are online, if not then for how long are they offline and if it’s been already offline for more than 3 minutes (wanted to give it a little time to reboot automatically if possible) than call a webhook from IFTTT like this one “https://maker.ifttt.com/trigger/[trigger_name]/with/key/[your_ifttt_maker_key]” and send me an email about the initiation.

And that’s it. Don’t forget to set your motherboard bios to “auto start when power on” (usually named somehow similar in all bioses). Otherwise, it would not restart after the smart plug is switched off and on again.

I have an IFTTT Pro account (paid) because I had to set up these restart triggers individually to each of the smartplug/rig. And you can have just 2 triggers for free. Don’t know if there is a workaround for that.

I have tried the same approach with TP-Link Kasa HS110, with Sonoff Pow R2 and Woox Smart Plug, but generally, anything that has an open API or is compatible with IFTTT should work properly.

1 Like

I have the same problem here, did someone found a solution?

I started having the same issues with my AMD rig after installing 0.6-199@210216. I’m going to downgrade to the 0.6-199@210210 version now to see if that resolves the issue.

I’m a bit leery of using the hard reset method as this could cause damage to your mainboard. Given the current state of affairs with price gouging on everything mining. A new mainboard could cost upwards of $300 today.

I have exact same problem… some times it mine well for 12hours sometimes it drops evry hour. I never get more than 24hours mine continuously…
maybe Im wrong but I think it started after swaping from outdated claymore to phoenixminer the rig become unstable.

So far, so good with stability on running the 0.6-199@21021 version. I’ve ran for 24hrs and no random outages. I am running lolminer so the 0.6-199@210216 version would have an impact on my miner. I’ll give it another 24hrs and then attempt to upgrade again.

2 Likes

Updated to the latest version 0.6-199@210218 and have had two restarts. GPU temperature 511 is unreal, driver error, rebooting. Not sure of the next steps at this point.

I had the same issue on 0.6-199, freezing at the most inconvenient hours, and only a hard reset would get it going again.
I had to downgrade back to 0.6-195 to fix it for me. It’s running stable again all day and night.

1 Like

The issue with the GPU temperature was caused by a loose riser card. Reseating the card resolved that issue and running the latest version has been stable for me 24hrs in a row.

yes, gpu temp 511 is usually due to riser or cable issues, that’s a known issue
but the freezing didn’t even give any errors, the OS was just stuck, so those seem different problems

Can you share your programm? I have no idea how to do this. Best wishes

Did you solve this issue? I’m having same thing. Can’t seem to find the problem

Hello guys,
I finaly got thé solution.
I read it on a commantary on Facebook, the motherboard is to hot so hiveos freeze. I put two PWM on the motherboard for try because i not a pro. And it been 3 Day without freeze, before i have freeze every Day.

2 Likes