Bash script to reboot rig that is offline or has a dead gpu

afargo · September 7, 2021, 11:49pm

I’ve had intermittent issues with a couple of my rigs where they will stop mining or shut down I’m not actually sure what happens to them, but they don’t reboot and/or start mining again–and it’s usually in the middle of the night so the rig is down for hours before I find out it stopped. I have to literally unplug the machine and plug it back in. I’ve also started dual mining Eth and Erg on one of my rigs and I’ve had issues where the rig keeps mining even though one of the gpus is dead. The wc.sh script creates a script at /home/user/scripts/worker_check.sh. There is a timer/service at /etc/systemd/system/worker_check.timer and /etc/systemd/system/worker_check.service that first runs 5 minutes after boot and every 7 minutes thereafter. It checks the Teamredminer or Lolminer (or both in a dual mining setup) log files to make sure the number of gpus you specify are all present and are getting hashrates greater than 0 mh/s. It checks the hiveos api to make sure the rig and all gpus are online, and that there are no problems such as missed_unit, missed_hashrate, or no_hashrate. If any of these problems are present and persist for more than 3-5 minutes (depending on the problem), the service will reboot the rig. I can confirm this works when there is a dead gpu, not sure about when the rig is offline yet, and the checks on the log files are newly implemented, so I can’t confirm these work either. Every time there’s an issue it’s logged to /home/user/worker_check.log. An api call or log file check could also be used to reboot based on hashrate, gpu temperatures, etc.

Login to hiveos.farm (on a browser), click on your username in the top right corner, scroll down, and click on "Generate new Personal API-token.” Once it’s generated, click on "Show” and copy the 220 character API key and save it somewhere.
Go to hiveos.farm and click on your farm, then copy and save your farm id from the url. Select each worker from the drop down list, then copy and save your worker id from the url. The format is:
https://the.hiveos.farm/farms/{farmid}/workers/{workerid}/
You can test whether the api is working with one or both of the following command in Terminal (replace {apikey} and {farmid} and/or {workerid} with that of your own api key, farm id, and/or worker id):

curl -s -w '\\n%{http_code}' -H 'Content-Type: application/json' -H 'Authorization: Bearer {apikey}' https://api2.hiveos.farm/api/v2/farms/{farmid}

curl -s -w '\\n%{http_code}' -H 'Content-Type: application/json' -H 'Authorization: Bearer {apikey}' https://api2.hiveos.farm/api/v2/farms/{farmid}/workers/{workerid}

This step is optional. You can set up an IFTTT web hook to alert you if the rig is unable to check its online status… assuming the reason isn’t because the internet is down. Follow this guide to set up a web hook and get a key: https://betterprogramming.pub/how-to-send-push-notifications-to-your-phone-from-any-script-6b70e34748f6
If the worker is unable to check its online status, it will wait 5 minutes, check again, and send you a push notification.
SSH into your mining rig, download the following bash script, make it executable, and follow the prompts.

wget https://raw.githubusercontent.com/billkenney/worker_check/main/wc.sh
chmod +x wc.sh
bash wc.sh

system · October 29, 2022, 2:49pm

This topic was automatically closed 416 days after the last reply. New replies are no longer allowed.