My vega64 rig was working perfectly fine, until I decided to clean up one of the cards.
Suddenly all other 5 cards were acting weird. My OC’s were different and the cards were pulling 220W.
Later I put back my cleaned GPU and the rig always throws out an unexpected error.
First it was giving out the error - Waiting for cards to cool down.
Later HiveOS couldn’t be downgraded, cus I saw there was a newer version.
Now all GPU’s start for a bit an stop working. Even restarting from HiveOS doesn’t solve the issue.
That cool down has nothing to do with temps, it’s a phrase that means the time between the miner restarting. Looks like your gpu 0 is hanging, I would try reducing ocs first, if that doesn’t help try that card alone and make sure it works as intended. Could be a faulty riser or something too, you just need to troubleshoot.
I have removed the OC’s. Tried to check the cards, by changing their places, switched GPU 3 and GPU 2 risers, but it’s still taking a lot of time to start.