Without my intervention, several times a day the whole rig switches to a strange state where the power consumption of all GPUs increases above the limit and the hashrate drops. Any change in the config, e.g. fans from 100 to 99, helps and restores the normal state. Screenshots below. I did nvidia-driver-update to 525.85.05 it did nothing, the problem persists. Then I did a hive-replace, i.e. reinstalled the system, updated to the latest version 0.6-220@230130 and drivers 525.85.05, unfortunately the problem still persists. I will add that I don’t see anything strange in the logs, in the activity tab there are only my changes (fan 99, fan 100) which I restore to normal. The problem occurs regardless of the miner always on the entire rig (on all GPUs). During the screenshots, 3 different miners were running, each with 2 GPUs. The problem started about 10 days ago by itself without any action on my part.
The same problem occurred on my another rig, but only once.
I have no more ideas, please help.
are you controlling OC with your miner? can you post your flight sheet?
Switch to locked core clocks instead of the core offsets. Hive treats values above 500 as locked core clocks.
Yes, I know, but why should I do that? For some algorithms locked core works better, for others core offsets works better. I’ve been testing it for some time and I have it set optimally for me. The problem appeared 10 days ago out of nowhere, never had problems with it before. I think the problem lies elsewhere.
I can try to do that, but even if it helps, it won’t be a satisfactory solution.
The more so that it is enough to apply a new config with some insignificant change and everything returns to normal. It seems like for some reason the configuration suddenly gets messed up and made a mess.
Try setting oc in the miner then. Offsets only will let the card use as much power as it wants
I tested switch to locked core clocks and it did nothing better. It worked fine for about 3 hours and just a moment ago this strange state turned on again, despite setting locked core clocks on all GPUs in this rig.
I will test setting oc in the miner then, but generally the problem occurs in hiveos. It’s weird that I’m the only one having this problem?
See if you can capture the logs from when you have this issue
Which exact logs would you like to see?
Miner logs during the change
I think I have a lead. The logs in most miners look normal, only the power on the GPUs starts to increase at this point. However, on bzminer when this state occurs, it starts a new miner launch and a new log and does not store the old ones, strange. I think bzminer is to blame, I need to set something up so that it doesn’t overwrite old logs with new ones and then maybe I’ll see what happens before bzminer restarts.
But all it seems is that bzminer is restart for some reason and is forcing some weird OC on restart. Thanks, I’ll let you know as soon as I find out what’s going on.
you can use this command to have bz keep the previous logs. A 2 for verbosity is normal, 4 is debug
"clear_log_file": false
"log_file_verbosity": 4
Indeed, bz has an issue with power exceeding the limits set in hive’s oc. Think it started with 13.0.2. You can manually set it to 13.0.1 and should work better
Thanks. That’s exactly what I did, but I didn’t find anything interesting in the logs at all. All I know is that the miner does a restart when he has a problem connecting to the server during Dev fee thread.
However, I found an interesting file in the miner folder “*reset_oc.sh” in which there is a line: “./bzminer --no_watchdog --oc_reset_all” and this is probably the trap, because by default --oc_reset_all should be false.
I don’t know when this file is run, but logic tells me that maybe in case of problems with Dev fee thread. I changed it and we’ll see if it helps.
think reset_oc.sh is hive related. Even modifying it doesn’t change bz behavior
You might want to go to bzminers discord and let them know
@keaton_hiveon does hive runs the reset_oc script run before you start bz? Several of the cards exceed the power they would normally use with the oc set by their miners. The script doesn’t include which cards it should reset, and thus reset all the oc for the rig.
Seems this is a new item after 13.0.1
That’s a bzminer specific script, not a hive one.
In fact, changing this file or even deleting it does nothing. Additionally, I added the commands:
“oc_reset_all”: 0
“oc_reset_on_exit”: 0
But that didn’t change the behavior of bz either.
I give up with this and manually set version 13.0.1. Thanks for the hint.
This topic was automatically closed 416 days after the last reply. New replies are no longer allowed.