No, it’s a power delivery issue
Could the lack of headroom be the issue, as I have eliminated most of the other variables. If so would a good check be to for example just run the amd cards (330w less drawn from each psu when Nvidia cards are offline)?
If you’re overloading the psu causing it to trip/fail that could cause it to stop delivering power.
I doesn’t exactly trip because the rig is able to restart, however I does take at least one other card offline on the same psu when it does. The psu has been tripped a couple times before so that could factor in to the problem. Still waiting for a higher rates psu…
Is that the same psu running the motherboard?
No its not the motherboard psu, however on other occasion where the powerlimit obviously tripped the over current protection the secondary PSU went offline along with all gpus connected to it on reboot. Now when it reboots it is still giving power to the atleast 2 gpues (2 offline after reboot)
So I tried disabling all the Nvidia GPUs leaving a software reported draw of under 800w total, which gives a big amount of headroom. Still the rig is being retarded as there is nothing that should make it crash, basically everything has been checked. When Nvidia cards are placed in the place of AMD there is no problem at all…
Also I don’t get how it can run 24-48h and then suddenly crash, if there is a power delivery issue it should happen much quicker?
So swap psus and see if the issue persists.
Error seems to be oc related, lowered the oc and now there is no issue (so far, 24h+)
It’s definitely a power delivery issue, my guess is your oc is overloading something and causing it to stop getting power. But the only way to figure out exactly what’s wrong is to test and rule out each component.
Unfortunately I’m still waiting for the new PSU to replace the 750w. As of right now all I can test is OC. That will in turn most likely rule out issues with pcie cables and the risers themselves. At this point I think it it is my OC drawing too much or incorrect volts compared to other OC variables and PSU headroom
you have 2 psus, correct?
swap them and see if the issue follows
Not possible as that would definitely cause an issue, the 850w PSU has about 20% headroom and putting a 750w in its place would most likely trip over current protection.
Yes I have 2 psus, one 850w (motherboard/cpu + 2x vega 56, 1x 1080 & 1x 1080ti ) and one 750w (2x vega 56, 1x, 1080 & 1x 1080ti)…so headroom is not that big but the cards are limited on their oc and cpu is at stock clock speeds
im saying test with it
I think I have narrowed down the issue to one GPU. Error seems to only follow one GPU, so I tried swapping slots which also changed which psu it was powered by and the error followed. Currently I have disabled the gpu and testing if OC can give the same error on the other cards. Faulty card?
Same GPU different miner and this is the error I get with mild OC. I know it can handle the memory offset as I used one that was even higher when I mined eth.
Reduce oc until it’s stable. Just because it ran some more aggressive clocks on a different algo and miner months ago doesn’t mean it will now with this algo/miner.
Do you have any input for really mild settings for Vega 56?
If it’s crashing slowly lower the mem and core clocks, and or raise voltages. Probably best to start from scratch with clocks and find the best performance and then the lowest voltages that maintains stability.