Error causing TRex miner to restart

RRH96 · April 15, 2021, 12:44pm

Has anybody else experienced this problem? If so, how can it be solved?
TRex miner restarts because of 0 hashrate and this is the error:
“TREX: Can’t find nonce with device [ID=3, GPU #3], GPU #3: not enough free memory to mine ethash at epoch 408”
The GPU is a 3080 so it has enough memory to mine ETH. It doesn’t report the same GPU, sometimes it reports the error with other GPUs, which are 3070 or 3080.

PreEdit · April 21, 2021, 10:44am

I’m having the exact same issue, I have 3080’s, 3070’s and a 3060 on the rig. All have enough memory but for some reason the error comes up, I’m going to try and use an older Trex version to see if its a bug in the new version.

RRH96 · April 21, 2021, 12:50pm

Please, let me know if you get good results. I’m still experiencing the same issue.

PreEdit · April 21, 2021, 12:54pm

I’ve been trying for most of the day… some forums seemed to think the error was due to not enough virtual memory so I found something where you could allocate SSD space to act as some but it didn’t work. I only have 8Gb DDR4 ram, so I’m thinking of ordering some more. The rig seems to work with 3 GPU’s going but as soon as I add a 4th or 5th it crashes. No idea why tho as it was running flawlessly for 3 weeks a month ago…

RRH96 · April 21, 2021, 1:08pm

I have four 3080s and four 3070s and 16Gb of ram and 120Gb SSD. I extended the disk partition where the OS is to the whole SSD just in case and it still restarts 3 to 5 times a day.

kyivleaf · April 21, 2021, 5:43pm

Hi there! I have a problem with the latest trex 0.20.1. It restarts without any error notifications. My rig config you can see in attach. Also, fan speed readings disappear (zero). Maybe somebody has an idea what the problem?

RRH96 · April 22, 2021, 1:47am

The same happens to me sometimes. The miner can’t read the fan speed of one or two gpus, they start to get too hot and it causes the system to reboot. I don’t know why this happens yet.

PreEdit · April 22, 2021, 8:55am

I removed all overclocks and it seemed to make it through the night for the first time in a week, maybe I was just being too aggressive with them?

kyivleaf · April 22, 2021, 11:45am

I rolled back to the previous version TREX 0.20.0. Seems that hashrate and stability is better than in new version.

RRH96 · April 22, 2021, 1:25pm

I’ll try that soon.

RRH96 · April 22, 2021, 1:26pm

Newer versions are supposed to be better, I don’t know why this happens.

Lateam · April 23, 2021, 4:05am

13x 3070 and same issue with critical fan temp. One or two gpu begin increase temp after a random gpu oc error. T-rex 0.20.0
I updated to 0.20.1 today. I will let you now guys after 24-28h of running…

Lateam · April 23, 2021, 4:07am

I actually patch this issue with the reboot function into the autofan configuration.
Should be appreciate to have the solution ^^

RRH96 · April 23, 2021, 12:09pm

It seems like the miner can’t read the fan speed so it stays low and the gpu overheats. Sometimes I see it happen before the gpu reaches my predefined critical temperature which I have set to 75. Restarting the miner before the system reboots by itself solves the issue. This tells me the problem is with the miner and not the OS. This is random and days can go by before it happens again.

kyivleaf · May 7, 2021, 12:57pm

With new version of T-REX (0.20.3) I have error:
TREX: Can’t find nonce with device [ID=0, GPU #0], cuda exception in [synchronize, 52], CUDA_ERROR_ILLEGAL_ADDRESS, try to reduce overclock to stabilize GPU state

I know this means that I need to decrease the overclocking, but with these settings everything was fine in the old version of the miner.

Belou48 · May 10, 2021, 10:00pm

Hello. same problem with a 3070

RRH96 · May 10, 2021, 11:48pm

In my case it was caused by the overclocking. I found out only a few days ago. I decided to reduce the memory OC by 100mhz on all the cards and it’s been running well for two days. Let me know about your OC setting and I’ll help you.

aqrx · May 14, 2021, 2:18am

I was just getting the same behaviour right here.
1 of my 1660 Super cards was unable to read its fan value, and t-rex miner got restarted frequently like every 2-3 hours of running.

haven’t tried auto fans setting yet, will give it a try if there’s no other choices left.
Below is my OC setting of the card.

aqrx · May 14, 2021, 5:33am

update: even tried auto fans setting it still had frequent restart.
below is logs i’ve fetched from /var/log/miner/t-rex.log

PS, after miner restart i’ve seen CRC with (!) mark on the GPU slot during DAG gnerate. not sure if it’s something with the process on this card itself or not.

20210514 11:46:42 GPU #4: DAG generated [crc: 1096dfed, time: 27595 ms], memory left: 1.49 GB
20210514 11:46:49 GPU #1: DAG generated [crc: 91ed30ce(!), time: 34313 ms], memory left: 1.49 GB

20210514 11:44:04 [ OK ] 66/66 - 256.05 MH/s, 236ms ... GPU #1
20210514 11:44:05 TREX: Can't find nonce with device [ID=1, GPU #1], cuda exception in [synchronize, 52], CUDA_ERROR_LAUNCH_FAILED, try to reduce overclock to stabilize GPU state
20210514 11:44:05 WARN: Miner is going to shutdown...
20210514 11:44:06 Main loop finished. Cleaning up resources...
20210514 11:44:06 ApiServer: stopped listening on 127.0.0.1:4059
20210514 11:44:06 ApiServer: stopped listening on 127.0.0.1:4058
20210514 11:44:06 [ OK ] 67/67 - 0.00 H/s, 445ms ... GPU #4
20210514 11:44:07 T-Rex finished.

RRH96 · May 14, 2021, 11:56am

I had the same problem and it was caused by the overclocking. Not all the GPUs will OC the same way. Reduce the OC or start from the default settings and see if that solves the problem. Then increase memory OC slowly until the problem reappears. This way you’ll know what setting to use. Don’t hurry the process. Another way to do it is to lower de OC by 50 or 100 until the issue is gone, this is probably the fastest way to do it. Let me know about your results. Good luck!