Hi, I recently started having this issue after upgrading to 0.6-211@211102.
Unfortunately, I do not remember what version I was on prior to updating.
I will get a notification that my rig went offline but it will not come back online.
Each time this happens, I go check on the rig and everything is still running.
I connected a monitor to see what was happening, and HiveOS just completely freezes.
What have I done:
- Reset bios and redid the settings.
- Updated bios.
- Swapped to a different Motherboard / CPU. (X470 Taichi and 5800X)
- Swapped to a brand new SSD.
- Swapped to a brand new set of memory. (2x8GB kit)
- Flashed a new hive image multiple times.
- Updated to the newest hive version.
- Swapped riser with know working risers.
- Reducing and removing overclocks.
My system:
- Motherboard (B450 Gaming K4 ASRock P4.80)
- CPU (AMD Ryzen 7 2700X)
- Memory (Crucial Ballistix Sport 2x8GB)
- SSD (PNY CS900 120GB)
- Kernel (5.10.0-hiveos #72)
Cards:
- 3080 (Evga)
- 3080 (Zotac)
- 3070 (Evga)
- 3070 (MSI)
- 3070 (Zotac)
- 3070 (Zotac)
- 6700 (AMD)
- 6800 (AMD)
I am running T-Rex and PhoenixMiner and have tried NBMiner as well.
When I run the commands to check for errors in the syslog there are none.
I was able to see what happened just before the rig froze by using the command ‘motd watchdog’.
kernel: [72580.057791][ T1171] NVRM: GPU at PCI:0000:04:00: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
kernel: [72580.057793][ T1171] NVRM: Xid (PCI:0000:04:00): 62, pid=1171, 0000(0000) 00000000 00000000
kernel: [72580.093233][ T1171] NVRM: Xid (PCI:0000:04:00): 45, pid=4560, Ch 00000010
kernel: [72581.503222][ C10] sched: RT throttling activated
kernel: [72585.094636][ T1171] NVRM: Xid (PCI:0000:04:00): 45, pid=4560, Ch 00000011
kernel: [72585.098915][ T1171] NVRM: Xid (PCI:0000:04:00): 45, pid=4560, Ch 00000012
kernel: [72585.103196][ T1171] NVRM: Xid (PCI:0000:04:00): 45, pid=4560, Ch 00000013
kernel: [72585.107478][ T1171] NVRM: Xid (PCI:0000:04:00): 45, pid=4560, Ch 00000014
kernel: [72585.111761][ T1171] NVRM: Xid (PCI:0000:04:00): 45, pid=4560, Ch 00000015
kernel: [72585.116045][ T1171] NVRM: Xid (PCI:0000:04:00): 45, pid=4560, Ch 00000016
kernel: [72585.120329][ T1171] NVRM: Xid (PCI:0000:04:00): 45, pid=4560, Ch 00000017realloc(): invalid pointer
I decided to look up the error “sched: RT throttling activated” because I have no idea what it means.
All I could find was the following:
This issue does involve my 3080 (Zotac) on bus (PCI:0000:04:00) but I’m not sure what the problem is.
I thought the card was thermal throttling but I have monitored the temps and they are normal. (<100c)
The system freezes right after getting the “invalid pointer” error.
If you have any ideas on what can be causing the issue please let me know.
I can provide all my logs if you would like to take a look at them.