Hi guys!
I have a problem with my setup. One of the cards keeps falling of the bus when mining ethereum on hiveon. No overclock applied, only power limit of 250W and static fan on 75%.
After reboot it may work for a few hours, but then falls off again.
Error in gminer logs:
~ cat gminer.1.log | grep Error
07:43:20 Error on GPU5: Device not responding, check overclocking settings
08:23:06 Error on GPU5: Device not responding, check overclocking settings
09:38:40 Error on GPU5: Device not responding, check overclocking settings
09:42:15 Error on GPU5: unspecified launch failure
09:55:27 Error on GPU5: Device not responding, check overclocking settings
10:09:36 Error on GPU5: Device not responding, check overclocking settings
10:32:20 Error on GPU5: Device not responding, check overclocking settings
Example of errors in syslog:
Jul 02 09:42:15 Zion kernel: NVRM: Xid (PCI:0000:02:00): 31, pid=25185, Ch 00000012, intr 00000000. MMU Fault: ENGINE GRAPHICS GPCCLIE
NT_T1_1 faulted @ 0x7f98_4b406000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ
Jul 02 10:32:12 Zion kernel: NVRM: Xid (PCI:0000:0a:00): 79, pid=0, GPU has fallen off the bus.
Jul 02 10:48:20 Zion kernel: NVRM: Xid (PCI:0000:09:00): 70, pid=3096, CCMDs 0000001e 0000c7b5
Jul 02 10:48:20 Zion kernel: NVRM: Xid (PCI:0000:01:00): 31, pid=3096, Ch 00000012, intr 00000000. MMU Fault: ENGINE GRAPHICS GPCCLIEN
T_T1_1 faulted @ 0x7f0c_eb407000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ
Jul 02 10:48:20 Zion kernel: NVRM: Xid (PCI:0000:05:00): 31, pid=3096, Ch 00000012, intr 00000000. MMU Fault: ENGINE GRAPHICS GPCCLIEN
T_T1_1 faulted @ 0x7f0c_e2807000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ
Jul 02 10:50:59 Zion kernel: NVRM: Xid (PCI:0000:05:00): 79, pid=4294937038, GPU has fallen off the bus.
Jul 02 10:50:59 Zion kernel: NVRM: Xid (PCI:0000:06:00): 79, pid=0, GPU has fallen off the bus.
Jul 02 10:50:59 Zion kernel: NVRM: Xid (PCI:0000:07:00): 79, pid=0, GPU has fallen off the bus.
Jul 02 10:50:59 Zion kernel: NVRM: Xid (PCI:0000:08:00): 79, pid=0, GPU has fallen off the bus.
Jul 02 10:50:59 Zion kernel: NVRM: Xid (PCI:0000:09:00): 79, pid=4294937037, GPU has fallen off the bus.
Jul 02 10:50:59 Zion kernel: NVRM: Xid (PCI:0000:0a:00): 79, pid=0, GPU has fallen off the bus.
Jul 02 10:50:59 Zion kernel: NVRM: Xid (PCI:0000:0b:00): 79, pid=0, GPU has fallen off the bus.
Jul 02 12:14:04 Zion kernel: NVRM: Xid (PCI:0000:09:00): 79, pid=3121, GPU has fallen off the bus.
Jul 02 12:23:41 Zion kernel: NVRM: Xid (PCI:0000:09:00): 79, pid=3123, GPU has fallen off the bus.
Jul 02 13:18:52 Zion kernel: NVRM: Xid (PCI:0000:09:00): 32, pid=2877, Channel ID 0000001e intr0 00040000
Jul 02 13:18:52 Zion kernel: NVRM: Xid (PCI:0000:09:00): 32, pid=2877, Channel ID 0000001e intr0 00040000
Jul 02 13:18:52 Zion kernel: NVRM: Xid (PCI:0000:01:00): 31, pid=2877, Ch 00000012, intr 00000000. MMU Fault: ENGINE GRAPHICS GPCCLIEN
T_T1_0 faulted @ 0x7f94_63400000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ
Jul 02 13:18:52 Zion kernel: NVRM: Xid (PCI:0000:05:00): 31, pid=2877, Ch 00000012, intr 00000000. MMU Fault: ENGINE GRAPHICS GPCCLIEN
T_T1_0 faulted @ 0x7f94_5a800000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ
The rig is 8xRTX 3090s, different vendors. But I have another Aorus working fine on the same settings. Kernel 5.4.0-hiveos #108, nvidia driver N460.84. Tried T-Rex, having same issue with it.
Does this ring a bell to anyone?
Thanks!
PS: Don’t mind the 0 fan speeds in the screenshot, cured it by disabling autofan and enabling static fan speeds.