Gigabyte Aorus RTX 3090 Falls off the bus

Hi guys!
I have a problem with my setup. One of the cards keeps falling of the bus when mining ethereum on hiveon. No overclock applied, only power limit of 250W and static fan on 75%.

After reboot it may work for a few hours, but then falls off again.

Error in gminer logs:

~ cat gminer.1.log | grep Error
07:43:20 Error on GPU5: Device not responding, check overclocking settings
08:23:06 Error on GPU5: Device not responding, check overclocking settings
09:38:40 Error on GPU5: Device not responding, check overclocking settings
09:42:15 Error on GPU5: unspecified launch failure
09:55:27 Error on GPU5: Device not responding, check overclocking settings
10:09:36 Error on GPU5: Device not responding, check overclocking settings
10:32:20 Error on GPU5: Device not responding, check overclocking settings

Example of errors in syslog:

Jul 02 09:42:15 Zion kernel: NVRM: Xid (PCI:0000:02:00): 31, pid=25185, Ch 00000012, intr 00000000. MMU Fault: ENGINE GRAPHICS GPCCLIE
NT_T1_1 faulted @ 0x7f98_4b406000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ                                                   
Jul 02 10:32:12 Zion kernel: NVRM: Xid (PCI:0000:0a:00): 79, pid=0, GPU has fallen off the bus.                                       
Jul 02 10:48:20 Zion kernel: NVRM: Xid (PCI:0000:09:00): 70, pid=3096, CCMDs 0000001e 0000c7b5                                        
Jul 02 10:48:20 Zion kernel: NVRM: Xid (PCI:0000:01:00): 31, pid=3096, Ch 00000012, intr 00000000. MMU Fault: ENGINE GRAPHICS GPCCLIEN
T_T1_1 faulted @ 0x7f0c_eb407000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ                                                    
Jul 02 10:48:20 Zion kernel: NVRM: Xid (PCI:0000:05:00): 31, pid=3096, Ch 00000012, intr 00000000. MMU Fault: ENGINE GRAPHICS GPCCLIEN
T_T1_1 faulted @ 0x7f0c_e2807000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ                                                    
Jul 02 10:50:59 Zion kernel: NVRM: Xid (PCI:0000:05:00): 79, pid=4294937038, GPU has fallen off the bus.                              
Jul 02 10:50:59 Zion kernel: NVRM: Xid (PCI:0000:06:00): 79, pid=0, GPU has fallen off the bus.                                       
Jul 02 10:50:59 Zion kernel: NVRM: Xid (PCI:0000:07:00): 79, pid=0, GPU has fallen off the bus.                                       
Jul 02 10:50:59 Zion kernel: NVRM: Xid (PCI:0000:08:00): 79, pid=0, GPU has fallen off the bus.                                       
Jul 02 10:50:59 Zion kernel: NVRM: Xid (PCI:0000:09:00): 79, pid=4294937037, GPU has fallen off the bus.                              
Jul 02 10:50:59 Zion kernel: NVRM: Xid (PCI:0000:0a:00): 79, pid=0, GPU has fallen off the bus.                                       
Jul 02 10:50:59 Zion kernel: NVRM: Xid (PCI:0000:0b:00): 79, pid=0, GPU has fallen off the bus.                                       
Jul 02 12:14:04 Zion kernel: NVRM: Xid (PCI:0000:09:00): 79, pid=3121, GPU has fallen off the bus.                                    
Jul 02 12:23:41 Zion kernel: NVRM: Xid (PCI:0000:09:00): 79, pid=3123, GPU has fallen off the bus.                                    
Jul 02 13:18:52 Zion kernel: NVRM: Xid (PCI:0000:09:00): 32, pid=2877, Channel ID 0000001e intr0 00040000                             
Jul 02 13:18:52 Zion kernel: NVRM: Xid (PCI:0000:09:00): 32, pid=2877, Channel ID 0000001e intr0 00040000                             
Jul 02 13:18:52 Zion kernel: NVRM: Xid (PCI:0000:01:00): 31, pid=2877, Ch 00000012, intr 00000000. MMU Fault: ENGINE GRAPHICS GPCCLIEN
T_T1_0 faulted @ 0x7f94_63400000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ                                                    
Jul 02 13:18:52 Zion kernel: NVRM: Xid (PCI:0000:05:00): 31, pid=2877, Ch 00000012, intr 00000000. MMU Fault: ENGINE GRAPHICS GPCCLIEN
T_T1_0 faulted @ 0x7f94_5a800000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ                                                    

The rig is 8xRTX 3090s, different vendors. But I have another Aorus working fine on the same settings. Kernel 5.4.0-hiveos #108, nvidia driver N460.84. Tried T-Rex, having same issue with it.

Does this ring a bell to anyone?
Thanks!

PS: Don’t mind the 0 fan speeds in the screenshot, cured it by disabling autofan and enabling static fan speeds.

1 Like

Running it in testing mode just for now, looks like this:

Hi @tmcgray, any updates about this cuda behaviour? :cry:

Hi ! having same problem with one RTX 3060ti LHR , we are about to check if is the riser , but it seams to be that the problem is on the riser or the GPU , because it always happen in the same riser/gpu combination in diferents PCIe slot.

Have you any update? thank you

This topic was automatically closed 416 days after the last reply. New replies are no longer allowed.