GPUs no longer detected, onboard graphics only

mattberryio · August 12, 2021, 3:36am

This is my first rig. I have four RX5700XTs on an ASRock H110 Pro board. I was mining just fine for about 3 days using Hive OS, but there was some error which forced a reboot.

Upon reboot, it’s like the onboard graphics take over. I can’t get the GPUs to be recognized by Hive OS. I have the onboard disabled in the BIOS but that doesn’t seem to work for me.

Anyone else had this issue?

FYI - Before I used Hive OS I was in Windows 10. I noticed a weird behavior where if the onboard NIC took over and booted to Windows 10, I wouldn’t see my GPUs. But if I would reboot the. They’d show up.

Lalading · August 12, 2021, 12:27pm

Get off all Graphic Cards, except the one on main pcie slot and try again.
Start your pc with the lowest configuration - 1 graphic cards, no useless periphery things

mattberryio · August 12, 2021, 6:38pm

I already tried that. It’s like it no longer recognizes the cards.

Easy55 · December 15, 2021, 6:52pm

Hi Matt

I have the exact same problem right now. I had one AMD 6600XT working, then I added a second, which apparently was DOA. I removed the bad GPU and rebooted from HiveOS. Now HiveOS does not recognize the original GPU.

I likely need to clean-up the AMD driver configs, but don’t see how to do that from HiveOS.

If you found a solution, please let me know.

Grea · December 15, 2021, 7:10pm

First, run the latest stable kernel which includes the drivers required. AMD driver changes in Linux distributions is not fun and with Hive, unnecessary.

Second, H110 in particular has a tendency to default BIOS settings. Check them each time you have an issue like this. Yes, I have and run an ASRock H110 Pro.

Easy55 · December 15, 2021, 7:29pm

Thanks Grea

I just flashed my SSD with the latest HiveOS yesterday. BUT, I do see a new release today with updates for AMD drivers. I will give the upgrade a try right now and see if the known good GPU can be found following the upgrade and reboot.

Easy55 · December 15, 2021, 7:35pm

@Grea

Just ran the upgrade to the latest and rebooted. HiveOS says the upgrade worked. But the one single AMD known good 6600XT card still is not found.

I run an amdmeminfo command:

AMDMemInfo for Hive OS v2.1.16
original code by Zuikkis and Yann St.Arnaud

CL_DEVICE_TYPE_GPU Failed: Unable to get the number of OpenCL devices.

Grea · December 15, 2021, 9:15pm

6600XT is well supported by #83 kernel and the drivers included. I only have my test rig on the latest OS right now.

What are working with here:

Motherboard
Risers
Power supplies
GPU mix
Which GPUs are found and not found

Easy55 · December 15, 2021, 9:50pm

Here are all the details. There is only one GPU as of now.

MB: ASUS Prime Z590-P (HiveOS identifies)
CPU: i5-10400 (HiveOS identifies and provide running metrics)
just 1 GPU: AMD RX 6600 XT
PSU: Corsair RM750 80+ Gold 750W
Type 4 PCIe Pwr Cable (swapped out and verifed on another GPU on a diff rig)
UBit Ver010S Plus Riser (swapped out and verifed on another GPU on a diff rig)
Updated to latest BIOS
BIOS key settings:
VT-d = Disabled
IOMMU Pre-Boot = Disabled
Primary Display = CPU Graphics
PCIEx16_1 Link Speed = Gen1
PCIe Speed = Gen1
Above 4G Decoding = Enabled
Launch CSM = Disabled

Grea · December 15, 2021, 10:07pm

Try:

Primary PCIe = CPU Graphics Disabled
PCIEx16_1 Link Speed = Gen2
PCIe Speed = Gen2

GPU BIOS Mods?
GPU Fans spin up? LEDs light up?
Have another GPU just in case?

Easy55 · December 16, 2021, 12:13am

@Grea

I will try the Gen2 settings instead of Gen1. The GPU fans come on and the LEDs light up.

Easy55 · December 16, 2021, 12:58am

@Grea

No joy. The only other solution I can find is to re-image the internal SSD drive with HiveOS and start over. People seem to think that this is the only good solution when AMD drivers are involved. I need to get this resolved before I am willing to convert my rigs to HiveOS. I will give the re-image the drive approach a try tomorrow. Thanks for your help today.

Grea · December 16, 2021, 2:47am

As shared previously: It’s the kernel loads which include the drivers vs. incremental OS updates.

Assuming you get the the Hive OS running, with or without GPUs, you many find it easiest to use the command initiated from the following command from the Hive Shell:
hive-replace -y --stable

Note: 5.10.0 #83 is the Kernel and the 06.-212@211211 is the OS version

Details on Discord Hive Announcements: Discord

fwiw: every one of my rigs have AMD GPUs ranging from 2016 vintage to recent and I have never run anything other than the drivers in the stable kernel release. There does tend to be a delay with a brand new GPU release and it was experienced with 6xxx series each time a new GPU arrived. Thankfully 6600/XT is supported in the latest stable release.

Easy55 · December 17, 2021, 6:31pm

@Grea
Thanks for all your input. I could not get the rig to respond, so I removed it, re-flashed the SSD, and started over.

Question: where do you run “hive-replace -y --stable” from? There is no command line available at the physical rig? I tried it from HiveOS web app “Run Command” icon and nothing happened.

Anyway, the rig now find the GPU, but my new error is the following, which I am following up on via a different thread.

amdgpu: Msg issuing pre-check failed(0xffffffc2) and SMU may be not in the right state!

Thanks!

Grea · December 17, 2021, 8:07pm

Most will run the commands from Hive Shell. Here on the Hive web dashboard for the rig “Hive Shell Start”:
Image 12-17-21 at 3.03 PM

Will then show this little box to be clicked. It should launch a new window with command line and lots of good data, access to miner, etc.:
Image 12-17-21 at 3.04 PM

That error has shown up for me due to bad risers and hot overclocking.

Easy55 · December 18, 2021, 3:07am

@Grea

Thanks for the info on Hive Shell Start. At this point HiveOS does find the GPU on startup but the pre-check error above happens immediately on startup and the rig goes off line in less than a minute, so I can’t run any commands.

I have replaced the riser with known good risers more than once now, and the power cables. I only have the one GPU on a 750W PSU. I have been dropping the overclocking steadily from the original working settings. Slow going because of how fast the OS fails on restart. I will keep at it though and let you know what happens.

I have tried many other suggestions (BIOS settings, reseting the BIOS memory, etc.). I am pretty sure at this point it is the GPU. Might need to try re-flashing the GPU BIOS (ugly). FYI – I already have 3 other rigs running 18 total 6600XT GPUs on Windows, and reasonably stable. However, those rigs have all the standard Windows issues and I do want to migrate them to HiveOS.

Grea · December 18, 2021, 3:36am

Post a picture of the rig dashboard with GPU as it is identified.

You can put the rig in maintenance mode with drivers, reset the clocks the way you want, then start the mining software/flight sheet.

First, let’s see how the GPU is identified.

Easy55 · December 20, 2021, 5:50pm

@Grea

I followed the steps you suggested. Here is the screenshot of the rig.

Here are the error messages from the “No Temps” error:

Grea · December 20, 2021, 6:38pm

Classic riser, 1x adapter, usb cable, or power delivery error.

Have plugged it directly into the motherboard?
Have you removed all the overclocks and let it run default?

Easy55 · December 20, 2021, 11:31pm

@Grea

Great! Your second suggestion in your last reply worked!

Here is what I did: I removed all the overclocks, rebooted, and then started mining. The GPU came up and ran on the AMD 6600XT defaults. Then while it was running I applied my overclocks again, but a little less aggressive: core 956, core mV 762, mem cntlr mV 680, mem mV 1300, mem MHz 1100. At 3 hours up time, all is looking good and very stable.

Next I added a second rig, this one with six 6600XTs. I got the same “No Temps” error on this rig. But when I used the same sequence of starting with no OC’s set, and then adding the OC’s once all was working, again GPUs are running. At one hour looking great! I did need to set the fan speed on this rig because it started to overheat. But at 75% fan each GPU is drawing only 52W to 53W. And max temp is 56C.

Thanks for you help! Letting the GPUs run in default settings was the trick that worked for me!