HiveOS Crashes when Web Console updates applied

Running HiveOS from an SSD. Any time I make a change (anything that updates the rig) via the web console for my rig, HiveOS bombs out and I have to manually reboot the mining rig.

I also run HiveOS on a PC with 2 cards in it and when I make a change via the web, to that rig everything updates smoothly with no errors.

Any ideas? Thanks!

Are you on latest stable version of HiveOS?

Yes, I am.

Do you have logs enabled for the rig? If so does it say what errors are? If not you can run logs-on and from the rig itself and then reboot.

=== GPU 0, 0d:00.0 Radeon RX 5700 XT 8176 MB #1 === 22:35:13
Traceback (most recent call last):
File “/hive/opt/upp2/upp.py”, line 358, in
main()
File “/hive/opt/upp2/upp.py”, line 354, in main
cli(obj={})()
File “/hive/opt/upp2/click/core.py”, line 722, in call
return self.main(*args, **kwargs)
File “/hive/opt/upp2/click/core.py”, line 697, in main
rv = self.invoke(ctx)
File “/hive/opt/upp2/click/core.py”, line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File “/hive/opt/upp2/click/core.py”, line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File “/hive/opt/upp2/click/core.py”, line 535, in invoke
return callback(*args, **kwargs)
File “/hive/opt/upp2/click/decorators.py”, line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File “/hive/opt/upp2/upp.py”, line 336, in set
decode._write_pp_tables_file(pp_file, pp_bytes)
File “/hive/opt/upp2/upp/decode.py”, line 52, in _write_pp_tables_file
f.close()
OSError: [Errno 62] Timer expired
/hive/sbin/amd-oc.navi.sh: line 81: echo: write error: Invalid argument
/hive/sbin/amd-oc.navi.sh: line 82: echo: write error: Invalid argument
/hive/sbin/amd-oc.navi.sh: line 72: echo: write error: Invalid argument
/hive/sbin/amd-oc.navi.sh: line 75: echo: write error: Invalid argument
/hive/sbin/amd-oc.navi.sh: line 76: echo: write error: Invalid argument
/hive/sbin/amd-oc.navi.sh: line 77: echo: write error: Invalid argument
/hive/sbin/amd-oc.navi.sh: line 99: echo: write error: Invalid argument
/hive/sbin/amd-oc.navi.sh: line 100: echo: write error: Invalid argument
Applying all changes to Power Play table
cat: /sys/class/drm/card1/device/pp_od_clk_voltage: Unknown error 380

Here is more from the logs right after an OC change:

Jan 17 14:06:08 rig29542C kernel: [ 535.081070][T28158] amdgpu 0000:0d:00.0: amdgpu: smu driver if version = 0x00000036, smu fw if version = 0x00000037, smu fw version = 0x002a3d00 (42.61.0)
Jan 17 14:06:08 rig29542C kernel: [ 535.081072][T28158] amdgpu 0000:0d:00.0: amdgpu: SMU driver if version not matched
Jan 17 14:06:08 rig29542C kernel: [ 535.081126][T28158] amdgpu 0000:0d:00.0: amdgpu: use vbios provided pptable
Jan 17 14:06:08 rig29542C kernel: [ 535.081127][T28158] amdgpu 0000:0d:00.0: amdgpu: smc_dpm_info table revision(format.content): 4.5
Jan 17 14:06:08 rig29542C kernel: [ 535.082343][T28158] amdgpu 0000:0d:00.0: amdgpu: SMU is initialized successfully!
Jan 17 14:06:12 rig29542C kernel: [ 535.235423][ C0] amdgpu 0000:0d:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:2 pasid:32769, for process PhoenixMiner pid 2009 thread PhoenixMiner pid 2261)
Jan 17 14:06:12 rig29542C kernel: [ 535.237327][ C0] amdgpu 0000:0d:00.0: amdgpu: in page starting at address 0x0000e2ed1764c000 from client 27
Jan 17 14:06:12 rig29542C kernel: [ 535.238302][ C0] amdgpu 0000:0d:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0xFFFFFFFF
Jan 17 14:06:12 rig29542C kernel: [ 535.239290][ C0] amdgpu 0000:0d:00.0: amdgpu: Faulty UTCL2 client ID: 0x1ff
Jan 17 14:06:12 rig29542C kernel: [ 535.240285][ C0] amdgpu 0000:0d:00.0: amdgpu: MORE_FAULTS: 0x1
Jan 17 14:06:12 rig29542C kernel: [ 535.241248][ C0] amdgpu 0000:0d:00.0: amdgpu: WALKER_ERROR: 0x7
Jan 17 14:06:12 rig29542C kernel: [ 535.242222][ C0] amdgpu 0000:0d:00.0: amdgpu: PERMISSION_FAULTS: 0xf
Jan 17 14:06:12 rig29542C kernel: [ 535.243178][ C0] amdgpu 0000:0d:00.0: amdgpu: MAPPING_ERROR: 0x1
Jan 17 14:06:12 rig29542C kernel: [ 535.244130][ C0] amdgpu 0000:0d:00.0: amdgpu: RW: 0x1
Jan 17 14:06:12 rig29542C kernel: [ 535.385772][ C0] amdgpu 0000:0d:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:2 pasid:32769, for process PhoenixMiner pid 2009 thread PhoenixMiner pid 2261)
Jan 17 14:06:12 rig29542C kernel: [ 535.386913][ C0] amdgpu 0000:0d:00.0: amdgpu: in page starting at address 0x0000e331b80d7000 from client 27
Jan 17 14:06:12 rig29542C kernel: [ 535.388086][ C0] amdgpu 0000:0d:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0xFFFFFFFF
Jan 17 14:06:12 rig29542C kernel: [ 535.389868][ C0] amdgpu 0000:0d:00.0: amdgpu: Faulty UTCL2 client ID: 0x1ff
Jan 17 14:06:12 rig29542C kernel: [ 535.391662][ C0] amdgpu 0000:0d:00.0: amdgpu: MORE_FAULTS: 0x1
Jan 17 14:06:12 rig29542C kernel: [ 535.393428][ C0] amdgpu 0000:0d:00.0: amdgpu: WALKER_ERROR: 0x7
Jan 17 14:06:12 rig29542C kernel: [ 535.395192][ C0] amdgpu 0000:0d:00.0: amdgpu: PERMISSION_FAULTS: 0xf
Jan 17 14:06:12 rig29542C kernel: [ 535.396966][ C0] amdgpu 0000:0d:00.0: amdgpu: MAPPING_ERROR: 0x1
Jan 17 14:06:12 rig29542C kernel: [ 535.398741][ C0] amdgpu 0000:0d:00.0: amdgpu: RW: 0x1
Jan 17 14:06:12 rig29542C kernel: [ 535.546144][ C0] amdgpu 0000:0d:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:2 pasid:32769, for process PhoenixMiner pid 2009 thread PhoenixMiner pid 2261)
Jan 17 14:06:12 rig29542C kernel: [ 535.549733][ C0] amdgpu 0000:0d:00.0: amdgpu: in page starting at address 0x00000b2c2f0c8000 from client 27
Jan 17 14:06:12 rig29542C kernel: [ 535.551621][ C0] amdgpu 0000:0d:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0xFFFFFFFF
Jan 17 14:06:12 rig29542C kernel: [ 535.553535][ C0] amdgpu 0000:0d:00.0: amdgpu: Faulty UTCL2 client ID: 0x1ff
Jan 17 14:06:12 rig29542C kernel: [ 535.555432][ C0] amdgpu 0000:0d:00.0: amdgpu: MORE_FAULTS: 0x1
Jan 17 14:06:12 rig29542C kernel: [ 535.557298][ C0] amdgpu 0000:0d:00.0: amdgpu: WALKER_ERROR: 0x7
Jan 17 14:06:12 rig29542C kernel: [ 535.559185][ C0] amdgpu 0000:0d:00.0: amdgpu: PERMISSION_FAULTS: 0xf
Jan 17 14:06:12 rig29542C kernel: [ 535.561048][ C0] amdgpu 0000:0d:00.0: amdgpu: MAPPING_ERROR: 0x1
Jan 17 14:06:12 rig29542C kernel: [ 535.562877][ C0] amdgpu 0000:0d:00.0: amdgpu: RW: 0x1
Jan 17 14:06:12 rig29542C kernel: [ 535.706517][ C0] amdgpu 0000:0d:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:2 pasid:32769, for process PhoenixMiner pid 2009 thread PhoenixMiner pid 2261)
Jan 17 14:06:12 rig29542C kernel: [ 535.710192][ C0] amdgpu 0000:0d:00.0: amdgpu: in page starting at address 0x000066ddff8c0000 from client 27
Jan 17 14:06:12 rig29542C kernel: [ 535.712097][ C0] amdgpu 0000:0d:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0xFFFFFFFF
Jan 17 14:06:12 rig29542C kernel: [ 535.714012][ C0] amdgpu 0000:0d:00.0: amdgpu: Faulty UTCL2 client ID: 0x1ff
Jan 17 14:06:12 rig29542C kernel: [ 535.715897][ C0] amdgpu 0000:0d:00.0: amdgpu: MORE_FAULTS: 0x1
Jan 17 14:06:12 rig29542C kernel: [ 535.717758][ C0] amdgpu 0000:0d:00.0: amdgpu: WALKER_ERROR: 0x7
Jan 17 14:06:12 rig29542C kernel: [ 535.719595][ C0] amdgpu 0000:0d:00.0: amdgpu: PERMISSION_FAULTS: 0xf
Jan 17 14:06:12 rig29542C kernel: [ 535.721433][ C0] amdgpu 0000:0d:00.0: amdgpu: MAPPING_ERROR: 0x1
Jan 17 14:06:12 rig29542C kernel: [ 535.723258][ C0] amdgpu 0000:0d:00.0: amdgpu: RW: 0x1
Jan 17 14:06:12 rig29542C kernel: [ 535.866889][ C0] amdgpu 0000:0d:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:2 pasid:32769, for process PhoenixMiner pid 2009 thread PhoenixMiner pid 2261)
Jan 17 14:06:12 rig29542C kernel: [ 535.868323][ C0] amdgpu 0000:0d:00.0: amdgpu: in page starting at address 0x0000031fcaf7a000 from client 27
Jan 17 14:06:12 rig29542C kernel: [ 535.868937][ C0] amdgpu 0000:0d:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0xFFFFFFFF
Jan 17 14:06:12 rig29542C kernel: [ 535.869531][ C0] amdgpu 0000:0d:00.0: amdgpu: Faulty UTCL2 client ID: 0x1ff
Jan 17 14:06:12 rig29542C kernel: [ 535.870120][ C0] amdgpu 0000:0d:00.0: amdgpu: MORE_FAULTS: 0x1
Jan 17 14:06:12 rig29542C kernel: [ 535.870715][ C0] amdgpu 0000:0d:00.0: amdgpu: WALKER_ERROR: 0x7
Jan 17 14:06:12 rig29542C kernel: [ 535.871270][ C0] amdgpu 0000:0d:00.0: amdgpu: PERMISSION_FAULTS: 0xf
Jan 17 14:06:12 rig29542C kernel: [ 535.871808][ C0] amdgpu 0000:0d:00.0: amdgpu: MAPPING_ERROR: 0x1
Jan 17 14:06:12 rig29542C kernel: [ 535.872328][ C0] amdgpu 0000:0d:00.0: amdgpu: RW: 0x1
Jan 17 14:06:12 rig29542C kernel: [ 536.017239][ C0] amdgpu 0000:0d:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:2 pasid:32769, for process PhoenixMiner pid 2009 thread PhoenixMiner pid 2261)
Jan 17 14:06:12 rig29542C kernel: [ 536.018449][ C0] amdgpu 0000:0d:00.0: amdgpu: in page starting at address 0x000003944fa64000 from client 27
Jan 17 14:06:12 rig29542C kernel: [ 536.019042][ C0] amdgpu 0000:0d:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0xFFFFFFFF
Jan 17 14:06:12 rig29542C kernel: [ 536.019638][ C0] amdgpu 0000:0d:00.0: amdgpu: Faulty UTCL2 client ID: 0x1ff
Jan 17 14:06:12 rig29542C kernel: [ 536.020241][ C0] amdgpu 0000:0d:00.0: amdgpu: MORE_FAULTS: 0x1
Jan 17 14:06:12 rig29542C kernel: [ 536.020853][ C0] amdgpu 0000:0d:00.0: amdgpu: WALKER_ERROR: 0x7
Jan 17 14:06:12 rig29542C kernel: [ 536.021449][ C0] amdgpu 0000:0d:00.0: amdgpu: PERMISSION_FAULTS: 0xf
Jan 17 14:06:12 rig29542C kernel: [ 536.022035][ C0] amdgpu 0000:0d:00.0: amdgpu: MAPPING_ERROR: 0x1
Jan 17 14:06:12 rig29542C kernel: [ 536.022600][ C0] amdgpu 0000:0d:00.0: amdgpu: RW: 0x1
Jan 17 14:06:12 rig29542C kernel: [ 536.167586][ C0] amdgpu 0000:0d:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:2 pasid:32769, for process PhoenixMiner pid 2009 thread PhoenixMiner pid 2261)
Jan 17 14:06:12 rig29542C kernel: [ 536.168658][ C0] amdgpu 0000:0d:00.0: amdgpu: in page starting at address 0x000051ea2049f000 from client 27
Jan 17 14:06:12 rig29542C kernel: [ 536.169196][ C0] amdgpu 0000:0d:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0xFFFFFFFF
Jan 17 14:06:12 rig29542C kernel: [ 536.169739][ C0] amdgpu 0000:0d:00.0: amdgpu: Faulty UTCL2 client ID: 0x1ff
Jan 17 14:06:12 rig29542C kernel: [ 536.170284][ C0] amdgpu 0000:0d:00.0: amdgpu: MORE_FAULTS: 0x1
Jan 17 14:06:12 rig29542C kernel: [ 536.170844][ C0] amdgpu 0000:0d:00.0: amdgpu: WALKER_ERROR: 0x7
Jan 17 14:06:12 rig29542C kernel: [ 536.171388][ C0] amdgpu 0000:0d:00.0: amdgpu: PERMISSION_FAULTS: 0xf
Jan 17 14:06:12 rig29542C kernel: [ 536.171924][ C0] amdgpu 0000:0d:00.0: amdgpu: MAPPING_ERROR: 0x1
Jan 17 14:06:12 rig29542C kernel: [ 536.172447][ C0] amdgpu 0000:0d:00.0: amdgpu: RW: 0x1
Jan 17 14:06:12 rig29542C kernel: [ 536.317938][ C0] amdgpu 0000:0d:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:2 pasid:32769, for process PhoenixMiner pid 2009 thread PhoenixMiner pid 2261)
Jan 17 14:06:12 rig29542C kernel: [ 536.319211][ C0] amdgpu 0000:0d:00.0: amdgpu: in page starting at address 0x00009b5d3ed59000 from client 27
Jan 17 14:06:12 rig29542C kernel: [ 536.319839][ C0] amdgpu 0000:0d:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0xFFFFFFFF
Jan 17 14:06:12 rig29542C kernel: [ 536.320471][ C0] amdgpu 0000:0d:00.0: amdgpu: Faulty UTCL2 client ID: 0x1ff
Jan 17 14:06:12 rig29542C kernel: [ 536.321119][ C0] amdgpu 0000:0d:00.0: amdgpu: MORE_FAULTS: 0x1
Jan 17 14:06:12 rig29542C kernel: [ 536.321752][ C0] amdgpu 0000:0d:00.0: amdgpu: WALKER_ERROR: 0x7
Jan 17 14:06:12 rig29542C kernel: [ 536.322384][ C0] amdgpu 0000:0d:00.0: amdgpu: PERMISSION_FAULTS: 0xf
Jan 17 14:06:12 rig29542C kernel: [ 536.323026][ C0] amdgpu 0000:0d:00.0: amdgpu: MAPPING_ERROR: 0x1
Jan 17 14:06:12 rig29542C kernel: [ 536.323627][ C0] amdgpu 0000:0d:00.0: amdgpu: RW: 0x1
Jan 17 14:06:12 rig29542C kernel: [ 536.468288][ C0] amdgpu 0000:0d:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:2 pasid:32769, for process PhoenixMiner pid 2009 thread PhoenixMiner pid 2261)
Jan 17 14:06:12 rig29542C kernel: [ 536.471563][ C0] amdgpu 0000:0d:00.0: amdgpu: in page starting at address 0x000048fcfed6b000 from client 27
Jan 17 14:06:12 rig29542C kernel: [ 536.473286][ C0] amdgpu 0000:0d:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0xFFFFFFFF
Jan 17 14:06:12 rig29542C kernel: [ 536.475022][ C0] amdgpu 0000:0d:00.0: amdgpu: Faulty UTCL2 client ID: 0x1ff
Jan 17 14:06:12 rig29542C kernel: [ 536.476766][ C0] amdgpu 0000:0d:00.0: amdgpu: MORE_FAULTS: 0x1
Jan 17 14:06:12 rig29542C kernel: [ 536.478504][ C0] amdgpu 0000:0d:00.0: amdgpu: WALKER_ERROR: 0x7
Jan 17 14:06:12 rig29542C kernel: [ 536.480251][ C0] amdgpu 0000:0d:00.0: amdgpu: PERMISSION_FAULTS: 0xf
Jan 17 14:06:12 rig29542C kernel: [ 536.482005][ C0] amdgpu 0000:0d:00.0: amdgpu: MAPPING_ERROR: 0x1
Jan 17 14:06:12 rig29542C kernel: [ 536.483656][ C0] amdgpu 0000:0d:00.0: amdgpu: RW: 0x1
Jan 17 14:06:12 rig29542C kernel: [ 536.628661][ C0] amdgpu 0000:0d:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:2 pasid:32769, for process PhoenixMiner pid 2009 thread PhoenixMiner pid 2261)
Jan 17 14:06:12 rig29542C kernel: [ 536.631944][ C0] amdgpu 0000:0d:00.0: amdgpu: in page starting at address 0x0000490ccaeb9000 from client 27
Jan 17 14:06:12 rig29542C kernel: [ 536.633665][ C0] amdgpu 0000:0d:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0xFFFFFFFF
Jan 17 14:06:12 rig29542C kernel: [ 536.635403][ C0] amdgpu 0000:0d:00.0: amdgpu: Faulty UTCL2 client ID: 0x1ff
Jan 17 14:06:12 rig29542C kernel: [ 536.637146][ C0] amdgpu 0000:0d:00.0: amdgpu: MORE_FAULTS: 0x1
Jan 17 14:06:12 rig29542C kernel: [ 536.638896][ C0] amdgpu 0000:0d:00.0: amdgpu: WALKER_ERROR: 0x7
Jan 17 14:06:12 rig29542C kernel: [ 536.640634][ C0] amdgpu 0000:0d:00.0: amdgpu: PERMISSION_FAULTS: 0xf
Jan 17 14:06:12 rig29542C kernel: [ 536.642359][ C0] amdgpu 0000:0d:00.0: amdgpu: MAPPING_ERROR: 0x1
Jan 17 14:06:12 rig29542C kernel: [ 536.644020][ C0] amdgpu 0000:0d:00.0: amdgpu: RW: 0x1
Jan 17 14:06:12 rig29542C kernel: [ 538.793702][T28105] amdgpu 0000:0d:00.0: amdgpu: failed send message: TransferTableDram2Smu (19) param: 0x00000009 response 0xfffffffb
Jan 17 14:06:12 rig29542C kernel: [ 538.794904][T28105] amdgpu 0000:0d:00.0: amdgpu: Failed to import overdrive table!
Jan 17 14:06:12 rig29542C kernel: [ 538.843815][T28105] amdgpu 0000:0d:00.0: amdgpu: Msg issuing pre-check failed and SMU may be not in the right state!
Jan 17 14:06:12 rig29542C kernel: [ 538.844368][T28105] amdgpu 0000:0d:00.0: amdgpu: Failed to import overdrive table!
Jan 17 14:06:12 rig29542C kernel: [ 538.893945][T28105] amdgpu 0000:0d:00.0: amdgpu: Msg issuing pre-check failed and SMU may be not in the right state!
Jan 17 14:06:12 rig29542C kernel: [ 538.894512][T28105] amdgpu 0000:0d:00.0: amdgpu: [smu_v11_0_auto_fan_control]Stop smc FAN CONTROL feature failed!
Jan 17 14:06:12 rig29542C kernel: [ 538.895079][T28105] amdgpu 0000:0d:00.0: amdgpu: [smu_v11_0_set_fan_control_mode]Set fan control mode failed!
Jan 17 14:06:12 rig29542C kernel: [ 538.944048][T28105] amdgpu 0000:0d:00.0: amdgpu: Msg issuing pre-check failed and SMU may be not in the right state!
Jan 17 14:06:12 rig29542C kernel: [ 538.994179][T28105] amdgpu 0000:0d:00.0: amdgpu: Msg issuing pre-check failed and SMU may be not in the right state!

I’ve been having this issue as well with the latest versions of everything. HiveOS, drivers, bios updates etc and its been nothing but trouble. No wonder why so many people have left the Hiveon pool. I wish I never tried swapping over to HiveOS its been nothing but trouble. Even if it was 1000 rigs for free it just doesn’t make sense when nothing works. Moving back to windows this weekend.

RIP HiveOS 2017-2020

having the same issue. Ive watched a friend apply over clock settings while the rig is actively mining and it accepts all changed being made. I come home and do literally the same thing and the miner freaks out and crashed and have to perform a hard reset. If i pause mining and wait for it to completely stop, I can then make the changes to individual card OCs and upload to miner, and then restart. Only then does it work. It is very time consuming as the miner does not like to be paused during an auto tune.

Same problem.

I had upgraded to get TRM’s A and B modes but started to get continuous crashes of TRM (known TRM bug).

Decided to upgrade the build and since then its been horrible.

Trying to find what version I was when I had stability, I think it was a beta version.

Sorry for delayed response, I’m not super active on the forum tbh and spend most of my time troubleshooting on the Discord channel.

This error appears to be an issue with PhoenixMiner. I highly recommend swapping to a supported miner, such as TeamRedMiner for AMD & T-Rex for Nvidia. There is a lot of information on the internet about how PhoenixMiner is known to inflate hashrates and lower the amount of valid shares you get.

However, if you are insistent on staying on PhoenixMiner, try downgrading to 5.4c.

Not all cards are made equally, and some have special needs. This is something you get used to. Yet again, I’ll repeat myself; TeamRedMiner & T-Rex Miner are generally far more stable then Phoenix, and don’t lie to your face while taking valid shares out of your pocket.

What Kernel version were you on currently? In most situations, I see beta version being stable most of the time but there’s really no reason to be on a Beta version unless you have a Big Navi card (RX 6xxx series card).

I am on the latest stable build, the one with the new TRM. 0.6-200@210302

Same problem using TRM (Team Red Miner version 0.8.1.1). Phoenixminer 5.5c has also the same problem. I changed the raiser board but this did not fix the issue. Here is what syslog says:

Mar 31 19:21:14 rig06B127 hive-watchdog[931]: OK LA(5m): 0.10 < 22.0, LA(1m): 0.13 < 44.0
Mar 31 19:21:17 rig06B127 kernel: [ 1455.888380][T13718] amdgpu 0000:04:00.0: amdgpu: smu driver if version = 0x00000036, smu fw if version = 0x00000037, smu fw version = 0x002a3d00 (42.61.0)
Mar 31 19:21:17 rig06B127 kernel: [ 1455.888382][T13718] amdgpu 0000:04:00.0: amdgpu: SMU driver if version not matched
Mar 31 19:21:17 rig06B127 kernel: [ 1455.888434][T13718] amdgpu 0000:04:00.0: amdgpu: use vbios provided pptable
Mar 31 19:21:17 rig06B127 kernel: [ 1455.888435][T13718] amdgpu 0000:04:00.0: amdgpu: smc_dpm_info table revision(format.content): 4.5
Mar 31 19:21:17 rig06B127 kernel: [ 1455.889610][T13718] amdgpu 0000:04:00.0: amdgpu: SMU is initialized successfully!
Mar 31 19:21:17 rig06B127 kernel: [ 1456.054013][T13665] amdgpu 0000:04:00.0: amdgpu: failed send message: TransferTableDram2Smu (19) param: 0x00000009 response 0xfffffffb
Mar 31 19:21:17 rig06B127 kernel: [ 1456.103994][ C1] amdgpu 0000:04:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:3 pasid:32769, for process teamredminer pid 29024 thread teamredminer pid 29024)
Mar 31 19:21:17 rig06B127 kernel: [ 1456.103996][T13665] amdgpu 0000:04:00.0: amdgpu: Failed to import overdrive table!
Mar 31 19:21:17 rig06B127 kernel: [ 1456.104001][ C1] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x00007f6f03988000 from client 27
Mar 31 19:21:17 rig06B127 kernel: [ 1456.104008][ C1] amdgpu 0000:04:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0xFFFFFFFF
Mar 31 19:21:17 rig06B127 kernel: [ 1456.104011][ C1] amdgpu 0000:04:00.0: amdgpu: Faulty UTCL2 client ID: 0x1ff
Mar 31 19:21:17 rig06B127 kernel: [ 1456.104013][ C1] amdgpu 0000:04:00.0: amdgpu: MORE_FAULTS: 0x1
Mar 31 19:21:17 rig06B127 kernel: [ 1456.104015][ C1] amdgpu 0000:04:00.0: amdgpu: WALKER_ERROR: 0x7
Mar 31 19:21:17 rig06B127 kernel: [ 1456.104016][ C1] amdgpu 0000:04:00.0: amdgpu: PERMISSION_FAULTS: 0xf
Mar 31 19:21:17 rig06B127 kernel: [ 1456.104018][ C1] amdgpu 0000:04:00.0: amdgpu: MAPPING_ERROR: 0x1
Mar 31 19:21:17 rig06B127 kernel: [ 1456.104020][ C1] amdgpu 0000:04:00.0: amdgpu: RW: 0x1
Mar 31 19:21:17 rig06B127 kernel: [ 1456.154030][T13665] amdgpu 0000:04:00.0: amdgpu: Msg issuing pre-check failed and SMU may be not in the right state!
Mar 31 19:21:17 rig06B127 kernel: [ 1456.154035][T13665] amdgpu 0000:04:00.0: amdgpu: Failed to import overdrive table!
Mar 31 19:21:17 rig06B127 kernel: [ 1456.204037][T13665] amdgpu 0000:04:00.0: amdgpu: Msg issuing pre-check failed and SMU may be not in the right state!
Mar 31 19:21:17 rig06B127 kernel: [ 1456.204043][T13665] amdgpu 0000:04:00.0: amdgpu: [smu_v11_0_auto_fan_control]Stop smc FAN CONTROL feature failed!
Mar 31 19:21:17 rig06B127 kernel: [ 1456.204046][T13665] amdgpu 0000:04:00.0: amdgpu: [smu_v11_0_set_fan_control_mode]Set fan control mode failed!
Mar 31 19:21:17 rig06B127 kernel: [ 1456.254049][T13665] amdgpu 0000:04:00.0: amdgpu: Msg issuing pre-check failed and SMU may be not in the right state!
Mar 31 19:21:17 rig06B127 kernel: [ 1456.304069][T13665] amdgpu 0000:04:00.0: amdgpu: Msg issuing pre-check failed and SMU may be not in the right state!
Mar 31 19:21:17 rig06B127 kernel: [ 1456.304076][T13665] amdgpu 0000:04:00.0: amdgpu: [smu_v11_0_auto_fan_control]Stop smc FAN CONTROL feature failed!
Mar 31 19:21:17 rig06B127 kernel: [ 1456.304078][T13665] amdgpu 0000:04:00.0: amdgpu: [smu_v11_0_set_fan_control_mode]Set fan control mode failed!
Mar 31 19:21:17 rig06B127 kernel: [ 1456.306213][T13665] amdgpu: manual fan speed control should be enabled first
Mar 31 19:21:24 rig06B127 avg_khs[1693]: {“params”:{“avg_khs”:{“ethash”:[51346.3,35342.5]}}}
Mar 31 19:21:26 rig06B127 kernel: [ 1464.736511][T13913] amdgpu 0000:04:00.0: amdgpu: Msg issuing pre-check failed and SMU may be not in the right state!
Mar 31 19:21:26 rig06B127 kernel: [ 1464.736519][T13913] amdgpu 0000:04:00.0: amdgpu: Failed to export SMU metrics table!
Mar 31 19:21:26 rig06B127 kernel: [ 1464.786531][T13896] amdgpu 0000:04:00.0: amdgpu: Msg issuing pre-check failed and SMU may be not in the right state!
Mar 31 19:21:26 rig06B127 kernel: [ 1464.786537][T13896] amdgpu 0000:04:00.0: amdgpu: Failed to export SMU metrics table!
Mar 31 19:21:26 rig06B127 kernel: [ 1464.836548][T13896] amdgpu 0000:04:00.0: amdgpu: Msg issuing pre-check failed and SMU may be not in the right state!
Mar 31 19:21:26 rig06B127 kernel: [ 1464.836554][T13896] amdgpu 0000:04:00.0: amdgpu: Failed to export SMU metrics table!
Mar 31 19:21:26 rig06B127 kernel: [ 1464.936565][T13896] amdgpu 0000:04:00.0: amdgpu: Msg issuing pre-check failed and SMU may be not in the right state!
Mar 31 19:21:26 rig06B127 kernel: [ 1464.936570][T13896] amdgpu 0000:04:00.0: amdgpu: Failed to export SMU metrics table!
Mar 31 19:21:26 rig06B127 kernel: [ 1464.986572][T13896] amdgpu 0000:04:00.0: amdgpu: Msg issuing pre-check failed and SMU may be not in the right state!
Mar 31 19:21:26 rig06B127 kernel: [ 1464.986575][T13896] amdgpu 0000:04:00.0: amdgpu: Failed to export SMU metrics table!
Mar 31 19:21:26 rig06B127 kernel: [ 1465.036590][T13896] amdgpu 0000:04:00.0: amdgpu: Msg issuing pre-check failed and SMU may be not in the right state!
Mar 31 19:21:26 rig06B127 kernel: [ 1465.036593][T13896] amdgpu 0000:04:00.0: amdgpu: Failed to export SMU metrics table!
Mar 31 19:21:26 rig06B127 kernel: [ 1465.086603][T13896] amdgpu 0000:04:00.0: amdgpu: Msg issuing pre-check failed and SMU may be not in the right state!
Mar 31 19:21:26 rig06B127 kernel: [ 1465.086606][T13896] amdgpu 0000:04:00.0: amdgpu: Failed to export SMU metrics table!
Mar 31 19:21:27 rig06B127 kernel: [ 1466.106968][ T3382] [drm:amdgpu_job_timedout [amdgpu]] ERROR ring sdma0 timeout, signaled seq=4961, emitted seq=4962
Mar 31 19:21:27 rig06B127 kernel: [ 1466.107121][ T3382] [drm:amdgpu_job_timedout [amdgpu]] ERROR Process information: process teamredminer pid 29024 thread teamredminer pid 29024
Mar 31 19:21:27 rig06B127 kernel: [ 1466.107128][ T3382] [drm] GPU recovery disabled.

Remove all OC settings - then reboot then apply OC settings… if this dont work ,Try to write the image from the beggining… to stable version…

1 Like

i try fix 2 days. all not work. by doing this by remove all OC setting its works again. thanks

Hi,

I am facing this issue “Msg issuing pre-check failed and SMU may be not in the right state!” for the past few weeks on 4x6700XT rig. Happens randomly with any 1 of the cards.
Could you let me know the solution you found.
Any help would be appreciated!

Things I have tried: Different PCIE Gens, different mobo, reflashed hive, changed risers.

Any luck? I’m running into this issue too now. Seems to be my 5700xt cards, but hiveos reports the following for every one of my amd cards (6800xt, 6800, 6700xt, 6600xt):
amdgpu: Failed to export SMU metrics table!

nope bro, no concrete solution as of yet. I get these errors very randomly.
all I can say is try to lower your MEM freq a little.
might help.
will update if I have anything.

can you let me know if you’re using a pen drive or a ssd as boot device?

Thanks for the reply! Hopefully we find out the issue. I’m using a SSD for my boot device.

I tried cleaning up the rig as it had some dust built up, and swapped out the pci-e multiplier I was using for 4 of the 5700xt gpus that I believe were causing the failures. Also ran a couple commands:
apt-get -f install
apt update
apt upgrade
selfupgrade -f

After a couple reboots, the rig has been steady for the past couple of hours. Before, I was getting the smu error and it would crash every 5-7 minutes, if not earlier. I’ll report back if it doesn’t stay stable and gives me the same smu error again.

Any updates?

That seems to have fixed it for me. The rig is back to normal and not throwing those smu errors at me. So I think it mainly related to the pcie multiplier. But the weird thing is I just swapped it with another one I was using on the same rig.