Hi Guys,
I want to know is there any way to see GPU Memory Errors in HiveOS?
Hi Guys,
I want to know is there any way to see GPU Memory Errors in HiveOS?
“dmesg | grep amd” will display memory errors in Linux. there is ways to go deeper than this but this should be sufficient
Thanks for your reply,
But I cant understand what is this: “dmesg | grep amd”. Is it a Linux command?
Yes Linux command you can put in the “Run command” section on the rig page or login to the Rig itself and run it there.
Thanks a lot,
I will check and tell you.
Hi guys, and how to understand that there are errors? my conclusion is this
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.10.0-hiveos root=UUID=b4b60f60-cd34-49c7-859b-53f802e8659c ro text consoleblank=0 intel_pstate=disable net.ifnames=0 ipv6.disable=1 pci=noaer iommu=soft usbcore.autosuspend=-1 radeon.si_support=0 radeon.cik_support=0 amdgpu.vm_fragment_size=9 amdgpu.si_support=1 amdgpu.cik_support=1 amdgpu.ppfeaturemask=0xffff7fff amdgpu.runpm=0 amdgpu.gpu_recovery=0 noibrs noibpb nopti nospectre_v2 nospectre_v1 l1tf=off nospec_store_bypass_disable no_stf_barrier mds=off mitigations=off e1000e.EEE=0 fsck.mode=force fsck.repair=yes
[ 0.025286] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.10.0-hiveos root=UUID=b4b60f60-cd34-49c7-859b-53f802e8659c ro text consoleblank=0 intel_pstate=disable net.ifnames=0 ipv6.disable=1 pci=noaer iommu=soft usbcore.autosuspend=-1 radeon.si_support=0 radeon.cik_support=0 amdgpu.vm_fragment_size=9 amdgpu.si_support=1 amdgpu.cik_support=1 amdgpu.ppfeaturemask=0xffff7fff amdgpu.runpm=0 amdgpu.gpu_recovery=0 noibrs noibpb nopti nospectre_v2 nospectre_v1 l1tf=off nospec_store_bypass_disable no_stf_barrier mds=off mitigations=off e1000e.EEE=0 fsck.mode=force fsck.repair=yes
[ 10.978448] [drm] amdgpu kernel modesetting enabled.
[ 10.978450] [drm] amdgpu version: 5.11.1001
[ 10.978612] amdgpu: CRAT table not found
[ 10.978615] amdgpu: Virtual CRAT table created for CPU
[ 10.978627] amdgpu: Topology: Add CPU node
[ 10.981251] amdgpu 0000:03:00.0: enabling device (0000 → 0003)
[ 10.981334] amdgpu 0000:03:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[ 10.981367] kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics
[ 11.223528] amdgpu 0000:03:00.0: amdgpu: Fetched VBIOS from ROM BAR
[ 11.223530] amdgpu: ATOM BIOS: 113-4E353BU-O4E
[ 11.355790] amdgpu 0000:03:00.0: BAR 2: releasing [mem 0xf0000000-0xf01fffff 64bit pref]
[ 11.355793] amdgpu 0000:03:00.0: BAR 0: releasing [mem 0xe0000000-0xefffffff 64bit pref]
[ 11.355839] amdgpu 0000:03:00.0: BAR 0: no space for [mem size 0x200000000 64bit pref]
[ 11.355841] amdgpu 0000:03:00.0: BAR 0: failed to assign [mem size 0x200000000 64bit pref]
[ 11.355843] amdgpu 0000:03:00.0: BAR 2: no space for [mem size 0x00200000 64bit pref]
[ 11.355845] amdgpu 0000:03:00.0: BAR 2: failed to assign [mem size 0x00200000 64bit pref]
[ 11.355893] amdgpu 0000:03:00.0: BAR 0: assigned [mem 0xe0000000-0xefffffff 64bit pref]
[ 11.355906] amdgpu 0000:03:00.0: BAR 2: assigned [mem 0xf0000000-0xf01fffff 64bit pref]
[ 11.355938] amdgpu 0000:03:00.0: amdgpu: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used)
[ 11.355940] amdgpu 0000:03:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[ 11.356011] [drm] amdgpu: 8192M of VRAM memory ready
[ 11.356015] [drm] amdgpu: 8192M of GTT memory ready.
[ 11.366946] amdgpu: [powerplay] hwmgr_sw_init smu backed is polaris10_smu
[ 11.824294] amdgpu 0000:03:00.0: amdgpu: SE 4, SH per SE 1, CU per SH 9, active_cu_number 36
[ 11.824545] amdgpu 0000:03:00.0: [drm] Cannot find any crtc or sizes
[ 11.829570] [drm] Initialized amdgpu 3.42.0 20150101 for 0000:03:00.0 on minor 1
[ 11.829655] amdgpu 0000:04:00.0: enabling device (0000 → 0003)
[ 11.829743] amdgpu 0000:04:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[ 11.829777] kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics
[ 12.071764] amdgpu 0000:04:00.0: amdgpu: Fetched VBIOS from ROM BAR
[ 12.071766] amdgpu: ATOM BIOS: 113-4E353BU-O4E
[ 12.202692] amdgpu 0000:04:00.0: BAR 2: releasing [mem 0xd0000000-0xd01fffff 64bit pref]
[ 12.202694] amdgpu 0000:04:00.0: BAR 0: releasing [mem 0xc0000000-0xcfffffff 64bit pref]
[ 12.202738] amdgpu 0000:04:00.0: BAR 0: no space for [mem size 0x200000000 64bit pref]
[ 12.202739] amdgpu 0000:04:00.0: BAR 0: failed to assign [mem size 0x200000000 64bit pref]
[ 12.202742] amdgpu 0000:04:00.0: BAR 2: no space for [mem size 0x00200000 64bit pref]
[ 12.202743] amdgpu 0000:04:00.0: BAR 2: failed to assign [mem size 0x00200000 64bit pref]
[ 12.202788] amdgpu 0000:04:00.0: BAR 0: assigned [mem 0xc0000000-0xcfffffff 64bit pref]
[ 12.202801] amdgpu 0000:04:00.0: BAR 2: assigned [mem 0xd0000000-0xd01fffff 64bit pref]
[ 12.202835] amdgpu 0000:04:00.0: amdgpu: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used)
[ 12.202836] amdgpu 0000:04:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[ 12.202885] [drm] amdgpu: 8192M of VRAM memory ready
[ 12.202889] [drm] amdgpu: 8192M of GTT memory ready.
[ 12.205404] amdgpu: [powerplay] hwmgr_sw_init smu backed is polaris10_smu
[ 12.656405] amdgpu 0000:04:00.0: amdgpu: SE 4, SH per SE 1, CU per SH 9, active_cu_number 36
[ 12.656640] amdgpu 0000:04:00.0: [drm] Cannot find any crtc or sizes
[ 12.660910] [drm] Initialized amdgpu 3.42.0 20150101 for 0000:04:00.0 on minor 2
[ 12.661256] amdgpu 0000:06:00.0: enabling device (0000 → 0003)
[ 12.661346] amdgpu 0000:06:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[ 12.661381] kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics
[ 12.906543] amdgpu 0000:06:00.0: amdgpu: Fetched VBIOS from ROM BAR
[ 12.906546] amdgpu: ATOM BIOS: 113-4E353BU-O4E
[ 13.038678] amdgpu 0000:06:00.0: BAR 2: releasing [mem 0xb0000000-0xb01fffff 64bit pref]
[ 13.038680] amdgpu 0000:06:00.0: BAR 0: releasing [mem 0xa0000000-0xafffffff 64bit pref]
[ 13.038727] amdgpu 0000:06:00.0: BAR 0: no space for [mem size 0x200000000 64bit pref]
[ 13.038728] amdgpu 0000:06:00.0: BAR 0: failed to assign [mem size 0x200000000 64bit pref]
[ 13.038731] amdgpu 0000:06:00.0: BAR 2: no space for [mem size 0x00200000 64bit pref]
[ 13.038732] amdgpu 0000:06:00.0: BAR 2: failed to assign [mem size 0x00200000 64bit pref]
[ 13.038778] amdgpu 0000:06:00.0: BAR 0: assigned [mem 0xa0000000-0xafffffff 64bit pref]
[ 13.038793] amdgpu 0000:06:00.0: BAR 2: assigned [mem 0xb0000000-0xb01fffff 64bit pref]
[ 13.038826] amdgpu 0000:06:00.0: amdgpu: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used)
[ 13.038828] amdgpu 0000:06:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[ 13.038867] [drm] amdgpu: 8192M of VRAM memory ready
[ 13.038871] [drm] amdgpu: 8192M of GTT memory ready.
[ 13.041473] amdgpu: [powerplay] hwmgr_sw_init smu backed is polaris10_smu
[ 13.493291] amdgpu 0000:06:00.0: amdgpu: SE 4, SH per SE 1, CU per SH 9, active_cu_number 36
[ 13.493595] amdgpu 0000:06:00.0: [drm] Cannot find any crtc or sizes
[ 13.501741] [drm] Initialized amdgpu 3.42.0 20150101 for 0000:06:00.0 on minor 3
[ 13.501837] amdgpu 0000:0a:00.0: enabling device (0000 → 0003)
[ 13.501934] amdgpu 0000:0a:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[ 13.501975] kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics
[ 13.757035] amdgpu 0000:0a:00.0: amdgpu: Fetched VBIOS from ROM BAR
[ 13.757050] amdgpu: ATOM BIOS: 113-4E353BU-O4E
[ 13.890659] amdgpu 0000:0a:00.0: BAR 2: releasing [mem 0x2fd0000000-0x2fd01fffff 64bit pref]
[ 13.890660] amdgpu 0000:0a:00.0: BAR 0: releasing [mem 0x2fc0000000-0x2fcfffffff 64bit pref]
[ 13.890701] amdgpu 0000:0a:00.0: BAR 0: no space for [mem size 0x200000000 64bit pref]
[ 13.890702] amdgpu 0000:0a:00.0: BAR 0: failed to assign [mem size 0x200000000 64bit pref]
[ 13.890704] amdgpu 0000:0a:00.0: BAR 2: no space for [mem size 0x00200000 64bit pref]
[ 13.890704] amdgpu 0000:0a:00.0: BAR 2: failed to assign [mem size 0x00200000 64bit pref]
[ 13.890747] amdgpu 0000:0a:00.0: BAR 0: assigned [mem 0x2fc0000000-0x2fcfffffff 64bit pref]
[ 13.890759] amdgpu 0000:0a:00.0: BAR 2: assigned [mem 0x2fd0000000-0x2fd01fffff 64bit pref]
[ 13.890785] amdgpu 0000:0a:00.0: amdgpu: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used)
[ 13.890786] amdgpu 0000:0a:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[ 13.890810] [drm] amdgpu: 8192M of VRAM memory ready
[ 13.890812] [drm] amdgpu: 8192M of GTT memory ready.
[ 13.892116] amdgpu: [powerplay] hwmgr_sw_init smu backed is polaris10_smu
[ 14.349900] amdgpu 0000:0a:00.0: amdgpu: SE 4, SH per SE 1, CU per SH 9, active_cu_number 36
[ 14.350109] amdgpu 0000:0a:00.0: [drm] Cannot find any crtc or sizes
[ 14.354996] [drm] Initialized amdgpu 3.42.0 20150101 for 0000:0a:00.0 on minor 4
[ 14.355082] amdgpu 0000:0b:00.0: enabling device (0000 → 0003)
[ 14.355171] amdgpu 0000:0b:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[ 14.355208] kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics
[ 14.610545] amdgpu 0000:0b:00.0: amdgpu: Fetched VBIOS from ROM BAR
[ 14.610547] amdgpu: ATOM BIOS: 113-4E353BU-O4E
[ 14.742640] amdgpu 0000:0b:00.0: BAR 2: releasing [mem 0x2fb0000000-0x2fb01fffff 64bit pref]
[ 14.742641] amdgpu 0000:0b:00.0: BAR 0: releasing [mem 0x2fa0000000-0x2fafffffff 64bit pref]
[ 14.742685] amdgpu 0000:0b:00.0: BAR 0: no space for [mem size 0x200000000 64bit pref]
[ 14.742686] amdgpu 0000:0b:00.0: BAR 0: failed to assign [mem size 0x200000000 64bit pref]
[ 14.742687] amdgpu 0000:0b:00.0: BAR 2: no space for [mem size 0x00200000 64bit pref]
[ 14.742688] amdgpu 0000:0b:00.0: BAR 2: failed to assign [mem size 0x00200000 64bit pref]
[ 14.742733] amdgpu 0000:0b:00.0: BAR 0: assigned [mem 0x2fa0000000-0x2fafffffff 64bit pref]
[ 14.742746] amdgpu 0000:0b:00.0: BAR 2: assigned [mem 0x2fb0000000-0x2fb01fffff 64bit pref]
[ 14.742773] amdgpu 0000:0b:00.0: amdgpu: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used)
[ 14.742774] amdgpu 0000:0b:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[ 14.742802] [drm] amdgpu: 8192M of VRAM memory ready
[ 14.742804] [drm] amdgpu: 8192M of GTT memory ready.
[ 14.744204] amdgpu: [powerplay] hwmgr_sw_init smu backed is polaris10_smu
[ 15.199214] amdgpu 0000:0b:00.0: amdgpu: SE 4, SH per SE 1, CU per SH 9, active_cu_number 36
[ 15.199447] amdgpu 0000:0b:00.0: [drm] Cannot find any crtc or sizes
[ 15.203807] [drm] Initialized amdgpu 3.42.0 20150101 for 0000:0b:00.0 on minor 5
[ 15.204154] amdgpu 0000:0c:00.0: enabling device (0000 → 0003)
[ 15.204245] amdgpu 0000:0c:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[ 15.204282] kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics
[ 15.458542] amdgpu 0000:0c:00.0: amdgpu: Fetched VBIOS from ROM BAR
[ 15.458545] amdgpu: ATOM BIOS: 113-4E353BU-O4E
[ 15.590700] amdgpu 0000:0c:00.0: BAR 2: releasing [mem 0x2f90000000-0x2f901fffff 64bit pref]
[ 15.590701] amdgpu 0000:0c:00.0: BAR 0: releasing [mem 0x2f80000000-0x2f8fffffff 64bit pref]
[ 15.590744] amdgpu 0000:0c:00.0: BAR 0: no space for [mem size 0x200000000 64bit pref]
[ 15.590744] amdgpu 0000:0c:00.0: BAR 0: failed to assign [mem size 0x200000000 64bit pref]
[ 15.590746] amdgpu 0000:0c:00.0: BAR 2: no space for [mem size 0x00200000 64bit pref]
[ 15.590746] amdgpu 0000:0c:00.0: BAR 2: failed to assign [mem size 0x00200000 64bit pref]
[ 15.590788] amdgpu 0000:0c:00.0: BAR 0: assigned [mem 0x2f80000000-0x2f8fffffff 64bit pref]
[ 15.590801] amdgpu 0000:0c:00.0: BAR 2: assigned [mem 0x2f90000000-0x2f901fffff 64bit pref]
[ 15.590828] amdgpu 0000:0c:00.0: amdgpu: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used)
[ 15.590829] amdgpu 0000:0c:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[ 15.590853] [drm] amdgpu: 8192M of VRAM memory ready
[ 15.590856] [drm] amdgpu: 8192M of GTT memory ready.
[ 15.592192] amdgpu: [powerplay] hwmgr_sw_init smu backed is polaris10_smu
[ 16.047170] amdgpu 0000:0c:00.0: amdgpu: SE 4, SH per SE 1, CU per SH 9, active_cu_number 36
[ 16.047399] amdgpu 0000:0c:00.0: [drm] Cannot find any crtc or sizes
[ 16.050997] [drm] Initialized amdgpu 3.42.0 20150101 for 0000:0c:00.0 on minor 6
[ 16.051088] amdgpu 0000:0d:00.0: enabling device (0000 → 0003)
[ 16.051179] amdgpu 0000:0d:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[ 16.051217] kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics
[ 16.303845] amdgpu 0000:0d:00.0: amdgpu: Fetched VBIOS from ROM BAR
[ 16.303848] amdgpu: ATOM BIOS: 113-4E353BU-O4E
[ 16.434711] amdgpu 0000:0d:00.0: BAR 2: releasing [mem 0x2f70000000-0x2f701fffff 64bit pref]
[ 16.434714] amdgpu 0000:0d:00.0: BAR 0: releasing [mem 0x2f60000000-0x2f6fffffff 64bit pref]
[ 16.434768] amdgpu 0000:0d:00.0: BAR 0: no space for [mem size 0x200000000 64bit pref]
[ 16.434769] amdgpu 0000:0d:00.0: BAR 0: failed to assign [mem size 0x200000000 64bit pref]
[ 16.434772] amdgpu 0000:0d:00.0: BAR 2: no space for [mem size 0x00200000 64bit pref]
[ 16.434773] amdgpu 0000:0d:00.0: BAR 2: failed to assign [mem size 0x00200000 64bit pref]
[ 16.434827] amdgpu 0000:0d:00.0: BAR 0: assigned [mem 0x2f60000000-0x2f6fffffff 64bit pref]
[ 16.434843] amdgpu 0000:0d:00.0: BAR 2: assigned [mem 0x2f70000000-0x2f701fffff 64bit pref]
[ 16.434876] amdgpu 0000:0d:00.0: amdgpu: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used)
[ 16.434878] amdgpu 0000:0d:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[ 16.434918] [drm] amdgpu: 8192M of VRAM memory ready
[ 16.434921] [drm] amdgpu: 8192M of GTT memory ready.
[ 16.436920] amdgpu: [powerplay] hwmgr_sw_init smu backed is polaris10_smu
[ 16.892605] amdgpu 0000:0d:00.0: amdgpu: SE 4, SH per SE 1, CU per SH 9, active_cu_number 36
[ 16.892850] amdgpu 0000:0d:00.0: [drm] Cannot find any crtc or sizes
[ 16.897757] [drm] Initialized amdgpu 3.42.0 20150101 for 0000:0d:00.0 on minor 7
[ 16.897922] amdgpu 0000:0f:00.0: enabling device (0000 → 0003)
[ 16.898009] amdgpu 0000:0f:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[ 16.898048] kfd kfd: amdgpu: skipped device 1002:67df, PCI rejects atomics
[ 17.161059] amdgpu 0000:0f:00.0: amdgpu: Fetched VBIOS from ROM BAR
[ 17.161061] amdgpu: ATOM BIOS: 113-4E353BU-O4E
[ 17.295147] amdgpu 0000:0f:00.0: BAR 2: releasing [mem 0x2ff0000000-0x2ff01fffff 64bit pref]
[ 17.295151] amdgpu 0000:0f:00.0: BAR 0: releasing [mem 0x2fe0000000-0x2fefffffff 64bit pref]
[ 17.295215] amdgpu 0000:0f:00.0: BAR 0: assigned [mem 0x2200000000-0x23ffffffff 64bit pref]
[ 17.295231] amdgpu 0000:0f:00.0: BAR 2: assigned [mem 0x2100000000-0x21001fffff 64bit pref]
[ 17.295295] amdgpu 0000:0f:00.0: amdgpu: VRAM: 8192M 0x000000F400000000 - 0x000000F5FFFFFFFF (8192M used)
[ 17.295299] amdgpu 0000:0f:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[ 17.295345] [drm] amdgpu: 8192M of VRAM memory ready
[ 17.295351] [drm] amdgpu: 8192M of GTT memory ready.
[ 17.299138] amdgpu: [powerplay] hwmgr_sw_init smu backed is polaris10_smu
[ 17.750594] amdgpu 0000:0f:00.0: amdgpu: SE 4, SH per SE 1, CU per SH 9, active_cu_number 36
[ 17.750820] amdgpu 0000:0f:00.0: [drm] Cannot find any crtc or sizes
[ 17.756215] [drm] Initialized amdgpu 3.42.0 20150101 for 0000:0f:00.0 on minor 8
What have you tried to troubleshoot? Reflashed hive? Tried to boot with no gpu?
No, I’m trying to figure out how to view gpu memory errors, an analog in Windows via hwinfo.
Try reflashing and start with one known good card and go from there.
you do not understand me correctly , how to understand that I have no errors in memory when output via dmesg | grep amd ? similar to how it is done in Windows via hwinfo
There likely isn’t something wrong with all cards installed, and instead something else.
Relfahs the latest stable image, make sure it boots correctly with no gpu installed.
Then add a single gpu and make sure it boots correctly with no errors and go from there.
Everything is fine with video cards, they are mining stably. The question is, how can I track gpu memory errors on hive OS? or is it not possible?
It’s unlikely 8 cards are all experiencing the same error at the same time. I think your issue is something else, like the driver not loading correctly or something along those lines which caused all of that. Not anything actually wrong with the memory on your cards.
You didn’t understand, I’m looking for how and where to look in Hive gpu memory errors
this is necessary in order to check the stability of overclocking AMD cards