I just wanted to give a quick feature suggestion.
HiveOS already monitors a lot of parameters. But from other Linux distributions i always go back to nvidia-smi for diagnosing some issues. e.g. a bad riser can be found rather quickly using
nvidia-smi dmon -c 1 -s e
which will output a short list of your cards detected by nvidia-smi and a pci errs column.
If that column isn’t all zeros, you either have a faulty riser or … like me once, forgot to set PCIe from 3.0 to 2.0
Might be worth adding this counter to the card overview similar to the invalid share counter, only showing if it is not zero, maybe even with a mouse-over hinting at riser or riser + wrong pcie speed.