Foreman reporting hardware errors for Luxor Firmware

thomas.barnhart

Problem
Foreman is reporting hardware errors when just 1 chip is bad on any board. This leads to more Failed miners than may be useful.

From Foreman. For hardware errors for lux we look at chain_acs from the stats response and look for any "x"s. 

The issue is, this does not always create a hardware failure according to LUXor. 

From Luxor
They(Foreman) would have to change what they classify as a hardware failure. This could be done mutliple ways.

- They can have a threshold for how many bad chips before categorizing it as an error. 
- Use expected vs actually hashrate to determine if a board is struggling
- Only indicate hardware failure if a board shuts down
 

I request that Foreman work with LUxor to determine when bad chips raises to the level of a hardware error or maybe even classify these as hardware warnings as long as the miner is hashing.
We can also set ‘Ignore Hardware Errors’ in Foreman, but I am not certain what other hardware errors this would filter out.
 

Optionally, if the number if available, store the number of bad chips for use in a custom issue.

0

Comments

2 comments

  • Comment author
    wayne

    There is a ‘ignore hardware errors’ slider in the setting page and it can also be mass applied. 

    Not a long-term solution, but I do this for older units that still hash among multiple failing chips. 

    0
  • Comment author
    rchambers

    Thomas, You may be able to leverage the new Custom Issues to get this working the way you want. Not knowing the specifics, feel free to drop us note (product@obm.io) and we can find a time to hop on a call to talk through how your use case could be accomplished.

    0

Please sign in to leave a comment.