Summit KnL (sknl) nodes not accepting jobs
Incident Report for CU Boulder RC
Resolved
This incident has been resolved.
Posted Feb 07, 2020 - 14:21 MST
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Feb 07, 2020 - 10:08 MST
Investigating
We are investigating an issue that is preventing Summit KnL (sknl) nodes from accepting jobs. The issue appears to be related to these nodes having not retained their desired operating mode settings, which affects visible memory. Slurm, having identified that the memory available is different than expected, is preventing jobs from starting.

More information is available at https://software.intel.com/en-us/articles/intel-xeon-phi-x200-processor-memory-modes-and-cluster-modes-configuration-and-use-cases
Posted Feb 06, 2020 - 16:06 MST
This incident affected: RMACC Summit.