Uncontrolled node reboots in core storage
Incident Report for CU Boulder RC
Resolved
We applied a patch to the operating system that underlies core storage. This patch appears to have addressed the issues that led to uncontrolled node reboots. We have not experienced any further node reboots since the patch was applied, so we believe this issue is resolved.
Posted Aug 04, 2020 - 14:16 MDT
Investigating
We are investigating an issue that is causing uncontrolled node reboots in the core storage infrastructure that serves /home, /projects, and /curc. We have experienced two reboots so far. In both cases the nodes recovered on their own and re-joined the cluster after a few minutes; but, during the reboot, access to these file systems blocked.

We have a case open with upstream support regarding this issue.
Posted Jul 13, 2020 - 15:13 MDT
This incident affected: Research Computing Core.