Network services outage in RC Core affecting all RC services
Incident Report for CU Boulder RC
Resolved
There are a few compute nodes left to bring up; but most nodes have been restored to service at this point.
Posted 4 months ago. Aug 21, 2018 - 20:04 MDT
Update
We are continuing to monitor for any further issues.
Posted 4 months ago. Aug 21, 2018 - 11:55 MDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted 4 months ago. Aug 21, 2018 - 11:03 MDT
Identified
We have corrected the issue on our internal dhcpd server, and other dependent clients are recovering. We are manually investigating hosts as well to see what might need additional manual effort to recover them.
Posted 4 months ago. Aug 21, 2018 - 09:43 MDT
Investigating
We are investigating a network services outage in the Research Computing Core that may potentially be impacting all RC services.

This is not a network outage per se, but appears to be an outage of a core network service required for all RC systems to function.
Posted 4 months ago. Aug 21, 2018 - 09:17 MDT