HPCF outage affecting Summit
Incident Report for CU Boulder RC
Resolved
The HPCF has remained stable with the UPS in its default operating mode. The manufacturer is continuing to investigate the root cause of this issue, and has suggested the replacement of a component of the control system as part of this effort.
Posted 5 days ago. Dec 11, 2018 - 13:29 MST
Monitoring
Summit has been returned to service. We will continue to monitor the status of the system, and expect to receive a root cause analysis regarding our observations of the UPS. In the mean time, we expect that our use of the default UPS mode will reduce the likelihood that the problem will reoccur.
Posted 15 days ago. Dec 02, 2018 - 00:30 MST
Update
We have discovered an apparent anomaly in the operation of the UPS “ECOnversion” mode, a mode that allows the UPS to operate with greater power efficiency than in its default mode. This anomaly does not appear to affect the default mode; so we are gathering diagnostic reports from the UPS in each mode for further analysis, and plan to bring Summit back into production with the UPS in the default mode.
Posted 15 days ago. Dec 01, 2018 - 22:28 MST
Investigating
The technician has arrived and is inspecting the UPS.
Posted 15 days ago. Dec 01, 2018 - 20:55 MST
Identified
The UPS that supports the HPCF has experienced a fault which is preventing it from being able to supply power to the environment. A service technician has been dispatched and is en route.
Posted 15 days ago. Dec 01, 2018 - 19:55 MST
Update
We have confirmed onsite that the HPCF has experienced a major power outage. All HPCF systems, including Summit and Summit storage (including both scratch and interim PetaLibrary allocations) are offline.

We are investigating the cause and are working to restore service as soon as possible.
Posted 15 days ago. Dec 01, 2018 - 17:18 MST
Investigating
We are aware of an HPCF outage affecting Summit and Summit scratch. We are investigating and will update here as more information becomes available.
Posted 15 days ago. Dec 01, 2018 - 15:58 MST
This incident affected: RMACC Summit.