Power outage in primary CURC data center affecting Alpine, Blanca
Incident Report for CU Boulder RC
Resolved
This incident has been resolved. Alpine and OnDemand are restored. Most Blanca nodes have been restored to pre-incident state. Owners of a small number of Blanca nodes with anticipated additional impacts will be notified and their nodes will continue to receive service. We will continue to monitor in the coming days.
Posted Jul 24, 2024 - 16:01 MDT
Monitoring
Power has been restored to HPCF and nearly all Alpine nodes are back in service. Numerous Blanca nodes are still offline. The Core Desktop nodes are partially restored (K80 nodes are back online, RTX8000 nodes are still offline).

Remaining issues will be addressed first thing tomorrow (Wednesday) morning.
Posted Jul 23, 2024 - 20:08 MDT
Investigating
CU Research Computing experienced a power outage in the High Performance Computing Facility (HPCF) beginning at approximately 4:22p today. This outage affects the following services:

* all alpine nodes
* the Blanca "bhpc" nodes
* the Alpine scratch filesystem
* some network access to services in other locations on campus.

Staff is onsite to address the issue. We will provide updates as we have more information.
Posted Jul 23, 2024 - 16:50 MDT
This incident affected: Alpine, Blanca, and Open OnDemand.