Alpine and Blanca outage

Incident Report for CU Boulder RC

Resolved

We believe this incident has been resolved. We will continue to monitor and respond to issues if they surface.
Posted Apr 17, 2025 - 10:20 MDT

Update

We believe services have been restored. We will keep this incident message open overnight, verify continued functionality in the morning, and if possible send an all-clear at that time. Thank you for your patience.
Posted Apr 16, 2025 - 17:25 MDT

Monitoring

Both clusters are back in service and initial tests are positive. We will continue to monitor through the day.
Posted Apr 16, 2025 - 13:59 MDT

Update

Water has been restored and we are beginning to power systems back on.
Posted Apr 16, 2025 - 12:20 MDT

Update

We are in communication with CU's Data Center Operations team. Compute nodes will be powered off.
Posted Apr 16, 2025 - 09:41 MDT

Update

We are continuing to investigate this issue.
Posted Apr 16, 2025 - 08:43 MDT

Investigating

The data center that houses much of RC's infrastructure has experienced a cooling outage. To prevent overheating, jobs have been suspended while the issue is investigated.
Posted Apr 16, 2025 - 08:36 MDT
This incident affected: Research Computing Core, Alpine, and Blanca.