Scheduled CURC service outage: April 22

Scheduled Maintenance Report for CU Boulder RC

Completed

The Blanca cluster is back in production. Additional issues identified through the day have been resolved. We are closing this maintenance. We appreciate your patience.
Posted Apr 23, 2025 - 16:44 MDT

Update

Most of the Blanca cluster has now been restored to service. Several nodes remain offline, and we expect to restore them to production by close of business today (Wednesday). We will provide another update this afternoon.
Posted Apr 23, 2025 - 10:42 MDT

Update

The issue affecting Blanca's scratch mounts has been identified and fixed. We are continuing to work to restore service.
Posted Apr 23, 2025 - 07:08 MDT

Update

The Blanca cluster remains out of service and will require additional work from the CURC team tomorrow to address issues mounting the Alpine scratch system. We will provide our next update in the morning.

Initial tests on the Alpine and Viz clusters have been successful. We are marking them as Operational at this time.
Posted Apr 22, 2025 - 21:05 MDT

Update

The Alpine and Viz clusters are back in production and we will continue to monitor them.

Some Blanca nodes are not successfully mounting the scratch filesystem. This issue may require additional RC effort tomorrow (Wednesday) before concluding this maintenance event.

We will provide one more update before concluding work for the day.
Posted Apr 22, 2025 - 20:37 MDT

Update

The city's work on the water system has completed. RC will begin restoring power to systems presently, followed by testing and a resumption of service.
Posted Apr 22, 2025 - 18:48 MDT

In progress

Scheduled maintenance is currently in progress. We will provide updates as necessary.
Posted Apr 22, 2025 - 16:30 MDT

Update

In preparation for the City of Boulder water outage at 5:30p today, CURC will begin taking systems offline at 4:30 pm in order to mitigate the potential for failed jobs.
Posted Apr 22, 2025 - 14:18 MDT

Scheduled

The City of Boulder announced an impromptu water outage beginning at 5:30 pm on Tuesday April 22, which requires that CURC infrastructure be taken offline in order to prevent overheating. The outage is anticipated to last for several hours. All CURC services will be unavailable at this time.

Thank you for your patience while the City conducts critical water infrastructure maintenance in the area adjacent to the primary CURC data center.

-CU Research Computing
Posted Apr 21, 2025 - 14:03 MDT
This scheduled maintenance affected: Research Computing Core, Alpine, Blanca, PetaLibrary, and Open OnDemand.