All Systems Operational

Research Computing Core ? Operational
Alpine ? Operational
90 days ago
98.81 % uptime
Today
Blanca ? Operational
PetaLibrary Operational
Open OnDemand ? Operational
90 days ago
100.0 % uptime
Today
CUmulus OpenStack Platform Operational
90 days ago
100.0 % uptime
Today
AWS ec2-us-west-2 Operational
AWS rds-us-west-2 Operational
AWS s3-us-west-2 Operational
Science Network ? Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
No data exists for this day.
had a major outage.
had a partial outage.
Apr 25, 2025
Resolved - Following reduction of storage usage overnight, the scratch system's performance is back to normal. As such, we are closing the incident. Additional steps we will take in the coming days include further discussion with the vendor, adjustments to our monitoring and alerting thresholds, and completion of a previously-planned expansion of the scratch system's maximum storage capacity.
Apr 25, 15:45 MDT
Update - We have made substantial progress this afternoon freeing space on scratch, the proximate cause of this degradation. A process is now running that will clear additional space in the coming hours, based on adjustment of scratch policy settings. CURC will continue evaluating and remain in communication with the vendor. We expect our next update here to be tomorrow.
Apr 24, 17:05 MDT
Investigating - The scratch filesystem (/scratch/alpine) is currently experiencing performance degradation. CURC has performed initial troubleshooting and opened written communication with the vendor.
Apr 24, 16:03 MDT
Apr 24, 2025
Apr 23, 2025
Completed - The Blanca cluster is back in production. Additional issues identified through the day have been resolved. We are closing this maintenance. We appreciate your patience.
Apr 23, 16:44 MDT
Update - Most of the Blanca cluster has now been restored to service. Several nodes remain offline, and we expect to restore them to production by close of business today (Wednesday). We will provide another update this afternoon.
Apr 23, 10:42 MDT
Update - The issue affecting Blanca's scratch mounts has been identified and fixed. We are continuing to work to restore service.
Apr 23, 07:08 MDT
Update - The Blanca cluster remains out of service and will require additional work from the CURC team tomorrow to address issues mounting the Alpine scratch system. We will provide our next update in the morning.

Initial tests on the Alpine and Viz clusters have been successful. We are marking them as Operational at this time.

Apr 22, 21:05 MDT
Update - The Alpine and Viz clusters are back in production and we will continue to monitor them.

Some Blanca nodes are not successfully mounting the scratch filesystem. This issue may require additional RC effort tomorrow (Wednesday) before concluding this maintenance event.

We will provide one more update before concluding work for the day.

Apr 22, 20:37 MDT
Update - The city's work on the water system has completed. RC will begin restoring power to systems presently, followed by testing and a resumption of service.
Apr 22, 18:48 MDT
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Apr 22, 16:30 MDT
Update - In preparation for the City of Boulder water outage at 5:30p today, CURC will begin taking systems offline at 4:30 pm in order to mitigate the potential for failed jobs.
Apr 22, 14:18 MDT
Scheduled - The City of Boulder announced an impromptu water outage beginning at 5:30 pm on Tuesday April 22, which requires that CURC infrastructure be taken offline in order to prevent overheating. The outage is anticipated to last for several hours. All CURC services will be unavailable at this time.

Thank you for your patience while the City conducts critical water infrastructure maintenance in the area adjacent to the primary CURC data center.

-CU Research Computing

Apr 21, 14:03 MDT
Apr 22, 2025
Apr 21, 2025

No incidents reported.

Apr 20, 2025

No incidents reported.

Apr 19, 2025

No incidents reported.

Apr 18, 2025

No incidents reported.

Apr 17, 2025
Resolved - We believe this incident has been resolved. We will continue to monitor and respond to issues if they surface.
Apr 17, 10:20 MDT
Update - We believe services have been restored. We will keep this incident message open overnight, verify continued functionality in the morning, and if possible send an all-clear at that time. Thank you for your patience.
Apr 16, 17:25 MDT
Monitoring - Both clusters are back in service and initial tests are positive. We will continue to monitor through the day.
Apr 16, 13:59 MDT
Update - Water has been restored and we are beginning to power systems back on.
Apr 16, 12:20 MDT
Update - We are in communication with CU's Data Center Operations team. Compute nodes will be powered off.
Apr 16, 09:41 MDT
Update - We are continuing to investigate this issue.
Apr 16, 08:43 MDT
Investigating - The data center that houses much of RC's infrastructure has experienced a cooling outage. To prevent overheating, jobs have been suspended while the issue is investigated.
Apr 16, 08:36 MDT
Apr 16, 2025
Apr 15, 2025

No incidents reported.

Apr 14, 2025

No incidents reported.

Apr 13, 2025

No incidents reported.

Apr 12, 2025

No incidents reported.

Apr 11, 2025

No incidents reported.