Investigating - rc_scratch currently has degraded performance due to a hardware issue. Research Computing is working to identify a solution.
Jul 18, 2024 - 10:25 MDT
Research Computing Core ? Degraded Performance
Alpine ? Operational
90 days ago
98.86 % uptime
Today
Blanca ? Operational
PetaLibrary Operational
Open OnDemand ? Operational
90 days ago
99.71 % uptime
Today
CUmulus OpenStack Platform Operational
90 days ago
99.84 % uptime
Today
AWS ec2-us-west-2 Operational
AWS rds-us-west-2 Operational
AWS s3-us-west-2 Operational
RMACC Summit ? Operational
Science Network ? Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
No data exists for this day.
had a major outage.
had a partial outage.
Past Incidents
Jul 27, 2024

No incidents reported today.

Jul 26, 2024

No incidents reported.

Jul 25, 2024

No incidents reported.

Jul 24, 2024
Resolved - This incident has been resolved. Alpine and OnDemand are restored. Most Blanca nodes have been restored to pre-incident state. Owners of a small number of Blanca nodes with anticipated additional impacts will be notified and their nodes will continue to receive service. We will continue to monitor in the coming days.
Jul 24, 16:01 MDT
Monitoring - Power has been restored to HPCF and nearly all Alpine nodes are back in service. Numerous Blanca nodes are still offline. The Core Desktop nodes are partially restored (K80 nodes are back online, RTX8000 nodes are still offline).

Remaining issues will be addressed first thing tomorrow (Wednesday) morning.

Jul 23, 20:08 MDT
Investigating - CU Research Computing experienced a power outage in the High Performance Computing Facility (HPCF) beginning at approximately 4:22p today. This outage affects the following services:

* all alpine nodes
* the Blanca "bhpc" nodes
* the Alpine scratch filesystem
* some network access to services in other locations on campus.

Staff is onsite to address the issue. We will provide updates as we have more information.

Jul 23, 16:50 MDT
Jul 23, 2024
Jul 22, 2024

No incidents reported.

Jul 21, 2024

No incidents reported.

Jul 20, 2024

No incidents reported.

Jul 19, 2024

No incidents reported.

Jul 18, 2024

Unresolved incident: Performance degradation of rc_scratch.

Jul 17, 2024

No incidents reported.

Jul 16, 2024

No incidents reported.

Jul 15, 2024
Resolved - SSH Key login via login-ci.rc.colorado.edu is now available.
Jul 15, 13:53 MDT
Monitoring - CILogon has implemented a fix and we have verified that SSH key logins are working. We will monitor throughout the day today.
Jul 15, 11:07 MDT
Identified - Our vendor, CILogon is currently having an outage which is not allowing our login servers to retrieve user SSH keys. SSH key authentication on login-ci.rc.colorado.edu will not be possible until this outage is resolved.
Jul 15, 10:00 MDT
Jul 14, 2024

No incidents reported.

Jul 13, 2024

No incidents reported.