All Systems Operational
Research Computing Core ? Operational
Science Network ? Operational
RMACC Summit ? Operational
Blanca ? Operational
PetaLibrary ? Operational
EnginFrame ? Operational
JupyterHub ? Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Past Incidents
Jun 12, 2021

No incidents reported today.

Jun 11, 2021

No incidents reported.

Jun 10, 2021
Resolved - Hardware replacements in the RC core virtual infrastructure appear to have successfully addressed this issue. No further disruption is anticipated.
Jun 10, 09:22 MDT
Monitoring - Last night RC experienced a failure in its "Core Virtual Infrastructure" which hosts, among many other things, the login nodes and Slurm services. This is the second such recent failure, though the first passed without notable disruption. This time the login nodes were not automatically returned to service correctly, nor the Blanca Slurm service, apparently due to network involvement in the disruption.

We have advised our upstream OIT support team, which administers the Core Virtual Infrastructure, about this failure, and are awaiting their feedback. Meanwhile, we have returned the login and Blanca Slurm services to service.

We will continue to monitor the situation and follow-up with upstream support staff on Monday.
Apr 24, 10:57 MDT
Jun 9, 2021
Completed - Summit SFA upgrade was concluded. All drives, controllers and enclosures firmware updated to the latest version.

As we had trouble to boot one of the controllers during the upgraded (resolved with support assistance), time got short and we haven't upgraded GPFS servers this time. However that can be done in the future without downtime.

Reservations were removed and Summit Scratch and PL are available again for production.
Jun 9, 17:38 MDT
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Jun 9, 07:00 MDT
Scheduled - Research Computing will perform a Summit storage upgrade on Wednesday, 9 June 2021. Maintenance activities include:
- Summit SFA controller firmware upgrade
- Enclosure and drive firmware upgrade
- GPFS servers upgrade
- Storage configuration updates based on vendor recommendations

We expect that this upgrade will prevent the SFA storage failure that impacted Summit a month ago.

Maintenance is scheduled to take place between 07:00 and 19:00, though service will be restored as soon as all activities have concluded. During the maintenance period no jobs will run on Summit resources, PetaLibrary allocations hosted in Summit will also be unavailable.

If you have any questions or concerns, please contact rc-help@colorado.edu.
Jun 1, 11:25 MDT
Jun 8, 2021

No incidents reported.

Jun 7, 2021

No incidents reported.

Jun 6, 2021

No incidents reported.

Jun 5, 2021

No incidents reported.

Jun 4, 2021

No incidents reported.

Jun 3, 2021

No incidents reported.

Jun 2, 2021

No incidents reported.

Jun 1, 2021

No incidents reported.

May 31, 2021

No incidents reported.

May 30, 2021

No incidents reported.

May 29, 2021

No incidents reported.