All Systems Operational
Research Computing Core ? Operational
Science Network ? Operational
RMACC Summit ? Operational
Blanca ? Operational
PetaLibrary ? Operational
EnginFrame ? Operational
JupyterHub ? Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Past Incidents
Jan 16, 2021

No incidents reported today.

Jan 15, 2021

No incidents reported.

Jan 14, 2021

No incidents reported.

Jan 13, 2021

No incidents reported.

Jan 12, 2021
Completed - Upgrade is complete Globus authentication services are functioning.
Jan 12, 13:55 MST
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Jan 12, 13:30 MST
Scheduled - myproxy1 will be upgraded to RH7.9 to update sssd to the latest package. This is requested by RedHat technical support to continue troubleshooting the sssd issues. There may be a brief interruption in Globus authentication.
Jan 12, 13:20 MST
Completed - The scheduled maintenance has been completed.
Jan 12, 09:00 MST
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Jan 12, 08:00 MST
Scheduled - The PetaLibrary cluster configuration will be updated to allow for seamless disk replacements. This change may cause a brief service interruption for BeeGFS PetaLibrary/active allocations.
Jan 8, 11:32 MST
Jan 11, 2021

No incidents reported.

Jan 10, 2021

No incidents reported.

Jan 9, 2021

No incidents reported.

Jan 8, 2021
Resolved - A disk in PetaLibrary was taking a long time to respond to requests, and the associated storage pool was configured to reboot the system should those wait times get too high. A new disk has been ordered and will be installed next week, and we are investigating changing the behavior of storage pools in this scenario. PetaLibrary/active BeeGFS filesystems have been stable since taking the disk in question offline.
Jan 8, 16:11 MST
Investigating - We are investigating an apparent issue in the PetaLibrary/active BeeGFS subsystem. BeeGFS appears to be inaccessible. ZFS allocations appear unaffected.
Jan 8, 12:35 MST
Jan 7, 2021

No incidents reported.

Jan 6, 2021
Completed - All work has completed, and Summit has been returned to service.
Jan 6, 17:42 MST
Update - The previously-announced electrical fault at the HPCF was fixed around 4:00 PM, and we have been bringing up affected Blanca and Summit nodes. Blanca has been returned to full operation and its PM reservations released. Core components of Summit have been powered on and are working, and compute nodes are being powered on now.
Jan 6, 17:09 MST
Update - Today's planned maintenance activities have completed. However, an electrical fault occurred in the HPCF power distribution system during the Summit bringup. We have halted the Summit bringup process until a facilities electrician is able to assess and respond to the fault.

In the mean time, we intend to bring up the Blanca nodes that were affected by today's HPCF outage, as they should be unaffected by this electrical fault.
Jan 6, 15:16 MST
Update - Summit, including Summit storage, and Blanca nodes in the HPCF, have been shut down as scheduled. This supports powering off the HPCF cooling system for regular maintenance.

We will be monitoring the facilities work throughout the day, and will restore Summit and Blanca to full production as soon as possible once the work has been completed.
Jan 6, 07:19 MST
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Jan 6, 07:00 MST
Scheduled - Research Computing will perform regularly-scheduled planned maintenance Wednesday, 6 January 2021. January's activities include

- HPCF tower cleaning and maintenance

Maintenance is scheduled to take place between 07:00 and 19:00, though service will be restored as soon as all activities have concluded. During the maintenance period no jobs will run on Summit resources, or Blanca resources that reside in the HPCF. This includes all "bhpc" nodes and a single Blanca GPU node. Summit storage will also be unavailable.

If you have any questions or concerns, please contact rc-help@colorado.edu.
Dec 18, 18:31 MST
Jan 5, 2021

No incidents reported.

Jan 4, 2021

No incidents reported.

Jan 3, 2021

No incidents reported.

Jan 2, 2021

No incidents reported.