All Systems Operational
Research Computing Core ? Operational
Science Network ? Operational
RMACC Summit ? Operational
Blanca ? Operational
PetaLibrary ? Operational
EnginFrame ? Operational
JupyterHub ? Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Past Incidents
Apr 8, 2020

No incidents reported today.

Apr 7, 2020

No incidents reported.

Apr 6, 2020

No incidents reported.

Apr 5, 2020

No incidents reported.

Apr 4, 2020

No incidents reported.

Apr 3, 2020

No incidents reported.

Apr 2, 2020

No incidents reported.

Apr 1, 2020
Completed - Today's maintenance operations have completed, and all affected systems have been returned to normal operation.
Apr 1, 14:54 MDT
Update - The electrical work that impacts Summit has completed, so we are returning Summit to operation.

Some electrical work impacting Blanca nodes that are installed in the HPCF ("bhpc") is still in progress.
Apr 1, 13:49 MDT
Update - Today's maintenance is in progress; but due to reduced staff in response to COVID-19, the planned activities have changed.

The tower cleaning will not be completed today. We will have to re-schedule this activity for a later date, when sufficient facilities management staff are available to complete the work.

Co-scheduled electrical work, including the deployment of new power "whips" in the HPCF and general UPS maintenance, will be completed.

We apologize that this information was not updated prior to today.

For more information on University of Colorado Boulder's COVID-19 response, please visit https://covid19.colorado.gov
Apr 1, 09:15 MDT
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Apr 1, 07:00 MDT
Scheduled - Research Computing will perform regularly-scheduled planned maintenance Wednesday, 1 April 2020. April's activities include

- HPCF tower cleaning

Maintenance is scheduled to take place between 07:00 and 19:00, though service will be restored as soon as all activities have concluded. During the maintenance period no jobs will run on HPCF resources, including Summit and some parts of Blanca.

If you have any questions or concerns, please contact rc-help@colorado.edu.
Mar 11, 16:17 MDT
Mar 31, 2020

No incidents reported.

Mar 30, 2020

No incidents reported.

Mar 29, 2020

No incidents reported.

Mar 28, 2020

No incidents reported.

Mar 27, 2020

No incidents reported.

Mar 26, 2020

No incidents reported.

Mar 25, 2020
Resolved - At 2:06pm the metadata service for BeeGFS suffered an outage, and momentarily paused IO requests to PetaLibrary/active. This follows the installation of a patch from the vendor at 1pm that was expected to resolve ongoing metadata problems. The metadata service has been restarted and all PetaLibrary allocations were available by 2:13pm. We are continuing to work with the vendor to resolve this issue.
Mar 25, 16:39 MDT
Resolved - The BeeGFS software upgrade is complete, and we are monitoring the service. Currently there are no known issues with PL/active.
Mar 25, 13:20 MDT
Identified - The BeeGFS developers have released a software patch that may resolve a failure mode that has been causing PL/active unresponsiveness. This patch will be applied at 1pm today and will briefly interrupt access to the filesystem. Jobs should continue without errors after the filesystem comes back up.
Mar 25, 10:44 MDT