Planned maintenance Wednesday, 3 Mar 2021
Scheduled Maintenance Report for CU Boulder RC
Completed
Today's planned maintenance activities have concluded, and Summit, Blanca, and PetaLibrary have been returned to production.

- Legacy PetaLibrary/archive is still offline following a planned outage last week, and we will resume our attempts to restore service there

- We replaced a defective and damaged backplane in one of the Blanca HPC chassis which was impeding correct functioning of the cooling system and some of the InfiniBand interconnect. We also repaired damage to compute nodes that had themselves been damaged by the backplane.

- We upgraded InfiniBand interconnect firmware in Blanca HPC to bring all chassis to the latest version.

- We improved the clustering fail-over configuration in PetaLibrary/active to prevent some erroneous failure conditions we have previously experienced.

- We migrated data within PetaLibrary/active BeeGFS to free up infrastructure for conversion to ZFS storage. This will likely cause a reduction in performance for allocations that remain in BeeGFS until they are migrated to ZFS.

- We performed further tests for SMB support (particularly fail-over) in PetaLibrary/active.
Posted Mar 03, 2021 - 16:50 MST
Update
Due to an oversight, not all Blanca compute resources were correctly reserved for today's planned maintenance activities. As a result, some jobs were running when the PM period started. We have requeued these jobs when possible; but some jobs may not be re-queueable, and will have been cancelled as a result.
Posted Mar 03, 2021 - 10:09 MST
In progress
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Posted Mar 03, 2021 - 07:00 MST
Scheduled
Research Computing will perform regularly-scheduled planned maintenance on Wednesday, 3 Mar 2021. March's activities include:

- routine maintenance of the UPS for HPCF (Summit compute/storage). The HPCF UPS will be in "maintenance bypass" mode, meaning the HPCF will be dependent on utility power for 30-60 minutes.
- data relocation on PetaLibrary/active to support ongoing migrations to a new filesystem (BeeGFS to ZFS).

Maintenance is scheduled to take place between 07:00 and 19:00, though service will be restored as soon as all activities have concluded. During the maintenance period no jobs will run on Summit resources, and access to PetaLibrary/active allocations via /pl/active and /work will not be available.

If you have any questions or concerns, please contact rc-help@colorado.edu.
Posted Feb 26, 2021 - 11:02 MST
This scheduled maintenance affected: Blanca, PetaLibrary, and RMACC Summit.