Planned maintenance Wednesday, 3 July 2019
Scheduled Maintenance Report for CU Boulder RC
Completed
Today's planned maintenance activities have concluded, and Summit is once again in production.

Today, we accomplished

- Summit GPFS update
- Summit OPA update
- Summit kernel update
- Transitioned remaining Summit compute to stateless provisioning
- Slurmdbd and slurmctld (major/feature) update
- Summit Slurmd update
- blanca-nso slurmd update
- Summit performance validation
Posted about 2 months ago. Jul 03, 2019 - 18:06 MDT
Update
The issue with Blanca scheduling was traced to changed behavior surrounding the topology configuration. We have adjusted the configuration and Blanca appears to again be fully operational.
Posted about 2 months ago. Jul 03, 2019 - 16:28 MDT
Update
Much of the upgrade work occurring today has been successful. We had some trouble with the GRIDScaler upgrade (part of the Summit storage system) but we've engaged with upstream support and it looks like we may be making progress again.

An upgrade to slurmctld has caused an apparent compatibility issue with not-yet-upgraded slurmd running on Blanca compute nodes. We are investigating the cause of the issue, and have also opened a support case with Slurm support. This issue produces error messages of the form "Unable to allocate resources: Requested node configuration is not available."
Posted about 2 months ago. Jul 03, 2019 - 15:34 MDT
In progress
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Posted about 2 months ago. Jul 03, 2019 - 07:00 MDT
Scheduled
Research Computing will perform regularly-scheduled planned maintenance Wednesday, 3 July 2019. July's activities include

- Summit scratch firmware updates
- Summit scratch filesystem (GPFS) updates
- Summit scratch disk repositioning
- Summit interconnect (OPA) software and firmware updates
- Summit kernel updates
- Slurm database server (slurmdbd) update (in preparation for later Summit, Blanca, and core Slurm updates)

Maintenance is scheduled to take place between 07:00 and 19:00, though service will be restored as soon as all activities have concluded. During the maintenance period no jobs will run on Summit resources, and Summit scratch will likely be unavailable during firmware and filesystem updates.

If you have any questions or concerns, please contact rc-help@colorado.edu.
Posted about 2 months ago. Jun 25, 2019 - 13:46 MDT
This scheduled maintenance affected: RMACC Summit and Blanca.