Special PetaLibrary planned maintenance Wednesday, 10 July 2019
Scheduled Maintenance Report for CU Boulder RC
Completed
We have completed today's special planned maintenance period for the PetaLibrary (BeeGFS). All tasks were completed successfully, and reservations have been lifted on Blanca.

Thank you again.
Posted Jul 10, 2019 - 15:48 MDT
Verifying
We have completed all of our PetaLibrary/active (BeeGFS) changes and resiliency tests. Notably, all of our changes were completed without any problems, and all of our resiliency / fail-over tests performed precisely as expected. We even managed to sort out the problem with our prototype ZFS-native file system which had been causing us problems in the past, including one of our past significant outages.

Also notably, all of our activities were completed today without any IO failures. We did have one fail-over test that failed on its first attempt, causing a 5-minute pause to IO (2 minutes more than the expected 3-minute pause during the fail-over) but even this did not, according to our monitoring, cause any actual IO errors.

Our last activity for today is to re-run metadata benchmarks to see if they have been affected by the performance tuning we performed today. Benchmarks are more meaningful in a quiesced environment, so we still have Blanca partitions reserved for now; but if you need to resume work immediately, please let us know and we'll happily release your reservations. Otherwise, we should be done soon.

Thank you everyone for your patience while we work out the eccentricities of this system. With each maintenance session we have improved our understanding of the environment and fixed configuration problems that were causing us trouble; and this has been our most successful maintenance period for BeeGFS yet.
Posted Jul 10, 2019 - 14:13 MDT
Update
We are commencing maintenance activities for PetaLibrary/active (BeeGFS) today, including some performance tuning and further resiliency (e.g., fail-over) testing. We hope to minimize or prevent any actual BeeGFS outages (some momentary interruption/pauses are to be expected); but there is a potential for outages to /pl/active/ allocations.

We will do our best to communicate our current status and events here throughout the activities.
Posted Jul 10, 2019 - 08:58 MDT
In progress
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Posted Jul 10, 2019 - 07:01 MDT
Scheduled
Research Computing will perform special PetaLibrary planned maintenance Wednesday, 10 July 2019. Activities include

- Performance optimizations for BeeGFS
- Failover testing for beegfs-storage
- ZFS mount testing and root cause investigation
- beegfs-mgmtd update

Maintenance is scheduled to take place between 07:00 and 19:00, though service will be restored as soon as all activities have concluded. We will endeavor to perform these tasks while minimizing or preventing any PetaLibrary outages; but outages are possible. During the maintenance period we will have reservations in place on Blanca, but Blanca contributors may request that their reservation be released if they prefer.

If you have any questions or concerns, please contact rc-help@colorado.edu.
Posted Jun 25, 2019 - 13:51 MDT
This scheduled maintenance affected: PetaLibrary.