Planned Maintenance Wednesday, 5 June 2019
Scheduled Maintenance Report for CU Boulder RC
Completed
Today's maintenance activities have concluded, and PetaLibrary, Blanca, and Summit are back in production.

We were unable to provoke a failure in beegfs-mgmtd; so we are likely to want to allow it to fail naturally at least once in the relatively near future so that we can get a gdb backtrace of the error state.

We were also unable to determine the cause of our prototype PetaLibrary zfs-direct allocation's failure to mount. We are working with the ZFS development community to try to determine root cause.
Posted 4 months ago. Jun 05, 2019 - 17:54 MDT
Update
Today's planned maintenance activities are largely complete. We are running standard performance validation of the Summit environment, and an additional metadata performance test of PetaLibrary/active from Blanca. The metadata performance test is largely an attempt to see if beegfs-mgmtd fails under the load.

We should be able to return to service soon.
Posted 4 months ago. Jun 05, 2019 - 16:24 MDT
Update
We are commencing PetaLibrary maintenance activities, which will include interruptions to I/O for PetaLibrary/active (/pl/active/).
Posted 4 months ago. Jun 05, 2019 - 09:01 MDT
In progress
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Posted 4 months ago. Jun 05, 2019 - 07:00 MDT
Update
Be reminded that we have scheduled planned maintenance activities tomorrow, Wednesday, 5 June 2019. Updates will be posted here as they are available.
Posted 4 months ago. Jun 04, 2019 - 15:22 MDT
Scheduled
Research Computing will perform regularly-scheduled planned maintenance Wednesday, 5 June 2019. June's activities include

- Summit OPA switch firmware update
- Decommission Summit "debug" QoS
- Security updates on Internet-facing servers (including login nodes)
- Internal changes to RC DNS to better conform to public DNS
- PetaLibrary BeeGFS storage configuration testing
- PetaLibrary BeeGFS failure testing
- PetaLibrary BeeGFS OS and software updates
- PetaLibrary BeeGFS xattr and ACL support configuration
- PetaLibrary ZFS allocation incident investigation
- Summit performance validation

Maintenance is scheduled to take place between 07:00 and 19:00, though service will be restored as soon as all activities have concluded. During the maintenance period no jobs will run on Summit or Blanca resources, and PetaLibrary/active (BeeGFS) will be intermittently offline during to testing and configuration changes.

Blanca partitions may be individually returned to service on request, particularly if you are unaffected by the scheduled PetaLibrary/active outage.

If you have any questions or concerns, please contact rc-help@colorado.edu.
Posted 5 months ago. May 30, 2019 - 11:10 MDT
This scheduled maintenance affected: RMACC Summit, Blanca, and PetaLibrary.