We are marking this first half of the activity as complete, and will schedule a new maintenance activity for the second half.
Jan 17, 09:58 MST
We discovered a ZFS build problem prior to our fail-over operation which has since been resolved; we subsequently moved beegfs-storage back over to boss2 successfully, and PetaLibrary remains available.
Since this operation took longer than we scheduled, we will conclude our activities for today and re-schedule the second half of this activity.
Jan 15, 17:21 MST
We are continuing our beegfs-storage update simultaneously with the beegfs-meta outage reported elsewhere. Again, these two issues appear to be completely independent an coincidental.
The upgrade of boss2 is complete. Our next step is to fail beegfs-storage for boss2 back from boss1. We will be proceeding with this operation now, which will cause a momentary pause in PetaLibrary IO; however, this is not expected to cause any outage, nor did it previously (during our initial failover from boss2 to boss1).
Jan 15, 15:59 MST
The upgrade on boss2 is still in progress; but we have become aware that there may be disruption to beegfs from at least some access points (notably login nodes). We are investigating.
Jan 15, 14:35 MST
Our first failover operation has completed successfully and without error. All PetaLibrary beegfs-storage load is currently being carried by the "boss1" server. We will proceed with upgrades on boss2, during which time there will likely be performance degradation but no loss of access to data.
We will continue to provide updates here as we make progress.
Jan 15, 13:48 MST
In progress -
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Jan 15, 13:01 MST
We intend to begin maintenance activities on this cluster at approximately 13:00 today, though this will include some transit time for us to relocate to the datacenter. We will continue to provide updates here as we progress.
Jan 15, 11:01 MST
Research Computing will be conducting off-cycle planned maintenance this Wednesday to address a known issue with the ZFS component of the PetaLibrary. During the maintenance period, access to PetaLibrary and compute on RMACC Summit and Blanca should continue. There will be momentary pauses in IO as services are moved from one storage server to another, and a likely decrease in performance with one server carrying the entire load; but we will do everything we can to ensure that the service remains up and available throughout the maintenance.
This activity addresses the previously-reported incident at https://curc.statuspage.io/incidents/5m7bvjcz7ktm
Jan 13, 12:26 MST