Extended planned maintenance, 5 August 2020
Scheduled Maintenance Report for CU Boulder RC
Completed
Summit has been returned to service, and today's planned maintenance activities have all completed successfully.
Posted Aug 07, 2020 - 00:19 MDT
Update
Scheduled maintenance is still in progress. We will provide updates as necessary.
Posted Aug 06, 2020 - 23:16 MDT
Update
We have restored network access to the HPCF and are slowly bringing compute resources back online. We hope to have Summit and Blanca HPC resources back in production in the next few hours.
Posted Aug 06, 2020 - 21:50 MDT
Update
We are still waiting for network access to be restored at the HPCF. Once connectivity has been restored we should be in a position to restore service.
Posted Aug 06, 2020 - 19:01 MDT
Update
Power has been restored at the HPCF, and we are starting to bring systems back up. A simultaneous network change at the HPCF "gateway" has presented some configuration challenges, and we are working through those now.
Posted Aug 06, 2020 - 16:25 MDT
Update
Today's planned maintenance activities in the HPCF are in progress and reportedly on-schedule. We are scheduled to have power again at 2:30 PM, and will do our best to restore service as soon as possible after that.
Posted Aug 06, 2020 - 09:05 MDT
Update
Today's maintenance is in progress. However, it has come to our attention that we neglected to announce that a subset of PetaLibrary allocations, still temporarily located on Summit storage, are also unavailable for the duration of this maintenance period. We regret omitting this detail in our prior announcement.
Posted Aug 05, 2020 - 12:40 MDT
In progress
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Posted Aug 05, 2020 - 07:00 MDT
Update
Because of the impact and breadth of the HPCF maintenance scheduled to start 5 August (Wednesday) I have so far emphasized it in our announcements; but I have neglected to mention that we also intend to add additional storage to the PetaLibrary/active cluster, in pursuit of supporting native ZFS allocations and an eventual migration away from BeeGFS. This should be non-disruptive; but we have experienced disruption in the past. For this reason we are performing this operation during the maintenance period, though we do not intend to proactively halt jobs on the portion of Blanca that is otherwise unaffected by the HPCF work.
Posted Aug 04, 2020 - 23:17 MDT
Update
Be reminded that we have an extended planned maintenance outage for the HPCF, including Summit and a portion of Blanca, scheduled to start tomorrow, 5 August 2020 (Wednesday). This outage addresses an outstanding electrical health and safety issue at the datacenter.
Posted Aug 04, 2020 - 14:15 MDT
Update
Be reminded that we have an extended planned maintenance outage for the HPCF, including Summit and a portion of Blanca, scheduled 5 August 2020 (Wednesday). This outage addresses an outstanding electrical health and safety issue at the datacenter.
Posted Jul 23, 2020 - 10:25 MDT
Scheduled
The datacenter operations team that supports the RC environment has requested an extended 48-hour outage window to correct an electrical health and safety issue at the High Performance Computing Facility (HPCF). This outage is being scheduled to coincide with our August regular maintenance schedule.

During this maintenance, Summit compute, Summit scratch, and Blanca HPC will be entirely offline. This includes the following Blanca partitions:

- blanca-curc
- blanca-nso
- blanca-topopt
- blanca-ngpdl

While the maintenance is scheduled for 48 hours, we will endeavor to complete the work as quickly as possible without compromising the necessary work.

If you have any questions or concerns, please contact rc-help@colorado.edu.
Posted Jul 07, 2020 - 12:01 MDT
This scheduled maintenance affected: Blanca, PetaLibrary, and RMACC Summit.