Summit has been returned to service, and today's planned maintenance activities have all completed successfully.
Aug 7, 00:19 MDT
Scheduled maintenance is still in progress. We will provide updates as necessary.
Aug 6, 23:16 MDT
We have restored network access to the HPCF and are slowly bringing compute resources back online. We hope to have Summit and Blanca HPC resources back in production in the next few hours.
Aug 6, 21:50 MDT
We are still waiting for network access to be restored at the HPCF. Once connectivity has been restored we should be in a position to restore service.
Aug 6, 19:01 MDT
Power has been restored at the HPCF, and we are starting to bring systems back up. A simultaneous network change at the HPCF "gateway" has presented some configuration challenges, and we are working through those now.
Aug 6, 16:25 MDT
Today's planned maintenance activities in the HPCF are in progress and reportedly on-schedule. We are scheduled to have power again at 2:30 PM, and will do our best to restore service as soon as possible after that.
Aug 6, 09:05 MDT
Today's maintenance is in progress. However, it has come to our attention that we neglected to announce that a subset of PetaLibrary allocations, still temporarily located on Summit storage, are also unavailable for the duration of this maintenance period. We regret omitting this detail in our prior announcement.
Aug 5, 12:40 MDT
In progress -
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Aug 5, 07:00 MDT
Because of the impact and breadth of the HPCF maintenance scheduled to start 5 August (Wednesday) I have so far emphasized it in our announcements; but I have neglected to mention that we also intend to add additional storage to the PetaLibrary/active cluster, in pursuit of supporting native ZFS allocations and an eventual migration away from BeeGFS. This should be non-disruptive; but we have experienced disruption in the past. For this reason we are performing this operation during the maintenance period, though we do not intend to proactively halt jobs on the portion of Blanca that is otherwise unaffected by the HPCF work.
Aug 4, 23:17 MDT
Be reminded that we have an extended planned maintenance outage for the HPCF, including Summit and a portion of Blanca, scheduled to start tomorrow, 5 August 2020 (Wednesday). This outage addresses an outstanding electrical health and safety issue at the datacenter.
Aug 4, 14:15 MDT
Be reminded that we have an extended planned maintenance outage for the HPCF, including Summit and a portion of Blanca, scheduled 5 August 2020 (Wednesday). This outage addresses an outstanding electrical health and safety issue at the datacenter.
Jul 23, 10:25 MDT
The datacenter operations team that supports the RC environment has requested an extended 48-hour outage window to correct an electrical health and safety issue at the High Performance Computing Facility (HPCF). This outage is being scheduled to coincide with our August regular maintenance schedule.
During this maintenance, Summit compute, Summit scratch, and Blanca HPC will be entirely offline. This includes the following Blanca partitions:
While the maintenance is scheduled for 48 hours, we will endeavor to complete the work as quickly as possible without compromising the necessary work.
If you have any questions or concerns, please contact firstname.lastname@example.org
Jul 7, 12:01 MDT