HPCF Cooling incident
Incident Report for CU Boulder RC
Resolved
This incident has been resolved.
Posted Sep 07, 2020 - 19:48 MDT
Monitoring
Datacenter operations and facilities management both report that the cooling systems inside the HPCF are operating normally again. We have enabled all the queues on Summit again and will be monitoring the systems into the evening.
Posted Sep 07, 2020 - 13:48 MDT
Investigating
Around 11:50 today we started receiving alerts that there are issues with the cooling systems inside the HPCF which houses Summit and Blanca HPC nodes. We have set all queues on Summit to a state of down for the time being in order to help reduce the load on Summit while datacenter operations and facilities management work on addressing the cooling issues. We have not stopped the queues for Blanca HPC just yet since the temperature for those nodes have not reached critical limits yet.
Posted Sep 07, 2020 - 12:38 MDT
This incident affected: RMACC Summit.