Last night there was a momentary power disruption in the network gateway at HPCF. For some jobs that may have led only to a hold / block of IO; but for others it may have caused job failure.
This unexpected failure is ultimately a problem with our not properly monitoring the status of the redundant power supply for this network equipment. We will be investigating this and intend to add monitoring for this state in the future.
Posted Nov 14, 2019 - 09:28 MST
This incident affected: Science Network, RMACC Summit, and Blanca.