PetaLibrary enclosure 1 hardware failure, 12 allocations offline
Incident Report for CU Boulder RC
Resolved
The direct cause of the failure that took the enclosure offline is still unknown, but we are pursuing a possible conflict between a software package and the hardware platform with the vendor. If this was related to the failure, we will update the software during our next maintenance day. 24 hours after the failure we have seen no further signs of problems, and are closing the incident.
Posted Sep 01, 2023 - 16:11 MDT
Monitoring
The issue has been identified and a fix is in place. We'll continue to monitor and follow up with the hardware vendor tomorrow.
Posted Aug 31, 2023 - 19:25 MDT
Investigating
A single PetaLibrary enclosure is offline, affecting the following allocations:

CCPM
JKL_IDEAL
JKL_OBS
JKL_REAL
JKL_TEACHING
Randolph_Lab
Stanislabski
Toney-group
bbsrinformatics
bbsrinformaticsdata
caruthers-research
phet

We are working to determine the cause of the failure. The vendor has been notified and we are heading to the data center to troubleshoot further. Most other allocations will be unaffected, although we may need to move services between hosts to isolate a hardware issue, which could cause brief outages for some allocations not listed above.
Posted Aug 31, 2023 - 13:23 MDT
This incident affected: PetaLibrary.