PetaLibrary Active Targets Offline
Incident Report for CU Boulder RC
Resolved
This incident has been resolved.
Posted Oct 02, 2019 - 09:58 MDT
Monitoring
All storage targets are now online.
During the procedure of creating new storage targets for new customers today, boss1 storage daemon restart failed. It kept being in an infinite loop of stopping and starting. The restart of this daemon shouldn't cause an outage as done many times before. However, it is understood that it affected the filesystem operation this time due to targets being full or nearly full. Quota was slightly increased for those targets and the old targets and new ones that were offline went back online after allowing again new targets to join the system. We will follow up with PI's for the 2 storage pools which had increased quota.

New monitoring will be added to PL/Active to notify of nearly full targets so as to avoid this problem in the future.
Posted Oct 01, 2019 - 19:52 MDT
Investigating
It was verified that targets attached to boss1 storage server went offline. It is not immediately clear why those targets went and remain offline. But we are investigating. That is affecting PetaLibrary Active spaces hosted in Beegfs. More news soon.
Posted Oct 01, 2019 - 18:47 MDT
This incident affected: PetaLibrary.