PetaLibrary race condition leading to "Bad address" write error
Incident Report for CU Boulder RC
ZFS has been updated on both beegfs-storage servers, which is expected to resolve this issue.
Posted Feb 05, 2020 - 16:35 MST
The fix addressing this issue has been deployed to one of our two storage servers; but coincident problems prevented us from finishing both servers on the scheduled day. We will re-schedule the completion of this effort, possibly next week.
Posted Jan 17, 2020 - 09:59 MST
A component of the PetaLibrary/active service (ZFS, providing storage for beegfs-storage, part of the BeeGFS parallel file system) is experiencing a load-induced race condition. When the race condition results in an error, a write fails with an error message like "Bad address".

This issue has previously been reported (and resolved) upstream.

This fix is available in the 0.8 branch of ZFS. We are planning an update from our currently-deployed ZFS 0.7.13 to resolve this issue. We will provide updates here as more information becomes available.
Posted Jan 06, 2020 - 13:58 MST
This incident affected: PetaLibrary.