Core storage "Permission Denied" errors
Incident Report for CU Boulder RC
Resolved
Dell believes that they have identified the root cause of this error--a bug in the OneFS file system that is triggered by our environment. We are still waiting for this bug to be fixed; but we have ceased the replication activity that was regularly triggering this bug until the fix is able to be deployed.
Posted Sep 30, 2020 - 10:32 MDT
Update
We have identified a backup process that correlated with the timing of most of these errors. Two days ago we suspended this backup process and the error has not reoccurred since. We are still working with Dell to understand why the failure was occurring, and plan to restore the backup process once a fix has been deployed.
Posted Aug 13, 2020 - 12:36 MDT
Identified
We have been following up with DELL on this problem on the Core Storage which is still present. We were able to reproduce it as well by running Matlab jobs and confirmed the "permission denied" on some of those jobs.
According to DELL, the root cause was identified and they are working on a solution which may come on the weekend.

In the meantime, we’ve asked DELL if there is any condition(s) that can be monitored to prevent the problem from happening. The problem can also affect NFS shares like it did today for PetaLibrary Active spaces that uses Beegfs.
Posted Aug 06, 2020 - 11:59 MDT
Update
Dell/EMC has escalated the issue internally to their engineering department.
Posted Jul 09, 2020 - 16:28 MDT
Investigating
An issue has developed on the core storage (/home, /projects, /curc/*). Occasional "Permission Denied" errors are occurring when accessing files on core storage. We have as of yet been unable to successfully duplicate the error condition. This seems to be related to an update made to the operating system of the Isilon cluster.

We have a support ticket open with Dell/EMC and are working with them to determine the cause of this issue and develop a resolution.
Posted Jul 09, 2020 - 10:18 MDT
This incident affected: Research Computing Core, Blanca, and RMACC Summit.