Summit filesystem errors
Incident Report for CU Boulder RC
Resolved
No more problems were verified in summit. It has been monitored and tests were done earlier today.
Nevertheless, if you find any problems please write to rc-help@colorado.edu.
Posted Nov 01, 2018 - 14:44 MDT
Monitoring
We believe the problem is fixed. But we are still monitoring. There was a compute node in a bad state which may have created a deadlock in the filesystem. We removed this node and after that the deadlock went away.

We are still in contact with our vendor to confirm the root cause of the problem. But tests indicate the problem is solved as we can access both scratch and the PL new spaces after the action above was taken.

If you still find any problems, please contact us via rc-help@colorado.edu.
Posted Oct 31, 2018 - 16:37 MDT
Investigating
We received some reports and confirmed that there is a problem affecting summit (scratch and new PL spaces) filesystem. You may observe a long response to "cd" or "ls" in either scratch or PL new spaces and some times that hangs. A ticket is being opened with our vendor and hopefully we will have an update soon.
Posted Oct 31, 2018 - 14:31 MDT
This incident affected: PetaLibrary and RMACC Summit.