(QOSGrpNodeLimit) Reason on Summit

Incident Report for CU Boulder RC

Resolved

This incident has been resolved.

Posted Jun 07, 2018 - 17:25 MDT

Monitoring

We have implemented a fix and will be monitoring the queues to ensure the fix had the intended outcome.

Posted Jun 07, 2018 - 17:05 MDT

Investigating

Some users jobs are being held in the queue due to the reason of (QOSGrpNodeLimit) when there are nodes available within the partition to run. We are currently investigating this issue and are working with the vendor to resolve the issue.

Posted Jun 07, 2018 - 13:07 MDT

This incident affected: RMACC Summit.