The Beegfs management problem occurred today. However, this time the problem manifested differently. We have no indication that the File Descriptors used by the management daemon was growing like it used to when the problem happened before. Also, the daemon recovered from the error itself. Before, we had to manually restart it to allow the clients to connect to it again.
We didn't realize the problem occurred until we got a ticket from a user reporting that his jobs failed today.
We did get debug data with GDB though. So that was communicated to the vendor in addition to the log messages verified this time.
Management daemon restarts will be enabled again at least until we hear back from the vendor. We will inform the next steps according to their feedback to us.
If you have any questions please write to firstname.lastname@example.org