We have observed that over the last 48 hours our BeeGFS metadata storage utilization has jumped from 93% to 96%. This is a significant jump that we must respond to immediately, despite our desire to do no maintenance outside of a planned outage.
We now understand why our metadata utilization is higher than expected, and details are below; but our first action will be to run a BeeGFS maintenance command to identify and remove orphaned files that are no longer resident in the file system, but nonetheless taking up space on the backend storage. This may allow the primary metadata server to return to a utilization ratio equivalent to what we see on the secondary, 85%.
Should we need to take additional action once that's complete, we will advise here in a further message.
We are currently only considering actions that should be able to be completed with zero downtime, and with much planning and consideration; but should we accidentally provoke an outage, we will advise here.
--
We believe we are seeing higher metadata storage consumption because, while we are using 512-byte inodes, the BeeGFS extended attributes on that inode are requiring the allocation of an additional 4Kb data block. As a result, the vast majority of our inodes are consuming 4,608 bytes, rather than the expected 512 bytes. This is because our files have a stripe width of 16 (i.e., they write data in parallel to 16 storage targets each) whereas a 512-byte inode can only internally accommodate a stripe width of 4.
We have been advised that a 1024-byte (1Kb) inode would be able to accommodate a stripe width of 16, which is 22% of our current effective metadata storage consumption. This will require the reformat of our metadata storage file systems which, given our redundant "buddy-mirror", should be able to be performed live (first on our secondary, then on our primary). Given our experiences trying to perform maintenance on BeeGFS live so far, we had hoped to do this during our upcoming planned outage; but if we can't bring utilization down otherwise, we may have to proceed immediately.
We also intend to move our file system from a stripe width of 16 to a stripe width of 4; but this restripe is a heavy, long-term operation, and unlikely to resolve our issue in the short term.
Finally, we have already ordered additional metadata storage space; but we do not know how long it will take for the order to clear CU, and how long it will take for the disks to arrive. Probably not long, but given our current situation, we should not wait.
Posted May 24, 2019 - 11:00 MDT