Upgrade requiring outage for Sneffels ("viz") cluster
Scheduled Maintenance Report for CU Boulder RC
Completed
It appears that Sneffels (viz) is working properly. Most sessions have been disabled by Tim Dunn while he evaluates them for remaining porting effort; but the general-purpose "Remote Desktop" session is published and works as expected. Be advised that the modules environment now reflects the same production software environment as is seen on Summit; but the viability of this software on the viz architecture and operating system remains to be seen.

There's a little bit of clean-up work to do (software porting, as said; viz1 still has a network configuration issue to be resolved, so it's been disabled for now) for in general the service is again operational.

Thank you for your understanding and patience while we've performed this upgrade. We've gained much experience in this process, and future upgrades should be much more straightforward.
Posted 14 days ago. Oct 04, 2018 - 10:12 MDT
Verifying
Sneffels (viz) is back up, and I expect it to be working at this point; however, I have not actually gone through and validated that all existing configuration is working at this point.

Feel free to attempt to use it (at viz.rc.colorado.edu) and let us know if anything isn't working; but I'm going to leave this maintenance marked as "verifying" until I've proactively reviewed everything for at least a baseline of functionality.
Posted 14 days ago. Oct 03, 2018 - 23:35 MDT
Update
Problems with Enginframe and DCV appear to have been resolved; however, we have been unable to successfully start a session on RHEL7, seemingly due to an incompatibility between gnome3 and the particular Nvidia GPU installed in the viz servers. We have already demonstrated this stack works successfully on RHEL6 in a test environment, so we are reprovisioning the hosts to RHEL6 now.
Posted 15 days ago. Oct 02, 2018 - 23:00 MDT
Update
The problem we encountered with dcv has been resolved. We are continuing work to bring viz back into production. Currently we are waiting on a re-generated license file which was built improperly by the software vendor.
Posted 16 days ago. Oct 02, 2018 - 11:46 MDT
Update
Much of the upgrade of the Sneffels visualization system is complete. However, we have encountered a problem starting DCV (the remote visualization server that runs on the viz nodes) on the updated operating system, RHEL7. A support request has been filed, and we will proceed with returning the system to operation when we have been advised by support.

Should we be unable to proceed with RHEL7, we will revert to RHEL6 (but retaining the updated versions of Enginframe and DCV. This configuration has already been demonstrated to work in a test environment.
Posted 16 days ago. Oct 01, 2018 - 21:43 MDT
In progress
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Posted 17 days ago. Oct 01, 2018 - 18:50 MDT
Update
We will be undergoing scheduled maintenance during this time.
Posted 17 days ago. Oct 01, 2018 - 18:49 MDT
Scheduled
Work upgrading the Sneffels (viz) cluster is ongoing. This maintenance was previously announced as completed by an automated process.

The Sneffels ("viz") cluster will be unavailable for use Monday, 1 October, for a period to support the upgrade of the EnginFrame environment and dcv software. This maintenance will also return all five nodes to production (currently down to two nodes).

Effort will be made to return the system to production as soon as possible, including earlier than the scheduled 8-hour maintenance period. Should any issues arise during the course of the upgrade, we will provide updates here.
Posted 17 days ago. Oct 01, 2018 - 18:49 MDT
This scheduled maintenance affected: EnginFrame.