All Systems Operational
Research Computing Core   ? Operational
Science Network   ? Operational
RMACC Summit   ? Operational
Blanca   ? Operational
PetaLibrary   ? Operational
EnginFrame   ? Operational
JupyterHub   ? Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Past Incidents
Oct 18, 2018

No incidents reported today.

Oct 17, 2018

No incidents reported.

Oct 16, 2018

No incidents reported.

Oct 15, 2018

No incidents reported.

Oct 14, 2018

No incidents reported.

Oct 13, 2018

No incidents reported.

Oct 12, 2018

No incidents reported.

Oct 11, 2018

No incidents reported.

Oct 10, 2018

No incidents reported.

Oct 9, 2018

No incidents reported.

Oct 8, 2018

No incidents reported.

Oct 7, 2018

No incidents reported.

Oct 6, 2018

No incidents reported.

Oct 5, 2018

No incidents reported.

Oct 4, 2018
Completed - It appears that Sneffels (viz) is working properly. Most sessions have been disabled by Tim Dunn while he evaluates them for remaining porting effort; but the general-purpose "Remote Desktop" session is published and works as expected. Be advised that the modules environment now reflects the same production software environment as is seen on Summit; but the viability of this software on the viz architecture and operating system remains to be seen.

There's a little bit of clean-up work to do (software porting, as said; viz1 still has a network configuration issue to be resolved, so it's been disabled for now) for in general the service is again operational.

Thank you for your understanding and patience while we've performed this upgrade. We've gained much experience in this process, and future upgrades should be much more straightforward.
Oct 4, 10:12 MDT
Verifying - Sneffels (viz) is back up, and I expect it to be working at this point; however, I have not actually gone through and validated that all existing configuration is working at this point.

Feel free to attempt to use it (at viz.rc.colorado.edu) and let us know if anything isn't working; but I'm going to leave this maintenance marked as "verifying" until I've proactively reviewed everything for at least a baseline of functionality.
Oct 3, 23:35 MDT
Update - Problems with Enginframe and DCV appear to have been resolved; however, we have been unable to successfully start a session on RHEL7, seemingly due to an incompatibility between gnome3 and the particular Nvidia GPU installed in the viz servers. We have already demonstrated this stack works successfully on RHEL6 in a test environment, so we are reprovisioning the hosts to RHEL6 now.
Oct 2, 23:00 MDT
Update - The problem we encountered with dcv has been resolved. We are continuing work to bring viz back into production. Currently we are waiting on a re-generated license file which was built improperly by the software vendor.
Oct 2, 11:46 MDT
Update - Much of the upgrade of the Sneffels visualization system is complete. However, we have encountered a problem starting DCV (the remote visualization server that runs on the viz nodes) on the updated operating system, RHEL7. A support request has been filed, and we will proceed with returning the system to operation when we have been advised by support.

Should we be unable to proceed with RHEL7, we will revert to RHEL6 (but retaining the updated versions of Enginframe and DCV. This configuration has already been demonstrated to work in a test environment.
Oct 1, 21:43 MDT
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Oct 1, 18:50 MDT
Update - We will be undergoing scheduled maintenance during this time.
Oct 1, 18:49 MDT
Scheduled - Work upgrading the Sneffels (viz) cluster is ongoing. This maintenance was previously announced as completed by an automated process.

The Sneffels ("viz") cluster will be unavailable for use Monday, 1 October, for a period to support the upgrade of the EnginFrame environment and dcv software. This maintenance will also return all five nodes to production (currently down to two nodes).

Effort will be made to return the system to production as soon as possible, including earlier than the scheduled 8-hour maintenance period. Should any issues arise during the course of the upgrade, we will provide updates here.
Oct 1, 18:49 MDT