Older DTN's data interface problem
Incident Report for CU Boulder RC
Resolved
All new production has been moved to the new DTN nodes, including all names including dtn.rc.colorado.edu, dtn.rc.int.colorado.edu, dtn-data.rc.int.colorado.edu, and dtn-new-data.rc.int.colorado.edu. As such, we are closing this incident. Some globus shared endpoints are still using the legacy DTN nodes, but migrating shared endpoints to the new nodes requires coordination with the endpoint owners. We will be scheduling a date for shutdown of the legacy DTN, after which such shared endpoints will stop working; but we will still be able to migrate them after-the-fact.
Posted Mar 03, 2021 - 12:23 MST
Monitoring
We may have found the reason why the data interfaces of the older DTN's aren't working as expected.

We've informed that a workaround (until we have the new nodes tested) is to use dtn02.rc.colorado.edu that right now relies on other network interfaces. But for some reason the connection between dtn02.rc.colorado.edu and other RC nodes are unstable, as noted today.

We are close to complete the transition of users to the new nodes, after testing FTP and sshfs services. A message with the current status of this transition will be ent very shortly via rc-news@colorado.edu.
Posted Feb 10, 2021 - 11:13 MST
Investigating
There is a problem affecting the data interface of our older data transfer nodes.
We can't reach other hosts on the RC data network with those interfaces. As such Beegfs PL/active fails to mount on those nodes as well as summit scratch, etc.

User who use sshfs and ftp via dtn.rc.colorado.edu or dtn-data.rc.int.colorado.edu are affected by this issue as well.

Those nodes are specifically dtn01.rc.colorado.edu and dtn02.rc.colorado.edu.
Some services are still running on dtn02 because the data interface was brought down and the management interface is serving the traffic used by those services. So you may want to use specifically dtn02.rc.colorado.edu when doing a rsync or using ftp or sshfs.

We are prioritizing the work that still needs to be completed for the new data transfer nodes in order to decommission dtn01 and dtn02. That will take possibly more than a week. But we should be able to move users to the new nodes within a week while we continue to finalize all the tasks we had identified for the new nodes. Users will be contacted by email.

If you have any questions, please write to us at rc-help@colorado.edu.
Posted Feb 05, 2021 - 10:34 MST
This incident affected: Research Computing Core.