Roslin Expansion throughout 2016 and 2017 - involving provisioning of the Roslin cluster filesystem in two tranches, provision of large memory compute nodes, GPU nodes, and memory upgrades across some existing nodes.
Key Learning Points:
As a result of unexpected high operational overheads on the core cluster service throughout 2016, effort available to this project was at lower than required. As a result of this, and some additions in funding available for Roslin hardware, the filesystem expansion was significantly delayed. Throughout the course of the project we have identified that better risk and resource management in the early stages of the project would have been of great benefit. We have now adopted a more formal approach to managing project resources and risks.
The Roslin community now has access to the increased storage and compute that this project has delivered.
Critical issues with the Eddie 3 main cluster filesystem were resolved with the aid of the Roslin community and the filesystem hardware, giving the team much-needed flexibility in the recovery/rebuild process. The service and its wider community has directly benefitted from the IS partnership with Roslin.
Roslin filesystem allocations are not automatically tagged as such in the Storage Manager application. A manual process has been created to cover the period until the introduction of a new allocation management platform.
Overall usage tracking on the Roslin filesystem is not implemented in Storage Manager. A manual process is being developed to cover the period until the introduction of a new allocation management platform.