Closure Report
Project Summary
The purpose of the project was to refresh the Storage Infrastructure used by Datastore which was coming to end of life. This was done in two tranches over two financial years.
Delivery involved;
- Manhandling dozens of storage arrays logistically and physically inside live racks.
- Thousands of spinning disks inside those arrays to be considered.
- Migrating petabytes of data from a->b.
- Migrating terabytes of metadata from a->b.
- Doing so across 8 separate live file-systems whilst they're in use 24/7 with significant i/o load.
- Re-configuring 23 servers live.
- Fettling vmware in addition to GPFS and migrating storage for 150 virtual machines that provide a multitude of external and internal services.
This has been achieved with *no impact to service* at all.
Doing this with no observable impact to services (between datastore and vmware this impacted *all* our services) or end users is no mean feat from a technical and organizational perspective.
All redundant kit has / will be disposed of as per agreed secure standards.
Status Of Project Benefits
As the storage was coming to five years old and going out of support, it basically enables Datastore to keep going. This was a refresh rather than improve project.
Explanation for variance
Pretty much a complete success.
Explanation for variance
There wasn’t really much variance in this project. A couple of minor delays here and there but nothing that detracted from the final outcome.
Key Learning Points
We (I?) Could do with better communication with Estates. There were several instances where we had little or no idea what was going on with building work and roads being dug up at the back of JCMB. To the point where I was calling building site managers directly and asking them not to dig up roads until we had a delivery in (they were very accommodating). This information should centrally available. There seems to be an attitude from some areas just to say no, rather than try and accommodate.
The only other learning point I can think of is that while the Research Systems Team wanted to look at other potential technologies and solutions once we had the budget, it was really too late to do that at that point if we wanted to complete by end of financial year. Should we possibly have a parallel workstream looking at these options? Not just for Storage, we had a similar discussion when we were buying nodes for Eddie and the Cloud.
Outstanding Issues
We have one array that still needs to come out. It was missed when the rest were disconnected. It will be removed in the next few weeks. It’s not an issue.
Follow On Tasks
Aside from the above recommendation that we create a separate workstream to look at potential future technologies. Nothing else I can think of.