Future Storage Models

Researchers have a strong requirement for being able to run analysis across any data they hold (no matter the storage system/location) and to be able to operate on the data as a single large data pool. (This is often referred to as a Data Lake)

A distributed storage architecture presents data which may be held at any arbitrary location within the system to the user or compute resources as an equivalent of local storage

The original EDDIE 3/DataStore upgrade had this as an ambition for the new systems but at that point the technology was not yet mature and reliable. In the subsequent 3 – 4 years the technology has developed rapidly to the point where there are now a number of major providers with such technologies available and suitable for large scale operations    In 17/18 IT-I RSS ran a feasibility study looking at the available extensions to the IBM technology currently used for DataStore providing this capability and a limited scan of competing options. The result of this in summary, is that the technology is potentially suitable but the IBM offering may be currently too costly   Proof of Concept systems for the IBM Spectrum Cloud technology and the competing CEPH OpenSource systems have been examined. We would like to extend this to a small number of other potential systems (eg. WesternDigital ActiveScale and Cloudian) and move to providing a limited scale Development system which can be made available to researchers interested in testing the capabilities of this technology, with the aim of identifying the correct direction for the eventual roll out of a comprehensive system    In addition many modern distributed storage solutions have at their core some form of Object Storage. This type of storage differs from traditional file based storage in that the data is stored as effectively a Binary Blob with associated metadata providing the data location information

Such an approach means that data is locatable using metadata search rather than traversing file trees. This has a number of benefits for the user and the system; rich search data discovery, improved system speed as operations do not need to span file trees


This Project is to complement and extend the POC for Future Storage Models already looked at under RSS039.

This Project will investigate and compare the new options with the same POC peramiter as previously conducted.

The options being considered are:

1. Arcastream

2. Cloudian

3. Western Digial


3rd Party discussions have taken place and all companies have been sent the POC details and are eager to progress, but due to RSS resources, this Project has been placed on hold until Q1/2 when resources should be available to concentrate on the POC.


Current project status

Report Date RAG Budget Effort Completed Effort to complete
July 2021 BLUE 50.0 days 4.0 days 46.0

Project Info

Future Storage Models
ITI - Research Services (RSS)
Management Office
Project Manager
Maurice Franceschi
Project Sponsor
David Fergusson
Current Stage
Project Classification
Start Date
Planning Date
Delivery Date
Close Date
Overall Priority


Project Dashboard

Project journal

No entries found.

Change dashboard

Nothing to report.