Future Storage Models

Researchers have a strong requirement for being able to run analysis across any data they hold (no matter the storage system/location) and to be able to operate on the data as a single large data pool. (This is often referred to as a Data Lake)

A distributed storage architecture presents data which may be held at any arbitrary location within the system to the user or compute resources as an equivalent of local storage

The original EDDIE 3/DataStore upgrade had this as an ambition for the new systems but at that point the technology was not yet mature and reliable. In the subsequent 3 – 4 years the technology has developed rapidly to the point where there are now a number of major providers with such technologies available and suitable for large scale operations    In 17/18 IT-I RSS ran a feasibility study looking at the available extensions to the IBM technology currently used for DataStore providing this capability and a limited scan of competing options. The result of this in summary, is that the technology is potentially suitable but the IBM offering may be currently too costly   Proof of Concept systems for the IBM Spectrum Cloud technology and the competing CEPH OpenSource systems have been examined. We would like to extend this to a small number of other potential systems (eg. WesternDigital ActiveScale and Cloudian) and move to providing a limited scale Development system which can be made available to researchers interested in testing the capabilities of this technology, with the aim of identifying the correct direction for the eventual roll out of a comprehensive system    In addition many modern distributed storage solutions have at their core some form of Object Storage. This type of storage differs from traditional file based storage in that the data is stored as effectively a Binary Blob with associated metadata providing the data location information

Such an approach means that data is locatable using metadata search rather than traversing file trees. This has a number of benefits for the user and the system; rich search data discovery, improved system speed as operations do not need to span file trees

 

This Project is to complement and extend the POC for Future Storage Models already looked at under RSS039.

This Project will investigate and compare the new options with the same POC peramiter as previously conducted.

The options being considered are:

1. Arcastream

2. Cloudian

3. Western Digial

 

3rd Party discussions have taken place and all companies have been sent the POC details and are eager to progress, but due to RSS resources, this Project has been placed on hold until Q1/2 when resources should be available to concentrate on the POC.

 

Current project status

Report Date RAG Budget Effort Completed Effort to complete
May 2021 GREEN 50.0 days 4.0 days 46.0

Project Info

Project
Future Storage Models
Code
RSS213
Programme
ITI - Research Services (RSS)
Management Office
ISG PMO
Project Manager
Maurice Franceschi
Project Sponsor
David Fergusson
Current Stage
Deliver
Status
In Progress
Project Classification
Grow
Start Date
04-May-2020
Planning Date
18-Dec-2020
Delivery Date
15-Jan-2021
Close Date
30-Jun-2021
Overall Priority
Normal
Category
Discretionary

Documentation

Project Dashboard

Project journal

No entries found.

Change dashboard

Nothing to report.