The aim of this project is to deliver an online workshop, to be held in March 2021, that will advance development of sophisticated tools and methodologies to efficiently process a large corpus of uncatalogued and undescribed printed text that will make it discoverable and available as data. Previously, the only way that researchers could access the content within an uncatalogued or indexed collection such as The Scottish Court of Session Papers was with some knowledge of a particular case, the year(s) in which it was argued or decided, and the library in which the documents might be located. Researchers often have to request all volumes relating to a given year before wading through them to identify relevant printed material. We aim to demonstrate how, by applying OCR, machine learning, and IIIF technology to Session Papers, it is now possible to search for individuals’ names with specific documents or across such collections and to explore thematically linked cases.. We intend to demonstrate how we have integrated OCR technology integrated with the International Image Interoperability Framework (IIIF), not only to create an accurate text transcription to underlay the digital facsimile, but to train the machines to automatically harvest and generate metadata from the digitised documents, making them more discoverable and usable for scholarly and public research. We will also show how the use of Transkribus, handwritten text recognition software can open up collections of uncatalogued hand-written material. This will draw on the University of Edinburgh’s experience of working with the handwritten notebooks of Sir Charles Lyell, the leading 19th century geologist. The event will offer opportunities for informal conversation and networking so that researchers and stakeholders can exchange experiences, ideas and arguments The project will employ a Project Coordinator to coordinate the promotion, marketing and delivery of the workshop. This role will be supported by two part time project Interns.

Sessions will be spread over a three-day period, not only to reduce online fatigue, but also to enable us to structure the programme to enable participation from audiences in different time zones.  The majority of sessions will take place from mid-afternoon to mid evening, UK time, in order to draw European and Eastern U.S. audiences but, subject to demand, additional sessions could be arranged for other time zones, either through recordings or facilitated by partners based outside the UK. For example, our project partners at the University Virginia Law Library may re-run a UK daytime session later in their day to enable participation from people on the west coast of the US and further afield to take part. All sessions will be held using Zoom and will require advanced registration and, with sufficient lead-in time, we can thus programme an event that includes specially scheduled times to allow attendees to form regional community connections in the Americas, Europe, Asia, and more. Participants may sign up for all events during the programme, or just the ones that interest them.

The project is funded through the Scottish Library and Information Council's Resource Discovery Fund, with an award of £9,652.

Current project status

Report Date RAG Budget Effort Completed Effort to complete
May 2021 BLUE 0.0 days 0.0 days 0.0

Project Info

Project
EDITOR (Enhanced Discovery with Integrated Toolkits and Optical Recognition)
Code
LUC040
Programme
Library & Collections - Centre for Research Collections (LUCCRC)
Management Office
ISG PMO
Project Manager
Norman Rodger
Project Sponsor
Daryl Green
Current Stage
Close
Status
In Progress
Start Date
10-Dec-2020
Planning Date
31-Mar-2021
Delivery Date
n/a
Close Date
31-Mar-2021
Overall Priority
Normal

Documentation

Not available.

Project Dashboard

Project journal

No entries found.

Change dashboard

Nothing to report.