The BBC Archive contains material dating the back to the 1880s, preceding the formation of the Corporation itself. The BBC Rewind project was set up to enable journalists and researchers to find and use those “golden nuggets” of archive footage, which can provide both context and illumination for a story.
The existing process for users to retrieve assets from the archive was time consuming and labour intensive. They could search legacy metadata systems for a programme using known information (e.g. title or synopsis) to locate its tape/reel. They could then order this tape to be physically shipped to their location where they could play it. This process could take a few days, and even then, until they are able to view the tape, they aren't sure if it's what they are looking for. For time sensitive stories, this turnaround time was very constraining.
Another issue was the large number of newly digitised archive assets which had minimal metadata associated with them (e.g. just a tape with a vague title), and so it was difficult to surface such assets in search results. The content of the video itself needed to be searchable.
I was engaged with the remit of architecting and implementing a system which addressed these problems, working closely with in-house Media Managers and Researchers (the core users and subject matter experts for the archive) and engineering staff.
I delivered a web portal which provides users with a streamlined workflow for searching for, viewing, exporting and creating curated collections of archive content.
Behind the scenes, we used Elasticsearch as the asset search engine. This allowed us to deliver lightning fast results, tune relevance by weighting certain fields more than others and perform complex facet queries to narrow the scope of search. Indexing jobs were developed to pull in asset data from several existing systems. We integrated a speech-to-text tool to generate transcripts, enabling users to search on words spoken within a video or audio clip, vastly increasing the discoverability of many older archive clips.
The system was originally deployed to on-premise infrastructure but was subsequently ported to AWS EC2.
Other significant technologies used include: Node.js, AngularJS, Docker, MongoDB.
“ Paul made a massive impact on the BBC Rewind project and the results of his work can be seen in archive content now being published across many of the BBC’s broadcast services. He was the architect of the Rewind Portal, a new archive search tool which gives BBC journalists and programme-makers faster and easier access to the Corporation’s largely untapped digitised archive.
His great ability for design and problem solving combined with his natural understanding of good UX and an eye for aesthetics allowed him to produce excellent results when faced with this challenging and complex project. Paul’s approachable style meant he was able to engage effectively with users of the Rewind Portal, dealing with feedback in a constructive and intelligent way.
Always a pleasure to work with and highly talented, I would strongly recommend Paul's services to any project team.”
- First usable version of the Rewind Portal delivered within 1 month of start of engagement
- Over 2 million archive assets now available for searching and still growing
- Manual ordering of physical tapes becoming a thing of the past as more content is digitised and indexed
- Users extremely happy with the UX and the fact that their feedback is regularly built into releases
- Knowledge transfer to in-house engineering team on search technologies and good UX practices
- Several expensive legacy systems have been or are scheduled to be retired after being obsolesced by the Rewind Portal
- Early discussions around opening up the Rewind Portal to the general public
Read more here about the BBC Rewind project and its use of open source search technologies.