by Bruce Devlin Issue 104 - August 2015
Its not uncommon to see ability to insert/extract captions/subtitles as part of a tender requirement or product specification. But in a file-based world, what does this mean and where are we inserting/extracting to/from?
For this article, well define captions and subtitles as text overlay that the user turns on or off, where captions refers primarily to US-style closed captions and subtitles to the European/Australasian equivalents.
Lets start by looking at the very end of the signal chain the consumer display. In the analogue SD world this was fairly simple because captions came in data packets (as defined by EIA-608) and subtitles were sent as WST (World Standard Teletext). The receiver in the TV set would decode the words and render them on to the screen.
This data took advantage of space in the VBI, Vertical Blanking Interval a part of the signal used for timing in old CRT displays. The advent of digital television, digital video broadcasting (DVB) and HD, broadened the number of options significantly, with more space in the signal for caption/subtitle data, bringing about Free TV Australias OP47 and EIA-708 in addition to DVB Bitmaps and other methods to transmit caption and subtitle data through a facility to the consumers display.
Until fairly recently, the caption/subtitle chain was commonly kept separate from the audio/video up to the broadcast head-end where it was injected into the appropriate place by a caption server during broadcast. However, the growing demand for multi-platform delivery, catch-up services and international versioning have brought captions and subtitles back up the audio and video workflow chain and into the file domain.
In the caption server, captions and subtitles were stored in authoring formats with SCC and STL respectively becoming the de facto interchange standards. There are many other old, proprietary less well known formats, and, more recently, variations of TTML (Timed-Text Mark-up Language) as defined by W3C, SMPTE and the EBU are becoming common to ease standardised interchange. SCC, STL and TTML therefore, typically, represent the bulk of upstream insert/extract captions operation.
The downstream side is less constrained. In HD, most carriage will be in the ancillary data packets. For MXF, the storage of ANC data is now well defined by the SMPTE ST436 specification, providing a solid framework for caption/subtitle workflows. Beyond MXF, formats like ProRes have proprietary methods to store (US) captions while legacy files may use other specifications.
A significant challenge today is in understanding archived legacy SD files. For IMX files, VBI is encoded in the video stream and caption/subtitle data is therefore preserved. For other formats it might be less obvious. ST436 and VAUX element of DV files are valid homes for timed text but sporadically used, whilst proprietary VBI files, proprietary MPEG headers and A53 user data are also known to have been used for the storage of subtitles and captions.
In short, if youre a); writing a tender that includes subtitle preservation or processing or b); writing a product specification that includes captions, make sure you know where to stick them!