Ask the Experts on Storage and Archive

Bob Pank#

Author: Bob Pank#

Published 1st October 2011


What is the most important thing to consider when choosing storage for content creation applications?
The single most important aspect to consider when choosing a storage device for content creation is workflow. It is critical to have made a decision with regards to the capture device, the video format to be used and how the finished project will be delivered. This will in turn help you decide on the host interface for the storage system, the required storage capacity, whether you need a RAID protected device or not, if you need to hold content for an extended period of time, and what the best medium is to archive the content on.
How do I calculate the capacity and performance I need from my storage?
There is a wide range of capture devices capable of acquiring data at HD resolutions such as consumer camcorders, DSLR cameras, HD video cameras and 35mm Digital Cinema Cameras. All of these devices use a codec (Compression / Decompression) which is simply an algorithm that shrinks large movie files to a more manageable size, and also makes them playable on a computer. For example, an uncompressed 1080i 10-bit file would have a data rate of 165MB/sec, which is difficult to work with. A codec such as Apple ProRes422 HQ is only 32MB/Sec, REDCODE 42 is 12MB/Sec and AVCHD is 3MB/Sec, reducing both performance and storage requirements. It is also important to consider the number of layers of video being worked with simultaneously. Two layers of ProRes422HQ requires 64MB/Sec of bandwidth, as well as a higher load on the host and hard disks.
Once you have established the capture device and the codec that the data will be acquired in, it is an easy process to extrapolate from the MB/Sec figure to the total storage required. For example, ten hours of ProResHQ 442 would require a storage capacity of approximately 1.12TB. Always keep in mind that a hard disk storage system, no matter what manufacture, will suffer from performance degradation if the storage volume is filled to near capacity. It is also important to make sure that the storage device you choose is capable of sustaining the data rate and I/O required for your chosen codec as failure to do so will lead to a more time consuming editing process or, at worst, the inability to edit the material. There are three key factors that determine the performance of the overall system: the power of the edit workstation, namely the CPU and RAM; the interface to the storage system; and the processing power of the storage system itself, including the number of disks in the system.
What host interface do I need on the storage?
There is a wide range of interfaces on the market from the mass market FireWire 800 (800Mb/sec), USB 2.0 (480Mb/sec), eSATA (3Gb/sec) and the new Thunderbolt (10Gb/sec) connections that provide ‘plug and play’ functionality, to enterprise connectivity such as 8Gb Fibre Channel (8Gb/sec) and 6Gb SAS - Serial Attached SCSI (24Gb/sec) that require a host interface adapter for computers. In general the plug and play interfaces offer a lower performance but are easier to use. The exception to this is the new Thunderbolt interface that provides a staggering 10Gb/sec host interface creating 500-800MB/sec desktop RAIDs that are currently available. As a general rule of thumb the plug and play connectivity types are great for codecs with low data rate requirements and direct attached environments, one storage device, one workstation (or laptop), and a single editor. Connectivity such as Fibre Channel (FC) is generally deployed in larger collaborative workflows with multiple workstations accessing a large amount of centralised storage, which is known as a Storage Area Network (SAN).
A note of caution with regards to interface; just because the interface is capable of making a high performance connection does not mean that the storage device will be capable of providing that amount of performance to the host. The performance of a solution is highly dependent on the number of hard disks in the system. Note to people thinking that a single HDD with a Thunderbolt interface will be the answer to all!
NB do not confuse your megabytes (MB) and megabits (Mb), a megabit is 1/8 of a megabyte and the same goes for gigabytes (GB) and gigabits (Gb)!
What is RAID and do I need it?
RAID is the “Redundant Array of Independent (or Inexpensive) Disks”. If you have been using hard disks for any length of time then I would almost certainly think that you would have had experience of a hard drive failure and as a result might have even lost data. One of the benefits of RAID is that it can be used to protect data.
Common RAID levels are RAID 0, RAID1, RAID 5, and RAID 6, and they all have different properties that give them relative advantages and disadvantages. RAID 0, sometimes referred to as a ‘stripe’, is when data is written across two or more disks. The more disks the data is written over, the more performance gained. However, if one disk fails then the data on the RAID is lost.
RAID 1 is sometimes referred to as a ‘mirror’ and in this process the same data is written to two disks simultaneously, if one of the disks were to fail the data is still intact and readable. RAID 1 provides security but with a performance penalty.
RAID 5 and 6 brings security and performance together, by writing data across three or more disks simultaneously in a similar manner as RAID 0, but it also employs a clever feature called RAID Parity. Parity information is data generated by the RAID engine about the data being written to the RAID set. If a disk fails within the RAID set, the RAID engine can re-create the missing data. RAID 5 incorporates a single layer of parity and can therefore tolerate a single disk failure, while RAID 6 features dual parity information so it can tolerate two disk failures within the RAID set. RAID 5 is now the defacto standard for multi-drive RAID solutions. For example, 8 x 1TB disks in RAID 5 is 7TB and in RAID 6 is 6TB (excluding formatting overhead).
What about shared storage?
Shared storage can be immensely helpful in increasing productivity, allowing multiple users to collaborate on the same projects. Shared storage can also bring benefits in managing media as the media is centrally stored and can be easily managed. There are typically two methods for providing facility-wide shared storage. The first is via a SAN and the second is via Networked Attached Storage (NAS). SAN solutions typically uses FC as a connectivity type as this provides the highest performance access to the centralised storage at a block-level, in other words it tricks the server or workstation (hosts) into thinking it has a single high capacity local disk attached. You are probably familiar with SAN solutions such as Apple XSAN, Tiger Technology metaSAN and Active Storage ActiveSAN. NAS uses ethernet to connect to a centralised storage device at a file level, files copied to and from the NAS are broken up and wrapped up into TCP/IP packets and sent over the network. In general NAS solutions provide lower cost, lower performance storage that is easier to install and manage than a SAN. In addition a SAN requires optical cabling, host adapters, and SAN management software. The advantage of a SAN is high performance, scalability and Quality of Service (QoS) – or the ability to guarantee certain levels of performance to individual hosts. QoS is something more difficult to achieve with a NAS. Other connectivity types for a SAN include iSCSI, FCoE and Infiniband.
I have a tapeless workflow, so how do I keep my media long term?
Creative professionals commonly use FireWire-connected desktop storage devices such as those from Glyph Production Technologies and G-Technology. These are inexpensive and easy to deploy, handle ProResHQ editing and as a result it is becoming common practice for these drives to be used to store media long term as well.
Using hard disks to store data long term is not a good idea, like any mechanical device it will break. When hard disks are powered on they are constantly checking data on the disk and performing error correction. During long periods of inactivity the integrity of the data on the disk is not being checked and corrected if necessary which can lead to degradation of the data on the disk. Additionally inactivity of the disk can often lead to mechanical problems that can cause the disk to fail. RAID systems can be used to keep data on a long term, but there is associated cost in power, cooling and maintenance and the disks at some point will still fail. A note for using SSD for long time storage is that it is still expensive, and likewise if a memory chip fails your data is gone.
So what is the alternative? Tape. To be precise, digital data tape in the form of Linear Tape-Open (LTO). LTO is now in its 5th generation with a native (uncompressed) capacity of 1.5TB per tape and read/write speed of 140MB/sec and a footprint of approximately 10 x 10 x 2 cm. LTO-5 media is very cost effective at under £50 per tape, and other advantages include the robustness and the lifespan of the media (approximately 30 years). The future for LTO is ensured with a road map stretching to LTO-8 at a massive 12.8TB per tape and a read/write speed of 472MB/sec.
Cache-A manufactures cost-effective LTO-5 based archive appliances that make writing to and retrieving from tape easy for photographers, videographers, postproduction and broadcast applications. Cache-A delivers small footprint solutions connected via USB, Firewire and Ethernet with a built-in searchable database to locate which tape the files are on and the location of the tape.
It is possible to scale the idea behind Cache A, by the combination of industry leading tape library manufacturers such as Spectra Logic and software archive products from Atempo and Xendata. The combination of such software and hardware provides highly scalable and efficient archive solutions that require a fraction of the amount of electricity to run, cool and maintain, making the solution a greener and more cost-effective long term archive than disks based systems. It is common to combine high performance disks systems with archive solutions to provide a balanced storage architecture.
In summary, keep your assets safe - if you only have one copy you risk everything. No data, no business.
Global Distribution is a value-added, specialist distributor with a wealth of knowledge and experience in providing storage and infrastructure solutions for data-intensive computing environments within the audio/video, broadcast, CCTV and high-performance computing (HPC) markets, across the UK and EMEA.
www.globaldistribution.com

Related Articles

Related News

Related Videos

© KitPlus (tv-bay limited). All trademarks recognised. Reproduction of this content is strictly prohibited without written consent.