Get Video Editing Tips, Tricks, and Guides Straight to Your Inbox
Brightness and bit depth can be complex areas of the editing workflow to master, especially for those without formal training. This guide will give you all the information you need (and some of the more advanced information too) to help create perfectly balanced, pro-level media content.
Since the dawn of television, we've used an increasing signal level to represent higher brightness. Whether that's been a bigger voltage on a wire, a bigger radio signal over the air, or a bigger number in a computer file, more signal has meant more light. Even so, the way those values compare to the amount of light that went into the lens, or the amount of light that comes out of the monitor, is often not simple. That's where phrases like gamma correction, log, linear, and all the related terminology of modern video come in.
The term brightness refers to the apparent intensity of light as viewed by the human visual system. Since both the light on a film set and the light coming out of a monitor will be viewed by human eyes, or by a camera designed to simulate the human eye, what the systems between them are encoding will eventually be perceived as brightness.
For simplicity, we use the term brightness, although in many of the situations, we'll discuss here, luminance is also correct.
Bit depth relates to colour, rather than brightness, representing the number of bits used to indicate the colour of a single pixel, or the number of bits used for each colour component of a single pixel.
The term luminance refers to a scientific measurement of a quantity of light, in candela per square metre, which is generally somewhat related to how bright something looks to humans. So, luminance is approximately related to brightness in a lot of situations, although the complexities of artistic and technical processing in the camera, and the variability of the human eye and brain, make things complicated.
Another confusingly-similar term, luma, refers specifically to the separated monochromatic portion of a component video image and is related to the encoding of brightness only as part of a compression codec.
Since the earliest computer imaging systems, we have used eight bits to represent the brightness of one red, green or blue component of a pixel. Most of us are familiar with the idea that 8 bits can represent 256 unique numbers, often called code values. In some situations, even with gamma encoding, that isn't enough. Where things go wrong, we see hard edges where there should be an even graduation of tone. The effect is sometimes called banding or posterisation, but more properly quantisation noise. The simple solution is to use more numbers, by adding more bits. While 8-bit images can give reasonable results with care, passing a client's quality control checks is easier with ten, which can describe up to 1024 brightness levels.
Images using more than ten bits are generally used either in a post production workstation to provide lots of working space in grading, or in proprietary in-camera recording of high-quality camera original material.
Adding one more bit doubles the range of values, since we can have all the old values with the new bit on, and all the old values again with the new bit off. Ten-bit images add two more bits to the 8 bit image, so we double the range of available values to 512, then again to 1024.
The generalised formula is 2n, where n is the bit depth; 210 = 1024. Floating point images, often in 32 bit precision, use different techniques to deal with enormous extremes of brightness. They're usually found only in the memory of a post production workstation, where they provide enormous headroom for grading and effects work.
Not all applications use the entire code value range for image data. In computer imaging, the whole range is generally used, with 0 representing black, and 255 representing the maximum signal level (or 0 and 1024 in 10-bit). This is sometimes called full swing, and contasts with studio swing pictures which treat code value 16 as black and 235 as white (64 and 940 in 10-bit).
The reasons behind this are historic and complex, dating back to the days of analogue video, and there isn't always an automatic way of determining which of these is in use in a particular situation. Problems with black levels being lifted, creating a greyish picture, or shadow or highlight details being clipped or crushed hint at problems with studio swing material being misinterpreted as full swing, or vice versa.
The way we record brightness was not simple from the earliest days of television, mainly because our eyes do not react to light as we might expect. Shine a hundred lux of light at a subject, and it looks well-lit. Add another 100 lux, for a total of 200 lux, and the scene looks a lot brighter. However, add another hundred, for a total of 300, and the increase in brightness doesn't look so big, even though it's actually the same 100-lux increase.
This demonstrates that our eyes don't have a one-to-one response to light; they're nonlinear. For film and TV productions to look good, that's usually taken into account in a process called gamma encoding, so we don't waste lots of signal range encoding tiny changes in light level in very bright areas of the picture – tiny changes that our eyes can't see.
An interesting coincidence allowed early engineers to avoid some complexity in home television receivers. When TV pictures were first sent by radio, noise caused by radio and other electronic interference was most visible in the darkest parts of the image, a situation made worse by the human eye's greater contrast sensitivity in darker areas of a scene. So, the signal was processed before transmission to have brighter shadows, and made normal again in the receiving TV, reducing the noise significantly.
One of the great coincidences of broadcast television engineering is that the behaviour of the cathode ray tube in classical television receivers happened to almost exactly reverse the brightening of shadows in the camera, so no extra electronics was required in TVs to implement gamma processing. Modern TFT-LCD and OLED displays behave very differently, but the processing electronics to simulate the old approach is now trivial.
For most of the history of TV, there was only one or two ways of handling this situation - one or two ways to encode brightness. Modern cameras, though, have introduced manufacturer-specific brightness encodings designed to make better use of the particular characteristics of that camera's sensor. At the same time, updated ways to send TV to viewers in the home, particularly including HDR, have driven the development of even more ways to describe just how bright a pixel on the screen should be. There are now often several per camera manufacturer, and equipment including displays and grading software must often be told what to expect, or the picture will look too bright, too dark or with incorrect contrast.
The term gamma comes from the lowercase Greek letter, γ, used in the mathematical equation representing the in-camera picture processing of gamma correction. In this equation, the recorded brightness value L is raised to the power gamma, Lout = Linγ, creating an exponential curve. Few actual "gamma" processing devices actually work in such a literal manner, although the mathematics they use generally approximate the shape of an exponential curve.
Not all imaging systems use gamma correction, sometimes being termed linear. If we use 16 bits, we can choose from 65,535 brightness levels. That's sufficient that we can avoid quantisation noise even without gamma correction, although the storage space required, whether that's in a file or an image held in computer memory, is double what it would be for an 8-bit equivalent.
Specific implementation details of individual cameras affect how strictly linear a recording really is. Imaging sensors are generally more linear than the human eye, though other engineering details of cameras can effect how literally linear an image can be. In most cases, though, there will be camera-specific workflow requirements, with plugins supplied by the manufacturer to handle the camera's material. Common examples include the raw formats from many cameras, which may use linear encoding or something designed to work in the same way. Where a manufacturer wishes to express a fixed relationship between the amount of light which hits the sensor and the numbers in the file, the data may be referred to as linear light.
The term "dynamic range" can cause confusion in that it commonly applies to several distinct concepts. All of them concern the difference between maximum and minimum brightness values in different situations.
Dynamic range is an important figure of merit in cameras, where it describes the ratio of brightness between a very dark shadow that the camera might see as black and a bright highlight it would see as the brightest possible white. More dymamic range is better, based largely on cultural memories of photochemical film which handled a large range and failed gracefully particularly at the top end of that range, minimising any unnatural appearance of overexposed highlights.
Camera dynamic range is invariably expressed in f-stops, each of which represents a successive doubling of light intensity. If a scene illuminated under 800 lux of light is well exposed with the lens set to f/8, it should be equally well exposed at f/5.6 under 400 lux of light. Halving the light requires reducing the f-stop by one step. In the same way, that same scene might be well exposed at f/4 under 200 lux. F-stops closely matches what looks to the human eye like evenly-spaced steps in exposure.
That demonstrates the nonlinearity of the human eye (see Gamma encoding) and explains f-stop numbering, which can seem arbitrary - the numbers are related to the diameter of the aperture as its area doubles.
The dynamic range of displays - the difference between the darkest and brightest pixels they can create - is invariably a lot less than the real world. Displays can also generally show much less than most cameras can record, which is why the image from many cameras, recorded in their high dynamic range or log modes (see Log Style Camera Standards), appears flat and dull when viewed on a conventional monitor.
The bright highlights the camera can perceive are reduced in brightness to the maximum the display can represent, which is not nearly as bright as they were in reality. Part of the job of the colourist, then, is to manipulate the brightness of the high dynamic range recorded image so that it looks subjectively reasonable on the much lower dynamic range display.
The absolute brightest light emitted by a monitor was not standardised until the adoption of ITU-R standard BT.1886 in March 2011. Until then, the actual brightness of a monitor was defined largely by common practice and the behaviour of the available technology (the falloff in brightness from that arbitrary brightest point to black was, however, well standardised).
Now, the standard for most home TVs recommends a brightness a little over 100 candela per square metre, which is designed for much darker viewing environments than most people actually use, especially in the home. Many TVs are three times that bright, and computer monitors, designed for the brightness of an office, maybe even more powerful, and yet still not thought of as HDR displays.
Digital images themselves have a dynamic range, which sometimes causes confusion. In an 8-bit file, a value of 1 represents the minimum grey level which is distinguishable from black, while a value of 255 represents the brightest white. It might seem that this means that the brightest part of the image can only be 255 times brighter than the darkest part, although the use of gamma encoding means that isn't always quite true when the image hits a monitor - the extremes of contrast are likely to be stretched out, although if we do that too much, we risk quantisation noise (see bit depth).
A physicist might prefer to describe dynamic range in decibels, which makes things easier to compare. An with 8-bit file theoretically has a dynamic range of about 48dB, though gamma encoding can change how that looks when it's displayed.
Comparatively, a common cinema camera might have 13 photographic f-stops of dynamic range, equivalent to perhaps 78dB. Notice that f-stops on lenses (see Cameras) and bits in a digital file (see Bit Depth) both refer to successively doubling quantities. For a rough rule of thumb, multiply bit depth or f-stop range by 6 to estimate dynamic range in decibels.
While a camera or display manufacturer could choose any reasonable method of relating the numbers in a digital image to the brightness of the picture, some standards have become particularly well known.
The proliferation of standards in this area is a phenomenon of the last decade or two, when new technologies have emerged alongside the long-established standards of the mid to late twentieth century.
By far the most common target for modern production is the oft-mentioned "Rec. 709." The International Telecommunication Union's Recommendation BT.1886 is almost a supplement to 709, and the two standards together describe how brightness is handled in modern HD television broadcast.
Slightly reduced brightness of shadow areas was specified to reduce the visibility of noise in historic cameras and because the anticipated viewing environment of common televisions was intended to be reasonably dark. Targeting only a 709 finish keeps things simple, although clients increasingly demand more, particularly HDR.
Displays on computers, and associated devices such as tablets and cellphones, often use the sRGB standard. Like ITU Recommendation BT.709. sRGB is numerically similar to 709, but it was developed with noise-free computer displays and brighter office environments in mind, and lacks the slightly crushed shadows of the 709 gamma curve.
Shadow handling differs in some common television standards (see sRGB), so material finished on a display built to implement Rec. 709 (see ITU Recommendation BT.1886) may seem to have lifted, greyish shadows on a 709 display. This error is sometimes ignored without catastrophic problems, but is not strictly accurate.