Philips Semiconductors' MELZONIC Chip - Technology Backgrounder
Many of the latest high-performance TV sets, particularly large screen and wide screen (16:9 format) versions, use a 100-Hz screen refresh rate instead of the normal 50-Hz refresh. This is primarily to avoid screen flicker, which becomes more perceptible and annoying to the viewer as the screen size increases. However, because standard TV broadcast transmissions only contain 50 fields per second, 100-Hz TVs must somehow double the number of fields available. They must also be able to do this in real time.
With the arrival of digital TVs and the availability of relatively cheap field memories (memories with the capacity to store a digitized version of a complete TV field), a very simple method of 50 Hz to 100 Hz field rate conversion became possible. Each TV field could be written once to a field memory at 50 fields per second, and read out from it twice at a field rate of 100 Hz. There are already TV sets on the market that use this technique. (It is interesting to note that the 60 Hz refresh rate used in the USA and some other parts of the world is sufficiently high to eliminate screen flicker. As a result, field-rate doubled TVs for these 60 Hz markets are far less common.)
While this simple field repetition technique is very effective at removing screen flicker, it does not remove the 'motion judder' that occurs with TV programmes derived from cine film (telecine transmissions). Motion judder is a direct consequence of the relatively low frame rate used by cine film. It occurs when the film is viewed directly at the cinema, or when it is viewed as a telecine transmission on any existing 50-Hz or 100-Hz TV. It is virtually absent from TV programmes recorded with a video camera.
Video cameras are an integral part of the TV broadcast system, and therefore scan an optical image at exactly the same rate at which standard TV sets display the image. For European TV systems this is 50 fields per second. If the image contains moving elements, each one of these fields represents a different movement phase - i.e. a field in which the spatial position of moving objects is different. Up-converting the TV scan from 50 fields per second to 100 fields per second using a simple field repetition technique results in each movement phase being displayed twice. Although this means that moving objects appear slightly displaced from their true spatio-temporal (time-space) positioning in the repeated movement phases, the effect is almost unnoticeable to the human eye. This is especially so if adjustments are made to maintain the correct odd-even-odd interlacing of sequential TV fields.
Although doubling the field rate by repeating each TV field results in displacement of moving objects from their true spatial positions, this is relatively unnoticeable to the viewer if the image was captured using a video camera.
However, the situation is entirely different for telecine transmissions. For historical (and largely electro-mechanical) reasons, standard cine cameras use only 24 movement phases per second (frames per second) to record moving pictures. It is this relatively low refresh rate that results in the effect known as 'motion judder'. (To avoid excessive motion judder, skilled film directors deliberately limit the speed of movements within a scene, although this is not always possible with modern fast-action movies.)
Cine film is normally converted for TV use by running it at 25 frames per second and then scanning each frame twice to achieve the 50-Hz field rate required for TV transmissions. As a result, telecine transmissions already contain adjacent pairs of identical fields. When up-converted to a field rate of 100 Hz using a simple field repetition algorithm, the result is sequential groups of four identical TV fields. Although screen flicker is eliminated, a significant amount of motion judder and picture blurring remains.
Field rate doubling of telecine material by field repetition results in three out of four fields displaying objects in the wrong positions, leaving motion judder and blurring uncorrected.
In practice, motion judder is the only major artefact that occurs in pictures displayed on a 100-Hz TV set. Its elimination promises pictures that are perceivably better than the original cine material.
Over the past four years, Philips Research Laboratories in Eindhoven, The Netherlands - a recognised world leader in vision processing systems - has developed an innovative 'motion compensated' field-rate conversion technique that overcomes this inherent problem with telecine material. By creating additional movement phases between those contained in the TV transmission, in which moving objects appear with their correct spatio-temporal positioning, this technique brings telecine material much closer to the quality of video camera material.
By replacing the TV transmission's repeated telecine fields with motion compensated fields, telecine material is brought much closer to the quality of video camera material.
This new technology displays cine films with considerably more natural motion than is achieved by standard 50-Hz or 100-Hz TV sets. It even produces better pictures than can be seen at the cinema.
From the outset, the need for ultimate realisation on VLSI silicon was a prime design requirement in this development work. Because of this requirement, the algorithm used to interpolate the new 'intermediate' movement phases from the movement phases contained in the TV transmission had to meet the following criteria:
- it must use the minimum number of field and line store memories, as these memories take up a considerable area of silicon.
- it must be able to achieve real-time interpolation of intermediate movement phases using a realistic level of computing power (number of operations per second).
- it must avoid the use of arithmetic operations which require complex hardware implementations.
After exhaustive investigation and computer simulation, researchers at Philips developed a totally new technique for motion estimation which they have called '3-D Recursive Search Block-Matching'. By analysing two successive TV fields to locate blocks of pixels in the second field that match blocks in the first, 3-D Recursive Search Block-Matching is able to assign a velocity vector to each block of pixels in the first field. These velocity vectors can then be used to interpolate the correct spatial position of each pixel block in a new field that is positioned temporally between the two original fields - i.e. to create new movement phases. Because 3-D Recursive Search Block-Matching requires several iterations before it converges to a correct result, methods were needed to speed up the convergence so that the necessary velocity vectors could be generated within the real time constraints imposed by the field rate. In the event that convergence is not reached within these time constraints, resulting in a seriously impaired vector field, the system must also be able to 'gracefully' degrade to a simple field repetition algorithm before artefacts that are more disturbing than motion judder appear in the TV picture.
One way of improving the convergence is to 'seed' the 3-D Recursive Search Block-Matching algorithm with prediction vectors derived from spatially adjacent pixel blocks. Essentially, it assumes that areas of the picture (large compared to the pixel block size) will be moving in approximately the same direction. An obvious exception to this is the boundary between moving and stationary parts of the picture, or between parts of the picture that are moving in opposite directions. Failure to converge on the correct motion vectors either side of these boundaries is overcome by simultaneously seeding the algorithm with vectors previously derived for pixel blocks either side of the pixel block being evaluated (i.e. by attempting convergence in opposite directions). The correct direction is the direction which converges first. The vectors are derived from spatially related pixel blocks . They are specifically chosen because the way in which the TV raster scans from the top to the bottom of the screen means that vectors for these blocks will have been evaluated before they are required as prediction vectors for the pixel block currently being evaluated.
Pixel blocks from which 'candidate' vectors are taken to speed up convergence in the 3-D Recursive Search Block-Matcher.
In addition to these spatial prediction vectors which are derived from the same field, the 3-D Recursive Search Block-Matching algorithm is also seeded with temporal prediction vectors derived from the previous field. By taking these temporal prediction vectors from the pixel blocks, they act as a 'look ahead' in the convergence direction of the spatial prediction vectors. In practice, this scheme is further modified by adding two further 'candidate' vectors - the zero vector (to take account of the fact that a pixel block may not have moved) and a vector obtained by adding a random vector to the spatial prediction vector. The 3-D Recursive Search Block-Matching algorithm determines which of these candidate vectors provides the best match between the pixel block being evaluated and a pixel block in the previous field. The chosen vector is then associated with the pixel block by being stored at an appropriate location in a 'vector field' memory.
In order to create the 'motion compensated' intermediate fields required to improve telecine transmissions, the vectors in the vector field memory are used to shift each pixel block in the previous TV field to a new position half way between its position in the previous field and its position in the next field.
For telecine transmissions these motion compensated fields replace the repeated fields that are produced as a result of the cine film scanning technique used prior to broadcasting, and the field rate is then up-converted from 50 Hz to 100 Hz by an interlace corrected field repetition process. For video camera material, however, the motion compensated fields are inserted between those in the original TV transmission - providing motion compensated frame rate conversion right up to 100 Hz.
The motion compensator which creates and inserts the new fields must therefore be able to determine whether telecine or video camera material is being received by the TV set. For telecine transmissions it must also be able to detect which fields are repeated fields and which fields are true movement phases (i.e. original film frames). Fortunately these conditions can easily be detected by examining the output of the 3-D Recursive Search Block-Matcher. For video camera material the block matcher continuously produces a vector field containing zero vectors for static pictures (such as test cards) or non-zero vectors for moving pictures. For telecine material the vector field alternates between zero-vectors for repeated fields and non-zero vectors for fields that represent true movement phases. The contents of the vector field can therefore be used to switch the motion compensator into the correct mode of operation and to detect the true movement phases in telecine material.
Examination of the vector field for temporal and spatial smoothness is also a good measure of the quality of the motion vectors. This smoothness, or lack of it, can therefore be used to decide if it is necessary to switch off the motion compensation and revert to a simple field-repetition algorithm. Although rare in normal program material, situations will arise where the velocity field is degraded to the point where visually annoying artefacts would otherwise appear in the picture.
The silicon realisation of Philips' natural motion technology takes the form of the SAA4991WP MELZONIC Video Signal Processor. Its internal architecture, illustrated in Figure 5, includes sub-processors for motion estimation (the 3-D Recursive Search Block-Matcher), vector field storage and processing, and motion compensation (interpolation of intermediate fields). A fourth 'top-level' processor synchronises the data flow between the sub-processors and peripheral functions such as input and output data formatting and I/O operations. The SAA4991WP also includes noise reduction and vertical zoom functions, as these can be implemented relatively easily using the field and line memory architecture required for motion estimation and compensation.
Meeting the real-time constraints of the motion estimation and compensation process leads to some impressive performance figures for this IC. Total on-chip processing, for example, amounts to a computing power of 10 Giga operations per second (GOPS) and the overall memory bandwidth achieved is 25 Gbit/s. To achieve acceptable silicon area and power dissipation, four different types of SRAM and DRAM memory are implemented on-chip. The SAA4991WP contains nearly one million transistors, yet it dissipates only 1.8 W of power at a clock frequency of 33 Mhz.
The architecture of the SAA4991WP Video Signal Processor
The SAA4991WP's development cycle was also impressive. The entire design process, starting with a behavioural description and ending up with working silicon, was completed in under a year. Philips' Phideo toolset was used to generate a register transfer level (RTL) description from the behavioural model, and the RTL description was then translated to a gate level netlist using standard logic synthesis tools. The behavioural synthesiser identified the clock cycles at which each arithmetic or logical operation had to be performed and from this information the memories required to store intermediate results were inferred. A top-level processor was then synthesised to control and synchronise the overall architecture.
