Menu English Ukrainian russian Home

Free technical library for hobbyists and professionals Free technical library


BOOKS AND ARTICLES
Free library / Directory / video art

MPEG-2 and non-linear video editing. Just about complex

video art

Directory / video art

Comments on the article Comments on the article

Recently, among professionals in the video world, you can increasingly hear talk about MPEG-2 encoding. Attention to it today is growing almost faster than the actual expansion of the scope of its practical application. Indeed, are we interested in an effective compression algorithm in non-linear editing and video production, are we thinking about creating our own DVD films or digital video archives, are we analyzing the principles of Betacam SX recording or data storage standards on video servers, and finally, are we discussing the features of digital transmission of programs in cable and satellite television, everywhere we come across the mention of MPEG-2. From the above enumeration it is clear that this algorithm is many-sided and multifaceted, so experts from different fields, when talking about MPEG, sometimes think of different things. But, in fact, it is not so complicated as not to understand its basic principles. So let's figure it out.

Basic concepts

Let me remind you that MPEG is an abbreviation for the Moving Pictures Experts Group, the name of the committee for standardizing methods for digital compression of video data streams of the international organization ISO / IEC (International Standards Organization / International Electrotechnical Commission). Initially, the task of the committee was to develop a format for storing and playing audio / video data from CD-ROMs. As a result, the MPEG-1 standard was created, focused on low-speed (about 1 Mbit / s) information transmission channels and limited to a frame resolution of 352 x 288 (for a PAL signal). Then, as the tasks of video transmission expanded, the bandwidth of channels increased and the requirements for the visual quality of the resulting images grew, MPEG-2, MPEG-4 and even MPEG-7 optimized for special conditions appeared.

So, MPEG-4 is intended mainly for digital transmission of video data over telephone lines (Internet, videoconferencing) in conditions of severely limited bandwidth (typically 28,8 Kbps), and therefore reduces the resolution by another four times - up to 176 x 144 , but uses the most advanced coding scheme with separation of the image into such independent objects as background, text, 2D / 3D graphics, "talking" human faces, moving bodies, etc. But due to the obvious complexity, this standard has not yet received a practical implementation.

As for MPEG-2, it was originally aimed at solving the problem of transmitting television images. Each of us knows from our own experience that the quality of the picture seen on TV can be very different. It's one thing to watch a movie played on your home VCR or broadcast on your local cable TV, and quite another to enjoy video from a DVD or satellite channel. MPEG-2, as defined in ISO/IEC 13818-2, is a family of mutually consistent and top-down compatible digital television compression standards. More precisely, he allows 4 level (Levels) frame resolution and 5 basic profiles (Profiles) coding of luminance and chrominance signals.

Levels: low LL (Low Level) with 352 x 288 frame resolution (MPEG-1 compliant), basic ML (Main Level) 720 x 576 high HL-1440 (High Level) 1440 x 1152 and high HL-1920 1920 x x 1152. Note that if, according to Recommendation ITU-R BT.601 (International Telecommunications Union - Recommendation), the main level determines the resolution of a standard television frame, then the high levels are focused on high-definition television.

Profiles: simple SP (Simple Profile), basic MP (Main Profile), two scalable - by signal-to-noise ratio SNR Scalable Profile and by permission Spatially Scalable Profile and finally high HP (high profile). An important place is also occupied by the so-called main-professional profile, which is not established by the standard, but is actively used in practice, or, in other words, MPEG 422 profile. It is designated as 422P. If everything is quite simple with the levels, then in order to understand the differences in the profiles, some preliminary explanations are required.

Some theory

Effective compression of video information is based on two main ideas: the suppression of small details of the spatial distribution of individual frames that are not essential for visual perception and the elimination of temporal redundancy in the sequence of these frames. Hence the concept of spatial and temporal compression.

The first of them uses the experimentally established low sensitivity of human perception to distortions of small image details. The eye quickly notices the heterogeneity of a uniform background than the curvature of a thin border or a change in the brightness and color of a small area. In mathematics, two equivalent representations of an image are known: the usual spatial distribution of brightness and color, and the so-called frequency distribution associated with the spatial discrete cosine transform (DCT). In theory, they are equivalent and reversible, but they retain information about the image structure in completely different ways: the transmission of smooth changes in the background is provided by low-frequency (central) values ​​of the frequency distribution, and high-frequency coefficients are responsible for fine details of the spatial distribution.

This allows the following compression algorithm to be used. The frame is divided into 16 x 16 blocks (720 x 576 corresponds to 45 x 36 blocks), each of which translates the DCT into the frequency domain. Then the corresponding frequency coefficients are subjected to quantization (rounding values ​​with a specified interval). If the DCT itself does not lead to data loss, then the quantization of the coefficients inevitably causes coarsening of the image. The quantization operation is performed with a variable interval - low-frequency information is transmitted most accurately, while many high-frequency coefficients take on zero values. This provides a significant compression of the data stream, but leads to a decrease in effective resolution and the possible appearance of minor false details (in particular, at block boundaries). Obviously, the coarser the quantization used, the greater the compression ratio, but the lower the quality of the resulting signal.

Let me remind you that this algorithm came from digital photography, where it was developed under the name JPEG for efficient compression of individual frames (JPEG is the abbreviation of the name of the international association that approved it, the Joint Photographic Experts Group). Then it was successfully applied to video sequences of frames (while each of them is processed completely independently) and received the new name M-JPEG (Motion-JPEG). It should also be noted that the DV encoding of DV/DVCAM/DVCPRO digital standards is fundamentally based on the same algorithm, but uses a more flexible scheme with adaptive selection of quantization tables. Unlike M-JPEG, the compression ratio for different blocks changes according to the image: for blocks with little information (for example, at the edges of the image), it increases, and for blocks with a large number of small details, it decreases relative to the average level in the image. As a result, with the same quality, a reduction in the amount of data by about 15% is achieved (or vice versa - with the same stream, the quality of the output signal is higher).

MPEG temporal compression uses high information redundancy in images separated by a small interval. Indeed, only a small part of the scene usually changes between adjacent images - for example, a small object moves smoothly against a fixed background. In this case, complete information about the scene needs to be saved only selectively - for reference images. For the rest, it is sufficient to transmit only differential information: about the position of the object, the direction and magnitude of its displacement, new background elements (which open up behind the object as it moves). Moreover, these differences can be formed not only in comparison with previous images, but also with subsequent ones (because it is in them that, as the object moves, a part of the background that was previously hidden behind the object is revealed). The most mathematically complex element is the search for blocks that shift, but little change in structure (16 x 16) and determine the corresponding vectors of their displacement. However, this element is the most significant, as it allows you to significantly reduce the amount of information required. It is the efficiency of real-time execution of this "intelligent" element that distinguishes various MPEG encoders.

Thus, three types of frames are fundamentally formed in MPEG encoding: I (Intra) - acting as reference frames and preserving the full amount of information about the image structure; P (Predictive) - carrying information about changes in the structure of the image compared to the previous frame (types I or P); B (Bi-directional) - retaining only the most significant part of the information about the differences from the previous and subsequent images (only I or P). The concept of subsequent compression of I-frames, as well as differential P- and B-frames, is similar to M-JPEG, but, like in DV, with adaptive adjustment of quantization tables. In particular, this makes it possible to characterize a DV signal as a special case of an MPEG sequence of I-frames with a given fixed stream (compression ratio). Sequences of I-, P-, B-frames are combined into groups of frames fixed in length and structure - GOP (Group of Pictures). Each GOP necessarily begins with I and contains P-frames at regular intervals. Its structure is described as M/N, where M is the total number of frames in the group, and N is the interval between P-frames. Thus, a typical Video-CD and DVD IPB group 15/3 looks like this: IBBBPBBPBBPBBPBB. Here, each B-frame is restored from the P-frames surrounding it (at the beginning and end of the group - from I and P), and in turn, each P-frame from the previous P- (or I-) frame. At the same time, I-frames are self-sufficient and can be restored independently of others, but they are reference for all P- and even more so B-frames of the group. Therefore, I and P have the least degree of compression, while B has the greatest. It has been established that the size of a typical P-frame is 1/3, and B - 1/8 of I.

As a result, an IPPP MPEG sequence (GOP 4/1) provides a twofold reduction in the required data stream (with the same quality) compared to a sequence of only I-frames, and the use of GOP 15/3 allows four times compression to be achieved.

MPEG-2 profiles

Now we have the right to return to the description of various profiles. In a simple SP profile, only motion compensation and prediction in one direction (P-frames) are performed. In the main profile, MP prediction is performed in two directions, i.e., B-frames are allowed. In scalable profiles, the original digital video data stream is divided into several parts according to various criteria. In the signal-to-noise ratio SNR Scalable Profile, the stream is divided into two parts. The first of them - the main signal - carries information with a reduced signal-to-noise ratio (rougher sampling). But this part is protected by an algorithm that is more resistant to transmission noise (and, accordingly, requires more bits), is received in strong noise, and even under adverse conditions, restores a TV image (albeit with a reduced signal-to-noise ratio). The less protected second part - the so-called additional signal - is simply discarded during unstable reception. With stable reception, it allows you to supplement the main signal and increase the signal-to-noise ratio to its original value.

The Spatially Scalable Profile further complicates the encoding scheme. In it, the flow will already be divided into three parts - according to the criterion of permission. The first part, the main signal, provides noise-resistant information about a standard-resolution image (625 lines, 576 of which are active). The second part completes the information to a high-definition image (1250 lines, 1152 active). Well, decoding the third signal allows you to increase the signal-to-noise ratio.

The fifth HP profile - the highest one - includes all the functions of the previous ones, but uses the YUV representation not 4:2:0, but 4:2:2, i.e. it transmits color difference signals twice as often (in each line, in each line element).

Here again an explanation is needed. It is known that a television signal is a combination of a luminance signal Y and two color difference signals U and V. Variations in their values ​​allow 256 gradations (from 0 to 255 for Y and from -128 to 127 for U / V), which in binary terms corresponds to 8 bits, or 1 byte. Theoretically, each frame element has its own YUV values, i.e. requires 3 bytes. This representation, where both luma and chrominance have an equal number of independent values, is commonly referred to as 4:4:4. But the human visual system is less sensitive to color spatial changes than to brightness ones. And without a visible loss of quality, the number of color samples in each line can be halved. It is this representation, referred to as 4:2:2, that has been adopted in broadcast television. In this case, to transmit the full value of the television signal in each frame sample, 2 bytes are sufficient (alternating independent values ​​of U and V through the sample). Moreover, for the purposes of consumer video, it is acceptable to halve the vertical color resolution as well, i.e. move to a 4:2:0 representation. This reduces the reported number of bytes per sample to 1,5. Note that this is the representation that was incorporated into the DV-format of digital cameras, as well as into the DVD-video format.

However, in professional tasks of digital video editing and editing, when multiple and multi-layered use of footage fragments and the inclusion of computer graphics in it is possible, higher quality of digital video is initially required to avoid the resulting accumulation of errors. Therefore, the presentation of 4:2:2 is considered mandatory here. This is what distinguishes the 422P profile from the main one. In table. 1 summarizes the differences of all the described profiles.

Table 1

Features /  Simple  Primary  422P scale- Spatial Tall 
  Profile (SP) (MP)   rummable scalable (HP)
I-frames - - - - -
P-frames - - - - - -
B-frames   - - - - -
Separation by SNR       - - -
Separation by resolution         - -
YUV representation 4:2:0 4:2:0 4:2:2 4:2:0 4:2:0 4:2:2

Audio compression

So far, we have only dealt with image compression. But a full-fledged video also implies a sound component. CD-quality audio is considered to require 44,1 kHz digitization at 16 bits per channel, which equates to 706 Kbps per channel (1,4 Mbps for stereo). The DAT quality of the signal defines a sampling frequency of 48 kHz (bandwidth 4-24 Hz) and increases the bit rate to 000 Kbps per channel. The approach to compressing information is the same - discarding a part that is not very important for perception by the human ear. The MPEG standard allows three levels (Layer) of audio compression. Layer 768 uses the simplest algorithm with minimal compression, which assumes 1 Kbps per channel. The Layer 192 algorithm is more complex, but the compression ratio is higher - 2 Kbps per channel. Powerful CD-quality digital audio compression algorithm (128 times without loss distinguishable by the human ear) Layer 11 provides the highest possible sound quality with severe flow restrictions - no more than 3 Kbps per channel. It is mainly intended for the Internet. Its importance is so great that it has received the special abbreviation MP64, which stands for MPEG Layer 3.

Numerous Internet sites have sprung up containing hundreds of thousands of MP3 files of popular music. With the help of special playback programs (Real Audio), MP3 music can be listened to in real time over the Internet, it can be copied indefinitely (caution: a typical song takes from 2 to 8 MB) and illegally distributed. There are already portable MP3 players priced around $200 (eg Diamond Rio). The music industry, suffering tangible losses, began to actively fight against MP3 sites (Recording Industry Association of America found and forced the closure of most of them). But the genie is released, you can't close everyone. Adaptec predicts billions of songs downloaded over the Internet in the coming years and announces MP3 support in the next version of EasyCD Creator. However, audio signal compression is not used in digital editing tasks, therefore, in calculating the allowable streams, up to 1,5 Mbps must be assigned to the audio component.

MPEG-2 in non-linear editing

The term "nonlinear editing" does not correspond to the essence of the process, but only reflects one of its characteristics. In fact, we are talking about the editing of video films, carried out in digital form on computers. In this case, the original video fragments are subject to mandatory digitization and recording to the hard drive in the form of appropriate files. Unlike tape drives, access to any of these fragment files does not require tedious rewinding (and this process is linear), that is, all video frames are available in random order. This is an important property and determined the name of digital editing as non-linear, although, obviously, the possibilities of digital processing are much wider and richer.

Recall that according to Recommendation ITU-R BT.601, a television frame is a 720 x 576 matrix. Taking into account the television frame rate of 25 Hz, we conclude that one second of digital video in 4:2:2 representation requires 20 bytes (736 x 000 x 25 x 2), i.e. the data stream is 720 MB/s. Recording such streams is technically feasible, but it is complex, expensive, and inefficient in terms of post-processing. In practice, taking into account real possibilities, a significant reduction in flows is required. There are many algorithms that perform compression without loss of information, but even the most efficient of them do not provide more than a twofold compression on typical images.

Until recently, M-JPEG reigned supreme in the world of non-linear video editing systems. Different solutions differed in the degree of compression, which corresponded to different levels of quality of the resulting video. It is rather arbitrary to distinguish four levels here: standard video (VHS, C-VHS, Video8), super video (SVHS, C-SVHS, Hi8), digital video (Betacam SP, DV/DVCAM/DVCPRO, miniDV, Digital8) and studio video (Digital S, DVCPRO50). For simplicity, we will refer to them as Video, S-Video, DV and Studio-TV. Quantitatively, they are usually characterized by horizontal resolution (the number of elements distinguished in a line - television lines). It is believed that Video provides a resolution of up to 280 lines and corresponds to an M-JPEG stream of about 2 Mb / s, S-Video - 400 lines and 4 Mb / s, DV - 500 lines and 3,1 Mb / s, and Studio-TV - resolution of at least 600 lines with streams of 7 MB/s. The compression ratios are 10:1, 5:1, 5:1 and 3:1, respectively (recall that the DV algorithm is more efficient than M-JPEG). But even such compression requires significant amounts of disk space for storing and processing video files. For example, one minute of M-JPEG video requires 120 MB for Video quality and about 500 MB for Studio-TV. But you really want to work with videos lasting tens of minutes!

And this is where MPEG-2 enters the arena. Even switching to I-frames saves 15% of the volume, and if you use P-frames, then the gain can be doubled (for IPPP groups), and this is already significant. True, there is an opinion that in the latter case one of the main advantages of non-linear editing is lost, namely its frame-by-frame accuracy. Actually this is a delusion. Using differential P-frames, the original image structure is easily and quickly restored (for modern processors, such a task is not difficult and is performed in real time). As for the recovery accuracy, in long groups and / or in the presence of B-frames, it really drops noticeably. Therefore, for example, DVD-Video (GOP 15/3) cannot be edited. At the same time, for short groups of only I- and P-frames, recovery occurs practically without error accumulation. Thus, with MPEG-2 encoding 422P@ML, a stream of 50 Mbps with I-frames (I-frame only) and 25 Mbps with an IPPP group is sufficient to ensure studio quality (see Table 2).

Table 2

 Compression type

Video

S-Video

DV

Studio TV

 M-JPEG, Mbps

16

32

38

56

 I-frame 422P@ML, Mbps

14

28

33

49

 I-frame MP@ML, Mbps

10

21

25

37

 IPPP 422P@ML, Mbps

7

14

17

24

 IPPP MP@ML, Mbps

5

10

12,5

18

 IBP 15/3 MP@ML, Mbps

2,5

5

6

9

It is in this direction that modern non-linear editing systems are developing. So far there are few examples. These are FAST 601 [six-o-one], Pinnacle miroVideo DC1000 and Matrox DigiSuite DTV. But the advantages of this approach are so obvious that other solutions are sure to appear in the near future.

Author: Andrey Ryakhin, based on digitalvideo.ru

 We recommend interesting articles Section video art:

▪ Details about all video camera formats

▪ MPEG-2 and home video

▪ Video editing. Gluing language

See other articles Section video art.

Read and write useful comments on this article.

<< Back

Latest news of science and technology, new electronics:

Energy from space for Starship 08.05.2024

Producing solar energy in space is becoming more feasible with the advent of new technologies and the development of space programs. The head of the startup Virtus Solis shared his vision of using SpaceX's Starship to create orbital power plants capable of powering the Earth. Startup Virtus Solis has unveiled an ambitious project to create orbital power plants using SpaceX's Starship. This idea could significantly change the field of solar energy production, making it more accessible and cheaper. The core of the startup's plan is to reduce the cost of launching satellites into space using Starship. This technological breakthrough is expected to make solar energy production in space more competitive with traditional energy sources. Virtual Solis plans to build large photovoltaic panels in orbit, using Starship to deliver the necessary equipment. However, one of the key challenges ... >>

New method for creating powerful batteries 08.05.2024

With the development of technology and the expanding use of electronics, the issue of creating efficient and safe energy sources is becoming increasingly urgent. Researchers at the University of Queensland have unveiled a new approach to creating high-power zinc-based batteries that could change the landscape of the energy industry. One of the main problems with traditional water-based rechargeable batteries was their low voltage, which limited their use in modern devices. But thanks to a new method developed by scientists, this drawback has been successfully overcome. As part of their research, scientists turned to a special organic compound - catechol. It turned out to be an important component that can improve battery stability and increase its efficiency. This approach has led to a significant increase in the voltage of zinc-ion batteries, making them more competitive. According to scientists, such batteries have several advantages. They have b ... >>

Alcohol content of warm beer 07.05.2024

Beer, as one of the most common alcoholic drinks, has its own unique taste, which can change depending on the temperature of consumption. A new study by an international team of scientists has found that beer temperature has a significant impact on the perception of alcoholic taste. The study, led by materials scientist Lei Jiang, found that at different temperatures, ethanol and water molecules form different types of clusters, which affects the perception of alcoholic taste. At low temperatures, more pyramid-like clusters form, which reduces the pungency of the "ethanol" taste and makes the drink taste less alcoholic. On the contrary, as the temperature increases, the clusters become more chain-like, resulting in a more pronounced alcoholic taste. This explains why the taste of some alcoholic drinks, such as baijiu, can change depending on temperature. The data obtained opens up new prospects for beverage manufacturers, ... >>

Random news from the Archive

telephoto lens for phone 09.08.2007

The Hong Kong firm "Brando" has begun production of telephoto attachments for telephone cameras. The attachment magnifies the image six times.

So far, this optics is intended only for the latest Nokia and Sony-Eriksson models, but in the future devices from other major manufacturers will also be covered.

Other interesting news:

▪ Raspberry Pi Pico microcontroller board

▪ Cats copy the behavior of their owners

▪ Robot on the weed

▪ Bicycle energy meter

▪ A new type of carbon-based battery

News feed of science and technology, new electronics

 

Interesting materials of the Free Technical Library:

▪ section of the site Amateur radio calculations. Article selection

▪ article Weighing large loads. Tips for the home master

▪ article What is solar activity? Detailed answer

▪ article Loch silvery. Legends, cultivation, methods of application

▪ article Engine management system January-4. Encyclopedia of radio electronics and electrical engineering

▪ article Prefix for microdrill control. Encyclopedia of radio electronics and electrical engineering

Leave your comment on this article:

Name:


Email (optional):


A comment:





All languages ​​of this page

Home page | Library | Articles | Website map | Site Reviews

www.diagram.com.ua

www.diagram.com.ua
2000-2024