Telos Zephyr Xstream User Manual

Page 125

USER’S MANUAL

Section 6: AUDIO CODING REFERENCE 113

The steps involved in the perceptual coding process are shown below:

The components work as follows:

• The analysis filter bank divides the audio into spectral components.

Sufficient frequency resolution must be used in order to exceed the width of
the ear's critical bands, which is 100 Hz below 500 Hz and 20% of the center
frequency at higher frequencies.

• The estimation of masked threshold section is where the human ear/brain

system is modeled. This determines the masking curve, under which noise
must fall.

• The audio is reduced to a lower bit rate in the quantization and coding

section.  One the one hand, the quantization must be sufficiently coarse in
order not to exceed the target bit rate.  On the other hand, the error must
be shaped to be under the limits set by the masking curve.

• The quantized values are joined in the bit stream multiplex, along with any

side information.

Doing audio coding effectively means managing several tradeoffs.  Most important is the
number of samples coded together in one frame.  Long frames have high delay, but are more
efficient because the header and side information is transmitted less frequently.  Longer frames
offer the possibility to use filter banks with better frequency resolution.  A fundamental
principle in signal processing is that spectral splitting filters may have either good time
resolution, or good frequency resolution, but not both.  This makes sense when you consider
that a longer time window means that the analyzer has more complete information, more full
audio cycles, to work with

In the case of rapidly changing input signals (transients) long frames are poorer than short ones
because the time spread will lead to so called pre‐echoes. For such signals, the size of the frame
should correspond to the temporal resolution of the human ear. This can be achieved by using
short frames or by changing the frame length according to the immediate characteristics of the
signal.

Perhaps this is the DSP designer’s equivalent to the economist’s TANSTAAFL: There

ain’t no such thing as a free lunch.