Telos Zephyr Xstream User Manual

Page 122

USER’S MANUAL

Section 6: AUDIO CODING REFERENCE 110

delta, between successive audio samples compared to using the individual values. Further
efficiency is had by adaptively varying the difference comparator according to the nature of the
program material. G.722 and APT‐X are examples of ADPCM schemes. They achieve around a
factor of 4:1 reduction in bitrate.

G.722 achieves additional efficiency by allocating its bits to match the patterns in the human
voice, and it’s considered adequate for news and talk programming over ISDN. But, for high‐
fidelity transmission,

algorithms with more power are required. These are based on

psychoacoustics, where the coding process is adapted to the way we hear sounds. There are
several algorithms available, with varying complexity and performance levels.

Some years ago, the international standards group ISO/IEC established the ISO/MPEG (Moving
Pictures Expert Group), to develop a universal standard for encoding moving pictures and sound
for digital storage and transmission media.  The standard was finalized in November 1992 with
three related algorithms, called Layers, defined to take advantage of psychoacoustic effects
when coding audio.  Layer 1 and 2 are intended for compression factors of about 4:1 and 6 or
8:1 respectively, and these algorithms have become popular in satellite and hard‐disk systems.
Layer‐3 achieves compression up to 12.5:1 — 8% of the original size — making it ideal for ISDN.

Basic Principles of Perceptual Coding

With perceptual coding, only information that can be perceived by the human auditory system is
retained.

Lossless – which, for audio, translates to noiseless – coding with perfect reconstruction would
be an optimum system, since no information would be lost or altered.  It might seem that
lossless, redundancy‐reducing methods (such as PKZIP, Stuffit, Stacker, and others used for
computer hard‐disk compression) would be applicable to audio.  Unfortunately, no constant
compression rate is possible due to signal‐dependent variations in redundancy.  There are highly
redundant signals like constant sine tones (where the only information necessary is the
frequency, phase, amplitude, and duration of the tone), while other signals, such as those which
approach broadband noise, may be completely unpredictable and contain no redundancy at all.
Furthermore, looking for redundancy can take time.  While a popular song might have three
choruses with identical audio data that would need to be coded only once, you’d have to store
and analyze the entire song in order to find them.  Any system intended for a real‐time use over
telephone channels must have a consistent output rate and be able to accommodate the worst
case, so effective audio compression is impossible with redundancy reduction alone.

Fortunately, psychoacoustics permits a clever solution! Effects called “masking” have been
discovered in the human auditory system. These masking effects (which merely prove that our
brain is also doing something similar to bit rate reduction) have been found to occur in both the
frequency and time domains and can be exploited for audio data reduction.

Most important for audio coding are the effects in the frequency domain. Research into
perception has revealed that a tone or narrow‐band noise at a certain frequency inhibits the
audibility of other signals that fall below a threshold curve centered on a masking signal.

The figure below shows two “threshold of audibility” curves.  The lower one is the typical
frequency sensitivity of the human ear when presented with a single swept tone.  When a single
constant tone is added, the threshold of audibility changes as shown in the upper curve.  The