
Vector Quantization
Contents

Introduction

Preliminaries

Design Problem

Optimality Criteria

LBG Design Algorithm

TwoDimensional Animation

Performance

References
I. Introduction
Vector quantization (VQ) is a lossy data compression method based
on the principle of block coding.
It is a fixedtofixed length algorithm. In the earlier days,
the design of a vector quantizer (VQ) is considered to be a challenging
problem due to the need for multidimensional integration. In 1980,
Linde, Buzo, and Gray (LBG) proposed a VQ design algorithm based
on a training sequence. The use of a training sequence bypasses
the need for multidimensional integration. A VQ that is designed
using this algorithm are referred to in the literature as an LBGVQ.
II. Preliminaries
A VQ is nothing more than an approximator. The idea is similar to that
of ``roundingoff'' (say to the nearest integer). An example of
a 1dimensional VQ is shown below:
Here, every number less than 2 are approximated by 3.
Every number between 2 and 0 are approximated by 1.
Every number between 0 and 2 are approximated by +1.
Every number greater than 2 are approximated by +3.
Note that the approximate values are uniquely represented by 2 bits.
This is a 1dimensional, 2bit VQ. It has a rate of 2 bits/dimension.
An example of a 2dimensional VQ is shown below:
Here, every pair of numbers falling in a particular region
are approximated by a red star associated with that region.
Note that there are 16 regions and 16 red stars  each of which
can be uniquely represented by 4 bits. Thus, this is a 2dimensional,
4bit VQ. Its rate is also 2 bits/dimension.
In the above two examples, the red stars are called codevectors
and the regions defined by the blue borders are called encoding
regions. The set of all codevectors is called the codebook
and the set of all encoding regions is called the partition of
the space.
III. Design Problem
The VQ design problem can be stated as follows. Given a vector
source with its statistical properties known, given a
distortion measure, and given the
number of codevectors, find a codebook (the set of all red stars)
and a partition (the set of blue lines) which result in the
smallest average distortion.
We assume that there is a training sequence
consisting of source vectors:
This training sequence can be obtained from some large database.
For example, if the source is a speech signal, then the training
sequence can be obtained by recording several long telephone conversations.
is assumed to be sufficiently large so that all the
statistical properties of the source are captured by the training sequence.
We assume that the source vectors are dimensional, e.g.,
Let be the number of codevectors and let
represents the codebook. Each codevector is dimensional, e.g.,
Let be the encoding region associated with codevector
and let
denote the partition of the space. If the source vector
is in the encoding region , then
its approximation (denoted by ) is :
Assuming a squarederror distortion measure,
the average distortion is given by:
where .
The design problem can be succinctly stated as follows:
Given and , find
and such that
is minimized.
IV. Optimality Criteria
If and are a solution to the above
minimization problem, then it must satisfied the following two
criteria.
 Nearest Neighbor Condition:
This condition says that the encoding region should consists
of all vectors that are closer to than any of the other
codevectors. For those vectors lying on the boundary (blue lines), any
tiebreaking procedure will do.
 Centroid Condition:
This condition says that the codevector should be
average of all those training vectors that are in encoding
region . In implementation, one should ensure that
at least one training vector belongs to each encoding region
(so that the denominator in the above equation is never 0).
V. LBG Design Algorithm
The LBG VQ design algorithm is an iterative algorithm which
alternatively solves the above two optimality criteria.
The algorithm requires an initial codebook . This
initial codebook is obtained by the splitting method.
In this method, an initial codevector is set as the
average of the entire training sequence. This codevector
is then split into two. The iterative algorithm is run with these
two vectors as the initial codebook. The final two codevectors are
splitted into four and the process is repeated until the desired number
of codevectors is obtained. The algorithm is summarized below.
LBG Design Algorithm
 Given . Fixed
to be a ``small'' number.
 Let and
Calculate
 Splitting:
For , set
Set .
 Iteration:
Let . Set the iteration index
.
 For , find the minimum value of
over all . Let be the
index which achieves the minimum. Set
 For , update the codevector
 Set .
 Calculate
 If ,
go back to Step (i).
 Set . For ,
set
as the final codevectors.
 Repeat Steps 3 and 4 until the desired number of codevectors is obtained.
VI. TwoDimensional Animation
Click on the figure above to begin the animation
 If the animation appears to be stuck, try moving up or down
the page in your browser.
 The source for the above is a memoryless Gaussian source
with zeromean and unit variance.
 The tiny green dots are training vectors  there are 4096 of them.
 The LBG design algorithm is run with .
 The algorithm guarantees a locally optimal solution.
 The size of the training sequence should be sufficiently large.
It is recommended that .
VI. Performance
The performance of VQ are typically given in terms of the
signaltodistortion ratio (SDR):
(in dB),
where is the variance of the source and
is the average squarederror distortion.
The higher the SDR the better the performance. The following
tables show the performance of the LBGVQ for the
memoryless Gaussian source and the firstorder GaussMarkov
source with correlation coefficient 0.9. Comparisons
are made with the optimal performance theoretically
attainable, SDR_{opt}, which is obtained by
evaluating the ratedistortion function.
Rate 
SDR (in dB) 
SDR_{opt} 
(bits/dimension) 









1 
4.4 
4.4 
4.5 
4.7 
4.8 
4.8 
4.9 
5.0 
6.0 
2 
9.3 
9.6 
9.9 
10.2 
10.3 
 
 
 
12.0 
3 
14.6 
15.3 
15.7 
 
 
 
 
 
18.1 
4 
20.2 
21.1 
 
 
 
 
 
 
24.1 
5 
26.0 
27.0 
 
 
 
 
 
 
30.1 
Memoryless Gaussian Source 
Rate 
SDR (in dB) 
SDR_{opt} 
(bits/dimension) 









1 
4.4 
7.8 
9.4 
10.2 
10.7 
11.0 
11.4 
11.6 
13.2 
2 
9.3 
13.6 
15.0 
15.8 
16.2 
 
 
 
19.3 
3 
14.6 
19.0 
20.6 
 
 
 
 
 
25.3 
4 
20.2 
24.8 
 
 
 
 
 
 
31.3 
5 
26.0 
30.7 
 
 
 
 
 
 
37.3 
FirstOrder GaussMarkov Source with Correlation 0.9 
VIII. References
 A. Gersho and R. M. Gray,
Vector Quantization and Signal Compression.
 H. Abut,
Vector Quantization.
 R. M. Gray, ``Vector Quantization,'' IEEE ASSP Magazine, pp. 429,
April 1984.
 Y. Linde, A. Buzo, and R. M. Gray, ``An Algorithm for Vector
Quantizer Design,'' IEEE Transactions on Communications, pp. 702710,
January 1980.
