• Learning Dynamics and Robustness of Vector Quantization and ...

  • FileName: neurocomp2007.pdf [read-online]
    • Abstract: depending on the structure of the data, the neural gas does not always obtain the best asymptotic quantization ***or. ... vector quantization (vq) is an important unsu- pervised learning algorithm, widely used ...

Download the ebook

learning dynamics and robustness of
vector quantization and neural gas
aree witoelar a,∗ michael biehl a anarta ghosh b barbara hammer c
a university of groningen, mathematics and computing science, p.o. box 800, nl-9700 av groningen, the netherlands
b wanprc, university of washington, seattle, wa-98195, usa
c clausthal university of technology, institute of computer science, d-98678 clausthal-zellerfeld, germany
various alternatives have been developed to improve the winner-takes-all (wta) mechanism in vector quantiza-
tion, including the neural gas (ng). however, the behavior of these algorithms including their learning dynamics,
robustness with respect to initialization, asymptotic results, etc. has only partially been studied in a rigorous mathe-
matical analysis. the theory of on-line learning allows for an exact mathematical description of the training dynamics
in model situations. we demonstrate using a system of three competing prototypes trained from a mixture of gaussian
clusters that the neural gas can improve convergence speed and achieves robustness to initial conditions. however,
depending on the structure of the data, the neural gas does not always obtain the best asymptotic quantization ***or.
key words: vector quantization, clustering, online learning, winner-takes-all algorithms, neural gas
1. introduction a variety of alternatives to overcome this problem
have been proposed, some of which are heuristically
vector quantization (vq) is an important unsu- motivated while others are based on the minimiza-
pervised learning algorithm, widely used in different tion of a cost function related to the quantization
areas such as data mining, medical analysis, image ***or: the self-organizing map (som) [12], fuzzy-
compression, and speech or handwriting recognition k-means [2], stochastic optimization [7], to name
[1]. the main objective of vq is to represent the data just a few. these algorithms have in common that
points by a small number of prototypes or codebook each pattern influences more than one prototype at
vectors. this can directly be used for compression, a time through a ”winner-takes-most” paradigm.
clustering, data mining, or (with post-labeling of the neural gas (ng) as proposed in [13] is a particularly
prototypes) classification [9,14]. robust variation of vector quantization with the
the basic ”winner-takes-all” (wta) or batch introduction of neighborhood relations. unlike the
algorithms like the popular k-means clustering di- self-organizing map, [12], the ng system takes into
rectly optimize the quantization ***or underlying account the relative distances between prototypes
vector quantization. however, these methods can be in the input space and not on a predefined lattice.
subject to confinement in local minima of the quan- in practice, ng algorithms yield better solutions
tization ***or and can produce suboptimal results. than wta; however, the effect of this strategy
on convergence speed or asymptotic behavior has
∗ corresponding author. hardly been rigorously investigated so far.
email address: [email protected] (aree witoelar). methods from statistical physics and the theory
url: http://www.cs.rug/∼aree (aree witoelar).
preprint submitted to elsevier 8 november 2007
of on-line learning [8] allow for an exact mathemat- the input data is presented sequentially during
ical description of learning systems for high dimen- training and one or more prototypes are updated on-
sional data. in the limit of infinite dimensionality, line. algorithms studied here can be interpreted as
such systems can be fully described in terms of a few stochastic gradient descent procedures with respect
characteristic quantities, the so-called order param- to a cost function h(w ) related to e(w ). the gen-
eters. the evolution of these order parameters along eralized form reads
the training procedure is characterized by a set of s
coupled ordinary differential equations (ode). by 1
h(w ) = dξp (ξ) d(ξ, wi )f (ri )
integrating these odes, it is possible to analyse the 2 i=1
performance of vq algorithms in term

Use: 0.313