beautypg.com

4 analysis of an unknown, Analysis of an unknown – Metrohm Vision – Theory User Manual

Page 27

background image

▪▪▪▪▪▪▪

25

Once these parameters have been defined, the algorithm proceeds as follows:

1.

A principal component model is calculated for the library using that number of primary PCs

required to satisfy the cumulative variance value (parameter #1).

2.

PC scores are calculated for all mean product spectra.

3.

The standard deviation spectrum (in wavelength space) is calculated for each product.

4.

The Euclidean norm is calculated for each product’s standard deviation spectrum.

5.

The minimal spanning tree is determined using Euclidean distances on principal component

scores of product mean spectra.

6.

Spheres are drawn around each product, with radii equal to the Euclidean norm of the

product standard deviation spectrum multiplied by the variance radius scale (parameter #2).

7.

If there is overlap between spheres for any pair of products on the tree, the products end up

in one cluster.

The algorithm described above may result in separation of a library into a number of clusters. Each of
those clusters in turn will undergo the same procedure in the next level of clustering (based on a PC
model local to the cluster) if the number of products in this cluster is equal or larger than the
maximum leaf cluster size (parameter #3).

The last step in the clustering procedure is to define the cluster boundaries. All leaf clusters are
enclosed by rectangular boxes. Each box has a size sufficient to enclose all the product spheres in a
cluster. (Radii of the spheres are equal to Euclidean norm of standard deviation multiplied by the
variance radius scale parameter.) The coordinates of the center of each cluster box and dimensions
are saved with the method.

6.4

Analysis of an Unknown

PC scores for the unknown spectrum are calculated from the first clustering level PC model. The
location of the unknown spectrum in the PC space is determined. The unknown spectrum may be
identified as one of the products if 1) its location locates within boundaries of any of the first level
clusters defined during clustering method development, and 2) that cluster is not further subdivided
into clusters.

If the location does not fit any of the clusters, the unknown fails clustering (is declared not present in
the library). If the cluster into which the unknown falls is further subdivided, the second level PC
model is applied to the unknown spectrum to determine if it falls into one of subclusters. This process
continues until a leaf cluster level is reached.

If the unknown spectrum locates within the boundaries of one of the leaf clusters, an identification
method local to this cluster is applied and the leaf cluster is searched to identify the unknown
sample.