2 analysis of an unknown, 4 residual variance in principal component space, 1 model development – Metrohm Vision – Theory User Manual
Page 24: Analysis of an unknown, Residual variance in principal component space, Model development
22
▪▪▪▪▪▪▪
Another type of threshold is match value. Mahalanobis distance calculated directly from the formula
depends strongly on the number of samples in the training set. Therefore if the match value type of
threshold is to be used, the distance has to be scaled. Vision divides Mahalanobis distance by the
number of degrees of freedom. Default threshold values have been determined experimentally and
do not have statistical interpretation.
5.3.2
Analysis of an Unknown
The Mahalanobis distance between an unknown spectrum and a product mean spectrum is
calculated using primary eigenvectors. The distance is scaled or the probability calculated (depending
on the choice of threshold). The unknown passes analysis if the calculated distance is below the
threshold value.
Note: If the probability threshold is used, Vision calculates the probability that a sample is not a
member of the distribution described by the training set of spectra. A low value for this quantity
indicates a high probability that the sample spectrum belongs to the training set.
5.4
Residual Variance in Principal Component Space
Library identification method based on residual variance calculates a local Principal Components
model for each product in the library. Qualification method is developed for each product separately.
5.4.1
Model Development
Principal Component Analysis performed on spectra in the training set of a given product yields a set
of eigenvectors with corresponding eigenvalues. From the cumulative variance threshold defined for
the model, the number of primary PCs in the model is determined.
Residual variance is distributed according to F function. (F distribution is a ratio of two chi-square
distributions). From the F function one can calculate probability that a given sample belongs to the
distribution represented by the training set.
Residual variance method offers a choice of two types of thresholds: probability or match value.
Threshold expressed as probability is the recommended type.
Vision has built-in probability function based on the F distribution. Samples’ residual variance and the
number of degrees of freedom is passed to this function, which returns probability that the sample
does not belong to the distribution represented by the training set of spectra (1-α).
Another type of threshold is match value. The calculation of residual variance from the formula is not
intuitive. Therefore if the match value type of threshold is to be used, the variance has to be scaled.
Vision scales residual variance by the number of degrees of freedom. The default threshold values
have been determined experimentally and do not have statistical interpretation.