beautypg.com

Kofax Getting Started with Ascent Xtrata Pro User Manual

Page 109

background image

Chapter 3

90

Ascent Xtrata Pro User's Guide

Figure 3-20. Relationship Between Precision and Recall - Scheme

The yellow area depicts the set of all documents. The vertical reference line divides
this set of documents into two groups: class A or not A. The classifier performs
classification and decides if a document belongs to class A or not. This is depicted by
the diagonal line. If the classifier and the reference set were perfect, the vertical line
and the diagonal line would exactly match. Since this is not the case, three subsets are
created by the intersection of these two lines:

a is the subset of correctly classified documents
b is the subset of incorrectly classified documents
c is the subset of documents that have not been classified but should have been

Precision ( P ) and recall ( R ) are defined as:

b

a

a

P

+

=

c

a

a

R

+

=

For more than one class, the weighted values of P and R are summed over all classes
to get an overall result. If no threshold is defined, P and R are equal since obviously
every incorrectly classified document is missing in another class.

If a threshold is introduced, a third set of rejected documents is created that is not
shown in the graph. A threshold will increase precision while lowering recall by
suppressing incorrectly classified documents.

You can easily determine P and R for your classification scheme using the Result
Matrix tool in Project Builder (see Result Matrix on page 93). Use the interactive
threshold setting tool to set the system to the desired precision for the reference set.