beautypg.com

Step 7: select documents for testing – Kofax INDICIUS 6.0 User Manual

Page 76

background image

Chapter 4

66

Getting Started Guide (Classification and Separation)

9

Review the chart.

There should now be at least 100 documents for each document type (except
for Header) and each bar should be green.

Note

It is possible to review the documents that have been automatically classified,

using Browse Documents. For more information refer to the INDICIUS Help.

Step 7: Select Documents for Testing

The Test Documents set is used to store a subset of the clean documents for use in
testing. These are not used during the training process and therefore form an unseen
set of documents to use for testing. As the test documents have been cleaned up, a
comparison between the data in the project and the results of running the
configuration on the documents will provide an accurate indication of performance.

The Test Documents set is populated by moving documents from the Sample
Documents set.

Guidelines for Selecting Test Documents

When selecting test documents you must specify the percentage of documents to
move from the Sample Documents set. You can also specify whether documents that
have had their type manually confirmed may be moved into the test set, or whether
they must remain in the Sample Documents set.

The following table shows guidelines for selecting test documents.

Table 4-4. Test Document Selection Guidelines

Method Number

of

Documents in
Test Set

Keep Confirmed
Documents in Sample
Set

Page text classification

Multiple page level classification methods

30% Yes

Document text classification

Page image classification

Templated (including barcode) classification

Rules-based classification

90% Yes