Step 7: select documents for testing – Kofax INDICIUS 6.0 User Manual
Page 76
Chapter 4
66
Getting Started Guide (Classification and Separation)
9
Review the chart.
There should now be at least 100 documents for each document type (except
for Header) and each bar should be green.
Note
It is possible to review the documents that have been automatically classified,
using Browse Documents. For more information refer to the INDICIUS Help.
Step 7: Select Documents for Testing
The Test Documents set is used to store a subset of the clean documents for use in
testing. These are not used during the training process and therefore form an unseen
set of documents to use for testing. As the test documents have been cleaned up, a
comparison between the data in the project and the results of running the
configuration on the documents will provide an accurate indication of performance.
The Test Documents set is populated by moving documents from the Sample
Documents set.
Guidelines for Selecting Test Documents
When selecting test documents you must specify the percentage of documents to
move from the Sample Documents set. You can also specify whether documents that
have had their type manually confirmed may be moved into the test set, or whether
they must remain in the Sample Documents set.
The following table shows guidelines for selecting test documents.
Table 4-4. Test Document Selection Guidelines
Method Number
of
Documents in
Test Set
Keep Confirmed
Documents in Sample
Set
Page text classification
Multiple page level classification methods
30% Yes
Document text classification
Page image classification
Templated (including barcode) classification
Rules-based classification
90% Yes