Kofax INDICIUS 6.0 User Manual
Page 62
Chapter 4
52
Getting Started Guide (Classification and Separation)
Test Documents Documents to use when testing the configuration (not used
for training).
Unused Documents Documents that are not currently being used. These may
be additional documents that are not required for configuration or documents
that have not yet been classified.
Table 4-1 gives guidelines for the number of documents required for the different
classification methods (per document type). The figures take into account that some
documents may be misclassified or of poor quality and therefore may be discarded
before starting the configuration process. Although you can use more than the
suggested number of sample documents, this will slow down the configuration
process and may not improve accuracy. However, if your initial document set is poor
you should start with a higher number.
Table 4-1. Guideline Number of Documents per Document Type
Method
Number of Documents
Text classification (or a combination of classification methods)
150
Image, templated or rules-based classification
10
Note
For information on the suitability of documents/pages for a particular
classification method, refer to the INDICIUS Help.
Documents in Multiple Document Sets
Documents in standard or custom document sets are shared with those in the overall
project, that is, a document in a standard or custom document set is the same as that
document in All Documents.
Note
The actual image files contained in a document are not duplicated into each
set, only the names of the files are duplicated.
When documents are added to a set, they are members of the original set and the set
they have been added to.
When documents are moved to a set, they are members of the new set, but not the
original set.
Similarly, if pages or documents are modified in one document set they will be
modified in all document sets to which they belong. These modifications may be