beautypg.com

Kofax INDICIUS 6.0 User Manual

Page 62

background image

Chapter 4

52

Getting Started Guide (Classification and Separation)

ƒ

Test Documents Documents to use when testing the configuration (not used
for training).

ƒ

Unused Documents Documents that are not currently being used. These may
be additional documents that are not required for configuration or documents
that have not yet been classified.

Table 4-1 gives guidelines for the number of documents required for the different
classification methods (per document type). The figures take into account that some
documents may be misclassified or of poor quality and therefore may be discarded
before starting the configuration process. Although you can use more than the
suggested number of sample documents, this will slow down the configuration
process and may not improve accuracy. However, if your initial document set is poor
you should start with a higher number.

Table 4-1. Guideline Number of Documents per Document Type

Method

Number of Documents

Text classification (or a combination of classification methods)

150

Image, templated or rules-based classification

10

Note

For information on the suitability of documents/pages for a particular

classification method, refer to the INDICIUS Help.

Documents in Multiple Document Sets

Documents in standard or custom document sets are shared with those in the overall
project, that is, a document in a standard or custom document set is the same as that
document in All Documents.

Note

The actual image files contained in a document are not duplicated into each

set, only the names of the files are duplicated.

When documents are added to a set, they are members of the original set and the set
they have been added to.

When documents are moved to a set, they are members of the new set, but not the
original set.

Similarly, if pages or documents are modified in one document set they will be
modified in all document sets to which they belong. These modifications may be