Low volume, single station environment, Free-form processing overview, Application areas – Kofax INDICIUS 6.0 User Manual

Page 14: Document classification

background image

Chapter 1

8

Getting Started Guide (Free-Form)

Low Volume, Single Station Environment

In lower volume environments it is possible to run batches through all the modules
on a single station, using Kofax Capture Batch Manager.

Free-form Processing Overview

Traditional data capture solutions rely upon the presence of a template to specify the
location of the data fields to be captured from each document. If there are multiple
document types then a separate template must be defined for each type. For capture
from unstructured documents this will not work - the number of templates required
to cover all possible locations of the required data would be excessive.

To solve this problem, INDICIUS uses knowledge of the format and context of the
data to dynamically locate it on each document without the need for a template.
Free-form searches take as input a full or partial page read of the document, then
extract the data using rules that are specific to the page content, not to the document
layout. The result is a maintenance-light solution which is robust to document
variation.

INDICIUS uses a unique methodology for free-form data classification and
extraction. Features include:

ƒ

An entirely template-free approach.

ƒ

Totally flexible script-based configuration allows any data search criteria or
validation to be defined, giving the widest application scope.

ƒ

Dual-mode set-up – uses a graphical interface for rapid generation of data
search criteria combined with a script editor for definition of more complex
data validations.

ƒ

Offline test-mode for rapid testing and debugging of free-form processing
scripts.

Application Areas

Free-form processing can be applied in a variety of application areas:

Document Classification

Keywords, text patterns, or any form of rule based on these can be used as a basis for
classifying documents. For example, invoices could be automatically detected from a
stream of mixed document types by finding the word “invoice” in combination with