beautypg.com

Evaluators, Online learning, Ocr and script integration – Kofax Getting Started with Ascent Xtrata Pro User Manual

Page 29: Ocr integration

background image

Chapter 1

10

Ascent Xtrata Pro User's Guide

Evaluators

In addition to the locators, various evaluators are available. Evaluators work on the
results of locators and do not directly retrieve data from the document.

Online Learning

The New Samples working mode is available within Project Builder. This working
mode shows documents that have been returned from validation. These documents
can be added to either a classification or extraction training set so that they may
optimize the extraction of tables and invoice header locators.

In order to make online learning available for a batch class, the Ascent Capture
Release module must be added to the list of queues for the batch class.

OCR and Script Integration

In addition to the classification and extraction methods provided with Ascent Xtrata
Pro, Project Builder also provides access to OCR settings and an editor for the built-
in script engine.

OCR Integration

To process unstructured documents and locate arbitrary content, the complete
document must be processed by the OCR engine before any of the extraction
methods can be applied. The OCR results are stored in a structured representation of
the document that is saved as an .xdc (XDoc) file. All subsequent algorithms operate
on the XDoc representation of the original file.

OCR is integrated transparently into Project Builder and Ascent Xtrata Pro Server. It
is also performed automatically during runtime, and only on demand. This means
that it is only done when the full text results of a page are needed. For example,
when extraction is restricted to the first page of the document, and none of the
classification methods require more than one page, OCR is only performed on the
first page.

Ascent Xtrata Pro is delivered with the ABBYY ® Finereader ® 8.0 OCR engine. An
additional language package for Asian languages for ABBYY ® Finereader ® and an
additional recognition engine KADMOS 4.2 ®, developed by Recognition GmbH, is
available. The language package as well as additional recognition engines like for
example KADMOS 4.2 ® must be licensed separately.