External metadata indexing guide, Introduction – Google Search Appliance External Metadata Indexing Guide User Manual
Page 4

Google Search Appliance: External Metadata Indexing Guide
4
External Metadata Indexing Guide
This guide is for developers and administrators of the Google Search Appliance who have documents
with metadata that is not stored directly in the primary document. A primary document is a record, file,
or web page that the search appliance treats as a document to index or serve. The guide explains how
to use the external metadata indexing capabilities of the search appliance, either through the use of the
Feeds system or the Database Crawler. You should be familiar with the Feeds system and Document
Crawler before you read this guide.
Introduction
The Google Search Appliance indexes metadata stored in documents and makes that data available for
retrieval at search time. Metadata is data that describes other data. It can provide useful information
that can improve the quality of your search results. For example, an HTML document can hold metadata
in the tag to describe the author or keywords for the document. Similarly, Microsoft Office files
such as Word documents or Excel spreadsheets often contain metadata fields, such as Title, Subject,
Author, Date, and many others.
From the perspective of the search appliance, there are two primary types of metadata:
•
Metadata that is stored directly in a primary document, as in the example of the HTML document
with a tag.
When the search appliance indexes a document, it automatically indexes the metadata that is
stored directly in that document.
•
Metadata that is not stored directly in a primary document.
An example of this is metadata about a document that is stored in a column of a database table or
metadata that is pushed in a separate feed.
You can configure the search appliance to index this external metadata and the primary document as a
single record.
Because the search appliance automatically indexes metadata that is stored directly in a primary
document, this guide describes how to index metadata that is not stored in the primary document. In
this guide, external metadata refers to metadata that is not stored directly in the primary document.
The primary document is defined as a record, web page, or any of the over 200 different file types that
can be indexed by the search appliance and are acquired through the web crawler, database crawler, file
system crawler, or feeder system.