Scenario 3 – Google Search Appliance External Metadata Indexing Guide User Manual

Page 8

Google Search Appliance: External Metadata Indexing Guide

In this scenario, the search appliance queries the database for data, then submits a feed with the
resulting rows. The search appliance extracts and indexes the recordset that is defined by the crawl
query. The URLs constructed from each external metadata record (as defined in Document ID Field and
the Base URL field) are added to the crawl queue and crawled by either the web or file system crawler,
following the normal crawl policy. When the primary document is crawled, the contents of the primary
document and the external metadata are merged into a single record identified by the URL of the
primary document in the search appliance index. After the primary document and the external
metadata are indexed, search users can query for terms or keywords in the metadata or the primary
document and the primary document will return as a search result.

When you re-index your primary document and external metadata, the replacement behavior for this
scenario is the same as for a metadata-and-URL feed. For information about metadata-and-URL feeds,
see the Feeds Protocol Developer’s Guide.

Scenario 3

Metadata: Stored in a database.

Primary Document: The primary document is also stored in the database, as a BLOB.

If your external metadata is stored in a relational database and the primary document is also stored in
the database as a BLOB (Binary Large OBject), do the following:

In the Content Sources > Databases page, create a new database data source.

Enter the database properties (type, hostname, port, name, username, password) used to connect
to the database that contains the external metadata.

Construct the Crawl Query, a valid SQL statement accepted by the target database that returns all
rows of metadata to be indexed. One of the indexed fields must be the BLOB field that contains the
primary document.

Under Data Display / Usage, select the BLOB option.

In BLOB MIME Type Field, type the database column name that specifies the standard Internet
MIME type of the BLOB.

In BLOB Content Field, type the database column name that contains the BLOB data.

Construct a Serve Query as a valid SQL statement that is accepted by the target database. The
database needs to return a single row of metadata and content to display as the result. Specify each
primary key value with a closing question mark. For example, to select metadata for an employee ID
and department for a serve query, enter the following statements:

SELECT employee_id, dept
FROM employee
WHERE employee_id = ? and dept = ?

In Primary Key Fields, enter the database columns that provide the single row of metadata that
you want to serve as the search result. For example:

employee_id, dept

In Scenario 3, the search appliance queries the database for data, then submits a feed with the resulting
rows. The search appliance crawls and indexes the set of records that is defined by the crawl query. The
specified BLOBs are pushed in a full content feed and are not crawled.

When you re-index your primary document and external metadata, the replacement behavior for this
scenario is the same as for a full feed. For information about full feeds, see the Feeds Protocol
Developer’s Guide.