Scenario 2 – Google Search Appliance External Metadata Indexing Guide User Manual

Page 7

Google Search Appliance: External Metadata Indexing Guide

In this scenario, the search appliance queries the database for data, then submits a feed with the
resulting rows. The search appliance crawls and indexes the set of records that is defined by the crawl
query. The URLs extracted from each external metadata record (as defined by the URL field) are added
to the crawl queue and crawled by either the web or file system crawler, following the normal crawl
policy. When the primary document is crawled, the contents of the primary document and the external
metadata are merged into a single record, which is identified by the URL of the primary document in the
search appliance index. After the primary document and the external metadata are indexed, the
primary document is returned as a search result when search users query for terms in the external
metadata or the primary document.

When you re-index your primary document and external metadata, the replacement behavior for this
scenario is the same as for a metadata-and-URL feed. For information about metadata-and-URL feeds,
see the Feeds Protocol Developer’s Guide.

Scenario 2

Metadata: Stored in a database.

Primary Document: A pointer to the primary document needs to be constructed from a base URL
and a database value.

This scenario is very similar to the first one, except the URL is constructed from a base URL and a
document ID.

If your external metadata is stored in a relational database and the URLs that reference primary
documents can be constructed by combining a base URL string and a database field, use the following
steps to enable external metadata indexing. The database field usually represents a unique document
ID number that, when inserted into a base URL string, references a specific document on a web server
or file system. For example, suppose that your primary documents are accessible from a URL of the
following form:

http://cmsystem.acme.corp.com:6502/getdoc?action=get&docid=4662118437

In the example, the highlighted number represents a unique document ID stored as a field in the
database. You can configure the search appliance to crawl the metadata and construct URLs that
reference primary documents by inserting values from one of the database fields into the base URL.

In the Content Sources > Databases page, create a new database data source.

Enter the database properties (type, hostname, port, name, username, password) used to connect
to the database that contains the external metadata.

Construct the Crawl Query, a valid SQL statement accepted by the target database that returns all
rows of metadata to be indexed. One of the indexed fields must contain the unique identifier of the
primary document to be inserted into a base URL string.

Under Data Display / Usage, select the Metadata option.

Select Document ID Field, and enter the database column that holds the unique value that is used
to construct a primary document URL.

In the Base URL field, enter the base URL string that is used to construct URLs that reference
primary documents. The value of the field that is specified in Document ID Field is inserted into the
base URL string, as specified by the {docid} tag. If the highlighted document ID in the preceding
example was stored in a field called uniqueID, the Document ID Field would be uniqueID and the
base URL string would be: http://cmsystem.acme.corp.com:6502/getdoc?
action=get&id={docid}