beautypg.com

Scenario 5, External metadata sent in an http header – Google Search Appliance External Metadata Indexing Guide User Manual

Page 11

background image

Google Search Appliance: External Metadata Indexing Guide

11

Scenario 5

Metadata: Inserted into the feed XML file.

Primary Document: Referenced by the URL in the feed XML file (web feed).

This scenario is similar to the previous scenario, except that the primary document is referenced by URL
only (instead of the contents of the primary document being fed to the search appliance). The feed file
therefore contains the

information and, for each element, the URL of the record
and the elements.

1.

Create the feed XML file and define the header information, including the data source name, as
shown in the following example:



sample2
metadata-and-url

beautypg.com

Note that the element is metadata-and-url. This tells the web or file system crawler
to pick up the URLs for the primary document and index them accordingly.

2.

Create a element for each primary document. In the element, insert one or
more elements, as shown in the following example:



sample2
metadata-and-url

beautypg.com


mimetype="text/plain" last-modified="Tue, 17 Feb 2009 12:45:26 GMT">



External Metadata Sent in an HTTP Header

At crawl time, the search appliance can accept external metadata, along with documents, through the X-
GSA-External-Metadata HTTP response header. This is useful for indexing metadata for non-HTML
documents, where it is not possible to include metadata. The metadata supplied at crawl time replaces
any and all metadata that may have been indexed earlier.

To use this method of indexing external metadata, the web service that stores the content needs to be
designed to generate the optional X-GSA-External-Metadata HTTP header. The header includes a
comma separated list of encoded values, as specified in RFC2616 (

http://www.w3.org/Protocols/

rfc2616/rfc2616.html

), Section 4.2:

X-GSA-External-Metadata: value_1, value_2,...