External metadata pushed in a feed, Scenario 4 – Google Search Appliance External Metadata Indexing Guide User Manual
Page 9

Google Search Appliance: External Metadata Indexing Guide
9
External Metadata Pushed in a Feed
The remaining scenarios use feeds. Feeds work well when the external metadata is not stored in a
relational database, the primary document is not accessible by the search appliance’s crawlers, or the
reference between the external metadata and the primary document is not easily expressed. You can
use a feeds-based solution in any of these cases or any case where you prefer using feeds to
implementing the database scenarios.
Feeds are described in general in the Feeds Protocol Developer’s Guide. You should be familiar with those
concepts before you index external metadata by using feeds.
There are two types of feeds:
•
Content feeds, which include a URL and the contents of the URL (the document itself).
•
Web feeds, which contain a list of URLs without their contents. The crawler queues the URLs and
fetches the contents as normal.
The primary document can be pushed in the feed as a content feed or referenced as a web feed. The
following scenarios detail how to index external metadata for primary documents that are pushed as
content feeds or web feeds.
Note:
Web feeds with a data source name of “web” and a feed type of “incremental” cannot contain
external metadata. If external metadata is added to this type of web feed, an error message will display
and the URL will not be crawled.
Scenario 4
Metadata: Inserted into the feed XML file.
Primary Document: Inserted into the feed XML file (content feed).
In this scenario, you need to write a script or code that generates the feed XML file and then push the
feed XML file to the search appliance. Use the following steps:
1.
Create the feed XML file and define the header information, including the data source name, as
shown in the following example: