beautypg.com

Choosing a name for the feed data source, Choosing the feed type – Google Search Appliance Feeds Protocol Developers Guide User Manual

Page 8

background image

Google Search Appliance: Feeds Protocol Developer’s Guide

8

Choosing a Name for the Feed Data Source

When you push a feed to the search appliance, the system associates the fed URLs with a data source
name, specified by the datasource element in the feed DTD.

If the data source name is “web”, the system treats the feed as a web feed. A search appliance can
only have one data source called “web”.

If the data source name is anything else, and the feed type is metadata-and-url, the system treats
the feed as a web feed.

If the data source name is anything else, and the feed type is not metadata-and-url, the system
treats the feed as a content feed.

To view all of the feeds for your search appliance, log into the Admin Console and choose Content
Sources > Feeds. The list shows the date of the most recent push for each data source name, along with
whether the feed was successful and how many documents were pushed.

Note: Although you can specify the feed type and data source in the XML file, the values specified in the
XML file are currently unused. Instead, the search appliance uses the data source and feed type that are
specified during the feed upload step. However, we recommend that you include the data source name
and feed type in the XML file for compatibility with future versions.

Choosing the Feed Type

The feed type determines how the search appliance handles URLs when a new content feed is pushed
with an existing data source name.

Content feeds can be full or incremental; a web feed is always incremental. To support feeds that provide
only URLs and metadata, you can also set the feed type to metadata-and-url. This is a special feed type
that is treated as a web feed.

When the feedtype element is set to full for a content feed, the system deletes all the prior URLs
that were associated with the data source. The new feed contents completely replace the prior feed
contents. If the feed contains metadata, you must also provide content for each record; a full feed
cannot push metadata alone. You can delete all documents in a data source by pushing an empty
full feed.

When the feedtype element is set to incremental, the system modifies the URLs that exist in the
new feed as specified by the action attribute for the record. URLs from previous feeds remain
associated with the content data source. If the record contains metadata, you can incrementally
update either the content or the metadata.

When the feedtype element is set to metadata-and-url, the system modifies the URLs and
metadata that exist in the new feed as specified by the action attribute for the record. URLs and
metadata from previous feeds remain associated with the content data source. You can use this
feed type even if you do not define any metadata in the feed. The system treats any data source
with this feed type as a special kind of web feed and updates the feed incrementally. Unless the
metadata-and-url feed has the crawl-immediately=true directive the search appliance will
schedule the re-crawling of the URL instead of re-crawling it without delay.

It is not possible to modify a single field of a document’s metadata by submitting a feed that contains
only the modified field. To modify a single field, you must submit a feed that includes all the metadata
fields along with the modified field.

Documents that have been fed by using content feeds are specially marked so that the crawler will not
attempt to crawl them unless the URL is also one of the Start URLs defined on the Content Sources >
Web Crawl > Start and Block URLs page. In this case, the URL is periodically accessed from the GSA as
part of the regular connectivity tests.