beautypg.com

Why use feeds, Impact of feeds on document relevancy, Choosing a feed client – Google Search Appliance Feeds Protocol Developers Guide User Manual

Page 6

background image

Google Search Appliance: Feeds Protocol Developer’s Guide

6

The search appliance does not support indexing compressed files sent in content feeds.

The search appliance follows links from a content-fed document, as long as the links match URL
patterns added under Follow and Crawl Only URLs with the Following Patterns on the Content
Sources > Web Crawl > Start and Block URLs page in the Admin Console.

Web feeds and content feeds behave differently when deleting content. See “Removing Feed Content
From the Index” on page 31
for a description of how content is deleted from each type of feed.

To see an example of a feed, follow the steps in the section “Quickstart” on page 7.

Why Use Feeds?

You should design a feed to ensure that your search appliance crawls any documents that require
special handling. Consider whether your site includes content that cannot be found through links on
crawled web pages, or content that is most useful when it is crawled at a specific time. For example, you
might use a feed to add external metadata from an Enterprise Content Management (ECM) system.

Examples of documents that are best pushed using feeds include:

Documents that cannot be fetched using the crawler. For example, records in a database or files on
a system that is not web-enabled.

Documents that can be crawled but are best recrawled at different times than those set by the
automatic crawl scheduler that runs on the search appliance.

Documents that can be crawled but there are no links on your web site that allow the crawler to
discover them during a new crawl.

Documents that can be crawled but are much more quickly uploaded using feeds, due to web
server or network problems.

Impact of Feeds on Document Relevancy

For documents sent with content feed, a flat fixed page rank value is assigned by default, which might
have a negative impact on the relevancy determination of the documents. However, you can specify
PageRank in a feed for either a single URL or group of URLs by using the pagerank element. For more
details, see “Defining the XML Record for a Document” on page 9.

Choosing a Feed Client

You push the XML to the search appliance using a feed client. You can use one of the feed clients
described in this document or write your own. For details, see “Pushing a Feed to the Google Search
Appliance” on page 27
.