beautypg.com

Quickstart, Designing an xml feed – Google Search Appliance Feeds Protocol Developers Guide User Manual

Page 7

background image

Google Search Appliance: Feeds Protocol Developer’s Guide

7

Quickstart

Here are steps for pushing a content feed to the search appliance.

1.

Download

sample_feed.xml

to your local computer. This is a content feed for a document entitled

“Fed Document”.

2.

In the Admin Console, go to Content Sources > Web Crawl > Start and Block URLs and add this
pattern to “Follow and Crawl Only URLs with the Following Patterns”:

http://www.localhost.example.com/

This is the URL for the document defined in sample_feed.xml.

3.

Download

pushfeed_client.py

to your local computer. This is a feed client script implemented in

Python 2.x. You must install Python 2.x to run this script. Google also provides a Python 3.x version,

pushfeed_client3.py

.

4.

Configure the search appliance to accept feeds from your computer. In the Admin Console, go to
Content Sources > Feeds, and scroll down to List of Trusted IP Addresses. Verify that the IP
address of your local computer is trusted.

5.

Run the feed client script with the following arguments (you must change “APPLIANCE-HOSTNAME”
to the hostname or IP address of your search appliance):

% pushfeed_client.py --datasource="sample" --feedtype="full"

--url="http://:19900/xmlfeed" --

xmlfilename="sample_feed.xml"

6.

In the Admin Console, go to Content Sources > Feeds. A data source named “sample” should
appear within 5 minutes.

7.

The URL http://www.localhost.example.com/ should appear under Crawl Diagnostics within
about 15 minutes.

8.

Enter the following as your search query to see the URL in the results:

info:http://www.localhost.example.com/

If your system is not busy, the URL should appear in your search results within 30 minutes.

Designing an XML Feed

The feed is an XML file that contains the URLs. It may also contain their contents, metadata, and
additional information such as the last-modified date. The XML must conform to the schema defined by
gsafeed.dtd. This file is available on your search appliance at http://:7800/
gsafeed.dtd. Although the Document Type Definition (DTD) defines elements for the data source name
and the feed type, these elements are populated when you push the feed to the search appliance. Any
datasource or feedtype values that you specify within the XML document are ignored.

An XML feed must be less than 1 GB in size. If your feed is larger than 1 GB, consider breaking the feed
into smaller feeds that can be pushed more efficiently.