beautypg.com

Configuring crawl of public content – Google Search Appliance Getting the Most from Your Google Search Appliance User Manual

Page 17

background image

Google Search Appliance: Getting the Most from Your Google Search Appliance

Crawling and Indexing

17

Configuring Crawl of Public Content

To configure a search appliance to crawl a content source, you specify top-level URLs and directory
addresses and links that the search appliance should follow by using the Crawl and Index > Crawl
URLs
page in the Admin Console. In addition to specifying start URLs, you can also specify URLs that the
search appliance should not follow and crawl.

By default, the search appliance crawls in continuous crawl mode. This means that after the Google
Search Appliance creates the index, it always crawls content sources looking for new or modified
content and updates the index to ensure that it contains the freshest listings. The search appliance can
also crawl content according to a schedule.

Configure continuous crawl by performing the following steps with the Admin Console:

1.

Specifying where to start the crawl by listing top-level URLs and directory addresses in the Start
Crawling from the Following URLs
section on the Crawl and Index > Crawl URLs page, shown in
the following figure.

2.

Specifying links for the search appliance to follow and index by listing patterns in the Follow and
Crawl Only URLs with the Following Patterns
section.

3.

Listing any URLs that you don’t want the search appliance to crawl in the Do Not Crawl URLs with
the Following Patterns
section.

4.

Saving the URL patterns.

After you save the URL patterns, the search appliance begins crawling in continuous mode.