beautypg.com

Setting up and starting the crawl, Checking the crawl status – Google Search Appliance Installing the Google Search Appliance User Manual

Page 13

background image

Google Search Appliance: Installing the Google Search Appliance

13

Setting Up and Starting the Crawl

To set up and start the crawl:

1.

In the left-hand menu, click Content Sources > Web Crawl > Start and Block URLs.

2.

In the Start URLs field, type one or more start URLs.

For the initial setup and testing, it is best to enter a start URL that does not require a login or user
authentication. Start URLs must be fully qualified URLs, in the following format:

protocol://host[:port]/[path]/

For example, http://dracula:2346/content. The information in the square brackets is optional.

3.

In the Follow Patterns field, copy all start URLs from the Start URLs field.

If you enter the URL pattern for a directory, the URL must terminate in a forward slash (/). Use only
the server part of the URL. If a URL refers to a specific page, only that page is crawled. For more
information on URL patterns, click the Help link or see Administering Crawl.

4.

In the Do Not Follow Patterns field, scroll through the list of patterns that can be blocked from
being crawled.

Many file formats are excluded from the crawl by default, including common graphic formats such
as .jpg. If you want a particular format crawled, remove the format from the list or comment the
format out using the comment symbol (#). If you do not want a particular document type to be
crawled, remove the comment symbol from the corresponding pattern. For example, if you do not
want any Microsoft Word files (.doc) crawled, remove the # sign that is in front of “.doc$” and no
.doc files will be crawled. You can also add specific URL patterns to this area to prevent the URLs
that match the patterns from being crawled.

5.

Click Save.

6.

In the left-hand menu, click Content Sources > Diagnostics > Crawl Status.

7.

Click Resume Crawl.

The search appliance starts to crawl the URLs according to the URL patterns you entered. When the
search appliance software is crawling content, the graphic on the page shows multicolored balls in
motion. You do not have to pause the crawl before making changes on the Crawl URLs page.

Checking the Crawl Status

You can check the progress of the crawl from the Home page.

To check the crawl status:

1.

In the side menu, click Home.

The Home page is displayed, showing the Crawl Status graph. The graph automatically refreshes to
show crawling activity. If the page does not refresh automatically, click any link, and then return to
this page. You can also click the browser’s Refresh button.