About crawl patterns, About database crawling – Google Search Appliance Configuring GSA Unification User Manual
Page 17

Google Search Appliance: Configuring GSA Unification
17
•
File type filters
•
Domain filters
•
Metatag filters
About Crawl Patterns
Unified environments function more efficiently when the set of URLs crawled on one node has few or no
links to URLs crawled on other nodes. Google recommends that you set up the crawl patterns on each
node so that there is minimal interlinking among the nodes.
Depending on how results are authorized in your unified environment, you might need to copy crawl
patterns or crawler access information from the secondary search appliances to the primary search
appliances. For more information, see the tables in “About Authentication and Authorization within a
Unified Environment” on page 9.
If a secondary search appliance uses SMB crawl patterns, you must add the patterns to the patterns on
the primary search appliance’s Crawl and Index > Crawl URLs > Follow and Crawl Only URLs field.
About Database Crawling
To use database crawling in a unified environment, you might need to perform some additional
configuration.
•
If you configure the primary search appliance to crawl the database, no additional configuration is
required.
•
If you configure a secondary search appliance to crawl the database, search results from the
database are correctly returned to the primary search appliance. However, the primary search
appliance cannot retrieve the database when the user clicks a result from the database. Use these
instructions to set up the primary search appliance so that it can retrieve the database.
To set up the primary search appliance:
1.
Log in to the Admin Console of the secondary search appliance.
2.
Navigate to Crawl and Index > Databases.
3.
Note down the configuration information.
4.
Log in to the Admin Console of the primary search appliance.
5.
Navigate to Crawl and Index > Databases.
6.
Set up a database crawl configuration that is identical to the configuration on the secondary search
appliance.
7.
Configure a dummy SELECT statement for the crawl query that does not return documents. This
prevents the primary search appliance from crawling the database. The serve query on the primary
search appliance must be identical to the serve query on the secondary search appliance.
8.
Save the configuration.
For more information on crawling database with the Google Search Appliance, see “Database Crawling
and Serving” in Administering Crawl.