beautypg.com

Learn more about public crawl, Crawling and serving controlled-access content – Google Search Appliance Getting the Most from Your Google Search Appliance User Manual

Page 19

background image

Google Search Appliance: Getting the Most from Your Google Search Appliance

Crawling and Indexing

19

3.

Saving the crawl schedule.

To schedule crawl times for a specific host, you can change the host load and times in the Content
Sources > Web Crawl > Host Load Schedule page. By setting a host load of 0, the crawler will not crawl
that host during the configured time period.

If you wish to have a document added to the crawl queue right away, then you can do so by entering in
the URL in Re-Crawl These URL Patterns on the Content Sources > Web Crawl > Freshness Tuning
page.

Learn More about Public Crawl

For in-depth information about public crawl, configuring a search appliance to crawl, and starting a
crawl, refer to the introduction in Administering Crawl.

For a complete list of file types that the search appliance can crawl, refer to Indexable File Formats.

Crawling and Serving Controlled-Access Content

Controlled-access content is secure content—it is restricted so that not all users have access to it. For
access to controlled-access content, users need authorization.

A search appliance discovers and indexes controlled-access content in the same way that it indexes all
other content: by performing a crawl through the content sources. However, the search appliance
requires access credentials to discover and index controlled-access content. Once you set up the search
appliance with access credentials, it maintains a copy of all crawled content in the index.

The following figure provides an overview of crawling controlled-access content.