Learn more about public crawl, Crawling and serving controlled-access content – Google Search Appliance Getting the Most from Your Google Search Appliance User Manual
Page 19
Google Search Appliance: Getting the Most from Your Google Search Appliance
Crawling and Indexing
19
3.
Saving the crawl schedule.
To schedule crawl times for a specific host, you can change the host load and times in the Content
Sources > Web Crawl > Host Load Schedule page. By setting a host load of 0, the crawler will not crawl
that host during the configured time period.
If you wish to have a document added to the crawl queue right away, then you can do so by entering in
the URL in Re-Crawl These URL Patterns on the Content Sources > Web Crawl > Freshness Tuning
page.
Learn More about Public Crawl
For in-depth information about public crawl, configuring a search appliance to crawl, and starting a
crawl, refer to the introduction in Administering Crawl.
For a complete list of file types that the search appliance can crawl, refer to Indexable File Formats.
Crawling and Serving Controlled-Access Content
Controlled-access content is secure content—it is restricted so that not all users have access to it. For
access to controlled-access content, users need authorization.
A search appliance discovers and indexes controlled-access content in the same way that it indexes all
other content: by performing a crawl through the content sources. However, the search appliance
requires access credentials to discover and index controlled-access content. Once you set up the search
appliance with access credentials, it maintains a copy of all crawled content in the index.
The following figure provides an overview of crawling controlled-access content.