Crawling and indexing, Crawling public content, Chapter 4 – Google Search Appliance Getting the Most from Your Google Search Appliance User Manual
Page 15

Google Search Appliance: Getting the Most from Your Google Search Appliance
15
Chapter 4
Crawling and Indexing
Chapter 4
After the Google Search Appliance has been set up (see “Setting Up a Search Appliance” on page 11), you
can configure the search appliance to crawl the content sources that you identified during the planning
phase, as described in “Planning” on page 10.
Crawl is the process by which the Google Search Appliance discovers enterprise content and creates a
master index. The resulting index consists of all of the words, phrases, and meta-data in the crawled
documents. When users search for information, their queries are executed against the index rather
than the actual documents themselves. Searching against content that is already indexed in the
appliance is not interrupted, even as new content continues to be indexed.
The Google Search Appliance can crawl:
•
Public content (see “Crawling Public Content” on page 15)
•
Controlled-access content (see “Crawling and Serving Controlled-Access Content” on page 18)
The Google Search Appliance is also capable of indexing:
•
Content in non-web repositories, such as content management systems (see “Indexing Content in
Non-Web Repositories” on page 21)
•
Hard-to-find content, such as content that cannot be found through links on crawled web pages
(see “Indexing Content in Non-Web Repositories” on page 21)
•
Database content (see “Indexing Database Content” on page 26)
This section briefly describes how the Google Search Appliance indexes each type of content.
Crawling Public Content
Public content is not restricted in any way; users don’t need credentials to view it. Some of the most
common forms of public content include:
•
Employee portals
•
Frequently Asked Questions
•
Employee policies
•
Benefits information