beautypg.com

Serving results from a content management system – Google Search Appliance Getting the Most from Your Google Search Appliance User Manual

Page 22

background image

Google Search Appliance: Getting the Most from Your Google Search Appliance

Crawling and Indexing

22

When connecting to a document repository through an enterprise connector, the Google Search
Appliance uses a process called “traversal.” During traversal, the connector issues queries to the
repository to retrieve document data to feed to the Google Search Appliance for indexing. The
connector manager formats the content and any associated metadata for a feed to the Google Search
Appliance, which then creates an index of the documents.

The following figure provides an overview of indexing content in non-web repositories.

You can also create a custom connector for the Google Search Appliance, as described in “Developing
Custom Connectors” on page 68
.

Serving Results from a Content Management System

For public content in a repository, searches work the same way as they do with web and file-system
content. The Google Search Appliance searches its index and returns relevant result sets to the user
without any involvement by the connector.

To authorize access to private or protected content from a repository, the Google Search Appliance
creates a connector instance at query time. The connector instance forwards authentication credentials
to the repository for authorization checking. The connector manager recognizes identities passed from
basic authentication, SAML authentication (see “Authentication SPI” on page 67), and client certificates.
If a SAML authentication provider is setup to support single sign-on (SSO), the connector manager also
recognizes identities passed from the SSO provider.