Populating the index for controlled-access content – Google Search Appliance Managing Search for Controlled-Access Content User Manual

Page 55

Google Search Appliance: Managing Search for Controlled-Access Content

Populating the Index for Controlled-Access Content

During crawl, the search appliance goes through each of the content sources that have been configured,
and uses the credentials under Crawler Access to obtain the controlled-access content.

The search appliance can use multiple protocols to crawl and index controlled-access content.

•

The search appliance connects to events.abc.int over HTTPS. The web server asks for credentials
using HTTP Basic Authentication: the search appliance provides the username “ABCsearch” and the
password entered in the Admin Console. The web server verifies that ABCsearch has access to view
documents on events.abc.int. The search appliance crawls through all documents on
events.abc.int and adds them to the index.

•

The search appliance connects to announce.abc.int over HTTPS. The Microsoft IIS server asks for
credentials using Windows Authentication: the search appliance provides an NTLM HTTP message
that contains the username “ABCsearch” and a response based on the password entered in the
Admin Console. The IIS server verifies that ABCsearch has access to view documents on
announce.abc.int. The search appliance crawls through all documents on announce.abc.int
and adds them to the index.

•

The search appliance receives a web feed that directs it to directory.abc.int with
authmethod=ntlm. It connects to directory.abc.int over HTTPS. The Microsoft IIS server asks for
credentials using Windows Authentication: the search appliance provides an NTLM HTTP message
that contains the username “ABCfeeder” and a response based on the password entered in the
Admin Console. The IIS server verifies that ABCfeeder has access to view documents on
directory.abc.int. The search appliance crawls through all documents on directory.abc.int
and adds them to the index.