beautypg.com

Setting up crawl and index – Google Search Appliance Managing Search for Controlled-Access Content User Manual

Page 54

background image

Google Search Appliance: Managing Search for Controlled-Access Content

54

Setting up Crawl and Index

First, the system administrator creates a user account for the search appliance, called ABCsearch, and
sets up access policies that ensure that the ABCsearch user account is authorized to view all files on
events.abc.int, and announce.abc.int. The feed process on directory.abc.int has its own
account with similar permissions, called ABCfeeder.

Next, the search appliance administrator logs into the Admin Console and performs these actions:

1.

To provide the search appliance with credentials for crawl and index, Sandra opens Crawl and
Index
> Crawler Access, and adds rows using the account names and passwords given to her by
the system administrator:

Here, omitting the domain for events.abc.int instructs the search appliance to authenticate
using HTTP Basic. For all other servers in this example, the domain entry tells the search appliance
to authenticate against a Microsoft IIS Server using NTLM HTTP.

Because Basic Authentication sends credentials as base-64 encoded clear text, the patterns for
events.abc.int all use HTTPS, which protects user names and passwords. Although the use of
HTTPS is recommended for Basic Authentication, the search appliance can also authenticate over
HTTP. Make Public is selected for all URL patterns.

2.

Under Crawl and Index > Crawl URLs, Sandra clicks in the text box for Start Crawling from the
Following URLs
and adds the URL patterns "https://events.abc.int/" and "https://
announce.abc.int/".

3.

Sandra also adds the URL patterns "https://events.abc.int/", "https://announce.abc.int/
", and "https://directory.abc.int/" under Follow and Crawl Only URLs with the Following
Patterns
.

4.

Finally, she clicks Save URLs to Crawl to save the changes.

5.

She pushes a web feed to the appliance that includes the URLs from directory.abc.int, using the
following syntax:

Because the record has authmethod=ntlm, the search appliance attempts to authenticate using
NTLM HTTP when crawling this content.

Now that the search appliance has access to all of ABC Company’s press releases, the search appliance
administrator starts the crawl and waits for the controlled-access content to appear in the index.

For URLs Matching Pattern, Use:

Username:

In Domain:

Password:

Confirm
Password:

Make
Public:

https://events.abc.int/

ABCsearch

******

******

X

https://announce.abc.int/

ABCsearch

abc_corp

******

******

X

https://directory.abc.int/

ABCfeeder

abc_corp

******

******

X