beautypg.com

Google Search Appliance Managing Search for Controlled-Access Content User Manual

Page 58

background image

Google Search Appliance: Managing Search for Controlled-Access Content

58

6.

In the URL Pattern for this rule box, Tanya enters http://insidealpha.com/ and clicks Create a
New Forms Authentication Rule
.

The search appliance proxies the login form.

7.

Tanya enters the credentials for the crawler user account and saves the forms authentication rule.

The search appliance stores the rule for use in crawl for all content under http://
insidealpha.com/. When a cookie expires, the search appliance uses the stored crawler account
to request a new session cookie.

8.

Next, Tanya uses the Crawl and Index > Forms Authentication page to add credentials for
crawling and indexing apacheserver.alphainside.com. In the Sample Forms Authentication
protected URL
box, Tanya enters apacheserver.alphainside.com/alphainsider.html.

9.

In the URL Pattern for this rule box, Tanya enters apacheserver.alphainside.com/ and clicks
Create a New Forms Authentication Rule.

10. The search appliance proxies the login form.

11. Tanya enters the credentials for the crawler user account and saves the forms authentication rule.

The search appliance stores the rule for use in crawl for all content under
apacheserver.alphainside.com/. When a cookie expires, the search appliance uses the stored
crawler account to request a new session cookie.

12. Next, to get the controlled-access content crawled and indexed, Tanya opens Crawl and Index >

Crawl URLs.

13. Tanya clicks in the box for Start Crawling from the Following URLs and adds the following URL

patterns:

http://comp.alpha.int/

https://pers.def.int/

http://insidealpha.com/

https://apacheserver.alphainside.com/

14. Tanya also adds these URL patterns in the Follow and Crawl Only URLs with the Following

Patterns box and clicks Save URLs to Crawl.

15. To check that the crawling system is currently running, Tanya opens Status and Reports > Crawl

Status. The crawl status indicates that the crawl system is running.

Now that the search appliance has access to all this protected content, it can populate the index, as
described in the following section.

Populating the Index with Controlled-Access Content

During crawl, the search appliance goes through each of the content sources that have been configured,
and obtains the controlled-access content by using the HTTP Basic Authentication credentials
configured on Crawl and Index > Crawler Access and the forms authentication credentials configured
Crawl and Index > Forms Authentication.

For content on comp.alpha.int, which is protected by HTTP Basic Authentication:

1.

The search appliance connects to http://comp.alpha.int/.

2.

The web server asks for credentials using HTTP Basic Authentication.