Excluding controlled-access content from the index – Google Search Appliance Managing Search for Controlled-Access Content User Manual
Page 49

Google Search Appliance: Managing Search for Controlled-Access Content
49
2.
Scroll down to Authorization SPI and enter the connection information for your Policy Decision
Point.
•
Under Authorization Service URL, enter the URL that the search appliance should use when
sending SAML Request messages to the Policy Decision Point. For example, https://
server.domain.com:8443/SAML/services/AuthZConnector. The search appliance
determines Authorization by issuing an
messages sent to the Artifact Service URL.
•
To prevent the search appliance from displaying a prompt to users when they search for secure
content (since you are passing the responsibility for authorization verification over to the Policy
Decision Point, which will display its own prompt), select Disable prompt for Basic
authentication or NTLM authentication. Note that this checkbox is only visible when the
index contains content that uses HTTP Basic or NTLM HTTP authentication.
•
Set appropriate Authorization Parameters to specify timeout values for communication
between the search appliance and the Policy Decision Point.
3.
Click Save Settings.
How to Exclude Controlled-Access Content Sources
from Search
When you assign credentials that allow a search appliance to crawl and index controlled-access content,
it’s important to consider whether the content source includes content that you don’t want anyone to
see. The best way to ensure that private content is never shown in search results is to exclude all private
content sources from the index. Examples of controlled-access content that should be excluded from
crawl and indexing include:
•
Draft working directories that contain unreviewed content.
If the search appliance has access to all directories on a server, you can find that your index
contains unfinished documents that aren’t meant for review. To ensure that your site users are
comfortable placing content on servers that are indexed, consider creating “no crawl” directories for
their rough work, and configure the search appliance to exclude all such directories from the index.
•
Highly sensitive materials that should never be discovered during search.
Because the search appliance checks for authentication and authorization before serving results, it
will never show secure results to a user who does not have authorization to view the documents.
Despite this, you may have some materials that are so sensitive that they require additional care.
Excluding Controlled-Access Content from the Index
To exclude private content from the index, use one or both of these methods:
•
Configure your content server to define a user policy that prohibits the search appliance account
from accessing those directories.
•
In the Admin Console, go to Crawl and Index > Crawl URLs. Scroll down to Do Not Crawl URLs
with the Following Patterns and enter a pattern for each URL that corresponds to private
content. Any content that matches the patterns under Do Not Crawl URLs with the Following
Patterns is excluded from the index.