Using both a whitelist and a blacklist of urls, Using only a whitelist of urls, Segmenting data in the search index – Google Search Appliance Creating the Search Experience User Manual
Page 83

Google Search Appliance: Creating the Search Experience
Best Practices
83
To keep the search index clean, you can use both methods together or simply use a whitelist. Generally,
using only a whitelist is more effective than using only a blacklist. The following sections describe the
advantages and disadvantages of both approaches. For information about setting up URLs for crawling
your content, refer to Preparing Data for a Crawl in Administering Crawl.
If you have a search appliance that you use for testing, test your crawl patterns first and then deploy
them if the quality of the search results has improved.
Using Both a Whitelist and a Blacklist of URLs
You can set up a crawl with a whitelist of URLs. After the search index has been created, you can prune
unwanted URLs from the search index using a blacklist. Starting from the comprehensive list of start
URLs and then pruning them has the benefit of ensuring that the crawler has found every document in
your organization.
Using Only a Whitelist of URLs
An alternate approach is to set up a crawl with a whitelist that contains only URLs that you know to be
valuable. This doesn’t mean you should get too specific and place specific low-level folders and
documents in the Crawl list, but it does mean you should be cautious of what a large root node might
bring into the index.
The benefit of this approach is that the index will not be bloated to include documents that may be
unnecessary to index. The disadvantage of this approach is you need to be especially careful to include
every start URL that might be of value.
Segmenting Data in the Search Index
User searches can be more efficient when they are restricted to subsets of the entire search index.
As an administrator, you can help users to search more efficiently by using collections. A collection is a
segment of the complete search index that you define by specifying URL patterns to include.
Using collections, you can show different results to different users. For example, suppose you want to
create different collections, such as:
•
“Engineering,” for technical users and other user who need to search for engineering documents
•
“Sales,” for sales staff to search for sales documents
•
“Marketing,” for marketing staff to search for marketing documents
•
“Corporate Policies,” for any staff to search for policy documents
•
“Europe Offices,” for users who are geographically located in the European offices
To search a collection, a user can select a collection from a pull-down menu on the search box, as
illustrated in the following figure.