beautypg.com

Removing feed content from the index, Time required to process a feed – Google Search Appliance Feeds Protocol Developers Guide User Manual

Page 31

background image

Google Search Appliance: Feeds Protocol Developer’s Guide

31

For content feeds, the content is provided as part of the XML and does not need to be fetched by the
crawler. URLs are passed to the server that maintains Crawl Diagnostics in the Admin Console. This will
happen within 15 minutes if your system is not busy. The feeder also passes the URLs and their
contents to the indexing process. The URLs will appear in your search results within 30 minutes if your
system is not busy.

Removing Feed Content From the Index

There are several ways of removing content from your index using a feed. The method used to delete
content depends on the kind of feed that has ownership.

For content feeds, remove content by performing one of these actions:

Push the URL as part of an incremental feed, using the “delete” action to remove the content. This is
the fastest way to remove content. URLs will be deleted within about 30 minutes.

Remove the URL from the feed and perform a full feed. Because a full feed overwrites the earlier
feed contents, any URLs that are omitted from the new full feed will be removed from the index.
The content is deleted within about 30 minutes.

Remove the data source and all of its contents. To remove a data source, log into the Admin
Console and open the Content Sources > Feeds page. Choose the data source that you want to
remove and click Delete. The contents will be deleted within about 30 minutes. The Delete option
removes the fed documents from the search appliance index. The feed is then marked Delete in the
Admin Console.

After deleting a feed, you can remove the feed from the Admin Console Feed Status page by clicking
Destroy.

For web and metadata-and-URL feeds, remove content by performing one of these actions:

In the XML record for the document, set action to delete. The action="delete" feature works
for content, web, and metadata-and-URL feeds.

Remove the URL from the web server. The next time that the URL is crawled, the system will
encounter a 404 status code and remove the content from the index.

Specify a pattern that removes the URL from the index. For example, add the URL to the Do Not
Follow Patterns list. The URL is removed the next time that the feeder delete process runs.

Note: If a URL is referenced by more than one feed, you will have to remove it from the feed that owns
it. See the Troubleshooting entry “Fed Documents Aren’t Updated or Removed as Specified in the Feed
XML” on page 35
for more information.

Time Required to Process a Feed

The following factors can cause the feeder to be slow to add URLs to the index:

The feed is large.

The search appliance is currently using a lot of resources to crawl other documents and serve
results.

Other feeds are pending.

In general, the search appliance can process documents that are pushed as content feeds more quickly
than it can crawl and index the same set of documents as a web feed.