Google Search Appliance Feeds Protocol Developers Guide User Manual
Page 10
Google Search Appliance: Feeds Protocol Developer’s Guide
10
•
displayurl—The URL that should be provided in search results for a document. This attribute is
useful for web-enabled content systems where a user expects to obtain a URL with full navigation
context and other application-specific data, but where a page does not give the search appliance
easy access to the indexable content.
•
action—Set action to add when you want the feed to overwrite and update the contents of a URL.
If you don’t specify an action, the system performs an add. Set action to delete to remove a URL
from the index. The action="delete" feature works for content, web, and metadata-and-URL
feeds.
•
lock—The lock attribute can be set to true or false (the default is false). When the search
appliance reaches its license limit, unlocked documents are deleted to make room for more
documents. After all other remedies are tried and if the license is still at its limit, then locked
documents are deleted. For more information, see “License Limits” on page 32.
•
mimetype (required)—This attribute tells the system what kind of content to expect from the
content element. All MIME types that can be indexed by the search appliance are supported.
Note: Even though the feeds DTD (see “Google Search Appliance Feed DTD” on page 41) marks
mimetype as required, mimetype is required only for content feeds and is ignored for web and
metadata-and-url feeds (even though you are required to specify a value). The search appliance
ignores the MIME type in web and metadata-and-URL feeds because the search appliance
determines the MIME type when it crawls and indexes a URL.
•
last-modified—Content feeds only. Populate this attribute with the date time format specified in
RFC822 (Mon, 15 Nov 2004 04:58:08 GMT). If you do not specify a last-modified date, then the
implied value is blank. The system uses the rules specified in the Admin Console under Index >
Document Dates to choose which date from a document to use in the search results. The
document date extraction process runs periodically so there may be a delay between the time a
document appears in the results and the time that its date appears.
•
authmethod—This attribute tells the system how to crawl URLs that are protected by NTLM, HTTP
Basic, or Single Sign-on. The authmethod attribute can be set to none, httpbasic, ntlm, or
httpsso. If a value for authmethod is not specified and a protected URL is defined on the search
appliance, the default value for authmethod is the previously specified value for that URL. If the URL
has not been previously specified on the search appliance, then the default value for authmethod is
set to none. If you want to enable crawling for protected documents, see “Including Protected
Documents in Search Results” on page 13.
•
pagerank—Content feeds only. This attribute specifies the PageRank of the URL or group of URLs.
The default value is 96. To alter the PageRank of the URL or group of URLs, set the value to an
integer value between 68 and 100. Note that this PageRank value does not determine absolute
relevancy, and the scale is not linear. Setting PageRank values should be done with caution and with
thorough testing. The PageRank set for a URL overrides the PageRank set for a group.
•
crawl-immediately—For web and metadata-and-url feeds only. If this attribute is set to "true",
then the search appliance crawls the URL immediately. If a large number of URLs with crawl-
immediately="true" are fed, then other URLs to be crawled are deprioritized or halted until these
URLs are crawled. This attribute has no effect on content feeds.
•
crawl-once—For web feeds only. If this attribute is set to “true”, then the search appliance crawls
the URL once, but does not recrawl it after the initial crawl. crawl-once urls can get crawled again if
explicitly instructed by a subsequent feed using crawl-immediately.