beautypg.com

Document feeds successfully but then fails – Google Search Appliance Feeds Protocol Developers Guide User Manual

Page 35

background image

Google Search Appliance: Feeds Protocol Developer’s Guide

35

Document Feeds Successfully But Then Fails

A content feed reports success at the feedergate, but thereafter, reports the following document feed
error:

Failed in error
documents included: 0
documents in error: 1
error details: Skipping the record, Line number: nn,
Error: Element record content does not follow the DTD, Misplaced metadata

This error occurs when a metadata element contains a content attribute with an empty string, for
example:

If the content attribute value is an empty string:

Remove the meta tag from the metadata element, or:

Set the value of the content attribute to show that no value is assigned. Choose a value that is not
used in the metadata element, for example, _noname_:

You can then use the inmeta search keyword to find the attribute value in the fed content, for
example:

inmeta:tags~_noname_

Fed Documents Aren’t Updated or Removed as Specified
in the Feed XML

All feeds, including database feeds, share the same name space and assume that URLs are unique. If a
fed document doesn’t seem to behave as directed in your feed XML, check to make sure that the URL
isn’t duplicated in your other feeds.

When the same URL is fed into the system by more than one data source, the system uses the following
rules to determine how that content should be handled:

If the URL is referenced by a web feed and a content feed, the URL’s content is associated with the
data source that crawled the URL last.

If the URL is referenced by more than one content feed, the URL’s content is associated with the
data source that was responsible for the URL’s last update.

If the URL is referenced in the Admin Console’s list of Crawl URLs and a content feed, the URL’s
content is associated with the content feed. The search appliance will not recrawl the URL until the
content feed requests a change. To return the URL to its original status, delete the URL from the
feed that originally pushed the document to the index.

If the URL has already been crawled by the search appliance, and is then referenced in a web feed,
the search appliance immediately injects the URL into the queue to be recrawled as if it were a new,
uncrawled URL. The URL’s Enterprise PageRank is not affected. However, the change interval is
reset to the default until the crawl scheduler process next runs.