Fed documents aren’t appearing in search results – Google Search Appliance Feeds Protocol Developers Guide User Manual
Page 34

Google Search Appliance: Feeds Protocol Developer’s Guide
34
Fed Documents Aren’t Appearing in Search Results
Some common reasons why the URLs in your feed might not be found in your search results include:
1.
The crawler is still running. Wait a few hours and search again. For large document feeds containing
multiple non-text documents, the search appliance can take several minutes to process all of the
documents. You can check the status of a document feed by going to the Content Sources > Feeds
page. You can also verify that the documents have been indexed by going to Index > Diagnostics >
Index Diagnostics and browsing to the URL, or entering the URL in “URLs starting with.”
Documents that are fed into the search appliance can show up in Crawl Diagnostics up to 15
minutes before they are searchable in the index.
2.
The URLs were removed by an exclusion pattern specified under Content Sources > Web Crawl >
Start and Block URLs. See “URL Patterns” on page 30.
3.
The URLs were removed by a full feed that did not include them. See Removing Feed Content From
the Index.
4.
The URLs don’t match the pattern for the collection that you were searching. Check the patterns for
your collection under Index > Collections. Make sure that the collection specified in the upper right
hand corner of the Crawl Diagnostics page contains the URL that you are looking for.
5.
The URLs are listed in multiple feeds. Another feed that contains this URL requested a delete
action.
6.
A metadata-and-URL feed was submitted with the feedtype element set to incremental or full.
Incremental can only be used on a content feed. If this is the case, the feed is treated as a content
feed and not crawled. Once a URL is part of a content feed, the feed is not recrawled even if you
later send a web or metadata feed. If you run into this issue, remove the URL from the URL pattern
(or click the Delete link on the feeds page) and after the feed URLs have been deleted, put the URL
patterns back, and send a proper metadata-and-url feed.
7.
The documents were removed because your index is full. See “License Limits” on page 32.
8.
The feed that you pushed was not pointing to a valid host. Verify that the feed has an FQDN (fully
qualified domain name) in the host part of the URL.
9.
More relevant documents are pushing the fed URL down in the list. You can search for a specific
URL with the query info:[url] where [url] is the full URL to a document fed into the search
appliance. Or use inurl:[path] where [path] is part of the URL to documents fed into the search
appliance.
10. The fed document has failed. In this scenario, none of the external metadata fed by using a content
feed or metadata-and-URL feed would get indexed. In the case of metadata-and-URL feeds, just the
URL gets indexed without any other information. For additional details about the failure, click Index
> Diagnostics > Index Diagnostics.
11. The URLs are on a protected server and cannot be indexed. See “Including Protected Documents in
12. The URLs are on a protected server and have been indexed, but you do not have the authorization
to view them. Make sure that &access=a is somewhere in the query URL that you are sending to the
search appliance. See “Including Protected Documents in Search Results” on page 13.
13. You did not complete the upgrade from a previous version and are still running in “Test mode” with
the old Index. Review the Update Instructions for the current version of the software, and make sure
that you have accepted the upgrade and completed the update process.