Getting crawled document status – Google Search Appliance Administrative API Developers Guide: Protocol User Manual
Page 39

Google Search Appliance: Administrative API Developer’s Guide: Protocol
39
Getting Crawled Document Status
Get the status for documents that have been crawled for a collection.
To retrieve detailed information for a document, send an authenticated GET request to a document
entry of the diagnostics feed.
http://Search_Appliance:8000/feeds/diagnostics/
http%3A%2F%2Fserver.com%2Fsecured%2Ftest1%2Fdoc_0_2.html
A detailed document status entry is returned with the following properties.
Parameter
Description
collectionName
Name of the collection for which you want to list the document status. The
default value is the last used collection.
Property
Description
<Entry Name>
The URL of a document.
backwardLinks
The number of backward links for the document.
collectionList
The list of collections that contain the document.
contentSize
The size of the document content.
contentType
The type of the document.
crawlFrequency
The frequency at which the document is being scheduled to crawl, with
possible values of seldom, normal, and frequent.
crawlHistory
A multi-line history of the document crawl including the timestamp when
the document was crawled, the document status code and description in
the following format:
timestamp
status_code
status_description
timestamp
status_code
status_description
For status code values, see “Document Status Values” on page 34.
currentlyInflight
If the document is currently in process.
date
The date that the document was indexed.
forwardLinks
The number of forward links for the document.
isCached
If a cached page for the document is indexed.
lastModifiedDate
The last modified date of the document.
latestOnDisk
The timestamp of the version being served.