Getting crawled document status – Google Search Appliance Administrative API Developers Guide: Protocol User Manual

Page 39

Google Search Appliance: Administrative API Developer’s Guide: Protocol

Getting Crawled Document Status

Get the status for documents that have been crawled for a collection.

To retrieve detailed information for a document, send an authenticated GET request to a document
entry of the diagnostics feed.

http://Search_Appliance:8000/feeds/diagnostics/

http%3A%2F%2Fserver.com%2Fsecured%2Ftest1%2Fdoc_0_2.html

A detailed document status entry is returned with the following properties.

Parameter

Description

collectionName

Name of the collection for which you want to list the document status. The
default value is the last used collection.

Property

Description

<Entry Name>

The URL of a document.

backwardLinks

The number of backward links for the document.

collectionList

The list of collections that contain the document.

contentSize

The size of the document content.

contentType

The type of the document.

crawlFrequency

The frequency at which the document is being scheduled to crawl, with
possible values of seldom, normal, and frequent.

crawlHistory

A multi-line history of the document crawl including the timestamp when
the document was crawled, the document status code and description in
the following format:

timestamp

status_code

status_description

timestamp

status_code

status_description

For status code values, see “Document Status Values” on page 34.

currentlyInflight

If the document is currently in process.

date

The date that the document was indexed.

forwardLinks

The number of forward links for the document.

isCached

If a cached page for the document is indexed.

lastModifiedDate

The last modified date of the document.

latestOnDisk

The timestamp of the version being served.