Listing crawled documents – Google Search Appliance Administrative API Developers Guide: Protocol User Manual

Page 36

Google Search Appliance: Administrative API Developer’s Guide: Protocol

Listing Crawled Documents

Query parameters:

To list documents, send an authenticated GET request to root entry of diagnostics feed.

http://Search_Appliance:8000/feeds/

diagnostics?uriAt=http%3A%2F%2Fserver.com%2Fsecured%2Ftest1

Returns a description entry, a set of documents status entries and a set of directories status entries.

Description entry properties:

Unhandled content type

No filter for content type

Robots.txt forbidden

Parameter

Description

collectionName

Name of the collection that you want to list. The default value is the last
used collection.

flatList

false: List the files and directories that directly belong to an indicated URI.
true: List all files starting with an indicated URI as a flat list. The default
value is false.

negativeState

false: Just return documents with a status that is equal to view. true : Just
return documents with a status that is not equal to view. The default value
is false.

pageNum

The page you want to view. The files from a URI may be separated into
several pages to return. The page number starts from 1. The default value is
1, the first page.

sort

The key field of sorting. host: sort by host name, file: sort by file name,
crawled: sort by crawled doc number, errors sort by errors number,
excluded sort by excluded doc number. The default value is "".

uriAt

The prefix of the URI of the documents that you want to list. If not blank, it
must contain at least http://hostname.domain.com/. The default value is
"".

view

A filter of the document status. The values of view are described in the
section “Document Status Values” on page 34. The default value is all.

Property

Description

<Entry Name>

description

numPages

The total number of pages to return.

uriAt

The prefix of the URL taken from the query parameters.

Value

Description