beautypg.com

7 troubleshoot connectors, Debug a connector by using a web browser, Troubleshooting scenario – Google Search Appliance Connectors Administration Guide User Manual

Page 33: Troubleshoot connectors

background image


7 Troubleshoot Connectors


Connectors 4.0 provide several options for troubleshooting issues, including:

Connector Dashboard

for checking the status of feeds and document retrieval

Logs on connector machine

for checking messages about thread processing

Search appliance index diagnostics

for checking crawl status

Search appliance real-time diagnostics

for checking HTTP headers for a specific URL

at any time without having to wait for the crawler to ingest it

Web browser

with the connector host


Additionally, you can troubleshoot issues by examining URL-and-metadata feed files.
Because these types of feed files are relatively small, troubleshooting them does not
require significant effort.

Debug a connector by using a web browser

A connector, by default, will deny all document accesses, except from the search appliance.
To allow debugging and testing a connector by using a browser without a search appliance,
you can add a hostname to the server.fullAccessHosts configuration option to allow
that computer full access to all connector content.

In addition, this setting allows that computer to see metadata and other GSA-specific
information as HTTP headers. This capability can be very useful when combined with
Firebug or the Web Inspector in your browser to observe a connector's behavior.

Troubleshooting scenario

In this scenario, users cannot find a specific document in search results, even though it is
assumed to be in the search appliance index. To troubleshoot this issue, the administrator
can track the document through the system by following the path a document takes to get
into the search appliance index.

The administrator might perform one or more of the following steps:

1. Make sure that the search appliance is set to follow and crawl the Connector's URLs

by checking the Content Sources > Web Crawl > Start and Block URLs page in the
Admin Console.