Overview of the gsa connector for file systems – Google Search Appliance Connectors Deploying the Connector for File Systems User Manual
Page 4

Overview of the GSA Connector for File Systems
The Connector for File Systems enables the Google Search Appliance to crawl and index
content from Windows shares. A single connector instance can support a single Windows
share. The share can be a UNC path or a mapped drive. DFS links are fully supported by the
connector.
The Connector For File Systems submits URLs identifying files in the file system repository
to the GSA. These URLs point back to the connector, which services HTTP GET requests
from the GSA crawler.
The Connector For File Systems uses a graph traversal strategy, submitting a single URL
representing the root of the file system to the GSA in a metadata-and-url feed, then
returning URLs for all descendants of the root via crawl requests from the GSA.
The following process provides an overview of how the search appliance gets content from
the repository through the Connector for File Systems.
1. The Connector For File Systems generates a DocId identifying the root of the file
system to traverse.
2. The connector constructs a URL from the DocId and pushes it and the Access
Control List (ACL) of the file share to the search appliance in a metadata-and-URL
feed. Take note that this feed does not include the document contents.
3. The search appliance gets the URL to crawl from the feed.
4. The search appliance crawls the repository according to its own crawl schedule, as
specified in the GSA Admin Console. It crawls the content by sending GET requests
for content to the connector. If the content is in HTML format, the search appliance
follows links within the page.
5. The connector receives a crawl request from the GSA. If the requested DocId is a
regular file, the connector returns that file's contents to the GSA. It also includes the
file's ACL and some basic metadata in the response. If the requested DocId is for a
directory, the connector generates DocIds for each file and folder contained within
that directory. The connector then constructs an HTML document consisting of links
to URLs constructed from those DocIds. The connector returns the generated HTML
as the content and the directory's ACL as metadata.