beautypg.com

Crawl urls – Google Search Appliance Administrative API Developers Guide: .NET User Manual

Page 8

background image

Google Search Appliance: Administrative API Developer’s Guide: .NET

8

“Document Status” on page 21

Crawl URLs

Retrieve and update crawl URL patterns on a search appliance using the crawlURLs entry of the config
feed.

Retrieving Crawl URLs

Retrieve information about the URL patterns that the search appliance is crawling as follows:

// Send a request and print the response
GsaEntry myEntry = myService.GetEntry("config", "crawlURLs");
Console.WriteLine("Start URLs: " + myEntry.GetGsaContent("startURLs"));
Console.WriteLine("Follow URLs: " + myEntry.GetGsaContent("followURLs"));
Console.WriteLine("Do Not Crawl URLs: " + myEntry.GetGsaContent
("doNotCrawlURLs"));

Updating Crawl URLs

Update the crawl URL settings on a search appliance as follows—in the example that follows,
example.com is requested for crawling, and spreadsheets are requested to not be crawled.

// Create an entry to hold properties to update
GsaEntry updateEntry = new GsaEntry();

// Add a property for adding crawl URLs to updateEntry
updateEntry.AddGsaContent("startURLs", "http://www.example.com/");
updateEntry.AddGsaContent("followURLs", "http://www.example.com/");
updateEntry.AddGsaContent("doNotCrawlURLs", ".xls$");

// Send the request
myService.UpdateEntry("config", "crawlURLs", updateEntry);

Property

Description

doNotCrawlURLs

Do Not Crawl URLs with the following patterns, separate multiple URL
patterns with new line delimiters.

followURLs

Follow and crawl only URLs with the following URL patterns, separate
multiple URL patterns with new line delimiters.

startURLs

Start crawling from the following URLs, separate multiple URL patterns
with new line delimiters.