Using the utf-8 encoding, Including protected documents in search results – Google Search Appliance Feeds Protocol Developers Guide User Manual
Page 13
Google Search Appliance: Feeds Protocol Developer’s Guide
13
If the metadata is part of a feed, it must have the following format:
...
Note: The content= attribute cannot be an empty string (""). For more information, see “Document
Feeds Successfully But Then Fails” on page 35.
In version 6.2 and later, content feeds support the update of both content and metadata. Content feeds
can be updated by just sending new metadata.
Generally, robots META tags with a value of noindex, nofollow, or noarchive can be embedded in the
head of an HTML document to prevent the search appliance from indexing links or following them in
the document. However, robots META tags in a feed file are not honored, just the META tags in the
HTML documents themselves.
See the External Metadata Indexing Guide for more information about indexing external metadata and
examples of metadata feeds.
Metadata Base64 Encoding
Starting in Google Search Appliance version 6.2, you can base64 encode metadata using the
encoding="base64binary" attribute to the meta element. You can also base64 encode the metadata
name attribute, however, both the name and content attributes must be base64 encoded if this option
is used.
Note: Characters that be invalid XML characters feed correctly when encoded in base64.
For example:
content="Y2lyY2xlZ19yb2Nrcw=="/>
Using the UTF-8 Encoding
Unless you have content in legacy systems that must use a national character set encoding, such as
Shift_JIS, it is strongly recommended that all documents to be fed use the UTF-8 encoding. Do not
escape & if using numeric character references, for example, the & character in ラ should not be
XML encoded as ラ.
Including Protected Documents in Search Results
Feeds can push protected contents to the search appliance. If your feed contains URLs that are
protected by NTLM, Basic Authentication, or Forms Authentication (Single Sign-on), the URL record in
the feed must specify the correct type of authentication. You must also configure settings in the Admin
Console to allow the search appliance to crawl the secured pages.