Xml output overview, Character encoding conventions, Google xml results dtd – Google Search Appliance Protocol Reference User Manual
Page 54

Google Search Appliance: Search Protocol Reference
Results Format
54
XML Output Overview
For maximum flexibility, Google provides search results in XML format. Using the Google XML results,
you can use your own XML parser to customize the display for your search users. If you are using an XSL
stylesheet to transform the XML results instead of developing your own XML parser, proceed to
“Custom HTML” on page 52.
Notes:
•
Element values are valid HTML and are suitable for display, unless otherwise noted in the XML tag
definitions. Some values are URLs and must be HTML-encoded to be displayed.
•
To remain forward-compatible, your XML parser that parses Google search results should ignore
attributes or tags that are not documented. By ignoring unknown tags, your custom XML parser
can continue working without modification when Google adds more features to the XML output in
the future.
•
For custom parameters that contain spaces, each space is replaced with “_”. You can still retrieve the
unmodified value from the original_value attribute. For example:
Character Encoding Conventions
The first line of the XML results indicates which character encoding is used. See XML Standard for
information about character encoding (
).
Certain characters must be escaped when they are included as values in XML tags. These characters are
documented in XML Standard (
shown in the table that follows. All other characters in the XML results are presented without
modification.
Google XML Results DTD
Google XML results can be returned with or without a reference to the most recent DTD (Document
Type Definition) describing Google’s XML format. The DTD is a guide to help search administrators and
XML parsers understand the XML results output. Because Google’s XML grammar may change from
time to time, do not configure your parser to use the DTD to validate the XML results.
XML parsers should not be configured to fetch the DTD every time a search request is performed.
Because the DTD is updated infrequently, these fetches create unnecessary delay and bandwidth
requirements.
Character
Escaped Form
<
either < or <
&
either & or &
>
either > or >
’
either ' or '
"
either " or "