Appendix b: url encoding – Google Search Appliance Protocol Reference User Manual
Page 107

Google Search Appliance: Search Protocol Reference
Appendices
107
The underlined text in the message should be a hypertext link to submit the same search again with the
parameter filter=0. Google finds that this method of informing users about automatic document
filtering is effective. This method is used on the Google Internet search site.
If you are using OneBox modules to provide additional query results to your users, note that the results
served through a OneBox module are reported separately. The number of OneBox results are not
added to the number of standard results.
Appendix B: URL Encoding
Some characters are not safe to use in a URL without first being encoded. Because a Google Search
Appliance request is made by using an HTTP URL, the search request must follow URL conventions,
including character encoding, where necessary.
The HTTP URL syntax specifies that only alphanumeric characters, the special characters $-
_.+!*’(),and the reserved characters ;/?:@=& can be used as values within an HTTP URL request.
Since reserved characters are used by the search engine to decode the URL, and some special
characters are used to request search features, all non-alphanumeric characters used as a value to an
input parameter must be URL-encoded.
To URL-encode a string, replace each non-alphanumeric character with its hexadecimal ASCII value, in
the format of a percent sign (%) character followed by two hexadecimal digits. Such an ASCII value may
be referred to as an escape code. Spaces can be replaced by the plus sign (+) character for query
parameters except when requesting search results by meta name or values.
If you are using the search box on the search appliance, you single-encode the special characters $-
.+!*’(). Underscores (_) do not need to be URL-encoded in the search box.
If you are using special characters in a search query, you double-encode the special characters $-
.+!*’().
Underscores (_) do not need to be URL-encoded in the search box or in a search query.
Some input parameters require that the values passed to Google search are double-URL-encoded. This
requirement means that you must apply the URL encoding to the string twice in succession to generate
the final value. See the input parameter descriptions (“Search Parameters” on page 10) for more
information.
Special characters in a query are the ones described as query term separators (see “Special Characters:
Query Term Separators” on page 22) and meta tags names and values. Special characters within the
document content do not get indexed so they are not searchable. For example, an indexed document
containing a paragraph ending with “the *end” is not searchable using query “%2Aend” in the GSA
search box. Only ‘end’ is indexed.
For more information about URL encoding, see W3C
) and IETF
) web sites.