beautypg.com

Google Search Appliance Deployment Governance and Operational Models User Manual

Page 13

background image

13


Consideration

Comments

Users and user types affected by
the content integration

Aim for platforms that have the highest impact first.

Ease of integration

In order of increasing difficulty of integration: direct crawl, connector
available, crawl through proxy, feed needs to be developed,
connector needs to be developed.

Security authentication and
authorization mechanism
required

Systems that integrate with security mechanisms that are directly
supported by the GSA provide the easiest integration.

Systems that require custom auth integration are more difficult to
integrate.

Search front end customization
required to provide the most
value to the search experience
of the platform

There are times when providing the most value out of content
integration with the GSA requires some front end modifications to
expose such things as:

filters

categories

metadata in search results

custom navigation processes

In order of increasing difficulty of integration: default XSLT, minor
XSLT modifications, major XSLT modifications, custom application
parsing and displaying the XML provided by the GSA.

Maturity of metadata available in
the content source

Although metadata is not required for indexing, it can help in terms
of enriching the content in the index and by giving you more options
for shaping the search experience.

Some content sources, by their nature, have metadata available
“out-of-the-box,” while other sources require adhering to a process
at publish time.

Also consider augmenting documents with metadata at index time
programmatically through a custom process, if desired.

Augmenting document metadata
through Entity Recognition

The Entity Recognition feature of the GSA can be used to enrich
content with entities extracted through the definition of entity rules
based on dictionaries or regular expressions.

These

rules,

when defined, will tag the documents with metadata at

index time. This feature can be used to assign metadata to
documents, which may otherwise not be tagged with metadata.