beautypg.com

Crunch CRiSP File Editor 6 User Manual

Page 76

background image

Page 76

Keyword Regular Expressions

Colorization:keyword regular expressions

Keyword regular expressions are a

way of allowing generic styles of strings to be specified as keywords or string constants, etc. In a language
such as C, it is obvious that something like "case" specifies a keyword, but it is more difficult to describe a
string literal because although the start and end of the literal are well defined, anything can occur inside the
quote marks delimiting the string.

CRiSP allows limited regular expressions to be used to define this vagueness in a keywords definition.
These are not full-featured expressions as you can use in the normal search and replace.

The following describes the character sequences which can be used in a keyword.

^

If this character is used at the start of a keyword, then that keyword will only be recognised at the
start of a line. For example, Fortran comments (when using the C notation are restricted to starting
at the start of a line.

*

The asterisk operator has special meaning when following the start of a keyword. It is used to
indicate that anything can follow as the next character. It is used to avoid ambiguities. For example,
consider the Fortran C comment again. A Fortran comment using this notation starts at the
beginning of a line and because it is a comment, then anything can follow, even another C. CRiSP
would normally treat the string C and CC as two distinct keywords, so the normal rules wouldn't
work.

By specifying:

comment="^C*"

you would achieve the desired effect. However, this example is not totally correct. (See below).

.*

This sequence means 'any string of characters' it used when defining string constants or
comments, where you specify the starting character(s) and terminating character(s), but allow any
arbitrary text between these characters.

You can only use one occurrence of this expression in a keyword definition. You cannot define a
context sensitive keyword such as: ABC.*DEF.*GHI as that would require looking ahead to validate
the keyword.

$

This expression can be used at the end of a keyword definition. It is normally used in conjunction
with indefinite keywords, such as comments or string literals which can extend indefinitely. For
example the C++ // comment extends to the end of the line, so the definition is:

comment="//*.*$"

[ \t]* or [\t ]*

These two sequences are special. Although they look and feel like character class wild cards, they
have a very limited scope. Either one of the two forms may be used after a caret (^). It was added
to support C preprocessor directives which normally start at the beginning of a line. These
preprocessor directives can be preceded by an arbitrary amount of white space, so the leading
white space is taken into account when colorizing.

These sequences will only work in this restricted area.

Limitations of Colorization

Colorization:limitations

Colorization first appeared in CRiSP version 3. It has undergone

considerable changes in each subsequent release. The things which have consistently changed over the
releases is that more and more expressive power has been added to handle the vague corners of various
different languages.

For example, in the C programming language, a string constant consists of an opening quote (single or
double depending on whether it is a string constant or a character constant), and is followed by the text of
the string, with a matching single or double quote to terminate the string. If you want to include a quote