1. Regular expressions: Think of these as special characters that can be used in searching for text strings in files or in editing sessions. They are contrasted with Shell file name expansion that only supplies pattern matching in the current directory, not in a text file. Special Characters Regular Expression Shell File Name Expansion ------------------------- ----------------------------------- . ? Any character in a single character position * 0 or more instances of the preceding character .* * 0 or more instances of any character [ ] [ ] Selected characters or ranges in a single character position [^ ] [! ] Any characters except the selected characters or ranges in a single character position ^x ^ An anchor to the beginning of the line. x must be the first character on the line. x$ $ An anchor to the end of the line. x must be the last character on the line. \ \ Makes the next character ordinary (not special). \( \) \1 Capture a pattern and store or retrieve from numbered buffer 1 (up to 9 numbered buffers) \{m\} Exactly m instances of the previous character \{m,\} At least m instances of the previous character \{,n\} At most n instances of the previous character \{m,n\} Between m and n instances (0 <= m <= n <= 256) of the previous character 2. Refer to the Appendix A on Regular Expressions in the Sobell Book. Create text while in vi and use the commands involving regular expressions e.g. use / ? or :s to exercise the examples given. 3. POSIX bracket Expressions (sometimes known as Character Classes) This is a special metasequence for use within a POSIX Bracket Expression [ ] An example of this is: [:lower:] which represents the class of lower case letters (relative to a locale) and is comparable to the character range: a-z. However, these colon delimited expressions are only valid inside the brackets, so we have: [[:lower:]] as depicting the [a-z] lower case letter range. The list of POSIX character classes that is usually supported (locale dependent) is indicated below: [:alnum:] alphabetic characters and numeric characters [:alpha:] alphabetic characters [:blank:] space and tab [:cntrl:] control characters [:digit:] digits [:graph:] non-blank (not spaces, control characters, etc.) [:lower:] lower case alphabetics [:upper:] upper case alphabetics [:print:] like [:graph:] but includes the space character as well [:punct:] punctuation characters [:space:] all whitespace characters ( {:blank:], newline, carriage return, etc.) [:xdigit:] digits allowed in a hexadecimal number: [0-9a-fA-F] 4. POSIX Bracket Expression Character Equivalents Some locales define character equivalents to indicate certain characters should be considered identical for sorting purposes. (e.g. a and a with an accent mark above it). This is referenced by = instead of : For example, all the kinds of 'a' in the locale's character equivalents would be depicted: [[=a=]] and represent all the kinds of 'a' in a single character position. In the absense of accented characters, [[=a=]] would default to [a] 5. POSIX Bracket Expression Collating Sequences. A locale can have a collating sequence to describe how certain characters or sets of characters should be treated for sorting purposes. A collating sequence that maps certain (sets of) characters to a single logical character is considered 'one character' for regular expression purposes. This would mean that both the special character . and the regular expression [^123] would match this single logical character. A collating sequence element can be included within a bracket expression using a '.' instead of a : or =. For example, the notation: torti[[.span-ll.]]a matches the word tortilla as does torti.a The spanish collating sequence matches the 'll' for tortilla rendered in English. Also the 'll' comes between the l and m of the English alphabet. A collating sequence lets you match against those characters that are made up of other character combinations. It also creates a situation where a bracket expression can match more than one physical character. Another example is of a german collating sequence, called [.ezret.] puts the german letter that looks like a script upper case B (between S and T in the german alphabet.Questions? Robert Katz: rkatz@ned.highline.edu