Several people doesn't understand the way badword masks work. The documentation of UnrealIRCd hardly describes them, and since there are many problems with the usage of badwords, I've decided to write a small documentation.
Basically badword::word is the string to match upon a channel, private or quit message. For instance, badword message { word "stupid"; }; results that if you send "two stupid dogs" to a +G user, he/she will receive "two <censored> dogs".
The Unreal coders designed the most convenient way to specify either a simple badword expression or a very difficult one. There are two types of expressions: a fast badword replace and regular expressions (regexes in short). UnrealIRCd uses the POSIX-compliant TRE regex library. Later you will know how to choose an expression type that is suitable for your needs.
This type accepts only English alphabetical characters and * in the beginning or the end of text. In other words, the word, *word, word* and *word* forms are accepted. If the asterix (*) is in the middle of the expression (wo*rd), it's not a fast badword replacement expression anymore! In addition, the briefness of the character set of fast badword replacement is a great problem for texts containing non-English and numerical characters.
The syntax of regular expressions are totally different. Due to that it's hard to understand them, however it has much less limitations than fast badword replacement. Because the goal of this documentation is not to teach regular expressions, I'll give you some examples only too see how they work.
| Expression | Example | Description |
|---|---|---|
| ^who | who is who | ^ = search in the beginning of text |
| who$ | who is who | $ = search in the end of text |
| \$25$ | $25 | \ = escape special characters |
| d.g | dog | . = any character |
| O\.K\. | O.K. | \. = dot character |
| o[uw] | How do you do? | [] = list of characters |
| [C-K] | ABCDEFGHIJKLMN | - = range of characters |
| [^CGLM] | ABCDEFGHIJKLMN | [^] = skip characters |
| (on|ues|rida) | Monday Tuesday Friday | (|) = alternating text |
| a*b | aabc abc bc | * = match 0 or more of the previous character |
| a+b | aabc abc bc | + = match 1 or more of the previous character |
| a?b | aabc abc bc | ? = match 0 or 1 of the previous character |
| .* | -@- *** -- "*" -- *** -@- | .* = 0 or more of any character |
| \?{3} | what?? when??? how? | {} = matches the previous character a given number of times |
| \?{2,3} | what?? when??? how? | {x,y} = matches the previous character minimally x, maximally y times |
| [[:alnum:]] | !%+123/()"abc[]@ | [[::]] = character class |
More information: http://kouli.iki.fi/~vlaurika/tre/syntax.html. You can also find other documents on POSIX regexes with the Google search engine.
Expression type is automatically choosen. That means, if the characters of the given expression doesn't fit the character set of fast badword replacement, then regex is choosen, and vice-versa. So be careful what you type in a badword::word directive!