Question

Regular expression for excluding special characters

I am having trouble coming up with a regular expression which would essentially black list certain special characters.

I need to use this to validate data in input fields (in a Java Web app). We want to allow users to enter any digit, letter (we need to include accented characters, ex. French or German) and some special characters such as '-. etc.

How do I blacklist characters such as <>%$ etc?

 45  328951  45
1 Jan 1970

Solution

 59

I would just white list the characters.

^[a-zA-Z0-9äöüÄÖÜ]*$

Building a black list is equally simple with regex but you might need to add much more characters - there are a lot of Chinese symbols in unicode ... ;)

^[^<>%$]*$

The expression [^(many characters here)] just matches any character that is not listed.

2009-04-16

Solution

 11

To exclude certain characters ( <, >, %, and $), you can make a regular expression like this:

[<>%\$]

This regular expression will match all inputs that have a blacklisted character in them. The brackets define a character class, and the \ is necessary before the dollar sign because dollar sign has a special meaning in regular expressions.

To add more characters to the black list, just insert them between the brackets; order does not matter.

According to some Java documentation for regular expressions, you could use the expression like this:

Pattern p = Pattern.compile("[<>%\$]");
Matcher m = p.matcher(unsafeInputString);
if (m.matches())
{
    // Invalid input: reject it, or remove/change the offending characters.
}
else
{
    // Valid input.
}
2009-04-16