Using Globs
m (→Replacements) |
m (→Replacements) |
||
Line 106: | Line 106: | ||
Notice that "<tt>$</tt>" is a meta character ''only in the replacement''. | Notice that "<tt>$</tt>" is a meta character ''only in the replacement''. | ||
+ | |||
+ | Also notice that when combining replacements with includes-excludes (see above), replacements are specific for ''each'' include (while it does not make any sense to use replacements with excludes, though). | ||
=== Conclusion === | === Conclusion === |
Latest revision as of 19:27, 19 July 2015
Contents |
[edit] Introduction
Surely you have used globs before (maybe even without knowing that they call it "globs"):
*.txt dir/*.doc execut?r.txt
Globs are a widely spread concept that is used throughout the UNIX, MICROSOFT and other "worlds".
All implementations support at least the following elements:
Construct | Matches |
---|---|
? | Any character except the file separator ("/ " and/or "\ ")
|
* | Zero or more characters except the file separator |
x | The character x |
Some implementations add more features:
Construct | Matches |
---|---|
[abc] | Characters "a", "b" and "c" |
[^abc] or [!abc] |
Any character except "a", "b" and "c" |
[A-Za-z] | Any (latin) letter |
** | Zero or more characters (including the file separator) |
\X | "X", even if "X" is a meta character |
Since version 1.7 JAVA provides its own implementation of "glob matching", which adds another (quite uncommon) construct:
Construct | Matches |
---|---|
{alpha,beta,gamma} | Any of "alpha", "beta" or "gamma" |
All in all, globs are very practical, though not very powerful. This is why de.unkrig.commons
provides yet another implementation, as follows:
[edit] Regular expressions features
First, de.unkrig.commons.text.pattern.Pattern2
adds the full power of JAVA regular expressions. It does so by modifying a few characters in the glob before feeding it to java.util.text.pattern.Pattern.compile()
, making that effectively a glob compiler:
Glob construct | Regex construct | Matches |
---|---|---|
? | [^/\\!]* | Any character except the file separator and "!" |
* | [^/\\!]* | Zero or more characters except the file separator and "!" |
** | [^!]* | Zero or more characters except "!" |
*** | .* | Zero or more characters |
. | \. | The dot is a literal (not a character class as in a regular expression) |
If you have not yet worked with regular expressions: JAVA regular expressions are very powerful and introduce many constructs and meta characters. Find the complete reference documentation here.
Since "?" and "*" are not quantifiers as in regular expression ("?" == zero-or-one, "*" == zero-or-more), one has to use "{0,1}" and "{0,}" instead. The alternative notation for "." ("any character") is "[^]".
Find the reference documentation here.
[edit] Includes-Excludes
Second, it adds includes-excludes. This involves two new meta characters, ",
" and "~
":
Construct | Matches |
---|---|
pattern1,pattern2 | Any string that matches pattern1 or pattern2 |
pattern1~pattern2 | Any string that matches pattern1, but not pattern2 |
~pattern1 | Any string that does not match pattern1 |
pattern1,pattern2~pattern3~pattern4,pattern5 | Any strings that matches pattern1 or pattern2, but not pattern3 nor pattern4, or pattern5 |
This may sound complicated, but the very simple rule is: The patterns are applied right-to-left, and the first match determines the result.
This comes with de.unkrig.commons.text.pattern.Glob.compile() and the new INCLUDES_EXCLUDES compilation flag.
[edit] Replacements
Third, it adds replacements. This involves one new meta character, "=":
Construct | Matches | Replaces with |
---|---|---|
*.c=$0.bak | Any string ending with ".c" | The original string plus ".bak" ("$0" represents the "entire match") |
(*).(*)=$1.$2$2 | Any string containing a dot | The original string, with the file name extension doubled |
This comes with the Glob.replace() API.
Notice that "$" is a meta character only in the replacement.
Also notice that when combining replacements with includes-excludes (see above), replacements are specific for each include (while it does not make any sense to use replacements with excludes, though).
[edit] Conclusion
All these features can be combined mercilessly, e.g.:
*=$0$0~*.bak (*).docx{0,1}=$1.txt~*blabla*
Using it is simple:
import de.unkrig.commons.text.pattern.Glob; import de.unkrig.commons.text.pattern.Pattern2; Glob glob = Glob.compile("*.c=$0.C,*.h=$0.H", Pattern2.WILDCARD | Glob.INCLUDES_EXCLUDES | Glob.REPLACEMENT); glob.match("foo.c"); // returns true glob.replace("foo.h"); // returns "foo.H" glob.replace("foo.cpp"); // returns null
Remember: The Pattern2.WILDCARD modifies the regex compilation to understand wildcard characters. Glob.INCLUDES_EXCLUDES activates the recognition of "," and "~". Finally, Glob.REPLACEMENT activates the recognition of "=".