Using Globs

From unkrig.de
(Difference between revisions)
Jump to: navigation, search
m (Replacements)
m (Replacements)
 
Line 106: Line 106:
  
 
Notice that "<tt>$</tt>" is a meta character ''only in the replacement''.
 
Notice that "<tt>$</tt>" is a meta character ''only in the replacement''.
 +
 +
Also notice that when combining replacements with includes-excludes (see above), replacements are specific for ''each'' include (while it does not make any sense to use replacements with excludes, though).
  
 
=== Conclusion ===
 
=== Conclusion ===

Latest revision as of 19:27, 19 July 2015

Contents

[edit] Introduction

Surely you have used globs before (maybe even without knowing that they call it "globs"):

*.txt
dir/*.doc
execut?r.txt

Globs are a widely spread concept that is used throughout the UNIX, MICROSOFT and other "worlds".

All implementations support at least the following elements:

Construct Matches
? Any character except the file separator ("/" and/or "\")
* Zero or more characters except the file separator
x The character x

Some implementations add more features:

Construct Matches
[abc] Characters "a", "b" and "c"
[^abc]
or
[!abc]
Any character except "a", "b" and "c"
[A-Za-z] Any (latin) letter
** Zero or more characters (including the file separator)
\X "X", even if "X" is a meta character

Since version 1.7 JAVA provides its own implementation of "glob matching", which adds another (quite uncommon) construct:

Construct Matches
{alpha,beta,gamma} Any of "alpha", "beta" or "gamma"

All in all, globs are very practical, though not very powerful. This is why de.unkrig.commons provides yet another implementation, as follows:

[edit] Regular expressions features

First, de.unkrig.commons.text.pattern.Pattern2 adds the full power of JAVA regular expressions. It does so by modifying a few characters in the glob before feeding it to java.util.text.pattern.Pattern.compile(), making that effectively a glob compiler:

Glob construct Regex construct Matches
? [^/\\!]* Any character except the file separator and "!"
* [^/\\!]* Zero or more characters except the file separator and "!"
** [^!]* Zero or more characters except "!"
*** .* Zero or more characters
. \. The dot is a literal (not a character class as in a regular expression)

If you have not yet worked with regular expressions: JAVA regular expressions are very powerful and introduce many constructs and meta characters. Find the complete reference documentation here.

Since "?" and "*" are not quantifiers as in regular expression ("?" == zero-or-one, "*" == zero-or-more), one has to use "{0,1}" and "{0,}" instead. The alternative notation for "." ("any character") is "[^]".

Find the reference documentation here.

[edit] Includes-Excludes

Second, it adds includes-excludes. This involves two new meta characters, "," and "~":

Construct Matches
pattern1,pattern2 Any string that matches pattern1 or pattern2
pattern1~pattern2 Any string that matches pattern1, but not pattern2
~pattern1 Any string that does not match pattern1
pattern1,pattern2~pattern3~pattern4,pattern5 Any strings that matches pattern1 or pattern2, but not pattern3 nor pattern4, or pattern5

This may sound complicated, but the very simple rule is: The patterns are applied right-to-left, and the first match determines the result.

This comes with de.unkrig.commons.text.pattern.Glob.compile() and the new INCLUDES_EXCLUDES compilation flag.

[edit] Replacements

Third, it adds replacements. This involves one new meta character, "=":

Construct Matches Replaces with
*.c=$0.bak Any string ending with ".c" The original string plus ".bak" ("$0" represents the "entire match")
(*).(*)=$1.$2$2 Any string containing a dot The original string, with the file name extension doubled

This comes with the Glob.replace() API.

Notice that "$" is a meta character only in the replacement.

Also notice that when combining replacements with includes-excludes (see above), replacements are specific for each include (while it does not make any sense to use replacements with excludes, though).

[edit] Conclusion

All these features can be combined mercilessly, e.g.:

*=$0$0~*.bak
(*).docx{0,1}=$1.txt~*blabla*

Using it is simple:

import de.unkrig.commons.text.pattern.Glob;
import de.unkrig.commons.text.pattern.Pattern2;

Glob glob = Glob.compile("*.c=$0.C,*.h=$0.H", Pattern2.WILDCARD | Glob.INCLUDES_EXCLUDES | Glob.REPLACEMENT);
glob.match("foo.c");     // returns true
glob.replace("foo.h");   // returns "foo.H"
glob.replace("foo.cpp"); // returns null

Remember: The Pattern2.WILDCARD modifies the regex compilation to understand wildcard characters. Glob.INCLUDES_EXCLUDES activates the recognition of "," and "~". Finally, Glob.REPLACEMENT activates the recognition of "=".

Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox