Javaranch rocks at the Software Development ConferenceKathy Sierra, Apil 2003Imagine a huge ballroom filled to capacity with developers (including more than a handful of alpha geeks and techno-luminaries). Two projection screens, each as large as a two-story building, are at the front. A hush falls over the crowd as the finalists for the Jolt Cola Award in the Web Sites category are displayed. IBM, BEA, Microsoft, Javararanch, ... what? Javaranch? Up against the likes of the big corporate developer sites? Yes, there was the name Javaranch in a font that had to be at least 7000 points. There was a drum roll... (no, seriously, there really *was* a drum roll) and the winners of the Software Development Productivity Awards were announced. Javaranch is a winner! The crowd goes wild. So we didn't win the coveted Jolt Cola award, but what's that about anyway? Javaranch is all about PRODUCTIVITY, not beverages with an illegal amount of caffeine. Most importantly, Javaranch was a finalist in a category of huge, big-name, big-budget developer sites. And here we are, 100% volunteer. Paul couldn't make it, so I was there to accept the award for him, and if we'd won the Jolt award, I would have had to give a short acceptance speech. Here's what I would have said: "Javaranch is an all-volunteer site. There's no company, no budget, no "Press three for worldwide marketing" on the phone system. There's no phone. The value of javaranch is the hundreds of people who run the site, man the site, and moderate the forums. The value of javaranch is in the hundreds of thousands who participate, asking and answering questions to help one another out. This award is for all those individuals, and to Java -- for being a language that inspires that much passion. You'll never see this much enthusiasm at a .netRanch." (OK, I wouldn't have said that last thing, but I would have been thinking it strongly enough to telepathically convey it to at least the entire front row of the audience).I was intensely proud to be there representing Javaranch. Return to Top |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
Win the hottest new Java teaching book on the blockHead First JavaKathy Sierra and Bert Bates have kindly donated a signed, yes signed, copy of their latest book, Head First Java, to the 1st person that posts the correct solution to their JavaCross crossword from chapter 10 of their book. If you wish to win the book you must post your attempt in the thread JavaCross in the Java in General (beginner) forum here on the JavaRanch. The thread will be started by myself, Johannes de Jong, on April the 3rd. I'll attempt to start the thread as close to 07:00 am server time as possible. So set your alarm or try and stay awake, depending on where in the world you are, as the post time in the thread will determine the winner. If you edit your reply to change your attempt, the entry becomes invalid. If you want to change your attempt, post a new attempt. Good luck
Return to Top |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
Return to Top |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
Small and Simple Web Applications - the Friki Way (Part 2)Frank Carver, March 2003AbstractThis article is the second of a series which laments the bloated and unmaintainable state of so many J2EE web applications and looks at ways to keep web applications small, simple and flexible. The series uses the author's Friki software as a case study, and discusses ways to design and build a powerful, flexible and usable web application in less space than a typical gif image. This article continues the design process begun last time, and starts the serious work of coding a solution. On the way it discusses some techniques to help make software development more trustworthy and less stressful. IntroductionIf you've read the first article, you should be aware that the aim of this project is to develop a small, simple and understandable "Wiki" (editable web site) application. We've considered and decided to defer the relatively heavy decision of how to store and retrieve pages by introducing a Java interface, and thought through some ways of generating the HTML for each displayed page. Before we really get started, there's one vital question which needs to be answered. It's such an important question, that it's often forgotten until it's too late!
I recommend that this question be asked and answered at the start of every software development project. I'm not suggesting that anyone build complex and detailed plans (also known as "guesses" or "wishes") before starting work. I'm not suggesting that some all-powerful "architect" lay down a complete design before starting the real work. I'm not suggesting that everything should be described in UML or pseudo code. Almost the opposite. How will we know when we're done? is a simple plea to whoever wants the work done. A development team needs to know when to stop developing. Everyone involved in design and coding a solution needs to have an idea of the end goal, if they are to make sensible decisions along the way. The answers to this question can vary enormously, but beware of "answers" which don't actually answer the question: "stop when I tell you to stop" might as well be "don't bother doing any work, it won't make any difference"; "stop when the customer is happy" gives no guidance on what makes the customer happy - why not just buy him or her a beer and go home? Useful answers to this question include things like "when it does this and this and this", "when average response time is less than 5 seconds under peak load" and so on. What these answers have in common is that they are measurable and testable. You could (theoretically, at least) make a test, and when it passes you can stop developing. So let's ask our "virtual customer" this hard but very useful question. How will we know when we are done ? For the purposes of this article, our customer says we will be "done" when we have a Java web application in which:
Just one final reminder. The above points are our complete "acceptance criteria". Any solution which meets these goals is a valid one. We must free our minds from imagining any "requirements" which have not been asked for. What do we write first?So we know what we have to do, but we don't know where or how to start. Strangely enough, I'm not going to start with design. I'm not even going to start with coding a solution. I'm going to start with a test! I always start with a test to make sure I can compile and run something. Without that, there's not much point putting in a lot of effort to write any code. In this case, I'll use the JUnit test framework, which I've found very handy over the years. If you don't already have a recent version of JUnit, please download it from the above link before proceeding. AllTests.java package tests; import junit.framework.*; public class AllTests extends TestCase { public static TestSuite suite() { TestSuite ret = new TestSuite(); ret.addTest(new TestSuite(EmptyTest.class)); return ret; } } If we try and compile this, it should fail. It should either complain about the absence of junit.framework in the import statement (which hints that the file "junit.jar" from the JUnit distribution needs to be in the classpath) or complain about the absence of the tests.EmptyTest class (which is fine. We haven't written it yet!). If it compiles without error you have either tried to compile the wrong file, or you already have both JUnit and a class tests.EmptyTest in your classpath, and you'll need to sort that out before we progress any further. The most important thing to take away from this is that a failing test has given us a lot of information and confidence about the state of our system. The EmptyTest class still doesn't exist, so we can proceed and write it. EmptyTest.java package tests; import junit.framework.*; public class EmptyTest extends TestCase { public void testEmpty() { } } Now try and compile these two classes. They should compile OK. We've passed our first test! Of course our "system" doesn't do much. There is some test code, but no actual product, but we know that we can compile some real Java code, including all that stuff about jar files and classpaths. And we have the start of a test "scaffolding" to help us build the real code. The next tiny step is to run the test code. Type:
You should see a nice graphical dialog with a green bar indicating that 1 test has been run, with no tests and no failures. Excellent. We are now ready to write a real test for a real feature. A first featureLast session I promised that we'd get started on our template system, so let's begin with that. Just as above, the process is to start with a test. Even with a tiny addition to some existing software, the most important question is still "How will we know when we're done?". Simple is good, so let's start simple - as simple as we can. Let's test that a template with no characters in it "expands" to a template with no characters in it. How about something like: TemplateTest.java package tests; import junit.framework.*; public class TemplateTest extends TestCase { public void testEmptyTemplate() { TemplateEngine engine = new TemplateEngine(); assertEquals("", engine.expand("")); } } Then add the line ret.addTest(new TestSuite(TemplateTest.class));to Alltests.java, next to the similar line for "EmptyTest" Trying to compile our system again should now tell us that there is no TemplateEngine. If you look carefully, though, it's actually complaining that there is no class "tests.TemplateEngine". To fix this we can make two changes. Add the line import friki.TemplateEngine;to TemplateTest.class and create a new class - the first of our real code: TemplateEngine.java package friki; public class TemplateEngine { public String expand(String input) { return input; } } Hold on. Isn't that cheating? That method will never "expand" anything! Pah! This is very important. Look back to where I said "We must free our minds from imagining any "requirements" which have not been asked for". The code we have written passes all our tests. If you think it should do more, you are guessing.. Worse than that, there is a much more important problem with that code which we'll see in a minute. But first, if we want more code, we have to know when to stop coding. So we need more tests. So let's think about an actual substitution. Say we want to substitute all occurrences of "~name~" with "Frank": TemplateTest.java package tests; import junit.framework.*; import friki.TemplateEngine; public class TemplateTest extends TestCase { public void testEmptyTemplate() { TemplateEngine engine = new TemplateEngine(); assertEquals("", engine.expand("")); } public void testSingleToken() { TemplateEngine engine = new TemplateEngine(); assertEquals("Frank", engine.expand("~name~")); } } We run the tests. They fail. Our TemplateEngine gives back "~name~", which is obviously not the same as "Frank". So let's fix the code TemplateEngine.java package friki; import java.text.CharacterIterator; import java.text.StringCharacterIterator; public class TemplateEngine { public String expand(String input) { boolean inToken = false; StringBuffer token = new StringBuffer(); StringBuffer ret = new StringBuffer(); CharacterIterator it = new StringCharacterIterator(input); for(char c = it.first(); c != CharacterIterator.DONE; c = it.next()) { if (c == '~') { if (inToken) { ret.append("Frank"); token.setLength(0); } inToken = !inToken; } else { if (inToken) { token.append(c); } else { ret.append(c); } } } return ret.toString(); } } There are a few things to note about this code:
It's getting better, but still not complete. Yes, we need more tests. So let's see what happens if we ask for something else. Add the following to TemplateTest, and run it again: public void testDifferentToken() { TemplateEngine engine = new TemplateEngine(); assertEquals("Margaret", engine.expand("~wife~")); } The test fails, of course. Apparently I'm married to myself. Much though I like the name Frank, I think it's unfair to return it as the value of every token. But that "Frank" is hard coded. If we want to return "Frank" for one token, and "Margaret" for another we have to get the names and values from somewhere. Sounds like a Map to me: TemplateEngine.java package friki; import java.text.CharacterIterator; import java.text.StringCharacterIterator; import java.util.Map; import java.util.HashMap; public class TemplateEngine { private Map values; public TemplateEngine() { values = new HashMap(); values.put("name", "Frank"); values.put("wife", "Margaret"); } public String expand(String input) { boolean inToken = false; StringBuffer token = new StringBuffer(); StringBuffer ret = new StringBuffer(); CharacterIterator it = new StringCharacterIterator(input); for(char c = it.first(); c != CharacterIterator.DONE; c = it.next()) { if (c == '~') { if (inToken) { ret.append(values.get(token.toString())); token.setLength(0); } inToken = !inToken; } else { if (inToken) { token.append(c); } else { ret.append(c); } } } return ret.toString(); } } Great. The basic template expansion works, but it still feels a bit clumsy to me. I don't really like the way the values are built in to the class itself. Can we make it simpler and more flexible? I reckon so. let's change our tests a little: TemplateTest.java package tests; import java.util.Map; import java.util.HashMap; import junit.framework.*; import friki.TemplateEngine; public class TemplateTest extends TestCase { Map values; public void setUp() { values = new HashMap(); values.put("name", "Frank"); values.put("wife", "Margaret"); } public void testEmptyTemplate() { TemplateEngine engine = new TemplateEngine(values); assertEquals("", engine.expand("")); } public void testSingleToken() { TemplateEngine engine = new TemplateEngine(values); assertEquals("Frank", engine.expand("~name~")); } public void testDifferentToken() { TemplateEngine engine = new TemplateEngine(values); assertEquals("Margaret", engine.expand("~wife~")); } } That's better. We now have much more control over how the template expander works. We can give it whatever names and values we want to use, and expect it to fill them in to a supplied template. The "setUp" method is run just before each of the test methods, and makes this test class into what is known as a "fixture". By the way. Have you noticed that there is a lot of duplication in this code? The code to create a new TemplateEngine is exactly the same in all the test cases. In the spirit of keeping things as small and simple as possible, let's move them into the setUp method as well: TemplateTest.java package tests; import java.util.Map; import java.util.HashMap; import junit.framework.*; import friki.TemplateEngine; public class TemplateTest extends TestCase { Map values; TemplateEngine engine; public void setUp() { values = new HashMap(); values.put("name", "Frank"); values.put("wife", "Margaret"); engine = new TemplateEngine(values); } public void testEmptyTemplate() { assertEquals("", engine.expand("")); } public void testSingleToken() { assertEquals("Frank", engine.expand("~name~")); } public void testDifferentToken() { assertEquals("Margaret", engine.expand("~wife~")); } } Of course, if we try and run this test it will fail (it won't even compile!). So the TemplateExpander code needs to be brought in line with our new design. Notice how at each stage what we need drives the tests, then the tests drive the implementation. TemplateTest.java package friki; import java.text.CharacterIterator; import java.text.StringCharacterIterator; import java.util.Map; public class TemplateEngine { private Map values; public TemplateEngine(Map values) { this.values = values; } public String expand(String input) { boolean inToken = false; StringBuffer token = new StringBuffer(); StringBuffer ret = new StringBuffer(); CharacterIterator it = new StringCharacterIterator(input); for(char c = it.first(); c != CharacterIterator.DONE; c = it.next()) { if (c == '~') { if (inToken) { ret.append(values.get(token.toString())); token.setLength(0); } inToken = !inToken; } else { if (inToken) { token.append(c); } else { ret.append(c); } } } return ret.toString(); } } And that is the "heart" of the templating system done. To make double sure it's what we wanted at the start, let's test the example template from last session: public void testExamplePage() { values.put("title", "PageOne"); values.put("content", "This is the first page in our new Wiki"); assertEquals( " <html><head><title>Friki: PageOne</title><head>\n" + " <body>\n" + " <h2>PageOne</h2>\n" + " This is the first page in our new Wiki\n" + " <hr width='100%'>\n" + " <a href='edit?PageOne'>edit this page</a>\n" + " </body>\n" + " </html>\n", engine.expand( " <html><head><title>Friki: ~title~</title><head>\n" + " <body>\n" + " <h2>~title~</h2>\n" + " ~content~\n" + " <hr width='100%'>\n" + " <a href='edit?~title~'>edit this page</a>\n" + " </body>\n" + " </html>\n")); } 5 tests passed. Cool. Although the implementation given here works as much we need it to (as shown by that last test), you may want to think about what it can't do. What happens if we want to include a '~' character in our template? What happens if we ask it to expand a token it doesn't have a value for? If you are worried about these sort of questions, ask them in the form of a test, then "fix" the code. But remember - every time you add a feature that's not needed by the application right now, you are making the final program bigger, you are making bugs harder to find, you are making the code harder to read. So think. Do you really need that feature yet? How are We Doing?We still haven't made a Wiki yet! We have written, compiled, and run some real code to make sure we have a workable build environment.We have a useful "template expander" in a few lines of code which we can use for this project, but might also be handy in others. We have built a complete regression test suite which automatically tests every class and method in our system so if anything breaks we'll know straight away. We can go home confident that we are making real, measurable, repeatable progress, and come back fresh and ready next time. Next session we'll add more customer features to our Wiki "engine" and look into automating the process of compiling and testing the code even more. In the meanwhile, I recommend reading more about JUnit. If you want to get ahead of the game, you could also look at the HTTPUnit web testing toolkit and the Ant build tool. If you want to read more about keeping the enjoyment in software development, check out my golden rules of stress-free programming. Return to Top |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
An Introduction to
|
String input = "Fee! Fie! Foe! Fum! " + "I smell the blood of an Englishman. " + "Be he 'live, or be he dead, " + "I'll grind his bones to make my bread." ; Pattern pattern = Pattern.compile( "(F[a-z]{2}! ){4}" ); // Matches four occurrences of a pattern that begins // with "F" followed by two lower case letters, a "!" // and a space. Matcher matcher = pattern.matcher( input ); System.out.println( matcher.find() ); // Prints true. System.out.println( matcher.group() ); // Prints "Fee! Fie! Foe! Fum! ". |
Capturing groups are numbered according to their appearance in the regular expression. The first opening parenthesis is the start of the first capturing group; the second opening parenthesis is the start of the second capturing group; and so on. Each capturing group ends at the matching closing parenthesis. It is possible to have one capturing group embedded in another. So, the pattern "I (am (Sam))" has two capturing groups. The first capturing group is the pattern "am Sam" and the second capturing group is the pattern "Sam".
The capturing group count and corresponding matched subsequence data are maintained
in the Matcher
object. A Matcher
object's String
group( int )
method "returns the input subsequence captured
by the given group during the previous match operation." A Matcher
object's int groupCount()
method "returns the number of capturing
groups in this matcher's pattern5."
Group count number zero refers to the entire pattern match, so matcher.group(
0 )
returns the entire previously matched subsequence and is equivalent
to matcher.group()
. Note that capturing group number zero
is not included in the total group count returned by the groupCount()
method7.
Consider this mildly more involved example demonstrating capturing groups. Note that the group construct limits the scope of the OR operator.
input = "Humpty Dumpty sat on a wall. " + "Humpty Dumpty had a great fall. " + "All the king's horses and all the king's men " + "Couldn't put Humpty together again! " ; pattern = Pattern.compile( "((H|D)(umpty) ){2}" ); // Matches six characters ending in "umpty" and // beginning with "H" or "D". Three capturing // groups are defined and remembered by the Matcher. matcher = pattern.matcher( input ); System.out.println( matcher.find() ); // Prints true. System.out.println( matcher.groupCount() ); // Prints 3. System.out.println( matcher.group( 1 ) ); // Prints "Dumpty ". System.out.println( matcher.group( 2 ) ); // Prints "D". System.out.println( matcher.group( 3 ) ); // Prints "umpty". System.out.println( matcher.group( 0 ) ); // Prints "Humpty Dumpty ". // If it was expected that matcher.group( 1 ) should contain // "Humpty", then remember that the group( int ) method // returns the input subsequence captured by the specified // group during the previous match operation. This match // operation was performed two times - the first time matching // "Humpty" and the second time matching "Dumpty". |
Each matched group maintained in the Matcher
object is called
a "back reference". Referencing a matched group as demonstrated above
is one style of back referencing in Java regular expressions. A later lesson
will introduce another style and use of back referencing.
A slight performance cost is associated with maintaining back references (the
group count and matched subsequence data) in the Matcher
object.
The non-capturing group construct provides the function of grouping pattern
elements without the cost of remembering each matched group as a back reference.
The syntax for a non-capturing group is simply "(?:X)". A non-capturing
group functions much like a capturing group with the distinction that no capturing
group specific data is maintained in the Matcher
.
input = "Humpty Dumpty sat on a wall. " + "Humpty Dumpty had a great fall. " + "All the king's horses and all the king's men " + "Couldn't put Humpty together again! " ; pattern = Pattern.compile( "((?:H|D)(?:umpty) ){2}" ); // Matches six characters ending in "umpty" and // beginning with "H" or "D". Three groups // are defined, one is a capturing group that // will be remembered by the Matcher. matcher = pattern.matcher( input ); System.out.println( matcher.find() ); // Prints true. System.out.println( matcher.groupCount() ); // Prints 1. System.out.println( matcher.group( 1 ) ); // Prints "Dumpty ". System.out.println( matcher.group( 0 ) ); // Prints "Humpty Dumpty ". |
According to J-Sprint's memory profiler, when the previous two code examples (the searches for Humpty Dumpty) were performed one hundred thousand times each, the non-capturing group strategy demonstrated a performance improvement of roughly 0.0003% - not much to write home about. Alternatively, tests against a larger input character sequence (50KB) composed of one hundred groups per match, resulted in the non-capturing group test consuming approximately 40% as much memory as the capturing group test.
Java regular expressions provide two "look ahead" constructs. These
constructs allow the description of a pattern where a specified pattern only
matches if it is followed by the pattern described in the look ahead construct.
The pattern described in the look ahead construct is not part of any matched
subsequence described by the Matcher
object - it is only a requirement
that must be met in order for the specified pattern to match. Though the look
ahead construct is contained within matching opening and closing parentheses,
it is a non-capturing group construct.
input = "Today's specials are apple chocolate pie and cherry banana pie." ; pattern = Pattern.compile( "(apple|cherry)(?= chocolate)" ); // Matches "apple" or "cherry" where the following pattern // matches " chocolate". " chocolate" is not a part of the // resulting match, it follows it. matcher = pattern.matcher( input ); System.out.println( matcher.find() ); // Prints true. System.out.println( matcher.groupCount() ); // Prints 1. System.out.println( matcher.group( 1 ) ); // Prints "apple". System.out.println( matcher.group() ); // Prints "apple". pattern = Pattern.compile( "(apple|cherry)(?! chocolate)" ); // Matches "apple" or "cherry" where the following pattern // does not match " chocolate". matcher = pattern.matcher( input ); System.out.println( matcher.find() ); // Prints true. System.out.println( matcher.groupCount() ); // Prints 1. System.out.println( matcher.group( 1 ) ); // Prints "cherry". System.out.println( matcher.group() ); // Prints "cherry". |
Two "look behind" constructs provide a similar function as the look ahead constructs, the distinction being that the look behind constructs try to match whatever precedes a specified pattern. The pattern described in the look behind construct is not part of any matched subsequence described by the Matcher object - it is only a requirement that must be met in order for the specified pattern to match. The look behind construct is also non-capturing.
input = "Tomorrow's special is fried bananas with baked clam." ; pattern = Pattern.compile( "(?<=fried )(bananas|clam)" ); // Matches "bananas" or "clam" if preceded by "fried ". // "fried " is not part of the resulting match, it precedes it. matcher = pattern.matcher( input ); System.out.println( matcher.find() ); // Prints true. System.out.println( matcher.groupCount() ); // Prints 1. System.out.println( matcher.group( 1 ) ); // Prints "bananas". System.out.println( matcher.group() ); // Prints "bananas". pattern = Pattern.compile( "(?<!fried )(bananas|clam)" ); // Matches "bananas" or "clam" if not preceded by "fried ". matcher = pattern.matcher( input ); System.out.println( matcher.find() ); // Prints true. System.out.println( matcher.groupCount() ); // Prints 1. System.out.println( matcher.group( 1 ) ); // Prints "clam". System.out.println( matcher.group() ); // Prints "clam". |
During the critiquing process of this lesson, Jim Yingst pointed out an important issue concerning look ahead and look behind constructs:
Normally when we use find() repeatedly to match a pattern
several times within a given input, each find() "consumes"
characters in the input string up to the last character matched by the
find. This means that a subsequent find() will not normally
"see" any characters which were already matched by previous finds. Lookahead
and lookbehind are workarounds for this limitation. We can look ahead
to match characters without consuming them, or we can look back to match
characters which were already consumed. |
This non-consumptive behavior of the look ahead and look behind constructs is also known as zero-width matching.
Consider this related example, also suggested by Jim (and adapted to fit a nursery rhyme).
input = "John Jacob Jingleheimer Schmidt " + "His name is my name, too! " + "Whenever we go out, " + "The people always shout " + "There goes John Jacob Jingleheimer Schmidt!" ; pattern = Pattern.compile( "(J\\w+)(?=.+Schmidt )" ); // Matches all words starting with "J" that // precede "Schmidt " (note the space following the t). // The ".+Schmidt " part of the regular // expression is not consumed. matcher = pattern.matcher( input ); while ( matcher.find() ) // Prints { // "John" System.out.println( matcher.group() ); // "Jacob" } // "Jingleheimer" |
Pattern.compile(String, int)
As previously mentioned, the Pattern
class contains two static
factory methods that create and return references to a Pattern
object. Pattern.compile( String )
creates a new Pattern
object from the specified String
. Pattern.compile( String ,
int )
creates a new Pattern
object from the specified
String
with the specified flag. This integer flag changes the way
a Pattern
matches an input sequence, such as turning on or off
case sensitivity or whether the "." meta character can match a line
terminator.
Another way to specify flags that adjust the way a Pattern
matches
is with a flag construct embedded in the regular expression itself. This construct
takes the form "(?flags)" and appears as part of the String
used to construct a Pattern
. The embedded flag construct is a non-capturing
group construct (in other words, the opening parenthesis does not count towards
the overall group count). The embedded flag construct can be used in combination
with the flag passed to the Pattern.compile( String , int )
factory method.
Available embedded flag constructs and available flags to specify when constructing
a new Pattern
object include:
embedded flags | construction flags | meanings * |
---|---|---|
(?i) | Pattern.CASE_INSENSITIVE |
Enables case-insensitive matching. |
(?d) | Pattern.UNIX_LINES |
Enables Unix lines mode. |
(?m) | Pattern.MULTILINE |
Enables multi line mode. |
(?s) | Pattern.DOTALL |
Enables "." to match line terminators. |
(?u) | Pattern.UNICODE_CASE |
Enables Unicode-aware case folding. |
(?x) | Pattern.COMMENTS |
Permits white space and comments in the pattern. |
--- | Pattern.CANON_EQ |
Enables canonical equivalence. |
Consider an example with case insensitivity turned on.
input = "Hey, diddle, diddle, " + "The cat and the fiddle, " + "The cow jumped over the moon. " + "The little dog laughed " + "To see such sport, " + "And the dish ran away with the spoon." ; pattern = Pattern.compile( "the \\w+?(?=\\W)" , Pattern.CASE_INSENSITIVE ); // Matches "the " followed by any word, regardless of case. matcher = pattern.matcher( input ); while ( matcher.find() ) // Prints { // The cat System.out.println( matcher.group() ); // the fiddle } // The cow // the moon // The little // the dish // the spoon |
An equivalent Pattern
could be constructed using the "(?i)" embedded
flag. If embedded at the beginning of the regular expression, this embedded
flag would affect the entire regular expression as would any integer flag specified
in Pattern.compile( String , int )
. In the previous
example, pattern = Pattern.compile( "(?i)the \\w+?(?=\\W)" )
would
have resulted in the same matched subsequences.
Multiple flags can be specified as embedded flags or as the integer argument
to Pattern.compile( String , int )
. To specify
multiple embedded flags, simply list them, one after the other. "(?is)"
would specify that the Pattern
is to match regardless of case and
the "." meta character can match line terminators. To specify multiple
flags as the integer argument to Pattern.compile( String , int )
,
OR together the integer constants of the Pattern
class that represent
the desired behavior.
Consider the following example that demonstrates equivalent use of multiple
embedded flags and OR'ed together integer constants passed as the integer flag
argument to Pattern.compile( String , int ) .
input = "Green cheese,\n" + "Yellow laces,\n" + "Up and down\n" + "The market places." ; pattern = Pattern.compile( "(?is)[a-z]*,.[a-z]*" ); // Regardless of case, matches consecutive letters // followed by a comma, any character, then more // consecutive letters where the meta character "." // may match line terminators. matcher = pattern.matcher( input ); while ( matcher.find() ) // Prints { // cheese, System.out.println( matcher.group() ); // Yellow } // laces, // Up int flags = Pattern.CASE_INSENSITIVE | Pattern.DOTALL ; pattern = Pattern.compile( "[a-z]*,.[a-z]*" , flags ); // Regardless of case, matches consecutive letters // followed by a comma, any character, then more // consecutive letters where the meta character "." // may match line terminators. matcher = pattern.matcher( input ); while ( matcher.find() ) // Prints { // cheese, System.out.println( matcher.group() ); // Yellow } // laces, // Up |
The embedded flag construct affects the parts of a regular expression that appear after the embedded flag. If an embedded flag appears in the middle of a regular expression, then only the half of the expression, that appears after the flag, is potentially affected.
Using an embedded flag construct, it is possible to specify that only a section (a group) of a regular expression be affected by the flag. The syntax for such a construct is "(?flags:X)" , where "X" is the regular expression to be affected by the flags. This limits the scope of the flags to within the closing parentheses.
The embedded flag construct can also be used to turn off flags. The syntax
to turn off a flag or flags is "(?-flags)". So, it is possible to
specify a flag be turned on for the entire regular expression by specifying
the appropriate integer using Pattern.compile( String , int )
and that the same flag should be turned off for a specified group of the regular
expression using "(?-flag:X)".
input = "HARK! HARK! The dogs do bark, " + "The beggars are coming to town. " + "Some in rags, " + "And some in tags, " + "And one in a velvet gown!" ; pattern = Pattern.compile( "(?-i:[A-Z])[A-Z]*" , Pattern.CASE_INSENSITIVE ); // Matches any word, regardless of case except // the first letter which must be capitalized. matcher = pattern.matcher( input ); // Prints while ( matcher.find() ) // HARK { // HARK System.out.println( matcher.group() ); // The } // The // Some // And // And |
The embedded flag constructs are non-capturing. So, the enclosing parentheses
do not contribute to the Matcher's
group count.
Notes and Resources | |
1 | The Regular Expressions Tutorial at JavaRegex.com |
2 | java.util.regex
Package API Documentation |
3 | java.lang.CharSequence is a new Interface in Java 1.4.
A CharSequence is a readable sequence of characters. This
interface provides uniform, read-only access to many different kinds of
character sequences. String and StringBuffer
both implement CharSequence . -- CharSequence
API Documentation |
4 | java.util.regex.Pattern
API Documentation |
5 | java.util.regex.Matcher
API Documentation |
6 | The text for the nursery rhymes in this lesson can be found at http://www.zelo.com/family/nursery/index.asp . |
7 | Jim Yingst notes: The API documentation for the groupCount() method of Matcher
is misleading - it should say "Any non-negative integer smaller than or equal to the value returned by this method is guaranteed to be a valid group index for this matcher." The italicized section does not appear in the current API. (It's fixed in 1.4.1 beta source code, but that's not what you see when you browse the API online.) |
8 | The Regular Expression Library |
9 | For further reading on regular expressions, take a look at Mastering Regular Expressions by Jeffrey Friedl. A sample chapter from O'Reilly is available on-line. Jeffrey Friedl maintains a website assiciated with his book at http://regex.info at which he has posted a nice list of alternative Java Regex Packages. |
The Cattle Drive forum is where the drivers get together to complain, uh rather, discuss their assignments and encourage each other. Thanks to the enthusiastic initiative of Johannes de Jong, you can keep track of your progress on the drive with the Assignment Log. If you're tough enough to get through the nitpicking, you'll start collecting moose heads at the Cattle Drive Hall of Fame.
Gettin' them doggies...
We got 'round about a good dozen ranchers drivin' along the trail, most of 'em in
rasslin' good ol' Java-4 Say and Servlets-4 assignments.
Fresh riders on the Drive...
Got a new Cattle Driver signed up on the assignment log, and this 'un a real bronco rider.
A big welcome to our latest new rider:
John Hembree.
This rider's been purdy busy, we'll tell y'all why in just a bit...
Another moose on the wall
for...
Yep, that's right, you make it through, you get yerself a nice
little moose to hang on the wall. Well, OK, so it's a virtual moose
on a virtual wall, but you can be right proud of 'em! Thanks to the
efforts of Marilyn deQueiroz, you can now proudly admire your
codin' accomplishments at the recently opened Cattle Drive Hall of Fame. Check it out, pardner, it's
worth a wink.
Juliane Gross has been ridin' real steady and bagged her second moose on the OOP Trail of the drive. We know her secret to success: chocolate! Keep it wrapped up in yer saddle bag cowgirl, that's stuff is more valuable than a bag o' gold nuggets! Richard Hawkes is gettin' right comfy in the saddle, he bagged a moose on the Java Basics part of the Drive. Nice ridin' Richard!
Cattle Drivers gittin' all kinds of honors...
Wouldn't ya know it, a couple hard workin' Cattle Drivers went an' took a side trail and
came back with a little extra dust and a purdey little cerrrrtificate to boot: Barry Gaunt
and John Hembree got themselves a SCJP! Big congrats fellas.
Nitpicking is hard work too...
We know they're workin' reeeeeally hard, after all, we've seen what those assignments
look like once the nitpickers have combed through 'em. Hats off to Marilyn deQueiroz, Pauline McNamara, and Jason Adam for their dedication and patience with the pesky
vermin that always manage to make their way into those assignments.
Those gentlemen behind the bar...
Notice the fellas wipin' the bar and shinin' up the sarsparila glasses? Updating that assignment log
is tough work too. We got us a another batch of bartenders. Big thanks to Dirk Schreckmann
and Matthew Phillips and welcome to Michael Matola and Barry Gaunt.
Mosey up and place yer orders.
Tips for the Trail...
You know the feeling, right? Yer bustin' yer head over that ol' favorite, Java-4 Say, an' you jest
asks yerself, "So why the tootin' am I doing this to myself?!" Well hang in there cowpokes, there's
life outside the Cattle Drive, we got proof, and all that hair you been pulling outta yer head will
pay off one day. Yep, check out this real life
testimonial - ain't no slick talkin' it's the truth!
Mastering Regular Expressions Jeffrey E. F. Friedl | ||
The books nine chapters are categorized into three sections. The book first teaches the basics of regular expressions, crafting simple regexes, and the different features and flavors available in various regex packages. Next, the reader is given invaluable information about how the different types of regular expression engines work, as well as techniques for crafting practical and efficient expressions. The final section covers language specific issues in Perl, Java, and .NET. The author does an outstanding job leading the reader from regex novice to master. The book is extremely easy to read and chock full of useful and relevant examples. The author offers up questions along the way designed to engage the reader to apply what he has learned. In-line references to other parts of the book containing information pertinent the particular topic being discussed are also very helpful. Regular expressions are valuable tools that every developer should have in their toolbox. "Mastering Regular Expressions" is the definitive guide to the subject, and an outstanding resource that belongs on every programmer's bookshelf. (Jason Menard - Bartender, March 2003) | ||
More info at Amazon.com || More info at Amazon.co.uk |
April 1 | Swing 2nd Ed | Matthew Robinson, Pavel Vorobiev | Manning | Swing/AWT/JFC | confirmed |
April 8 | Java 2 Programmer Exam Cram 2 (Exam CX-310-035) | Bill Brogden, Marcus Green | Que | Programmer Certification Study | confirmed |
April 15 | Java NIO | Ron Hitchens | O'Reilly | I/O and Streams | confirmed |
April 22 | Core JSTL | David Geary | Addison-Wesley | JSP | confirmed |
April 29 | Java Performance Tuning | Jack Shirazi | O'Reilly | Performance | confirmed |