|
Articles in this issue :
|
Javaranch rocks at the Software Development Conference
Kathy Sierra,
Apil 2003
Imagine a huge ballroom filled to capacity with developers (including more than a handful of alpha geeks and techno-luminaries). Two projection screens, each as large as a two-story building, are at the front. A hush falls over the crowd as the finalists for the Jolt Cola Award in the Web Sites category are displayed. IBM, BEA, Microsoft, Javararanch, ... what? Javaranch? Up against the likes of the big corporate developer sites? Yes, there was the name Javaranch in a font that had to be at least 7000 points.
There was a drum roll... (no, seriously, there really *was* a drum roll) and the winners of the Software Development Productivity Awards were announced. Javaranch is a winner! The crowd goes wild.
So we didn't win the coveted Jolt Cola award, but what's that about anyway? Javaranch is all about PRODUCTIVITY, not beverages with an illegal amount of caffeine.
Most importantly, Javaranch was a finalist in a category of huge, big-name, big-budget developer sites. And here we are, 100% volunteer. Paul couldn't make it, so I was there to accept the award for him, and if we'd won the Jolt award, I would have had to give a short acceptance speech. Here's what I would have said:
"Javaranch is an all-volunteer site. There's no company, no budget, no "Press three for worldwide marketing" on the phone system. There's no phone. The value of javaranch is the hundreds of people who run the site, man the site, and moderate the forums. The value of javaranch is in the hundreds of thousands who participate, asking and answering questions to help one another out. This award is for all those individuals, and to Java -- for being a language that inspires that much passion.
You'll never see this much enthusiasm at a .netRanch." (OK, I wouldn't have said that last thing, but I would have been thinking it strongly enough to telepathically convey it to at least the entire front row of the audience).
I was intensely proud to be there representing Javaranch.
Return to Top
|
Win the hottest new Java teaching book on the block
Head First Java
Kathy Sierra and Bert Bates have kindly donated a signed, yes signed, copy of their latest book, Head First Java, to the 1st person that posts the correct solution to their JavaCross crossword from chapter 10 of their book.
If you wish to win the book you must post your attempt in the thread JavaCross in the Java in General (beginner) forum here on the JavaRanch.
The thread will be started by myself, Johannes de Jong, on April the 3rd. I'll attempt to start the thread as close to 07:00 am server time as possible.
So set your alarm or try and stay awake, depending on where in the world you are, as the post time in the thread will determine the winner. If you edit your reply to change your attempt, the entry becomes invalid. If you want to change your attempt, post a new attempt.
Good luck
|
|
|
|
This puzzle is focused on Chapter 10 content (exceptions), but anything from the first nine chapters is fair game.
|
|
|
|
|
|
|
|
|
Across |
|
Down |
|
|
|
1. To give value
4. Flew off the top
6. All this and more!
8. Start
10. The family tree
13. No ducking
15. Problem objects
18. One of Java's '49' |
20. Class hierarchy
21. Too hot to handle
24. Common primitive
25. Code recipe
27. Unruly method action
28. Defined behavior
29. Start a chain reaction
|
2. Currently usable
3. Template's creation
4. Don't show the kids
5. Mostly static API class
7. Not about behavior
9. The template
11. Roll another one off
the line
|
12. javac saw it coming
14. Attempt risk
16. Automatic acquisition
17. Changing method
19. Announce a duck
22. Deal with it
23. Create bad news
26. One of my roles |
|
|
|
|
|
|
|
Return to Top
|
Mutable and Immutable Objectsby David O'Meara
Mutable and Immutable objects are a simple idea with wide ranging
consequences. Often when immutability is mentioned it isn't defined, or
when it is defined there are no details of how to ensure immutability or
the consequences.
Before we start, the terminology follows the article Pass-by-Value,
Please in the JavaRanch Camp fire. If you haven't
learnt to say Java is pass-by-value, you might want to head there
first.
Crappy Definition to start off with:Mutable Objects: When
you have a reference to an instance of an object, the contents of that
instance can be altered Immutable Objects: When you have
a reference to an instance of an object, the contents of that instance
cannot be altered
Immutability and InstancesTo demonstrate this behaviour, we'll
use java.lang.String as the immutable
class and java.awt.Point as the
mutable class.
Point myPoint = new Point( 0, 0 );
System.out.println( myPoint );
myPoint.setLocation( 1.0, 0.0 );
System.out.println( myPoint );
String myString = new String( "old String" );
System.out.println( myString );
myString.replaceAll( "old", "new" );
System.out.println( myString ); | In case
you can't see what the output is, here it is: java.awt.Point[0.0, 0.0]
java.awt.Point[1.0, 0.0]
old String
old String We are only looking at a single instance of each object,
but we can see that the contents of myPoint has changed, but the contents of myString did not. To show what happens when
we try to change the value of myString, we'll extend the previous example.
String myString = new String( "old String" );
System.out.println( myString );
myString = new String( "new String" );
System.out.println( myString ); | The output
from this is: old String
new String Now we find that the value displayed by the myString variable has changed. We have
defined immutable objects as being unable to change in value, so what is
happening? Let's extend the example again to watch the myString variable closer.
String myString = new String( "old String" );
String myCache = myString;
System.out.println( "equal: " + myString.equals( myCache ) );
System.out.println( "same: " + ( myString == myCache ) );
myString = "not " + myString;
System.out.println( "equal: " + myString.equals( myCache ) );
System.out.println( "same: " + ( myString == myCache ) ); | The
result from executing this is: equal: true
same: true
equal: false
same: false What this shows is that variable myString is referencing a new instance of the String class. The contents of the object
didn't change; we discarded the instance and changed our reference to a
new one with new contents.
Variable Values and Instance ContentsIf you look at the example
above, you can see the point I'm trying to sneak through. You can always
change the value of a variable by getting your variable to reference a new
object. Sometimes you can change the value of a variable by keeping a
reference to the same instance, but change the contents of the instance.
After you have eliminated those possibilities, you have a variable that
retains its reference to an object, but the contents of this object cannot
change. Doesn't sound like a very interesting idea, and it sounds a bit
too simple to be useful.
It turns out that Immutable Objects, that is objects that you cannot
change the contents after they have been set, are a very handy tool when
used in the right place. They can promote thread safety in your code, you
can share them around without being afraid that they will change without
your knowledge, they are great for caching and constants. But we're not
going to cover any of that yet; we are going to concentrate on building
immutable objects.
Building an Immutable classSo what is it about the String class that makes it Immutable while a
Point is mutable?
In this case, Strings have no
mutators while Points do. If we
removed all of the mutators from the Point class, would it be Immutable? No it wouldn't.
Removing mutators is a necessary first step, but immutability requires
more than that to ensure that the contents of an instance never changes.
Fields must be privateObviously all of the fields must be
private. There is little point removing the mutators if they aren't even
required to change the instance contents.
public class ImmutablePoint
{
//note there are no mutators!
private double x;
private double y;
//and the rest...
| This is almost enough, but there are two
more steps to consider.
Make sure methods can't be overridden.If your class gets
extended, it could add extra fields that are not immutable, or the methods
could be overridden to return a different value each time. There are two
ways to protect against this.
The preferred way is to make the class final. This is sometimes
referred to as "Strong Immutability". It prevents anyone from extending
your class and accidentally or deliberately making it mutable.
The second way, also called "Weak Immutability" is to make your methods
final. It allows others to extend your class to add more behaviour, but
protects the original contract specified by the class. If you want a more
verbose description, imagine a class A
is weakly immutable. If you have an instance of object A, it is immutable. If someone creates class
B that extends A, it is only the behaviour defined by the A class that is immutable. Any added
behaviour from class B may not be
immutable.
Protect mutable fieldsThe last requirement which many people fall
victim too, is to build your immutable class from primitive types or
immutable fields, otherwise you have to protect mutable fields from
manipulation.
To highlight this problem, we'll use the example of a supposedly
immutable class representing a person. Our class has a first and last
name, as well as a date of birth.
import java.util.Date;
public final class BrokenPerson
{
private String firstName;
private String lastName;
private Date dob;
public BrokenPerson( String firstName,
String lastName, Date dob)
{
this.firstName = firstName;
this.lastName = lastName;
this.dob = dob;
}
public String getFirstName()
{
return this.firstName;
}
public String getLastName()
{
return this.lastName;
}
public Date getDOB()
{
return this.dob;
}
} | This all looks fine, until someone uses
it like this:
Date myDate = new Date();
BrokenPerson myPerson =
new BrokenPerson( "David", "O'Meara", myDate );
System.out.println( myPerson.getDOB() );
myDate.setMonth( myDate.getMonth() + 1 );
System.out.println( myPerson.getDOB() ); | Depending
on the dates entered, the output could be something like this: Mon Mar 24 21:34:16 GMT+08:00 2003
Thu Apr 24 21:34:16 GMT+08:00 2003 The Date object is mutable, and the myPerson variable is referencing the same instance of the
Date object as the myDate variable. When myDate changes the instance it is referencing, the myPerson instance changes too. It isn't
immutable!
We can defend against this by taking a copy of the of the Date instance when it is passed in rather
than trusting the reference to the instance we are given.
import java.util.Date;
public final class BetterPerson
{
private String firstName;
private String lastName;
private Date dob;
public BetterPerson( String firstName,
String lastName, Date dob)
{
this.firstName = firstName;
this.lastName = lastName;
this.dob = new Date( dob.getTime() );
}
//etc... | Now we're close, but we're still
not quite there. Our class is still open to abuse.
BetterPerson myPerson =
new BetterPerson( "David", "O'Meara", new Date() );
System.out.println( myPerson.getDOB() );
Date myDate = myPerson.getDOB();
myDate.setMonth( myDate.getMonth() + 1 );
System.out.println( myPerson.getDOB() ); | We
see here that taking a copy on the way in wasn't enough; we also need to
prevent anyone from getting a reference to our mutable Date field when we pass it out.
public Date getDOB()
{
return new Date( this.dob.getTime() );
} |
Make deep copies of mutable dataThe only point to add is that
when you copy the instance on the way in and the way out, you need to make
a deep copy. Otherwise you run the risk of leaving some mutable data in
your immutable class!
If you are confused about the need to provide a deep copy, keep in mind
that a single piece of shared mutable data, no matter how deep it is
buried inside an object, makes your class mutable. When you create a copy
of an object to defend against the value changing, you need to make sure
your copy doesn't include this shared mutable class. You need to copy any
mutable objects all the way down to the last field, and copy any nested
fields until you have a completely new copy of your own. It's the only way
to be safe!
Our Template for Immutable ClassesNow we have a template for
creating immutable objects.
- Make all fields private
- Don't provide mutators
- Ensure that methods can't be overridden by either making the class
final (Strong Immutability) or making your methods final (Weak
Immutability)
- If a field isn't primitive or immutable, make a deep clone on the
way in and the way out.
Which classes are Immutable?To finish up, lets discuss the common
Java classes that are immutable and those that aren't. Firstly, all of the
java.lang package wrapper classes are
immutable: Boolean, Byte, Character, Double,
Float, Integer, Long, Short, String.
As in the Person classes we discussed, java.util.Date objects are not immutable. The classes java.math.BigInteger and BigDecimal are not immutable either,
although maybe they should have been.
And we're done......for now. This concludes an introduction to
Mutable and Immutable Classes in Java. Hopefully there will be a second
part that will go into more detail on weak and strong immutability,
reasons why you'd make classes immutable and reasons to avoid them, and
some other miscellaneous topics on immutable types in Java.
|
Return to Top
|
Small and Simple Web Applications - the Friki Way (Part 2)
Frank Carver,
March 2003
Abstract
This article is the second of a series which laments the bloated and unmaintainable state of so many J2EE web applications and looks at ways to keep web applications small, simple and flexible. The series uses the author's Friki software as a case study, and discusses ways to design and build
a powerful, flexible and usable web application in less space than
a typical gif image.
This article continues the design process begun last time, and starts the serious work of coding a solution. On the way it discusses some techniques to help make software development more trustworthy and less stressful.
Introduction
If you've read the first article, you should be aware that the aim of this project is to develop a small, simple and understandable "Wiki" (editable web site) application. We've considered and decided to defer the relatively heavy decision of how to store and retrieve pages by introducing a Java interface, and thought through some ways of generating the HTML for each displayed page.
Before we really get started, there's one vital question which needs to be answered. It's such an important question, that it's often forgotten until it's too late!
How will we know when we're done?
I recommend that this question be asked and answered at the start of every software development project. I'm not suggesting that anyone build complex and detailed plans (also known as "guesses" or "wishes") before starting work. I'm not suggesting that some all-powerful "architect" lay down a complete design before starting the real work. I'm not suggesting that everything should be described in UML or pseudo code. Almost the opposite.
How will we know when we're done? is a simple plea to whoever wants the work done. A development team needs to know when to stop developing. Everyone involved in design and coding a solution needs to have an idea of the end goal, if they are to make sensible decisions along the way. The answers to this question can vary enormously, but beware of "answers" which don't actually answer the question: "stop when I tell you to stop" might as well be "don't bother doing any work, it won't make any difference"; "stop when the customer is happy" gives no guidance on what makes the customer happy - why not just buy him or her a beer and go home?
Useful answers to this question include things like "when it does this and this and this", "when average response time is less than 5 seconds under peak load" and so on. What these answers have in common is that they are measurable and testable. You could (theoretically, at least) make a test, and when it passes you can stop developing.
So let's ask our "virtual customer" this hard but very useful question. How will we know when we are done ?
For the purposes of this article, our customer says we will be "done" when we have a Java web application in which:
- each page may be viewed using it's own unique URL
- page content may contain links to other pages by name
- links to nonexistent pages will be marked with a "?"
- page content may be created and edited using just a browser
- at least 100 different pages can be stored
That should be enough to get started. Let's see how simply and quickly we can code a solution to this.
Just one final reminder. The above points are our complete "acceptance criteria". Any solution which meets these goals is a valid one. We must free our minds from imagining any "requirements" which have not been asked for.
What do we write first?
So we know what we have to do, but we don't know where or how to start. Strangely enough, I'm not going to start with design. I'm not even going to start with coding a solution. I'm going to start with a test! I always start with a test to make sure I can compile and run something. Without that, there's not much point putting in a lot of effort to write any code. In this case, I'll use the JUnit test framework, which I've found very handy over the years. If you don't already have a recent version of JUnit, please download it from the above link before proceeding.
AllTests.java
package tests;
import junit.framework.*;
public class AllTests extends TestCase
{
public static TestSuite suite()
{
TestSuite ret = new TestSuite();
ret.addTest(new TestSuite(EmptyTest.class));
return ret;
}
}
If we try and compile this, it should fail. It should either complain about the absence of junit.framework in the import statement (which hints that the file "junit.jar" from the JUnit distribution needs to be in the classpath) or complain about the absence of the tests.EmptyTest class (which is fine. We haven't written it yet!). If it compiles without error you have either tried to compile the wrong file, or you already have both JUnit and a class tests.EmptyTest in your classpath, and you'll need to sort that out before we progress any further.
The most important thing to take away from this is that a failing test has given us a lot of information and confidence about the state of our system. The EmptyTest class still doesn't exist, so we can proceed and write it.
EmptyTest.java
package tests;
import junit.framework.*;
public class EmptyTest extends TestCase
{
public void testEmpty()
{
}
}
Now try and compile these two classes. They should compile OK. We've passed our first test! Of course our "system" doesn't do much. There is some test code, but no actual product, but we know that we can compile some real Java code, including all that stuff about jar files and classpaths. And we have the start of a test "scaffolding" to help us build the real code.
The next tiny step is to run the test code. Type:
java junit.swingui.TestRunner tests.AllTests
You should see a nice graphical dialog with a green bar indicating that 1 test has been run, with no tests and no failures. Excellent. We are now ready to write a real test for a real feature.
A first feature
Last session I promised that we'd get started on our template system, so let's begin with that. Just as above, the process is to start with a test. Even with a tiny addition to some existing software, the most important question is still "How will we know when we're done?". Simple is good, so let's start simple - as simple as we can. Let's test that a template with no characters in it "expands" to a template with no characters in it. How about something like:
TemplateTest.java
package tests;
import junit.framework.*;
public class TemplateTest extends TestCase
{
public void testEmptyTemplate()
{
TemplateEngine engine = new TemplateEngine();
assertEquals("", engine.expand(""));
}
}
Then add the line
ret.addTest(new TestSuite(TemplateTest.class));
to Alltests.java, next to the similar line for "EmptyTest"
Trying to compile our system again should now tell us that there is no TemplateEngine. If you look carefully, though, it's actually complaining that there is no class "tests.TemplateEngine". To fix this we can make two changes. Add the line
import friki.TemplateEngine;
to TemplateTest.class and create a new class - the first of our real code:
TemplateEngine.java
package friki;
public class TemplateEngine
{
public String expand(String input)
{
return input;
}
}
Hold on. Isn't that cheating? That method will never "expand" anything! Pah!
This is very important. Look back to where I said "We must free our minds from imagining any "requirements" which have not been asked for". The code we have written passes all our tests. If you think it should do more, you are guessing.. Worse than that, there is a much more important problem with that code which we'll see in a minute. But first, if we want more code, we have to know when to stop coding. So we need more tests.
So let's think about an actual substitution. Say we want to substitute all occurrences of "~name~" with "Frank":
TemplateTest.java
package tests;
import junit.framework.*;
import friki.TemplateEngine;
public class TemplateTest extends TestCase
{
public void testEmptyTemplate()
{
TemplateEngine engine = new TemplateEngine();
assertEquals("", engine.expand(""));
}
public void testSingleToken()
{
TemplateEngine engine = new TemplateEngine();
assertEquals("Frank", engine.expand("~name~"));
}
}
We run the tests. They fail. Our TemplateEngine gives back "~name~", which is obviously not the same as "Frank". So let's fix the code
TemplateEngine.java
package friki;
import java.text.CharacterIterator;
import java.text.StringCharacterIterator;
public class TemplateEngine
{
public String expand(String input)
{
boolean inToken = false;
StringBuffer token = new StringBuffer();
StringBuffer ret = new StringBuffer();
CharacterIterator it = new StringCharacterIterator(input);
for(char c = it.first(); c != CharacterIterator.DONE; c = it.next())
{
if (c == '~')
{
if (inToken)
{
ret.append("Frank");
token.setLength(0);
}
inToken = !inToken;
}
else
{
if (inToken)
{
token.append(c);
}
else
{
ret.append(c);
}
}
}
return ret.toString();
}
}
There are a few things to note about this code:
- It's quite short. If I had wanted to use a more compact layout style, I could have fitted it into about 15 lines.
- It uses system classes wherever possible, such as the fairly uncommon (but useful) CharacterIterator.
- Third, there's no constructor, "getters and setters", member variables or any other nonsense. Just the code needed to extract tokens separated by '~' and return "Frank".
It's getting better, but still not complete. Yes, we need more tests. So let's see what happens if we ask for something else. Add the following to TemplateTest, and run it again:
public void testDifferentToken()
{
TemplateEngine engine = new TemplateEngine();
assertEquals("Margaret", engine.expand("~wife~"));
}
The test fails, of course. Apparently I'm married to myself. Much though I like the name Frank, I think it's unfair to return it as the value of every token. But that "Frank" is hard coded. If we want to return "Frank" for one token, and "Margaret" for another we have to get the names and values from somewhere. Sounds like a Map to me:
TemplateEngine.java
package friki;
import java.text.CharacterIterator;
import java.text.StringCharacterIterator;
import java.util.Map;
import java.util.HashMap;
public class TemplateEngine
{
private Map values;
public TemplateEngine()
{
values = new HashMap();
values.put("name", "Frank");
values.put("wife", "Margaret");
}
public String expand(String input)
{
boolean inToken = false;
StringBuffer token = new StringBuffer();
StringBuffer ret = new StringBuffer();
CharacterIterator it = new StringCharacterIterator(input);
for(char c = it.first(); c != CharacterIterator.DONE; c = it.next())
{
if (c == '~')
{
if (inToken)
{
ret.append(values.get(token.toString()));
token.setLength(0);
}
inToken = !inToken;
}
else
{
if (inToken)
{
token.append(c);
}
else
{
ret.append(c);
}
}
}
return ret.toString();
}
}
Great. The basic template expansion works, but it still feels a bit clumsy to me. I don't really like the way the values are built in to the class itself. Can we make it simpler and more flexible? I reckon so. let's change our tests a little:
TemplateTest.java
package tests;
import java.util.Map;
import java.util.HashMap;
import junit.framework.*;
import friki.TemplateEngine;
public class TemplateTest extends TestCase
{
Map values;
public void setUp()
{
values = new HashMap();
values.put("name", "Frank");
values.put("wife", "Margaret");
}
public void testEmptyTemplate()
{
TemplateEngine engine = new TemplateEngine(values);
assertEquals("", engine.expand(""));
}
public void testSingleToken()
{
TemplateEngine engine = new TemplateEngine(values);
assertEquals("Frank", engine.expand("~name~"));
}
public void testDifferentToken()
{
TemplateEngine engine = new TemplateEngine(values);
assertEquals("Margaret", engine.expand("~wife~"));
}
}
That's better. We now have much more control over how the template expander works. We can give it whatever names and values we want to use, and expect it to fill them in to a supplied template. The "setUp" method is run just before each of the test methods, and makes this test class into what is known as a "fixture".
By the way. Have you noticed that there is a lot of duplication in this code? The code to create a new TemplateEngine is exactly the same in all the test cases. In the spirit of keeping things as small and simple as possible, let's move them into the setUp method as well:
TemplateTest.java
package tests;
import java.util.Map;
import java.util.HashMap;
import junit.framework.*;
import friki.TemplateEngine;
public class TemplateTest extends TestCase
{
Map values;
TemplateEngine engine;
public void setUp()
{
values = new HashMap();
values.put("name", "Frank");
values.put("wife", "Margaret");
engine = new TemplateEngine(values);
}
public void testEmptyTemplate()
{
assertEquals("", engine.expand(""));
}
public void testSingleToken()
{
assertEquals("Frank", engine.expand("~name~"));
}
public void testDifferentToken()
{
assertEquals("Margaret", engine.expand("~wife~"));
}
}
Of course, if we try and run this test it will fail (it won't even compile!). So the TemplateExpander code needs to be brought in line with our new design. Notice how at each stage what we need drives the tests, then the tests drive the implementation.
TemplateTest.java
package friki;
import java.text.CharacterIterator;
import java.text.StringCharacterIterator;
import java.util.Map;
public class TemplateEngine
{
private Map values;
public TemplateEngine(Map values)
{
this.values = values;
}
public String expand(String input)
{
boolean inToken = false;
StringBuffer token = new StringBuffer();
StringBuffer ret = new StringBuffer();
CharacterIterator it = new StringCharacterIterator(input);
for(char c = it.first(); c != CharacterIterator.DONE; c = it.next())
{
if (c == '~')
{
if (inToken)
{
ret.append(values.get(token.toString()));
token.setLength(0);
}
inToken = !inToken;
}
else
{
if (inToken)
{
token.append(c);
}
else
{
ret.append(c);
}
}
}
return ret.toString();
}
}
And that is the "heart" of the templating system done. To make double sure it's what we wanted at the start, let's test the example template from last session:
public void testExamplePage()
{
values.put("title", "PageOne");
values.put("content", "This is the first page in our new Wiki");
assertEquals(
" <html><head><title>Friki: PageOne</title><head>\n" +
" <body>\n" +
" <h2>PageOne</h2>\n" +
" This is the first page in our new Wiki\n" +
" <hr width='100%'>\n" +
" <a href='edit?PageOne'>edit this page</a>\n" +
" </body>\n" +
" </html>\n",
engine.expand(
" <html><head><title>Friki: ~title~</title><head>\n" +
" <body>\n" +
" <h2>~title~</h2>\n" +
" ~content~\n" +
" <hr width='100%'>\n" +
" <a href='edit?~title~'>edit this page</a>\n" +
" </body>\n" +
" </html>\n"));
}
5 tests passed. Cool.
Although the implementation given here works as much we need it to (as shown by that last test), you may want to think about what it can't do. What happens if we want to include a '~' character in our template? What happens if we ask it to expand a token it doesn't have a value for? If you are worried about these sort of questions, ask them in the form of a test, then "fix" the code. But remember - every time you add a feature that's not needed by the application right now, you are making the final program bigger, you are making bugs harder to find, you are making the code harder to read. So think. Do you really need that feature yet?
How are We Doing?
We still haven't made a Wiki yet! We have written, compiled, and run some real code to make sure we have a workable build environment.We have a useful "template expander" in a few lines of code which we can use for this project, but might also be handy in others. We have built a complete regression test suite which automatically tests every class and method in our system so if anything breaks we'll know straight away. We can go home confident that we are making real, measurable, repeatable progress, and come back fresh and ready next time.
Next session we'll add more customer features to our Wiki "engine" and look into automating the process of compiling and testing the code even more. In the meanwhile, I recommend reading more about JUnit. If you want to get ahead of the game, you could also look at the HTTPUnit web testing toolkit and the Ant build tool. If you want to read more about keeping the enjoyment in software development, check out my golden rules of stress-free programming.
Return to Top
|
RegexTutorial_02
This series of lessons covering regular expressions in Java was modeled
after the tutorial created to teach the com.stevesoft.pat package.
The com.stevesoft.pat package is available for download and use
from JavaRegex.com. It's an excellent
alternative package to harvest the power of regular expressions in Java.
This is the second part of a four part introduction to the java.util.regex
package. Part one can be found in The
September Newsletter.
Part 2: More Pattern Elements
Pattern Elements Introduced in this Lesson
One function of parentheses is to provide a grouping ability for parts of a
regular expression. The quantifiers and operators introduced in the previous
lesson, that were applied to a single character or character class, can then
be applied to a group.
String input =
"Fee! Fie! Foe! Fum! " +
"I smell the blood of an Englishman. " +
"Be he 'live, or be he dead, " +
"I'll grind his bones to make my bread." ;
Pattern pattern = Pattern.compile( "(F[a-z]{2}! ){4}" );
// Matches four occurrences of a pattern that begins
// with "F" followed by two lower case letters, a "!"
// and a space.
Matcher matcher = pattern.matcher( input );
System.out.println( matcher.find() ); // Prints true.
System.out.println( matcher.group() ); // Prints "Fee! Fie! Foe! Fum! ". |
Capturing groups are numbered according to their appearance in the regular
expression. The first opening parenthesis is the start of the first capturing
group; the second opening parenthesis is the start of the second capturing group;
and so on. Each capturing group ends at the matching closing parenthesis. It
is possible to have one capturing group embedded in another. So, the pattern
"I (am (Sam))" has two capturing groups. The first capturing group
is the pattern "am Sam" and the second capturing group is the pattern
"Sam".
The capturing group count and corresponding matched subsequence data are maintained
in the Matcher object. A Matcher object's String
group( int ) method "returns the input subsequence captured
by the given group during the previous match operation." A Matcher
object's int groupCount() method "returns the number of capturing
groups in this matcher's pattern5."
Group count number zero refers to the entire pattern match, so matcher.group(
0 ) returns the entire previously matched subsequence and is equivalent
to matcher.group() . Note that capturing group number zero
is not included in the total group count returned by the groupCount()
method7.
Consider this mildly more involved example demonstrating capturing groups.
Note that the group construct limits the scope of the OR operator.
input =
"Humpty Dumpty sat on a wall. " +
"Humpty Dumpty had a great fall. " +
"All the king's horses and all the king's men " +
"Couldn't put Humpty together again! " ;
pattern = Pattern.compile( "((H|D)(umpty) ){2}" );
// Matches six characters ending in "umpty" and
// beginning with "H" or "D". Three capturing
// groups are defined and remembered by the Matcher.
matcher = pattern.matcher( input );
System.out.println( matcher.find() ); // Prints true.
System.out.println( matcher.groupCount() ); // Prints 3.
System.out.println( matcher.group( 1 ) ); // Prints "Dumpty ".
System.out.println( matcher.group( 2 ) ); // Prints "D".
System.out.println( matcher.group( 3 ) ); // Prints "umpty".
System.out.println( matcher.group( 0 ) ); // Prints "Humpty Dumpty ".
// If it was expected that matcher.group( 1 ) should contain
// "Humpty", then remember that the group( int ) method
// returns the input subsequence captured by the specified
// group during the previous match operation. This match
// operation was performed two times - the first time matching
// "Humpty" and the second time matching "Dumpty". |
Each matched group maintained in the Matcher object is called
a "back reference". Referencing a matched group as demonstrated above
is one style of back referencing in Java regular expressions. A later lesson
will introduce another style and use of back referencing.
A slight performance cost is associated with maintaining back references (the
group count and matched subsequence data) in the Matcher object.
The non-capturing group construct provides the function of grouping pattern
elements without the cost of remembering each matched group as a back reference.
The syntax for a non-capturing group is simply "(?:X)". A non-capturing
group functions much like a capturing group with the distinction that no capturing
group specific data is maintained in the Matcher .
input =
"Humpty Dumpty sat on a wall. " +
"Humpty Dumpty had a great fall. " +
"All the king's horses and all the king's men " +
"Couldn't put Humpty together again! " ;
pattern = Pattern.compile( "((?:H|D)(?:umpty) ){2}" );
// Matches six characters ending in "umpty" and
// beginning with "H" or "D". Three groups
// are defined, one is a capturing group that
// will be remembered by the Matcher.
matcher = pattern.matcher( input );
System.out.println( matcher.find() ); // Prints true.
System.out.println( matcher.groupCount() ); // Prints 1.
System.out.println( matcher.group( 1 ) ); // Prints "Dumpty ".
System.out.println( matcher.group( 0 ) ); // Prints "Humpty Dumpty ". |
According to J-Sprint's memory profiler,
when the previous two code examples (the searches for Humpty Dumpty) were performed
one hundred thousand times each, the non-capturing group strategy demonstrated
a performance improvement of roughly 0.0003% - not much to write home
about. Alternatively, tests against a larger input character sequence (50KB) composed
of one hundred groups per match, resulted in
the non-capturing group test consuming approximately 40% as much memory as
the capturing group test.
Java regular expressions provide two "look ahead" constructs. These
constructs allow the description of a pattern where a specified pattern only
matches if it is followed by the pattern described in the look ahead construct.
The pattern described in the look ahead construct is not part of any matched
subsequence described by the Matcher object - it is only a requirement
that must be met in order for the specified pattern to match. Though the look
ahead construct is contained within matching opening and closing parentheses,
it is a non-capturing group construct.
input =
"Today's specials are apple chocolate pie and cherry banana pie." ;
pattern = Pattern.compile( "(apple|cherry)(?= chocolate)" );
// Matches "apple" or "cherry" where the following pattern
// matches " chocolate". " chocolate" is not a part of the
// resulting match, it follows it.
matcher = pattern.matcher( input );
System.out.println( matcher.find() ); // Prints true.
System.out.println( matcher.groupCount() ); // Prints 1.
System.out.println( matcher.group( 1 ) ); // Prints "apple".
System.out.println( matcher.group() ); // Prints "apple".
pattern = Pattern.compile( "(apple|cherry)(?! chocolate)" );
// Matches "apple" or "cherry" where the following pattern
// does not match " chocolate".
matcher = pattern.matcher( input );
System.out.println( matcher.find() ); // Prints true.
System.out.println( matcher.groupCount() ); // Prints 1.
System.out.println( matcher.group( 1 ) ); // Prints "cherry".
System.out.println( matcher.group() ); // Prints "cherry". |
Two "look behind" constructs provide a similar function as the look
ahead constructs, the distinction being that the look behind constructs try
to match whatever precedes a specified pattern. The pattern described in the
look behind construct is not part of any matched subsequence described by the
Matcher object - it is only a requirement that must be met in order for the
specified pattern to match. The look behind construct is also non-capturing.
input =
"Tomorrow's special is fried bananas with baked clam." ;
pattern = Pattern.compile( "(?<=fried )(bananas|clam)" );
// Matches "bananas" or "clam" if preceded by "fried ".
// "fried " is not part of the resulting match, it precedes it.
matcher = pattern.matcher( input );
System.out.println( matcher.find() ); // Prints true.
System.out.println( matcher.groupCount() ); // Prints 1.
System.out.println( matcher.group( 1 ) ); // Prints "bananas".
System.out.println( matcher.group() ); // Prints "bananas".
pattern = Pattern.compile( "(?<!fried )(bananas|clam)" );
// Matches "bananas" or "clam" if not preceded by "fried ".
matcher = pattern.matcher( input );
System.out.println( matcher.find() ); // Prints true.
System.out.println( matcher.groupCount() ); // Prints 1.
System.out.println( matcher.group( 1 ) ); // Prints "clam".
System.out.println( matcher.group() ); // Prints "clam". |
During the critiquing process of this lesson, Jim Yingst pointed out an important
issue concerning look ahead and look behind constructs:
Normally when we use find() repeatedly to match a pattern
several times within a given input, each find() "consumes"
characters in the input string up to the last character matched by the
find. This means that a subsequent find() will not normally
"see" any characters which were already matched by previous finds. Lookahead
and lookbehind are workarounds for this limitation. We can look ahead
to match characters without consuming them, or we can look back to match
characters which were already consumed. |
This non-consumptive behavior of the look ahead and look behind constructs
is also known as zero-width matching.
Consider this related example, also suggested by Jim (and adapted to fit a
nursery rhyme).
input =
"John Jacob Jingleheimer Schmidt " +
"His name is my name, too! " +
"Whenever we go out, " +
"The people always shout " +
"There goes John Jacob Jingleheimer Schmidt!" ;
pattern = Pattern.compile( "(J\\w+)(?=.+Schmidt )" );
// Matches all words starting with "J" that
// precede "Schmidt " (note the space following the t).
// The ".+Schmidt " part of the regular
// expression is not consumed.
matcher = pattern.matcher( input );
while ( matcher.find() ) // Prints
{ // "John"
System.out.println( matcher.group() ); // "Jacob"
} // "Jingleheimer" |
As previously mentioned, the Pattern class contains two static
factory methods that create and return references to a Pattern
object. Pattern.compile( String ) creates a new Pattern
object from the specified String . Pattern.compile( String ,
int ) creates a new Pattern object from the specified
String with the specified flag. This integer flag changes the way
a Pattern matches an input sequence, such as turning on or off
case sensitivity or whether the "." meta character can match a line
terminator.
Another way to specify flags that adjust the way a Pattern matches
is with a flag construct embedded in the regular expression itself. This construct
takes the form "(?flags)" and appears as part of the String
used to construct a Pattern . The embedded flag construct is a non-capturing
group construct (in other words, the opening parenthesis does not count towards
the overall group count). The embedded flag construct can be used in combination
with the flag passed to the Pattern.compile( String , int )
factory method.
Available embedded flag constructs and available flags to specify when constructing
a new Pattern object include:
embedded flags |
construction flags |
meanings * |
(?i) |
Pattern.CASE_INSENSITIVE |
Enables case-insensitive matching. |
(?d) |
Pattern.UNIX_LINES |
Enables Unix lines mode. |
(?m) |
Pattern.MULTILINE |
Enables multi line mode. |
(?s) |
Pattern.DOTALL |
Enables "." to match line terminators. |
(?u) |
Pattern.UNICODE_CASE |
Enables Unicode-aware case folding. |
(?x) |
Pattern.COMMENTS |
Permits white space and comments in the pattern. |
--- |
Pattern.CANON_EQ |
Enables canonical equivalence. |
* Please refer to The
Pattern Class Documentation for more complete descriptions.
Consider an example with case insensitivity turned on.
input =
"Hey, diddle, diddle, " +
"The cat and the fiddle, " +
"The cow jumped over the moon. " +
"The little dog laughed " +
"To see such sport, " +
"And the dish ran away with the spoon." ;
pattern = Pattern.compile( "the \\w+?(?=\\W)" , Pattern.CASE_INSENSITIVE );
// Matches "the " followed by any word, regardless of case.
matcher = pattern.matcher( input );
while ( matcher.find() ) // Prints
{ // The cat
System.out.println( matcher.group() ); // the fiddle
} // The cow
// the moon
// The little
// the dish
// the spoon |
An equivalent Pattern could be constructed using the "(?i)" embedded
flag. If embedded at the beginning of the regular expression, this embedded
flag would affect the entire regular expression as would any integer flag specified
in Pattern.compile( String , int ) . In the previous
example, pattern = Pattern.compile( "(?i)the \\w+?(?=\\W)" ) would
have resulted in the same matched subsequences.
Multiple flags can be specified as embedded flags or as the integer argument
to Pattern.compile( String , int ) . To specify
multiple embedded flags, simply list them, one after the other. "(?is)"
would specify that the Pattern is to match regardless of case and
the "." meta character can match line terminators. To specify multiple
flags as the integer argument to Pattern.compile( String , int ) ,
OR together the integer constants of the Pattern class that represent
the desired behavior.
Consider the following example that demonstrates equivalent use of multiple
embedded flags and OR'ed together integer constants passed as the integer flag
argument to Pattern.compile( String , int ) .
input =
"Green cheese,\n" +
"Yellow laces,\n" +
"Up and down\n" +
"The market places." ;
pattern = Pattern.compile( "(?is)[a-z]*,.[a-z]*" );
// Regardless of case, matches consecutive letters
// followed by a comma, any character, then more
// consecutive letters where the meta character "."
// may match line terminators.
matcher = pattern.matcher( input );
while ( matcher.find() ) // Prints
{ // cheese,
System.out.println( matcher.group() ); // Yellow
} // laces,
// Up
int flags = Pattern.CASE_INSENSITIVE | Pattern.DOTALL ;
pattern = Pattern.compile( "[a-z]*,.[a-z]*" , flags );
// Regardless of case, matches consecutive letters
// followed by a comma, any character, then more
// consecutive letters where the meta character "."
// may match line terminators.
matcher = pattern.matcher( input );
while ( matcher.find() ) // Prints
{ // cheese,
System.out.println( matcher.group() ); // Yellow
} // laces,
// Up |
The embedded flag construct affects the parts of a regular expression that
appear after the embedded flag. If an embedded flag appears in the middle of
a regular expression, then only the half of the expression, that appears after
the flag, is potentially affected.
Using an embedded flag construct, it is possible to specify that only a section
(a group) of a regular expression be affected by the flag. The syntax for such
a construct is "(?flags:X)" , where "X" is the regular
expression to be affected by the flags. This limits the scope of the flags to
within the closing parentheses.
The embedded flag construct can also be used to turn off flags. The syntax
to turn off a flag or flags is "(?-flags)". So, it is possible to
specify a flag be turned on for the entire regular expression by specifying
the appropriate integer using Pattern.compile( String , int )
and that the same flag should be turned off for a specified group of the regular
expression using "(?-flag:X)".
input =
"HARK! HARK! The dogs do bark, " +
"The beggars are coming to town. " +
"Some in rags, " +
"And some in tags, " +
"And one in a velvet gown!" ;
pattern = Pattern.compile( "(?-i:[A-Z])[A-Z]*" , Pattern.CASE_INSENSITIVE );
// Matches any word, regardless of case except
// the first letter which must be capitalized.
matcher = pattern.matcher( input ); // Prints
while ( matcher.find() ) // HARK
{ // HARK
System.out.println( matcher.group() ); // The
} // The
// Some
// And
// And |
The embedded flag constructs are non-capturing. So, the enclosing parentheses
do not contribute to the Matcher's group count.
Last Updated: 2003.03.31 0135
Return to Top
|
Movin' them doggies on
the Cattle Drive
It's where you come to learn Java, and just like the cattle drivers
of the old west, you're expected to pull your weight along the way.
The Cattle Drive forum is where the drivers get together to complain,
uh rather, discuss their assignments and encourage each other. Thanks to the
enthusiastic initiative of Johannes de Jong, you can keep track of your progress on the drive
with the Assignment Log. If you're tough enough to get through the
nitpicking, you'll start collecting moose heads at the Cattle Drive Hall of Fame.
Gettin' them doggies...
We got 'round about a good dozen ranchers drivin' along the trail, most of 'em in
rasslin' good ol' Java-4 Say and Servlets-4 assignments.
Fresh riders on the Drive...
Got a new Cattle Driver signed up on the assignment log, and this 'un a real bronco rider.
A big welcome to our latest new rider:
John Hembree.
This rider's been purdy busy, we'll tell y'all why in just a bit...
Another moose on the wall
for...
Yep, that's right, you make it through, you get yerself a nice
little moose to hang on the wall. Well, OK, so it's a virtual moose
on a virtual wall, but you can be right proud of 'em! Thanks to the
efforts of Marilyn deQueiroz, you can now proudly admire your
codin' accomplishments at the recently opened Cattle Drive Hall of Fame. Check it out, pardner, it's
worth a wink.
Juliane Gross has been ridin' real steady and bagged her second
moose on the OOP Trail of the drive. We know her secret to success: chocolate! Keep it
wrapped up in yer saddle bag cowgirl, that's stuff is more valuable than a bag o' gold nuggets!
Richard Hawkes is gettin' right comfy in the saddle, he bagged a moose on the Java
Basics part of the Drive. Nice ridin' Richard!
Cattle Drivers gittin' all kinds of honors...
Wouldn't ya know it, a couple hard workin' Cattle Drivers went an' took a side trail and
came back with a little extra dust and a purdey little cerrrrtificate to boot: Barry Gaunt
and John Hembree got themselves a SCJP! Big congrats fellas.
Nitpicking is hard work too...
We know they're workin' reeeeeally hard, after all, we've seen what those assignments
look like once the nitpickers have combed through 'em. Hats off to Marilyn deQueiroz, Pauline McNamara, and Jason Adam for their dedication and patience with the pesky
vermin that always manage to make their way into those assignments.
Those gentlemen behind the bar...
Notice the fellas wipin' the bar and shinin' up the sarsparila glasses? Updating that assignment log
is tough work too. We got us a another batch of bartenders. Big thanks to Dirk Schreckmann
and Matthew Phillips and welcome to Michael Matola and Barry Gaunt.
Mosey up and place yer orders.
Tips for the Trail...
You know the feeling, right? Yer bustin' yer head over that ol' favorite, Java-4 Say, an' you jest
asks yerself, "So why the tootin' am I doing this to myself?!" Well hang in there cowpokes, there's
life outside the Cattle Drive, we got proof, and all that hair you been pulling outta yer head will
pay off one day. Yep, check out this real life
testimonial - ain't no slick talkin' it's the truth!
Pauline McNamara
Return to Top
|
Book Review of the Month
Mastering Regular Expressions Jeffrey E. F. Friedl | | | |
Regular Expressions ("regexes" for short), have been officially integrated into Java with the release of J2SE 1.4. While many Java developers are just discovering them, they have been a fixture in other languages and tools for quite some time. Regular expressions are powerful tools for performing all kinds of text processing, but they require no small amount of knowledge to use effectively and efficiently. This is where "Mastering Regular Expressions" comes to the rescue.
The books nine chapters are categorized into three sections. The book first teaches the basics of regular expressions, crafting simple regexes, and the different features and flavors available in various regex packages. Next, the reader is given invaluable information about how the different types of regular expression engines work, as well as techniques for crafting practical and efficient expressions. The final section covers language specific issues in Perl, Java, and .NET.
The author does an outstanding job leading the reader from regex novice to master. The book is extremely easy to read and chock full of useful and relevant examples. The author offers up questions along the way designed to engage the reader to apply what he has learned. In-line references to other parts of the book containing information pertinent the particular topic being discussed are also very helpful.
Regular expressions are valuable tools that every developer should have in their toolbox. "Mastering Regular Expressions" is the definitive guide to the subject, and an outstanding resource that belongs on every programmer's bookshelf.
(Jason Menard - Bartender, March 2003)
| | |
More info at Amazon.com ||
More info at Amazon.co.uk
| |
Return to Top
|
April Scheduled Book
Promotions:
Return to Top
|
Managing Editor: Johannes de Jong
Comments or suggestions for JavaRanch's NewsLetter can be sent to the NewsLetter Staff
For advertising opportunities contact NewsLetter Advertising Staff
|