Javaranch rocks at the Software Development Conference

Kathy Sierra, Apil 2003

Imagine a huge ballroom filled to capacity with developers (including more than a handful of alpha geeks and techno-luminaries). Two projection screens, each as large as a two-story building, are at the front. A hush falls over the crowd as the finalists for the Jolt Cola Award in the Web Sites category are displayed. IBM, BEA, Microsoft, Javararanch, ... what? Javaranch? Up against the likes of the big corporate developer sites? Yes, there was the name Javaranch in a font that had to be at least 7000 points.

There was a drum roll... (no, seriously, there really *was* a drum roll) and the winners of the Software Development Productivity Awards were announced. Javaranch is a winner! The crowd goes wild.

So we didn't win the coveted Jolt Cola award, but what's that about anyway? Javaranch is all about PRODUCTIVITY, not beverages with an illegal amount of caffeine.

Most importantly, Javaranch was a finalist in a category of huge, big-name, big-budget developer sites. And here we are, 100% volunteer. Paul couldn't make it, so I was there to accept the award for him, and if we'd won the Jolt award, I would have had to give a short acceptance speech. Here's what I would have said:

"Javaranch is an all-volunteer site. There's no company, no budget, no "Press three for worldwide marketing" on the phone system. There's no phone. The value of javaranch is the hundreds of people who run the site, man the site, and moderate the forums. The value of javaranch is in the hundreds of thousands who participate, asking and answering questions to help one another out. This award is for all those individuals, and to Java -- for being a language that inspires that much passion.

You'll never see this much enthusiasm at a .netRanch." (OK, I wouldn't have said that last thing, but I would have been thinking it strongly enough to telepathically convey it to at least the entire front row of the audience).

I was intensely proud to be there representing Javaranch.

Return to Top

Win the hottest new Java teaching book on the block

Head First Java

Kathy Sierra and Bert Bates have kindly donated a signed, yes signed, copy of their latest book, Head First Java, to the 1st person that posts the correct solution to their JavaCross crossword from chapter 10 of their book.

If you wish to win the book you must post your attempt in the thread JavaCross in the Java in General (beginner) forum here on the JavaRanch. The thread will be started by myself, Johannes de Jong, on April the 3rd. I'll attempt to start the thread as close to 07:00 am server time as possible.

So set your alarm or try and stay awake, depending on where in the world you are, as the post time in the thread will determine the winner. If you edit your reply to change your attempt, the entry becomes invalid. If you want to change your attempt, post a new attempt.

Good luck


This puzzle is focused on Chapter 10 content (exceptions), but anything from the first nine chapters is fair game.

Across		Down
1. To give value 4. Flew off the top 6. All this and more! 8. Start 10. The family tree 13. No ducking 15. Problem objects 18. One of Java's '49'	20. Class hierarchy 21. Too hot to handle 24. Common primitive 25. Code recipe 27. Unruly method action 28. Defined behavior 29. Start a chain reaction	2. Currently usable 3. Template's creation 4. Don't show the kids 5. Mostly static API class 7. Not about behavior 9. The template 11. Roll another one off the line	12. javac saw it coming 14. Attempt risk 16. Automatic acquisition 17. Changing method 19. Announce a duck 22. Deal with it 23. Create bad news 26. One of my roles

Return to Top

Mutable and Immutable Objects

by David O'Meara

Mutable and Immutable objects are a simple idea with wide ranging consequences. Often when immutability is mentioned it isn't defined, or when it is defined there are no details of how to ensure immutability or the consequences.

Before we start, the terminology follows the article Pass-by-Value, Please in the JavaRanch Camp fire. If you haven't learnt to say Java is pass-by-value, you might want to head there first.

Crappy Definition to start off with:

Mutable Objects: When you have a reference to an instance of an object, the contents of that instance can be altered
Immutable Objects: When you have a reference to an instance of an object, the contents of that instance cannot be altered

Immutability and Instances

To demonstrate this behaviour, we'll use java.lang.String as the immutable class and java.awt.Point as the mutable class.

	Point myPoint = new Point( 0, 0 );
	System.out.println( myPoint );
	myPoint.setLocation( 1.0, 0.0 );
	System.out.println( myPoint );

	String myString = new String( "old String" );
	System.out.println( myString );
	myString.replaceAll( "old", "new" );
	System.out.println( myString );

In case you can't see what the output is, here it is:

	java.awt.Point[0.0, 0.0]
	java.awt.Point[1.0, 0.0]
	old String
	old String

We are only looking at a single instance of each object, but we can see that the contents of myPoint has changed, but the contents of myString did not. To show what happens when we try to change the value of myString, we'll extend the previous example.

	String myString = new String( "old String" );
	System.out.println( myString );
	myString = new String( "new String" );
	System.out.println( myString );

The output from this is:

	old String
	new String

Now we find that the value displayed by the myString variable has changed. We have defined immutable objects as being unable to change in value, so what is happening? Let's extend the example again to watch the myString variable closer.

	String myString = new String( "old String" );
	String myCache = myString;
	System.out.println( "equal: " + myString.equals( myCache ) );
	System.out.println( "same:  " + ( myString == myCache ) );

	myString = "not " + myString;
	System.out.println( "equal: " + myString.equals( myCache ) );
	System.out.println( "same:  " + ( myString == myCache ) );

The result from executing this is:

	equal: true
	same:  true
	equal: false
	same:  false

What this shows is that variable myString is referencing a new instance of the String class. The contents of the object didn't change; we discarded the instance and changed our reference to a new one with new contents.

Variable Values and Instance Contents

If you look at the example above, you can see the point I'm trying to sneak through. You can always change the value of a variable by getting your variable to reference a new object. Sometimes you can change the value of a variable by keeping a reference to the same instance, but change the contents of the instance.

After you have eliminated those possibilities, you have a variable that retains its reference to an object, but the contents of this object cannot change. Doesn't sound like a very interesting idea, and it sounds a bit too simple to be useful.

It turns out that Immutable Objects, that is objects that you cannot change the contents after they have been set, are a very handy tool when used in the right place. They can promote thread safety in your code, you can share them around without being afraid that they will change without your knowledge, they are great for caching and constants. But we're not going to cover any of that yet; we are going to concentrate on building immutable objects.

Building an Immutable class

So what is it about the String class that makes it Immutable while a Point is mutable?

In this case, Strings have no mutators while Points do. If we removed all of the mutators from the Point class, would it be Immutable? No it wouldn't. Removing mutators is a necessary first step, but immutability requires more than that to ensure that the contents of an instance never changes.

Fields must be private

Obviously all of the fields must be private. There is little point removing the mutators if they aren't even required to change the instance contents.

	public class ImmutablePoint
	{
		//note there are no mutators!
		private double x;
		private double y;

		//and the rest...

This is almost enough, but there are two more steps to consider.

Make sure methods can't be overridden.

If your class gets extended, it could add extra fields that are not immutable, or the methods could be overridden to return a different value each time. There are two ways to protect against this.

The preferred way is to make the class final. This is sometimes referred to as "Strong Immutability". It prevents anyone from extending your class and accidentally or deliberately making it mutable.

The second way, also called "Weak Immutability" is to make your methods final. It allows others to extend your class to add more behaviour, but protects the original contract specified by the class. If you want a more verbose description, imagine a class A is weakly immutable. If you have an instance of object A, it is immutable. If someone creates class B that extends A, it is only the behaviour defined by the A class that is immutable. Any added behaviour from class B may not be immutable.

Protect mutable fields

The last requirement which many people fall victim too, is to build your immutable class from primitive types or immutable fields, otherwise you have to protect mutable fields from manipulation.

To highlight this problem, we'll use the example of a supposedly immutable class representing a person. Our class has a first and last name, as well as a date of birth.

import java.util.Date; public final class BrokenPerson { private String firstName; private String lastName; private Date dob; public BrokenPerson( String firstName, String lastName, Date dob) { this.firstName = firstName; this.lastName = lastName; this.dob = dob; } public String getFirstName() { return this.firstName; } public String getLastName() { return this.lastName; } public Date getDOB() { return this.dob; } }
This all looks fine, until someone uses it like this:

Date myDate = new Date(); BrokenPerson myPerson = new BrokenPerson( "David", "O'Meara", myDate ); System.out.println( myPerson.getDOB() ); myDate.setMonth( myDate.getMonth() + 1 ); System.out.println( myPerson.getDOB() );
Depending on the dates entered, the output could be something like this:

	Mon Mar 24 21:34:16 GMT+08:00 2003
	Thu Apr 24 21:34:16 GMT+08:00 2003

The Date object is mutable, and the myPerson variable is referencing the same instance of the Date object as the myDate variable. When myDate changes the instance it is referencing, the myPerson instance changes too. It isn't immutable!

We can defend against this by taking a copy of the of the Date instance when it is passed in rather than trusting the reference to the instance we are given.

import java.util.Date; public final class BetterPerson { private String firstName; private String lastName; private Date dob; public BetterPerson( String firstName, String lastName, Date dob) { this.firstName = firstName; this.lastName = lastName; this.dob = new Date( dob.getTime() ); } //etc...
Now we're close, but we're still not quite there. Our class is still open to abuse.

BetterPerson myPerson = new BetterPerson( "David", "O'Meara", new Date() ); System.out.println( myPerson.getDOB() ); Date myDate = myPerson.getDOB(); myDate.setMonth( myDate.getMonth() + 1 ); System.out.println( myPerson.getDOB() );
We see here that taking a copy on the way in wasn't enough; we also need to prevent anyone from getting a reference to our mutable Date field when we pass it out.

public Date getDOB() { return new Date( this.dob.getTime() ); }

Make deep copies of mutable data

The only point to add is that when you copy the instance on the way in and the way out, you need to make a deep copy. Otherwise you run the risk of leaving some mutable data in your immutable class!

If you are confused about the need to provide a deep copy, keep in mind that a single piece of shared mutable data, no matter how deep it is buried inside an object, makes your class mutable. When you create a copy of an object to defend against the value changing, you need to make sure your copy doesn't include this shared mutable class. You need to copy any mutable objects all the way down to the last field, and copy any nested fields until you have a completely new copy of your own. It's the only way to be safe!

Our Template for Immutable Classes

Now we have a template for creating immutable objects.

Make all fields private
Don't provide mutators
Ensure that methods can't be overridden by either making the class final (Strong Immutability) or making your methods final (Weak Immutability)
If a field isn't primitive or immutable, make a deep clone on the way in and the way out.

Which classes are Immutable?

To finish up, lets discuss the common Java classes that are immutable and those that aren't. Firstly, all of the java.lang package wrapper classes are immutable: Boolean, Byte, Character, Double, Float, Integer, Long, Short, String.

As in the Person classes we discussed, java.util.Date objects are not immutable. The classes java.math.BigInteger and BigDecimal are not immutable either, although maybe they should have been.

And we're done...

...for now. This concludes an introduction to Mutable and Immutable Classes in Java. Hopefully there will be a second part that will go into more detail on weak and strong immutability, reasons why you'd make classes immutable and reasons to avoid them, and some other miscellaneous topics on immutable types in Java.

Return to Top

Small and Simple Web Applications - the Friki Way (Part 2)

Frank Carver, March 2003

Abstract

This article is the second of a series which laments the bloated and unmaintainable state of so many J2EE web applications and looks at ways to keep web applications small, simple and flexible. The series uses the author's Friki software as a case study, and discusses ways to design and build a powerful, flexible and usable web application in less space than a typical gif image.

This article continues the design process begun last time, and starts the serious work of coding a solution. On the way it discusses some techniques to help make software development more trustworthy and less stressful.

Introduction

If you've read the first article, you should be aware that the aim of this project is to develop a small, simple and understandable "Wiki" (editable web site) application. We've considered and decided to defer the relatively heavy decision of how to store and retrieve pages by introducing a Java interface, and thought through some ways of generating the HTML for each displayed page.

Before we really get started, there's one vital question which needs to be answered. It's such an important question, that it's often forgotten until it's too late!

How will we know when we're done?

I recommend that this question be asked and answered at the start of every software development project. I'm not suggesting that anyone build complex and detailed plans (also known as "guesses" or "wishes") before starting work. I'm not suggesting that some all-powerful "architect" lay down a complete design before starting the real work. I'm not suggesting that everything should be described in UML or pseudo code. Almost the opposite.

How will we know when we're done? is a simple plea to whoever wants the work done. A development team needs to know when to stop developing. Everyone involved in design and coding a solution needs to have an idea of the end goal, if they are to make sensible decisions along the way. The answers to this question can vary enormously, but beware of "answers" which don't actually answer the question: "stop when I tell you to stop" might as well be "don't bother doing any work, it won't make any difference"; "stop when the customer is happy" gives no guidance on what makes the customer happy - why not just buy him or her a beer and go home?

Useful answers to this question include things like "when it does this and this and this", "when average response time is less than 5 seconds under peak load" and so on. What these answers have in common is that they are measurable and testable. You could (theoretically, at least) make a test, and when it passes you can stop developing.

So let's ask our "virtual customer" this hard but very useful question. How will we know when we are done ?

For the purposes of this article, our customer says we will be "done" when we have a Java web application in which:

each page may be viewed using it's own unique URL
page content may contain links to other pages by name
links to nonexistent pages will be marked with a "?"
page content may be created and edited using just a browser
at least 100 different pages can be stored

That should be enough to get started. Let's see how simply and quickly we can code a solution to this.

Just one final reminder. The above points are our complete "acceptance criteria". Any solution which meets these goals is a valid one. We must free our minds from imagining any "requirements" which have not been asked for.

What do we write first?

So we know what we have to do, but we don't know where or how to start. Strangely enough, I'm not going to start with design. I'm not even going to start with coding a solution. I'm going to start with a test! I always start with a test to make sure I can compile and run something. Without that, there's not much point putting in a lot of effort to write any code. In this case, I'll use the JUnit test framework, which I've found very handy over the years. If you don't already have a recent version of JUnit, please download it from the above link before proceeding.

AllTests.java

package tests;

import junit.framework.*;

public class AllTests extends TestCase
{
    public static TestSuite suite()
    {
        TestSuite ret = new TestSuite();

        ret.addTest(new TestSuite(EmptyTest.class));

        return ret;
    }
}

If we try and compile this, it should fail. It should either complain about the absence of junit.framework in the import statement (which hints that the file "junit.jar" from the JUnit distribution needs to be in the classpath) or complain about the absence of the tests.EmptyTest class (which is fine. We haven't written it yet!). If it compiles without error you have either tried to compile the wrong file, or you already have both JUnit and a class tests.EmptyTest in your classpath, and you'll need to sort that out before we progress any further.

The most important thing to take away from this is that a failing test has given us a lot of information and confidence about the state of our system. The EmptyTest class still doesn't exist, so we can proceed and write it.

EmptyTest.java

package tests;

import junit.framework.*;

public class EmptyTest extends TestCase
{
    public void testEmpty()
    {
    }
}

Now try and compile these two classes. They should compile OK. We've passed our first test! Of course our "system" doesn't do much. There is some test code, but no actual product, but we know that we can compile some real Java code, including all that stuff about jar files and classpaths. And we have the start of a test "scaffolding" to help us build the real code.

The next tiny step is to run the test code. Type:

java junit.swingui.TestRunner tests.AllTests

You should see a nice graphical dialog with a green bar indicating that 1 test has been run, with no tests and no failures. Excellent. We are now ready to write a real test for a real feature.

A first feature

Last session I promised that we'd get started on our template system, so let's begin with that. Just as above, the process is to start with a test. Even with a tiny addition to some existing software, the most important question is still "How will we know when we're done?". Simple is good, so let's start simple - as simple as we can. Let's test that a template with no characters in it "expands" to a template with no characters in it. How about something like:

TemplateTest.java

package tests;

import junit.framework.*;

public class TemplateTest extends TestCase
{
    public void testEmptyTemplate()
    {
        TemplateEngine engine = new TemplateEngine();
        assertEquals("", engine.expand(""));
    }
}

Then add the line

        ret.addTest(new TestSuite(TemplateTest.class));

to Alltests.java, next to the similar line for "EmptyTest"

Trying to compile our system again should now tell us that there is no TemplateEngine. If you look carefully, though, it's actually complaining that there is no class "tests.TemplateEngine". To fix this we can make two changes. Add the line

import friki.TemplateEngine;

to TemplateTest.class and create a new class - the first of our real code:

TemplateEngine.java

package friki;

public class TemplateEngine
{
    public String expand(String input)
    {
        return input;
    }
}

Hold on. Isn't that cheating? That method will never "expand" anything! Pah!

This is very important. Look back to where I said "We must free our minds from imagining any "requirements" which have not been asked for". The code we have written passes all our tests. If you think it should do more, you are guessing.. Worse than that, there is a much more important problem with that code which we'll see in a minute. But first, if we want more code, we have to know when to stop coding. So we need more tests.

So let's think about an actual substitution. Say we want to substitute all occurrences of "~name~" with "Frank":

TemplateTest.java

package tests;

import junit.framework.*;
import friki.TemplateEngine;

public class TemplateTest extends TestCase
{
    public void testEmptyTemplate()
    {
        TemplateEngine engine = new TemplateEngine();
        assertEquals("", engine.expand(""));
    }

    public void testSingleToken()
    {
        TemplateEngine engine = new TemplateEngine();
        assertEquals("Frank", engine.expand("~name~"));
    }
}

We run the tests. They fail. Our TemplateEngine gives back "~name~", which is obviously not the same as "Frank". So let's fix the code

TemplateEngine.java

package friki;

import java.text.CharacterIterator;
import java.text.StringCharacterIterator;

public class TemplateEngine
{
    public String expand(String input)
    {
        boolean inToken = false;
        StringBuffer token = new StringBuffer();
        StringBuffer ret = new StringBuffer();

        CharacterIterator it = new StringCharacterIterator(input);
        for(char c = it.first(); c != CharacterIterator.DONE; c = it.next())
        {
            if (c == '~')
            {
                if (inToken)
                {
                    ret.append("Frank");
                    token.setLength(0);
                }

                inToken = !inToken;
            }
            else
            {
                if (inToken)
                {
                    token.append(c);
                }
                else
                {
                    ret.append(c);
                }
            }
        }

        return ret.toString();
    }
}

There are a few things to note about this code:

It's quite short. If I had wanted to use a more compact layout style, I could have fitted it into about 15 lines.
It uses system classes wherever possible, such as the fairly uncommon (but useful) CharacterIterator.
Third, there's no constructor, "getters and setters", member variables or any other nonsense. Just the code needed to extract tokens separated by '~' and return "Frank".

It's getting better, but still not complete. Yes, we need more tests. So let's see what happens if we ask for something else. Add the following to TemplateTest, and run it again:

    public void testDifferentToken()
    {
        TemplateEngine engine = new TemplateEngine();
        assertEquals("Margaret", engine.expand("~wife~"));
    }

The test fails, of course. Apparently I'm married to myself. Much though I like the name Frank, I think it's unfair to return it as the value of every token. But that "Frank" is hard coded. If we want to return "Frank" for one token, and "Margaret" for another we have to get the names and values from somewhere. Sounds like a Map to me:

TemplateEngine.java

package friki;

import java.text.CharacterIterator;
import java.text.StringCharacterIterator;
import java.util.Map;
import java.util.HashMap;

public class TemplateEngine
{
    private Map values;

    public TemplateEngine()
    {
        values = new HashMap();
        values.put("name", "Frank");
        values.put("wife", "Margaret");
    }

    public String expand(String input)
    {
        boolean inToken = false;
        StringBuffer token = new StringBuffer();
        StringBuffer ret = new StringBuffer();

        CharacterIterator it = new StringCharacterIterator(input);
        for(char c = it.first(); c != CharacterIterator.DONE; c = it.next())
        {
            if (c == '~')
            {
                if (inToken)
                {
                    ret.append(values.get(token.toString()));
                    token.setLength(0);
                }

                inToken = !inToken;
            }
            else
            {
                if (inToken)
                {
                    token.append(c);
                }
                else
                {
                    ret.append(c);
                }
            }
        }

        return ret.toString();
    }
}

Great. The basic template expansion works, but it still feels a bit clumsy to me. I don't really like the way the values are built in to the class itself. Can we make it simpler and more flexible? I reckon so. let's change our tests a little:

TemplateTest.java

package tests;

import java.util.Map;
import java.util.HashMap;
import junit.framework.*;
import friki.TemplateEngine;

public class TemplateTest extends TestCase
{
    Map values;

    public void setUp()
    {
        values = new HashMap();
        values.put("name", "Frank");
        values.put("wife", "Margaret");
    }

    public void testEmptyTemplate()
    {
        TemplateEngine engine = new TemplateEngine(values);
        assertEquals("", engine.expand(""));
    }

    public void testSingleToken()
    {
        TemplateEngine engine = new TemplateEngine(values);
        assertEquals("Frank", engine.expand("~name~"));
    }

    public void testDifferentToken()
    {
        TemplateEngine engine = new TemplateEngine(values);
        assertEquals("Margaret", engine.expand("~wife~"));
    }
}

That's better. We now have much more control over how the template expander works. We can give it whatever names and values we want to use, and expect it to fill them in to a supplied template. The "setUp" method is run just before each of the test methods, and makes this test class into what is known as a "fixture".

By the way. Have you noticed that there is a lot of duplication in this code? The code to create a new TemplateEngine is exactly the same in all the test cases. In the spirit of keeping things as small and simple as possible, let's move them into the setUp method as well:

TemplateTest.java

package tests;

import java.util.Map;
import java.util.HashMap;
import junit.framework.*;
import friki.TemplateEngine;

public class TemplateTest extends TestCase
{
    Map values;
    TemplateEngine engine;

    public void setUp()
    {
        values = new HashMap();
        values.put("name", "Frank");
        values.put("wife", "Margaret");
        engine = new TemplateEngine(values);
    }

    public void testEmptyTemplate()
    {
        assertEquals("", engine.expand(""));
    }

    public void testSingleToken()
    {
        assertEquals("Frank", engine.expand("~name~"));
    }

    public void testDifferentToken()
    {
        assertEquals("Margaret", engine.expand("~wife~"));
    }
}

Of course, if we try and run this test it will fail (it won't even compile!). So the TemplateExpander code needs to be brought in line with our new design. Notice how at each stage what we need drives the tests, then the tests drive the implementation.

TemplateTest.java

package friki;

import java.text.CharacterIterator;
import java.text.StringCharacterIterator;
import java.util.Map;

public class TemplateEngine
{
    private Map values;

    public TemplateEngine(Map values)
    {
        this.values = values;
    }

    public String expand(String input)
    {
        boolean inToken = false;
        StringBuffer token = new StringBuffer();
        StringBuffer ret = new StringBuffer();

        CharacterIterator it = new StringCharacterIterator(input);
        for(char c = it.first(); c != CharacterIterator.DONE; c = it.next())
        {
            if (c == '~')
            {
                if (inToken)
                {
                    ret.append(values.get(token.toString()));
                    token.setLength(0);
                }

                inToken = !inToken;
            }
            else
            {
                if (inToken)
                {
                    token.append(c);
                }
                else
                {
                    ret.append(c);
                }
            }
        }

        return ret.toString();
    }
}

And that is the "heart" of the templating system done. To make double sure it's what we wanted at the start, let's test the example template from last session:

    public void testExamplePage()
    {
        values.put("title", "PageOne");
        values.put("content", "This is the first page in our new Wiki");

        assertEquals(
            "  <html><head><title>Friki: PageOne</title><head>\n" +
            "  <body>\n" +
            "  <h2>PageOne</h2>\n" +
            "  This is the first page in our new Wiki\n" +
            "  <hr width='100%'>\n" +
            "  <a href='edit?PageOne'>edit this page</a>\n" +
            "  </body>\n" +
            "  </html>\n",
            engine.expand(
                "  <html><head><title>Friki: ~title~</title><head>\n" +
                "  <body>\n" +
                "  <h2>~title~</h2>\n" +
                "  ~content~\n" +
                "  <hr width='100%'>\n" +
                "  <a href='edit?~title~'>edit this page</a>\n" +
                "  </body>\n" +
                "  </html>\n"));
    }

5 tests passed. Cool.

Although the implementation given here works as much we need it to (as shown by that last test), you may want to think about what it can't do. What happens if we want to include a '~' character in our template? What happens if we ask it to expand a token it doesn't have a value for? If you are worried about these sort of questions, ask them in the form of a test, then "fix" the code. But remember - every time you add a feature that's not needed by the application right now, you are making the final program bigger, you are making bugs harder to find, you are making the code harder to read. So think. Do you really need that feature yet?

How are We Doing?

We still haven't made a Wiki yet! We have written, compiled, and run some real code to make sure we have a workable build environment.We have a useful "template expander" in a few lines of code which we can use for this project, but might also be handy in others. We have built a complete regression test suite which automatically tests every class and method in our system so if anything breaks we'll know straight away. We can go home confident that we are making real, measurable, repeatable progress, and come back fresh and ready next time.

Next session we'll add more customer features to our Wiki "engine" and look into automating the process of compiling and testing the code even more. In the meanwhile, I recommend reading more about JUnit. If you want to get ahead of the game, you could also look at the HTTPUnit web testing toolkit and the Ant build tool. If you want to read more about keeping the enjoyment in software development, check out my golden rules of stress-free programming.

Return to Top

RegexTutorial_02

An Introduction to `java.util.regex`

This series of lessons covering regular expressions in Java was modeled after the tutorial created to teach the com.stevesoft.pat package. The com.stevesoft.pat package is available for download and use from JavaRegex.com. It's an excellent alternative package to harvest the power of regular expressions in Java.

This is the second part of a four part introduction to the java.util.regex package. Part one can be found in The September Newsletter.

Part 2: More Pattern Elements

Pattern Elements Introduced in this Lesson

capturing groups and back references -- (X) , Matcher's group(int)
non-capturing groups -- (?:X)
"look ahead" and "look behind" constructs -- (?=X) , (?!X) , (?<=X) , (?<!X)
flags -- (?idmsux-idmsux) , (?idmsux-idmsux:X) , Pattern.compile(String, int)

Capturing Groups and Back References - (X) , Matcher's group(int)

One function of parentheses is to provide a grouping ability for parts of a regular expression. The quantifiers and operators introduced in the previous lesson, that were applied to a single character or character class, can then be applied to a group.

    String input =
      "Fee! Fie! Foe! Fum! " +
      "I smell the blood of an Englishman. " +
      "Be he 'live, or be he dead, " +
      "I'll grind his bones to make my bread." ;

    Pattern pattern = Pattern.compile( "(F[a-z]{2}! ){4}" );
    // Matches four occurrences of a pattern that begins 
    // with "F" followed by two lower case letters, a "!" 
    // and a space.

    Matcher matcher = pattern.matcher( input );

    System.out.println( matcher.find() );  // Prints true.
    System.out.println( matcher.group() ); // Prints "Fee! Fie! Foe! Fum! ".

Capturing groups are numbered according to their appearance in the regular expression. The first opening parenthesis is the start of the first capturing group; the second opening parenthesis is the start of the second capturing group; and so on. Each capturing group ends at the matching closing parenthesis. It is possible to have one capturing group embedded in another. So, the pattern "I (am (Sam))" has two capturing groups. The first capturing group is the pattern "am Sam" and the second capturing group is the pattern "Sam".

The capturing group count and corresponding matched subsequence data are maintained in the Matcher object. A Matcher object's String group( int ) method "returns the input subsequence captured by the given group during the previous match operation." A Matcher object's int groupCount() method "returns the number of capturing groups in this matcher's pattern⁵."

Group count number zero refers to the entire pattern match, so matcher.group( 0 ) returns the entire previously matched subsequence and is equivalent to matcher.group() . Note that capturing group number zero is not included in the total group count returned by the groupCount() method⁷.

Consider this mildly more involved example demonstrating capturing groups. Note that the group construct limits the scope of the OR operator.

    input =
      "Humpty Dumpty sat on a wall. " +
      "Humpty Dumpty had a great fall. " +
      "All the king's horses and all the king's men " +
      "Couldn't put Humpty together again! " ;

    pattern = Pattern.compile( "((H|D)(umpty) ){2}" );
    // Matches six characters ending in "umpty" and 
    // beginning with "H" or "D".  Three capturing 
    // groups are defined and remembered by the Matcher.

    matcher = pattern.matcher( input );

    System.out.println( matcher.find() );       // Prints true.
    System.out.println( matcher.groupCount() ); // Prints 3.
    System.out.println( matcher.group( 1 ) );   // Prints "Dumpty ".
    System.out.println( matcher.group( 2 ) );   // Prints "D".
    System.out.println( matcher.group( 3 ) );   // Prints "umpty".
    System.out.println( matcher.group( 0 ) );   // Prints "Humpty Dumpty ".

    // If it was expected that matcher.group( 1 ) should contain 
    // "Humpty", then remember that the group( int ) method 
    // returns the input subsequence captured by the specified 
    // group during the previous match operation.  This match 
    // operation was performed two times - the first time matching 
    // "Humpty" and the second time matching "Dumpty".

Each matched group maintained in the Matcher object is called a "back reference". Referencing a matched group as demonstrated above is one style of back referencing in Java regular expressions. A later lesson will introduce another style and use of back referencing.

Non-Capturing Groups - (?:X)

A slight performance cost is associated with maintaining back references (the group count and matched subsequence data) in the Matcher object. The non-capturing group construct provides the function of grouping pattern elements without the cost of remembering each matched group as a back reference. The syntax for a non-capturing group is simply "(?:X)". A non-capturing group functions much like a capturing group with the distinction that no capturing group specific data is maintained in the Matcher.

    input =
      "Humpty Dumpty sat on a wall. " +
      "Humpty Dumpty had a great fall. " +
      "All the king's horses and all the king's men " +
      "Couldn't put Humpty together again! " ;

    pattern = Pattern.compile( "((?:H|D)(?:umpty) ){2}" );
    // Matches six characters ending in "umpty" and 
    // beginning with "H" or "D".  Three groups 
    // are defined, one is a capturing group that 
    // will be remembered by the Matcher.

    matcher = pattern.matcher( input );

    System.out.println( matcher.find() );       // Prints true.
    System.out.println( matcher.groupCount() ); // Prints 1.
    System.out.println( matcher.group( 1 ) );   // Prints "Dumpty ".
    System.out.println( matcher.group( 0 ) );   // Prints "Humpty Dumpty ".

According to J-Sprint's memory profiler, when the previous two code examples (the searches for Humpty Dumpty) were performed one hundred thousand times each, the non-capturing group strategy demonstrated a performance improvement of roughly 0.0003% - not much to write home about. Alternatively, tests against a larger input character sequence (50KB) composed of one hundred groups per match, resulted in the non-capturing group test consuming approximately 40% as much memory as the capturing group test.

Look Ahead and Look Behind Constructs - (?=X) , (?!X) , (?<=X) , (?<!X)

Java regular expressions provide two "look ahead" constructs. These constructs allow the description of a pattern where a specified pattern only matches if it is followed by the pattern described in the look ahead construct. The pattern described in the look ahead construct is not part of any matched subsequence described by the Matcher object - it is only a requirement that must be met in order for the specified pattern to match. Though the look ahead construct is contained within matching opening and closing parentheses, it is a non-capturing group construct.

    input =
      "Today's specials are apple chocolate pie and cherry banana pie." ;

    pattern = Pattern.compile( "(apple|cherry)(?= chocolate)" );
    // Matches "apple" or "cherry" where the following pattern 
    // matches " chocolate".  " chocolate" is not a part of the 
    // resulting match, it follows it.

    matcher = pattern.matcher( input );

    System.out.println( matcher.find() );       // Prints true.
    System.out.println( matcher.groupCount() ); // Prints 1.
    System.out.println( matcher.group( 1 ) );   // Prints "apple".
    System.out.println( matcher.group() );      // Prints "apple".

    pattern = Pattern.compile( "(apple|cherry)(?! chocolate)" );
    // Matches "apple" or "cherry" where the following pattern 
    // does not match " chocolate".

    matcher = pattern.matcher( input );

    System.out.println( matcher.find() );       // Prints true.
    System.out.println( matcher.groupCount() ); // Prints 1.
    System.out.println( matcher.group( 1 ) );   // Prints "cherry".
    System.out.println( matcher.group() );      // Prints "cherry".

Two "look behind" constructs provide a similar function as the look ahead constructs, the distinction being that the look behind constructs try to match whatever precedes a specified pattern. The pattern described in the look behind construct is not part of any matched subsequence described by the Matcher object - it is only a requirement that must be met in order for the specified pattern to match. The look behind construct is also non-capturing.

    input =
      "Tomorrow's special is fried bananas with baked clam." ;

    pattern = Pattern.compile( "(?<=fried )(bananas|clam)" );
    // Matches "bananas" or "clam" if preceded by "fried ". 
    // "fried " is not part of the resulting match, it precedes it.

    matcher = pattern.matcher( input );

    System.out.println( matcher.find() );       // Prints true.
    System.out.println( matcher.groupCount() ); // Prints 1.
    System.out.println( matcher.group( 1 ) );   // Prints "bananas".
    System.out.println( matcher.group() );      // Prints "bananas".

    pattern = Pattern.compile( "(?<!fried )(bananas|clam)" );
    // Matches "bananas" or "clam" if not preceded by "fried ". 

    matcher = pattern.matcher( input );

    System.out.println( matcher.find() );       // Prints true.
    System.out.println( matcher.groupCount() ); // Prints 1.
    System.out.println( matcher.group( 1 ) );   // Prints "clam".
    System.out.println( matcher.group() );      // Prints "clam".

During the critiquing process of this lesson, Jim Yingst pointed out an important issue concerning look ahead and look behind constructs:

Normally when we use find() repeatedly to match a pattern several times within a given input, each find() "consumes" characters in the input string up to the last character matched by the find. This means that a subsequent find() will not normally "see" any characters which were already matched by previous finds. Lookahead and lookbehind are workarounds for this limitation. We can look ahead to match characters without consuming them, or we can look back to match characters which were already consumed.

This non-consumptive behavior of the look ahead and look behind constructs is also known as zero-width matching.

Consider this related example, also suggested by Jim (and adapted to fit a nursery rhyme).

    input =
      "John Jacob Jingleheimer Schmidt " +
      "His name is my name, too! " +
      "Whenever we go out, " +
      "The people always shout " +
      "There goes John Jacob Jingleheimer Schmidt!" ;

    pattern = Pattern.compile( "(J\\w+)(?=.+Schmidt )" );
    // Matches all words starting with "J" that 
    // precede "Schmidt " (note the space following the t). 
    // The ".+Schmidt " part of the regular 
    // expression is not consumed.

    matcher = pattern.matcher( input );
    while ( matcher.find() )                 // Prints 
    {                                        //   "John"
      System.out.println( matcher.group() ); //   "Jacob"
    }                                        //   "Jingleheimer"

Flags - (?idmsux-idmsux) , (?idmsux-idmsux:X) , and `Pattern.compile(String, int)`

As previously mentioned, the Pattern class contains two static factory methods that create and return references to a Pattern object. Pattern.compile( String ) creates a new Pattern object from the specified String. Pattern.compile( String , int ) creates a new Pattern object from the specified String with the specified flag. This integer flag changes the way a Pattern matches an input sequence, such as turning on or off case sensitivity or whether the "." meta character can match a line terminator.

Another way to specify flags that adjust the way a Pattern matches is with a flag construct embedded in the regular expression itself. This construct takes the form "(?flags)" and appears as part of the String used to construct a Pattern. The embedded flag construct is a non-capturing group construct (in other words, the opening parenthesis does not count towards the overall group count). The embedded flag construct can be used in combination with the flag passed to the Pattern.compile( String , int ) factory method.

Available embedded flag constructs and available flags to specify when constructing a new Pattern object include:

embedded flags	construction flags	meanings *
(?i)	`Pattern.CASE_INSENSITIVE`	Enables case-insensitive matching.
(?d)	`Pattern.UNIX_LINES`	Enables Unix lines mode.
(?m)	`Pattern.MULTILINE`	Enables multi line mode.
(?s)	`Pattern.DOTALL`	Enables "." to match line terminators.
(?u)	`Pattern.UNICODE_CASE`	Enables Unicode-aware case folding.
(?x)	`Pattern.COMMENTS`	Permits white space and comments in the pattern.
---	`Pattern.CANON_EQ`	Enables canonical equivalence.

* Please refer to The Pattern Class Documentation for more complete descriptions.

Consider an example with case insensitivity turned on.

    input =
      "Hey, diddle, diddle, " +
      "The cat and the fiddle, " +
      "The cow jumped over the moon. " +
      "The little dog laughed " +
      "To see such sport, " +
      "And the dish ran away with the spoon." ;

    pattern = Pattern.compile( "the \\w+?(?=\\W)" , Pattern.CASE_INSENSITIVE );
    // Matches "the " followed by any word, regardless of case.

    matcher = pattern.matcher( input );

    while ( matcher.find() )                   // Prints 
    {                                          //  The cat
        System.out.println( matcher.group() ); //  the fiddle
    }                                          //  The cow
                                               //  the moon
                                               //  The little
                                               //  the dish
                                               //  the spoon

An equivalent Pattern could be constructed using the "(?i)" embedded flag. If embedded at the beginning of the regular expression, this embedded flag would affect the entire regular expression as would any integer flag specified in Pattern.compile( String , int ) . In the previous example, pattern = Pattern.compile( "(?i)the \\w+?(?=\\W)" ) would have resulted in the same matched subsequences.

Multiple flags can be specified as embedded flags or as the integer argument to Pattern.compile( String , int ) . To specify multiple embedded flags, simply list them, one after the other. "(?is)" would specify that the Pattern is to match regardless of case and the "." meta character can match line terminators. To specify multiple flags as the integer argument to Pattern.compile( String , int ) , OR together the integer constants of the Pattern class that represent the desired behavior.

Consider the following example that demonstrates equivalent use of multiple embedded flags and OR'ed together integer constants passed as the integer flag argument to Pattern.compile( String , int ) .

    input =
      "Green cheese,\n" +
      "Yellow laces,\n" +
      "Up and down\n" +
      "The market places." ;

    pattern = Pattern.compile( "(?is)[a-z]*,.[a-z]*" );
    // Regardless of case, matches consecutive letters 
    // followed by a comma, any character, then more 
    // consecutive letters where the meta character "." 
    // may match line terminators.

    matcher = pattern.matcher( input );
    while ( matcher.find() )                 // Prints 
    {                                        //   cheese,
      System.out.println( matcher.group() ); //   Yellow
    }                                        //   laces,
                                             //   Up

    int flags = Pattern.CASE_INSENSITIVE | Pattern.DOTALL ;
    pattern = Pattern.compile( "[a-z]*,.[a-z]*" , flags );
    // Regardless of case, matches consecutive letters 
    // followed by a comma, any character, then more 
    // consecutive letters where the meta character "." 
    // may match line terminators.

    matcher = pattern.matcher( input );
    while ( matcher.find() )                 // Prints 
    {                                        //   cheese,
      System.out.println( matcher.group() ); //   Yellow
    }                                        //   laces,
                                             //   Up

The embedded flag construct affects the parts of a regular expression that appear after the embedded flag. If an embedded flag appears in the middle of a regular expression, then only the half of the expression, that appears after the flag, is potentially affected.

Using an embedded flag construct, it is possible to specify that only a section (a group) of a regular expression be affected by the flag. The syntax for such a construct is "(?flags:X)" , where "X" is the regular expression to be affected by the flags. This limits the scope of the flags to within the closing parentheses.

The embedded flag construct can also be used to turn off flags. The syntax to turn off a flag or flags is "(?-flags)". So, it is possible to specify a flag be turned on for the entire regular expression by specifying the appropriate integer using Pattern.compile( String , int ) and that the same flag should be turned off for a specified group of the regular expression using "(?-flag:X)".

    input =
      "HARK! HARK! The dogs do bark, " +
      "The beggars are coming to town. " +
      "Some in rags, " +
      "And some in tags, " +
      "And one in a velvet gown!" ;

    pattern = Pattern.compile( "(?-i:[A-Z])[A-Z]*" , Pattern.CASE_INSENSITIVE );
    // Matches any word, regardless of case except 
    // the first letter which must be capitalized.

    matcher = pattern.matcher( input );        // Prints 
    while ( matcher.find() )                   //   HARK
    {                                          //   HARK
        System.out.println( matcher.group() ); //   The
    }                                          //   The
                                               //   Some
                                               //   And
                                               //   And

The embedded flag constructs are non-capturing. So, the enclosing parentheses do not contribute to the Matcher's group count.

Notes and Resources
1	The Regular Expressions Tutorial at JavaRegex.com
2	`java.util.regex` Package API Documentation
3	`java.lang.CharSequence` is a new Interface in Java 1.4. A `CharSequence` is a readable sequence of characters. This interface provides uniform, read-only access to many different kinds of character sequences. `String` and `StringBuffer` both implement `CharSequence`. -- `CharSequence` API Documentation
4	`java.util.regex.Pattern` API Documentation
5	`java.util.regex.Matcher` API Documentation
6	The text for the nursery rhymes in this lesson can be found at http://www.zelo.com/family/nursery/index.asp .
7	Jim Yingst notes: The API documentation for the `groupCount()` method of `Matcher` is misleading - it should say "Any non-negative integer smaller than or equal to the value returned by this method is guaranteed to be a valid group index for this matcher." The italicized section does not appear in the current API. (It's fixed in 1.4.1 beta source code, but that's not what you see when you browse the API online.)
8	The Regular Expression Library
9	For further reading on regular expressions, take a look at Mastering Regular Expressions by Jeffrey Friedl. A sample chapter from O'Reilly is available on-line. Jeffrey Friedl maintains a website assiciated with his book at http://regex.info at which he has posted a nice list of alternative Java Regex Packages.

Last Updated: 2003.03.31 0135

Return to Top

Movin' them doggies on the Cattle Drive

It's where you come to learn Java, and just like the cattle drivers of the old west, you're expected to pull your weight along the way.

The Cattle Drive forum is where the drivers get together to complain, uh rather, discuss their assignments and encourage each other. Thanks to the enthusiastic initiative of Johannes de Jong, you can keep track of your progress on the drive with the Assignment Log. If you're tough enough to get through the nitpicking, you'll start collecting moose heads at the Cattle Drive Hall of Fame.

Gettin' them doggies...
We got 'round about a good dozen ranchers drivin' along the trail, most of 'em in rasslin' good ol' Java-4 Say and Servlets-4 assignments.

Fresh riders on the Drive...
Got a new Cattle Driver signed up on the assignment log, and this 'un a real bronco rider. A big welcome to our latest new rider: John Hembree. This rider's been purdy busy, we'll tell y'all why in just a bit...

Another moose on the wall for...
Yep, that's right, you make it through, you get yerself a nice little moose to hang on the wall. Well, OK, so it's a virtual moose on a virtual wall, but you can be right proud of 'em! Thanks to the efforts of Marilyn deQueiroz, you can now proudly admire your codin' accomplishments at the recently opened Cattle Drive Hall of Fame. Check it out, pardner, it's worth a wink.

Juliane Gross has been ridin' real steady and bagged her second moose on the OOP Trail of the drive. We know her secret to success: chocolate! Keep it wrapped up in yer saddle bag cowgirl, that's stuff is more valuable than a bag o' gold nuggets! Richard Hawkes is gettin' right comfy in the saddle, he bagged a moose on the Java Basics part of the Drive. Nice ridin' Richard!

Cattle Drivers gittin' all kinds of honors...
Wouldn't ya know it, a couple hard workin' Cattle Drivers went an' took a side trail and came back with a little extra dust and a purdey little cerrrrtificate to boot: Barry Gaunt and John Hembree got themselves a SCJP! Big congrats fellas.

Nitpicking is hard work too...
We know they're workin' reeeeeally hard, after all, we've seen what those assignments look like once the nitpickers have combed through 'em. Hats off to Marilyn deQueiroz, Pauline McNamara, and Jason Adam for their dedication and patience with the pesky vermin that always manage to make their way into those assignments.

Those gentlemen behind the bar...
Notice the fellas wipin' the bar and shinin' up the sarsparila glasses? Updating that assignment log is tough work too. We got us a another batch of bartenders. Big thanks to Dirk Schreckmann and Matthew Phillips and welcome to Michael Matola and Barry Gaunt. Mosey up and place yer orders.

Tips for the Trail...
You know the feeling, right? Yer bustin' yer head over that ol' favorite, Java-4 Say, an' you jest asks yerself, "So why the tootin' am I doing this to myself?!" Well hang in there cowpokes, there's life outside the Cattle Drive, we got proof, and all that hair you been pulling outta yer head will pay off one day. Yep, check out this real life testimonial - ain't no slick talkin' it's the truth!

Pauline McNamara

Return to Top

Book Review of the Month

Mastering Regular Expressions Jeffrey E. F. Friedl

	Regular Expressions ("regexes" for short), have been officially integrated into Java with the release of J2SE 1.4. While many Java developers are just discovering them, they have been a fixture in other languages and tools for quite some time. Regular expressions are powerful tools for performing all kinds of text processing, but they require no small amount of knowledge to use effectively and efficiently. This is where "Mastering Regular Expressions" comes to the rescue. The books nine chapters are categorized into three sections. The book first teaches the basics of regular expressions, crafting simple regexes, and the different features and flavors available in various regex packages. Next, the reader is given invaluable information about how the different types of regular expression engines work, as well as techniques for crafting practical and efficient expressions. The final section covers language specific issues in Perl, Java, and .NET. The author does an outstanding job leading the reader from regex novice to master. The book is extremely easy to read and chock full of useful and relevant examples. The author offers up questions along the way designed to engage the reader to apply what he has learned. In-line references to other parts of the book containing information pertinent the particular topic being discussed are also very helpful. Regular expressions are valuable tools that every developer should have in their toolbox. "Mastering Regular Expressions" is the definitive guide to the subject, and an outstanding resource that belongs on every programmer's bookshelf. (Jason Menard - Bartender, March 2003)
	More info at Amazon.com \|\| More info at Amazon.co.uk

Return to Top

April Scheduled Book Promotions:

April 1	Swing 2nd Ed	Matthew Robinson, Pavel Vorobiev	Manning	Swing/AWT/JFC	confirmed
April 8	Java 2 Programmer Exam Cram 2 (Exam CX-310-035)	Bill Brogden, Marcus Green	Que	Programmer Certification Study	confirmed
April 15	Java NIO	Ron Hitchens	O'Reilly	I/O and Streams	confirmed
April 22	Core JSTL	David Geary	Addison-Wesley	JSP	confirmed
April 29	Java Performance Tuning	Jack Shirazi	O'Reilly	Performance	confirmed

Return to Top

Managing Editor: Johannes de Jong

Comments or suggestions for JavaRanch's NewsLetter can be sent to the NewsLetter Staff

For advertising opportunities contact NewsLetter Advertising Staff

Javaranch rocks at the Software Development Conference

Win the hottest new Java teaching book on the block

Head First Java

Mutable and Immutable Objects

Crappy Definition to start off with:

Immutability and Instances

Variable Values and Instance Contents

Building an Immutable class

Fields must be private

Make sure methods can't be overridden.

Protect mutable fields

Make deep copies of mutable data

Our Template for Immutable Classes

Which classes are Immutable?

And we're done...

Small and Simple Web Applications - the Friki Way (Part 2)

Abstract

Introduction

What do we write first?

A first feature

How are We Doing?

An Introduction to java.util.regex

Part 2: More Pattern Elements

Capturing Groups and Back References - (X) , Matcher's group(int)

Non-Capturing Groups - (?:X)

Look Ahead and Look Behind Constructs - (?=X) , (?!X) , (?<=X) , (?<!X)

Flags - (?idmsux-idmsux) , (?idmsux-idmsux:X) , and Pattern.compile(String, int)

An Introduction to `java.util.regex`

Flags - (?idmsux-idmsux) , (?idmsux-idmsux:X) , and `Pattern.compile(String, int)`