Debugging Threads with a Little Help from javap
by Andrew Monkhouse

In a recent post, Swapnil Trivedi asked for help with some non thread-safe code that was not behaving the way he expected. While that particular problem could be solved without delving down into the depths of the produced bytecode, looking at it did present some interesting possibilities. So in this article I am going to look at a few simple threading issues, and solving them at a theoretical level, before eventually getting to a more complex example that may be easier to diagnose by looking at the bytecode.

While I will be describing the bytecode in my particular example in some detail, I would recommend readers to Corey McGlone's excellent article: ' Looking "Under the Hood" with javap' from the August 2004 JavaRanch Journal for more details on javap.

The code Swapnil presented could be boiled down to:

public class TestThreads extends Thread {
    static int i = 0;

    public void run() {
        for( ; i < 5; i++) {
            System.out.println(getName() + " " + i);
        }
    }

    public static void main(String args []) {
        new TestThreads().start();
        new TestThreads().start();
    }
}

I am sure that most of you will have experimented with code like this at some point. And you will have seen that it is not possible to determine which threads will run in which order. For example, many of you will have seen output such as:

Thread-0 0
Thread-0 1

While a later run might cause output such as:

Thread-1 0
Thread-1 1

And yet another might cause output such as:

Thread-0 0
Thread-1 1

And yet another might cause output such as:

Thread-1 0
Thread-0 1

All fairly standard and expected. But what about output such as:

Thread-0 0
Thread-1 0
Thread-1 1
Thread-0 2
Thread-0 3
Thread-1 4

That was the problem faced by Swapnil - not only did a number appear twice, but as a result 6 lines appeared in the output, not the expected 5.

Thinking back to the code being run, it is very straightforward logic:

        for( ; i < 5; i++) {
            System.out.println(getName() + " " + i);
        }

We can consider there to be at least 3 code fragments in that block of code (the reason for the "at least" statement will be made clearer when we solve our final problem). Lets rewrite that code as a comparable do ... while statement to make the 3 code fragments a bit more obvious:

        do {
            System.out.println(getName() + " " + i);
            i++;
        } while (i < 5);

We can now see that there is one line that is outputting data, another that is incrementing i and a third that is verifying that i is still in range.

Since the Java Thread Scheduler can swap the threads at any point, it is quite plausible that Thread-0 ran the fragment System.out.println(getName() + " " + i); then the Java Thread Scheduler swapped to Thread-1 and ran the same fragment before either of them had a chance to increment i.

Lets look at one possibility of that in slow motion:

  Thread-0              Thread-1               i    output
  ----------------      ------------------    --    ----------
  ...println(i)                                0    Thread-0 0
                        ...println(i)          0    Thread-1 0
                        i++                    1
                        (i < 5) -> continue    1
                        ...println(i)          1    Thread-1 1
  i++                                          2
  (i < 5) -> continue                          2
  ...println(i)                                2    Thread-0 2
  i++                                          3
  (i < 5) -> continue                          3
  ...println(i)                                3    Thread-0 3
                        i++                    4
                        (i < 5) -> continue    4
                        ...println(i)          4    Thread-1 4
                        i++                    5
                        (i == 5) -> stop       5
  (i == 5) -> stop                             5

So far this seems simple. We have managed to find a probable cause, and even demonstrated it in theory, all without worrying about the nasty 'javap' command - but don't worry, I haven't forgotten about it :-).

Lets go to a tiny bit more exotic case. In this example we will be incrementing and using the common variable in only one place:

public class TestThreads extends Thread {
    static int i = 0;

    public void run() {
        for (int j = 0; j < 50000; j++) {
            System.out.println(getName() + " " + i++);
        }
    }

    public static void main(String args []) {
        new TestThreads().start();
        new TestThreads().start();
    }
}

Here's a small section of the output:

...
Thread-0 10449
Thread-0 10450
Thread-0 10451
Thread-1 10453
Thread-1 10454
Thread-1 10455
...
Thread-1 12314
Thread-1 12315
Thread-1 12316
Thread-0 10452
Thread-0 12317
Thread-0 12318
...

Did you notice that the value "10451" is missing in the first block then appeared in the middle of the second block? But the variable i is only being referenced in one place? So therefore Java must consider this to be more than a single statement.

Thinking about it for a moment, we do know that the println() method only accepts a String. Therefore logically that line could be rewritten as:

            String output = getName() + " " + i++;
            System.out.println(output);

This allows us to logically see how the output above occurred - Thread-0 had incremented the value of i when creating the output string, but got swapped out before using it in the println() method.

Making my example a little more complex, we are now using the same variable twice, but in what appears to be a single line of code:

public class TestThreads extends Thread {
    static int i = 0;

    public void run() {
        for (int j = 0; j < 50000; j++) {
            String output = getName() + " " + ++i + " " + i;
            System.out.println(output);
        }
    }

    public static void main(String args []) {
        new TestThreads().start();
        new TestThreads().start();
    }
}

Here's a small section of the output:

...
Thread-0 69067 69067
Thread-0 69068 69068
Thread-0 69069 69069
Thread-1 69071 69071
Thread-1 69072 69072
Thread-1 69073 69073
...
Thread-1 70778 70778
Thread-1 70779 70779
Thread-0 69070 70779
Thread-0 70780 70780
Thread-0 70781 70781
...

Now we have something really interesting - the line that is out of sequence has 69070 in the first column and 70779 in the second column. It appears that Java swapped the threads while in the middle of the following line, which is building the String:

            String output = getName() + " " + ++i + " " + i;

This is interesting - the answer to why this is happening is in the API documentation for the String class, but at only a single sentence it is easy to miss - it can be worthwhile to look at the API documentation now and see if you can spot the problem before continuing.

In cases like this, where it is not immediately obvious what the problem is, it can be valuable to create a very simple test case that demonstrates the problem, then look at what the bytecode shows you. That is effectively what the code we are using is - the smallest amount of code that clearly shows a problem occurring.

Rather than dump the entire javap -c TestThreads output and then discuss it, I am going to discuss some of the simpler sections first and get them out of the way, then move on to the all important run() method and break it up to make it simpler to understand (yes, the entire 4 lines of the run() method shown above do deserve to be broken up as you will see).

In the following output I have reduced fully qualified class descriptions in the javap comments to the simpler class name so that I do not need to wrap lines. For example, where you would normally see java/lang/Thread referred to in the javap comments, I have reduced it to Thread.

When we run javap -c TestThreads, we are first told what source code contains this particular class, then given the definition of the class:

Compiled from "TestThreads.java"
public class TestThreads extends java.lang.Thread{

Next we have the definition of our static variable:

static int i;

Notice that in our source code we had provided an initial value for i but it is not listed here? That is because the java compiler moved the initialization to the static block. Normally we find the static block much later in the javap output, but I have moved it here to fit with the discussion's flow:

static {};
  Code:
    0:iconst_0
    1:putstatic      #8; //Field i:I
    4:return
So we can see that the constant value 0 will be put into storage area #8 which is now effectively our Field i. By the way - the reason this is storage area #8 is because for the sake of narrative flow I have moved that method from where it appears in the output. Normally it would appear at the end of the javap output. When we get to the later methods we will see the other storage areas being used.

Keeping the narrative approach, I am going to skip code once again to the main() method, which looks like:

public static void main(java.lang.String[]);
  Code:
    0:new            #13; //class TestThreads
    3:dup
    4:invokespecial  #14; //Method "<init>":()V
    7:invokevirtual  #15; //Method start:()V

   10:new            #13; //class TestThreads
   13:dup
   14:invokespecial  #14; //Method "<init>":()V
   17:invokevirtual  #15; //Method start:()V

   20:return

While not being completely straightforward, this is fairly close to our source code of two instances of new TestThreads().start(); The bytecode is a bit more verbose, but basically it creates a new instance of the TestThreads class, runs the special init method, then runs the start method. Not too difficult.

While off-topic, I am sure somebody is going to ask me what the dup commands are doing in that block, so here goes ... We create a reference to the new instance of the class in line zero, then duplicate the reference in line 3 (hence dup). We use one reference when we call the <init> method, then use the other reference when we invoke the start method.

As for why the invocations of the methods remove the reference to class from the stack - that is a longer discussion that is more suited to the advanced forum (or a computer science course).

For the sake of completeness, I will also mention that the default constructor will appear in our decompiled output, as shown below:

public TestThreads();
 Code:
    0:aload_0
    1:invokespecial   #1; //Method Thread."<init>":()V
    4:return

Whenever you do not provide any constructors, the Java compiler will create a default constructor for you, which will call the parent class' default constructor. And this is what we see here - the created constructor calls the default constructor for the Thread class.

Finally the bit you have been waiting for - now that everything has been set up, we get to the run() method. Just to save you scrolling backwards and forwards, here is the Java code for that method:

    public void run() {
        for (int j = 0; j < 50000; j++) {
            String output = getName() + " " + ++i + " " + i;
            System.out.println(output);
        }
    }

So we start with the initialization of j to zero:

public void run();
 Code:
    0:iconst_0
    1:istore_1

That is the code that equates to j = 0; in our initializer.

We immediately reload j and compare it with the value in storage area #2, which happens to be our magic number of 50000. If the two are equal we jump to line 67 (which is the implicit return statement at the end of the method).

    2:iload_1
    3:ldc            #2; //int 50000
    5:if_icmpge   67

We reloaded j, because this is a separate statement as far as both our original source code and the resultant code is concerned. This is the code that equates to j < 50000 in our evaluation expression, and we return to this block of code each time we go through the loop.

If we are at this line, it must mean that j is less than 50000. So we need to build our String. We start by creating a StringBuilder to hold the interim string:

    8:new            #3; //class StringBuilder
   11:dup
   12:invokespecial  #4; //Method StringBuilder."<init>":()V
   15:aload_0

(hmmm, StringBuilder? More on that later)

The first thing we want to do is find out the name of our current thread, and append it to the StringBuilder:

   16:invokevirtual  #5; //Method getName:()LString;
   19:invokevirtual  #6; //Method StringBuilder.append:(LString;)LStringBuilder;

We can then load the String " " and append it to the StringBuilder:

   22:ldc            #7; //String
   24:invokevirtual  #6; //Method StringBuilder.append:(LString;)LStringBuilder;

Then we will load the contents of the field i, increment it by 1, store a copy back in field i, and append a copy to the StringBuilder:

   27:getstatic      #8; //Field i:I
   30:iconst_1
   31:iadd
   32:dup
   33:putstatic      #8; //Field i:I
   36:invokevirtual  #9; //Method StringBuilder.append:(I)LStringBuilder;

We then append the same String " " used previously in line 22 to our StringBuilder:

   39:ldc            #7; //String
   41:invokevirtual  #6; //Method StringBuilder.append:(LString;)LStringBuilder;

Note that we are constantly told that the Java compiler will recognize when we are reusing a String - here is an example in action. Even though our original line was getName() + " " + ++i + " " + i, the Java compiler recognized that the two instances of " " were the same, and only used the one reference to storage area #7 to get to the single instance of that String that it has put into the class file.

Now we reload the value of i, and append it:

   44:getstatic      #8; //Field i:I
   47:invokevirtual  #9; //Method StringBuilder.append:(I)LStringBuilder;

Finally our interim string building is complete and we can convert it back to a real String:

   50:invokevirtual  #10; //Method StringBuilder.toString:()LString;
   53:astore_2

Then we can output the string:

   54:getstatic      #11; //Field System.out:LPrintStream;
   57:aload_2
   58:invokevirtual  #12; //Method PrintStream.println:(LString;)V

Finally we increment j by 1, and then go back to the top of the loop to repeat:

   61:iinc    1, 1
   64:goto    2

Just for completeness, we do have one more line: the implicit return statement at the end of the method.

   67:return

So - what does all this mean?

It means that the following line from our original code is not considered a single statement by Java:

            String output = getName() + " " + ++i + " " + i;

Rather it is more like:

            StringBuilder tmp = new StringBuilder();
            tmp.append(getName());
            tmp.append(" ");
            int z = i;
            z = z + 1;
            i = z;
            tmp.append(z);
            tmp.append(" ");
            tmp.append(i);
            String output = tmp.toString();

Wow! It is hardly surprising that we were able to cause the Java runtime to swap threads in the middle of building the String given that we were doing all that work in a single line.

Remember earlier that I mentioned that the answer to why this problem occurred in the API? Here is the explicit statement from the JavaDoc for String:

String concatenation is implemented through the StringBuilder(or StringBuffer) class and its append method.

Pretty simple to miss, but of of course we have just proven that ourselves.

Hopefully this breakdown of a decompilation will help you in doing similar decompilations in the future, whenever you find code not acting in the manner you expected.

As mentioned earlier, I recommend readers to Corey McGlone's excellent article: ' Looking "Under the Hood" with javap' from the August 2004 JavaRanch for more details on javap.

On a final note, for those who would like to run those programs above and verify that the output can appear as I have shown, here is a very simple validator that should work for all the above cases:

import java.io.*;

public class ValidateResult {
    public static void main(String[] args) throws Exception {
        if (args.length == 1) {
            new ValidateResult(args[0]);
        }
    }

    public ValidateResult(String fileName) throws Exception {
        File f = new File(fileName);
        if (f.exists() && f.canRead()) {
            FileInputStream fis = new FileInputStream(f);
            InputStreamReader isr = new InputStreamReader(fis);
            BufferedReader in = new BufferedReader(isr);

            int expectedLineNumber = 0;
            String line = null;
            for (line = in.readLine(); line != null; line = in.readLine()) {
                String[] lineParts = line.split(" ");
                String threadName = lineParts[0];
                int lineNumber = Integer.parseInt(lineParts[1]);
                int lineNumber2 = (lineParts.length == 3)
                                ? Integer.parseInt(lineParts[2])
                                : lineNumber;
                if (lineNumber == expectedLineNumber) {
                    expectedLineNumber++;
                } else {
                    System.out.println(line);
                    expectedLineNumber = lineNumber + 1;
                }
                if (lineNumber != lineNumber2) {
                    System.out.println(">>>> " + line);
                }
            }
            in.close();
        }
    }
}

You can send the output of one of the above programs to a file, and then run the above program on it. Something similar to:

java TestThreads > tmp
java ValidateResult tmp

It should be noted, however, that Java might not be the best language for such a simple task. For example, when ensuring that my test cases generated the right output, I first started with some simple awk statements to quickly see the results:

java TestThreads > tmp
awk 'BEGIN{EXP = 1}{if ($2 != EXP || $2 != $3) print; EXP = $2 + 1}' tmp

I find that single line much easier to write quickly than the equivalent Java program (but it would be much harder to maintain, and of course there are many things Java can do that awk and other tools can't). However knowing that there are other languages than just Java, and more importantly which ones to use in a given situation will make you a much better developer and far more sought after in the marketplace. But that could be a discussion for another time. :-)