The SCJP Tip Line
hashCodes Uncovered
by Corey McGlone

In this edition of the JavaRanch Journal SCJP Tipline, author Corey McGlone investigates hashCodes, expanding on an April blog entry.

What is a hashCode?

First of all, what the heck are hashcodes for? Well, oddly enough, they're used heavily in Hashtables. Hashtables, along with the other classes that extend the Map interface, are discussed in this Journal article For the purposes of this article, I'm going to assume you know what a Hashtable is and how useful it can be. Needless to say, Hashtables can help you keep a large number of objects organized and allow you to access them very quickly. Of course, to do so, a Hashtable relies on the power of the hashCode method.

In essence, when you invoke the get(Object o) method of a Hashtable, the Hashtable will use the hashCode method of the object you passed to it in order to access an object (or list of objects). As long as the hashCode method is working properly, everything works just fine. If it doesn't, however, you can have some rather serious problems.

So, what makes a valid hashCode? Well, here's what is said about hashCodes in the API Specification for Object:

The general contract of hashCode is:
  1. Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
  2. If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
  3. It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)

Study: A Proper hashCode

Let's start by looking at a good program. In this case, we've defined a new object, MyObject, which defines its own equals and hashCode methods. In general, it is good practice to define your own hashCode method any time you override equals (more about this later). Here it is:

import java.util.Hashtable;
import java.util.Date;

public class MyObject
{
    int a;
    
    public MyObject(int val)
    {
        a = val;
    }
    
    public boolean equals(Object o)
    {
        boolean isEqual = false;
        
        if ( o instanceof MyObject )
        {
            if ( ((MyObject)o).a == a )
            {
                isEqual = true;
            }
        }
        
        return isEqual;
    }
    
    public int hashCode()
    {
        return a;
    }
    
    public static void main(String[] args)
    {
        Hashtable h = new Hashtable();
        
        MyObject[] keys = 
        {
            new MyObject(11),
            new MyObject(12),
            new MyObject(13),
            new MyObject(14),
            new MyObject(15),
            new MyObject(16),
            new MyObject(17),
            new MyObject(18),
            new MyObject(19),
            new MyObject(110)
        };
        
        for ( int i = 0; i < 10; i++ )
        {
        	h.put(keys[i], Integer.toString(i+1));
        }
        
        long startTime = new Date().getTime();
        
        for ( int i = 0; i < 10; i++ )
        {
        	System.out.println(h.get(keys[i]));
        }
   
        long endTime = new Date().getTime();
        
        System.out.println("Elapsed Time: " + (endTime - startTime) + " ms");
    }
}

Executing the above code leaves you with this output:

1
2
3
4
5
6
7
8
9
10
Elapsed Time: 0 ms

As you can see, we easily retrieved the objects we had originally put into the Hashtable and it took practically no time at all. How does our hashCode method do? Does it pass all 3 of the criteria laid out earlier? Let's look at each of the criteria one at a a time.

1. Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.

Does our hashCode meet that criteria? Does our hashCode continually return the same value (assuming that our variable, a, hasn't changed)? Certainly, it does - it returns the value of a. Okay, next criteria.

2. If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

How about this one? Does our hashCode method still work here? Sure, it does. If two object have the same value for a, they will be equal (by the equals method). In such a situation, they would also return the same hashCode value. Our hashCode method works here. Okay, on to the final criteria.

3. It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.

Well, this isn't really a requirement at all - it's more of a suggestion, if anything. It is best if the hashCodes for unequal objects are different, but it's not required. We'll look at this a little more in a few minutes.

So, there you have it - we've successfully overridden the hashCode method. So, how do you know when you should do such a thing? Well, in general, it's considered good practice to override hashCode any time you override equals. The reason for this is due to the default behavior of the equals and hashCode methods.

When Should you Override hashCode()?

In Java, the default equals method (as defined in Object) compares the two objects to see if they are, in fact, the same object. That implementation does not take into account any data that might be contained by the object. For example, had we not overridden the equals method in our class, MyObject, we'd see that the following code:

MyObject obj1 = new MyObject(1);
MyObject obj2 = new MyObject(1);
System.out.println(obj1.equals(obj2));

...would produce the output "false." The reason for this is that, even though the two objects contain the same data, they are different objects - two separate objects on the heap. Fortunately, we overrode the equals method and, given the MyObject class as defined originally, we'd get the output "true" from this example. However, what about the hashCode?

Well, the default hashCode method works in a similar fashion to the default equals method. It converts the internal address of the object into an int and uses that as the hashCode. Well, that won't work so well here. We just defined a way in which two distinct objects (which will, necessarily, have distinct memory addresses) to be considered "equal." The default hashCode implementation, however, will return different hashCodes for the two objects. That violates the second rule defined above - any objects that are considered equal (by their equals method) must generate the same hashCode value. Therefore, whenever you override the equals method, you should also override the hashCode method.

Study: Faulty hashCodes

What happens if we override the hashCode method poorly? Let's violate one of the rules, shall we? Let's change our hashCode method to look like this:

public int hashCode()
{
    return (int)(Math.random() * 5);
}

This one actually violates a couple rules. Not only does it not guarantee that two objects that are equal have the same hashCode, it doesn't even guarantee that the same object will keep the same hashCode from one invocation to the next. Any idea what havoc this might wreak on a poor, unsuspecting Hashtable? If I execute the main method again, as I did before, I get this output:

null
2
null
4
null
6
null
null
null
null
Elapsed Time: 0 ms

Eek! Look at all of the objects I'm missing! Without a properly functioning hashCode function, our Hashtable can't do its job. Objects are being put into the Hashtable but we can't properly get at them because our hashCode is random. This is certainly not the way to go. If you were to run this, you might even get different output than I got!

Even if the hashCode that is returned is always the same for a given object, we must ensure that the hashCodes that are returned for two objects that are equal are identical (Rule #2). Let's modify our MyObject class so that we hold true to Rule #1 but not to Rule #2. Below is the modified parts of our class:

public class MyObject
{
    int a;
    int b;
    
    public MyObject(int val1, int val2)
    {
        a = val1;
        b = val2;
    }
    
    ...
    
    public int hashCode()
    {
        return a - b;
    }
    
    ...
    
    public static void main(String[] args)
    {
        ....
        MyObject[] keys = 
        {
            new MyObject(11, 0),
            new MyObject(11, 1),
            new MyObject(11, 2),
            new MyObject(11, 3),
            new MyObject(11, 5),
            new MyObject(11, 5),
            new MyObject(11, 6),
            new MyObject(11, 7),
            new MyObject(11, 8),
            new MyObject(11, 9)
        };
        ...
    }
}

Executing this code gives us some more disturbing results, although they may not appear that way at first. Here's my output:

1
2
3
4
5
6
7
8
9
10
Elapsed Time: 0 ms

So what's wrong with that, you ask? Well, what should the put method do? If you first put an object into a Hashtable using a specific key and then put a new value into the Hashtable using a key that is equal to that one, the original value should be replaced with this new one. That's not what's happening here. Instead, our Hashtable is treating our keys as if they're all unequal. Eek! This is the same result you could expect if you were to override equals without overriding hashCode. Here's the output we should get, assuming we have a good hashCode method:

10
10
10
10
10
10
10
10
10
10
Elapsed Time: 0 ms

Inefficient hashCodes

Okay, one more thing to go over. What happens if we have a valid hashCode, but the values aren't very distinct. In this case, I'm going to hold to requirements 1 and 2, but I'm going to ignore requirement 3 entirely. Let's modify MyObject.hashCode() and our main method to look like this:

public int hashCode()
{
    return 0;
}

public static void main(String[] args)
{
    Hashtable h = new Hashtable();
    
    MyObject[] keys = new MyObject[10000];
    for ( int i = 0; i < 10000; i++ )
    {
        keys[i] = new MyObject(i);
    }
    
    for ( int i = 0; i < 10000; i++ )
    {
    	h.put(keys[i], Integer.toString(i+1));
    }
    
    long startTime = new Date().getTime();
    
    for ( int i = 0; i < 10000; i++ )
    {
    	h.get(keys[i]);
    }

    long endTime = new Date().getTime();
    
    System.out.println("Elapsed Time: " + (endTime - startTime) + " ms");   
}

Note that this is a valid hashCode method. It always returns the same value for a given object, assuming that nothing used in the equals method changes (not that it cares anything about that). It also returns the same value for two objects that are "equal." What it doesn't do is return a different value for objects that are not equal. It always returns the same value. This is a valid hashCode but, as this example will show, an inefficient one. Executing this 5 times using this new hashCode method, I get this output:

Elapsed Time: 7016
Elapsed Time: 7125
Elapsed Time: 7297
Elapsed Time: 7047
Elapsed Time: 7218

That gives me an average time of 7140.6 - roughly 7 seconds. By executing my original hashCode method and execute the same main method, I got this output:

Elapsed Time: 16
Elapsed Time: 16
Elapsed Time: 16
Elapsed Time: 15
Elapsed Time: 16

That's an average of about 16 millseconds - a far cry from the 7 seconds we saw earlier! We're seeing a dramatic increase in the amount of time required to retrieve objects from our Hashtable using the poor hashCode method. The reason for this is that, with every key object having the same hashCode, our Hashtable has no choice but to index every value under the same hashCode. In such a case, we've got everything in one long list and our Hashtable does us no good at all - we might as well have stored the objects in an ArrayList.

Summary

Hopefully you see the importance of creating a valid hashCode for any objects you're using. Whenever creating your own hashCode method, it's important to remember the 3 rules. I've listed them here in a summarized format; refer to the API Spec for full details.

  1. The hashCode method must always return the same value (assuming the object has not changed) from one invocation to the next.
  2. The hashCode for multiple "equal" objects must be identical.
  3. The hashCodes for multiple "unequal" objects should be different, but it is not required.

With that knowledge, you should have a firm grasp on how to override hashCodes and make the most out of the Hashtable (and other Map classes) in Java. Remember to tune in to The SCJP Tipline for more updates.