JavaRanch Journal
August 2008, Volume 7, Issue 2
Managing Editor: Ulf Dittmer

JavaRanch Newsletter
Printer-Friendly Page
Articles in this issue :

Authorization using JAAS Rahul Bhattacharjee
Printable Version

Mapping IP addresses to locations Deepak Bala
Printable Version

Supercharging your HashMaps David O'Meara
Printable Version

Pagination using JDBC and JSP Deepak Bala
Printable Version

Interesting stuff elsewhere on the Ranch Ulf Dittmer
Printable Version

Authentication using JAAS

Authorization using JAAS

by Rahul Bhattacharjee

In a previous article, we took a first look at the authentication aspect of JAAS; now we'll look at authorization.

Before we start, it's important to understand the basics of the Java security model. By default applications run (get interpreted) without a security manager, which means that there are no restrictions in what the code is allowed to do. There are a number of issues in giving untrusted programs full control of the JVM, as the following examples show.

Applets are downloaded from a remote site, but run on the client. If we give the applet code full permissions then it is free to open a file and start appending junk data to it, until finally that leads to a disk full errors. Or it can transfer confidential files elsewhere over the network.

Denial of service attacks are another example. Suppose a web hosting company provides web application server space for lease. If the server does not run with a security manager then we can easily have a servlet call System.exit(0), which would bring the server down any time we access the servlet.

From this discussion it should be clear why security functions are an integral part of Java. This is a challenge for any portable language (this discussion also applies to Flash/Flex and JavaScript), so the Java language designers have taken great care in handling the security aspects. So how does a security manager help prevent these kinds of attacks?

First, let's have a look at Java permissions and policy files. The developer generally documents the permissions required by his code to work properly (when running on a JVM with a security manager enabled). It is quite likely that the freshly developed code will run on someone else's JVM. So the person running the program must to go through the documentation and grant the code with the required permissions (assuming that a security manager is used, which we shall assume). Only then will the program be able to run. If the program is run without granting proper permissions (using the policy file) and a security manager enabled, then a variety of scary-looking security exceptions will be thrown. The policy file is used to grant the required permissions to a piece of code. This is the core of JAAS authorization, and we will dig deeper into it later in this article.

What is a Principal?

A Principal is a representation of a subject in our system. It's basically an implementation of Principal interface. E.g., one could have a DepartmentName principal which represents the department of an authenticated subject. So when the subject (e.g., an employee) is authenticated, one might want to create a department principal which represents the respective department and associate it that with that subject. This is the core of JAAS authorization. One can grant various permissions to principals using a policy file. This is described in the next section.

But what if we want our code to be run only by a certain set of users? In this case authorization enters the picture. JAAS authorization combines the permission domain of a piece of code along with the permissions of a user before running the code. The previous section explained the permission domain of the code, but what is the permissions domain of the user? How do we grant permissions to individual users? This is done using a principal. We grant principals with various permissions using the policy file (this is the same policy file which we used to grant various permissions to our code). After the user is authenticated, we populate the Subject with the principal. JAAS thus extends code based security to user based security. Now only authenticated users having the required principal can invoke a particular code/operation (because the principal in turn has the required permissions to run that operation). But where is the actual code that associates the subject with the access control context and runs the code? For that, we need to run our code in the following manner, which takes care of associating the subject with the access control context.

For that we use the doAsPriviledged method of the Subject, and pass the authenticated Subject as the first parameter, and as the second parameter the PrivilegedAction, which is the code that should be run in the context of the newly created protection domain. The third parameter is null, as we want an empty context to be used:

Subject.doAsPrivileged (subject, new PrivilegedAction() {

    public Object run() {
        System.setProperty("file.encoding", "ASCII");
        return null;
    }

}, null);

To understand the above piece of code, let's examine the basics of Java security again. Firstly, the AccessController class, which performs security checks. Java has a method stack (outside the heap space) where the currently executing methods live. Each stack can be considered as an independent thread of execution. The AccessController checks whether the current stack of methods has all the required permissions to invoke an operation or not. Some of its important methods are:

static void checkPermission (Permission)

Used to determine whether the code has the required permissions mentioned in the parameter. If it does, then the method returns quietly, otherwise, it throws a runtime exception. Not only should the current call have the permission but the callers beneath it should also have the required permission for this method to succeed.

public static Object doPrivileged (PrivilegedAction)

A caller can be marked as Privileged, which indicates that one does not want the permission check to proceed further down the stack. In that case -if the caller has the required permission- then it returns quietly without any exception. It will not go further down the stack to find out whether all the methods beneath this method in the stack have the required permissions as well. There's also an overloaded method that takes a PrivilegedExceptionAction instead of a PrivilegedAction. This can be used in case the operation throws a checked exception.

I would recommend to read the class level javadoc for the AccessController class. It includes a good example of how to use it.

Now lets take a look at the authorization related methods defined in Subject. There are two sets of interesting methods - doAs and doAsPrivileged. The main purpose of these is to associate the authenticated subject with the the current AccessControlContext. The only difference between these sets of methods is that, in the latter, you can provide a AccessControlContext to use, while the former uses the current thread's AccessControlContext. If the AccessControlContext is null when calling doAsPrivileged, then it doesn't use the current thread's context, instead it creates an empty context (which has not protection domain) and merges it with the subject. So eventually it's only the protection domain of the subject, which in turn is used for the authorization.

Running the examples

Make sure that jaas.config, policy.config and the jaas-example.jar file are in the same directory. (There are also Ant targets named runCase1, runCase2, runcase3 and runCase4 that will execute these test program with the specified parameters.) The complete, ready-to-run code can be downloaded here.

The example does nothing fancier than changing the default encoding that is used for writing files. The Java code for that is System.setProperty("file.encoding", "ASCII"), and you can find it in the TestJaasAuthorization class. It's a simple example, but since this property has an impact on a wide range of I/O actions, it's protection-worthy nonetheless.

The login module for all examples is written so that authentication is successful if the username matches the password (a bit simplistic, but since the subject here is authorization -and not secure authentication- it's quite sufficient) and a principal with the username can be be created and attached to the already authenticated subject. Then the authorization is performed. You can run the example by invoking ant runCase1 which executes the following command:

java -Duser=rahul
     -Dpass=rahul
     -Djava.security.auth.login.config=jaas.config
     -Djava.security.manager
     -Djava.security.policy=policy-rahul-only.config
     -jar jaas-example.jar

In this case the authentication would succeed and principal RanchPrincipal would be attached to the authenticated subject.Through the policy file we have granted RanchPrincipal with the required permission only to user "rahul". The code runs properly to the end and will successfully change the value of the file.encoding property to ASCII regardless of the current value.

ant runCase2 executes the following command:

java -Duser=bob
     -Dpass=bob
     -Djava.security.auth.login.config=jaas.config
     -Djava.security.manager
     -Djava.security.policy=policy-rahul-only.config
     -jar jaas-example.jar

While user bob is successfully authenticated (username and password are identical), we're still using the policy config file that gives the critical permission to user "rahul" only. So in this case the authentication would succeed, but the authorization would fail and the subject will not be able to invoke the operation.

The third case fixes that. Again, the authentication succeeds, and the principal RanchPrincipal is with parameter "bob" is created and attached to the authenticated subject. Since we're now using a different policy config file (which gives the permission to all authenticated users – that's what the "*" after RanchPrincipal means), the code proceeds smoothly.

java -Duser=bob
     -Dpass=bob
     -Djava.security.auth.login.config=jaas.config
     -Djava.security.manager
     -Djava.security.policy=policy-everybody.config
     -jar jaas-example.jar

The fourth case demonstrates authentication failure. In this case the subject will not be populated with any Principal. In fact, a null subject would be returned from the LoginContext. And since only authenticated users are allowed to run the code, the execution is denied. This is just to make sure that authentication still works (without which proper handling of authorization wouldn't be worth much):

java -Duser=bob
     -Dpass=rahul
     -Djava.security.auth.login.config=jaas.config
     -Djava.security.manager
     -Djava.security.policy=policy-everybody.config
     -jar jaas-example.jar

Discuss this article in The Big Moose Saloon!

Return to Top

Mapping IP addresses to locations

by Deepak Bala

Have you ever wondered how some websites are able to determine where you visited from? It all begins with the IP address that you belong to when you connect to the internet. Every time you log onto the internet your ISP assigns an IP address to you. This IP address belongs to a specific range that is allotted to regions around the world.

So how do you find out which IP address belongs to a particular location? There are a couple of ways to do this. Solutions include commercial web services as well as free databases which you can download, import and use.

A number of commercial offerings are available, but I am not in favor of any solution that involves calling a web service or requesting for the data through some other means remotely. This adds an additional latency to the call, and it also means that your service depends on the service of your vendor and for some websites a downtime of such a service is simply not acceptable. On the positive side, you don't have to worry about how you maintain the data if you are making remote calls to get the location data. There are also commercial solutions that offer the entire database to you for a fee. You can then import this data to your favorite database and make calls as necessary. In this article I will concentrate on a free solution instead.

A free solution

It involves downloading a CSV (comma separated value) file, and then loading this data into a persistent store, say a database. This is comparable to commercial products that offer the same. The free CSV is updated on a daily basis, which is good news. Depending on the accuracy of the data that you need, you will have to refresh the data in your tables daily, weekly or monthly. How much data will be added depends on how many organizations/people register new IP addresses. Lets have a look at the format of this CSV file (the download link is available in the references section)

CSV File format
From	To	Registered with	When	Country (2)	Country (3)	Country (full)
50331648	67108863	ARIN	57257	US	USA	UNITED STATES

From and To: Hmm? Now what's with these From and To values? An IP address can be converted to a single number. With the IPV4 addressing system, the maximum value of any given value x in x.y.z.a is 255. Let's say we have an IP address 1.2.3.4. To convert this to a number we do the following

1.2.3.4 = (1 * 256 * 256 * 256) + (2 * 256 * 256) + (3 * 256) + 4 = 16909060

The IPV4 system is a 32 bit representation of an IP address. Here is an example illustration of an IP address and its associated bits:

This is what the numbers in the From and To columns represent.

Registered with: The registry that the IP range is associated with. Here are some examples – APCNIC, ARIN, LACNIC, RIPENCC and AFRINIC, IANA

When: Date when this range was assigned in milliseconds.

Country: Two letter and three letter abbreviations as well as the full country name.

Now lets import this data into a database, e.g. PostgreSQL. You can do this with the following command

copy ip_location from 'd:/temp/IpToCountry_withoutID.csv' WITH CSV

This command says to copy the data in the CSV file to the ip_location table (which we have already defined, the structure is available in the reference section). Note that copy is not a SQL standard command – it is specific to PostgreSQL. (Other databases have similar ETL tools available; check the documentation of the one you're using.) Once we have the data, we can write a small program to return the location result, given a IP address. Lets have a look at a code snippet. Here is a utility function to get a location given an IP address.

public void findLocation (String[] args) {
    BigInteger base = new BigInteger("256");
    if (args.length > 0 && args[0] != null && !args[0].equals("")) {
        String ip = args[0];
        String[] numberTokens = ip.split("\\.");
        int power = 3;
        BigInteger bigResult = new BigInteger("0");
        for (String number : numberTokens) {
            BigInteger bigNumber = new BigInteger(number);
            BigInteger raise = raiseToPow(base, power--);
            bigNumber = bigNumber.multiply(raise);
            bigResult = bigResult.add(bigNumber);
        }
        System.out.println("Calculated result for " + ip + " is " + bigResult);
        printLocationData(bigResult);
    } else {
        System.out.println("USAGE: java LocationFinder x.y.z.a");
    }
}

The method printLocationData() connects to the database and fetches the right values to display. The method raiseToPow() is a utility method that simply raises a number to a given power and returns the result. The code splits the IP address into string tokens containing the numbers that we are interested in. It then generates a result which is looked up in the database using the following query:

select * from ip_location where ? > from_address and ? < to_address

The "?" is parameterized with the IP address number that we calculated. If the number falls within a particular range, we have what we want. Here is a sample output that the program threw for a given IP address

Calculated result for 66.92.21.32 is 1113330976

Registered with: ARIN

Country: UNITED STATES

So there you have it – free IP to location conversion at your service.

References

Free CSV database: http://software77.net/geo-ip/

IPV4 addressing: http://en.wikipedia.org/wiki/IP_address

The ip_location table structure:

Discuss this article in The Big Moose Saloon!

Return to Top

Supercharging your HashMaps

by David O'Meara

There is plenty of content around that discuss the performance of the relative Java Collections classes, mostly in terms of adding and retrieving items and some of the specifics under the cover such as the effect of the initial capacity. HashMaps and HashSets are special cases that make heavy use of the Object.hashCode() contract and as such are particularly sensitive to the behaviour of that method.

This article was inspired by a couple of classes in the application I support at work. It has a couple of very large HashMaps, one containing several thousand properties and settings, and the other containing tens of thousands of translations. Since they are so pervasive their performance has a direct impact on the application, and since they are so large it was worth taking the time to see how they performed. From this I took a tangent to specifics affecting HashMap performance and the article below.

Firstly a recommendation: If you haven't read Joshua Bloch's Effective Java, you should. Much of the information below follows on from concepts in his book, but there is no replacement for the wealth contained in the book.

While looking at HashMaps, we will of course look at the effects on loading and retrieving data, but we'll also address the time to iterate the contents and combine the effects of the initial capacity at the same time. Having read the API, you are of course aware that the HashSet class is backed by a HashMap, which is why we cover one and not the other.

HashMaps With String Keys

Our HashMaps are keyed against either a String or Integer value, so I thought I would start with Strings.

Loading the Strings

I created three lists of random Strings containing 1,000 10,000 and 100,000 items. The Strings were from 5 to 10 characters long and all upper case.

The Strings values were loaded into an Array and the time taken to load them from the Array into the HashMap was measured for a variety of initial capacities and timed using something like this...


    long start = System.nanoTime();

    for (String key : array   ) {

        map.put(key, key);

    }

    System.out.println("Took " + (System.nanoTime() - start) / 1000000 + " ms");

Key Size	Initial Capacity	Load time (ms)

1000	10	4
1000	100	4
1000	1000	4
1000	10,000	4
1000	100,000	4
1000	1,000,000	4

10,000	10	20
10,000	100	20
10,000	1000	20
10,000	10,000	20
10,000	100,000	20
10,000	1,000,000	20

100,000	10	110
100,000	100	110
100,000	1000	110
100,000	10,000	110
100,000	100,000	110
100,000	1,000,000	110
Table 1 - String key loading times

These results are non-threatening and may even be a little faster than expected. It is certainly the case that you wouldn't think twice about putting 100,000 records in a HashMap if you knew it was going to take a tenth of a second to load.

Retrieving the Strings

Now that we have the data loaded, it isn't much good unless we can get it back out. Since HashMaps are renouned for their ability to return data quickly, we will start with that. Note that to make sure the time spent selecting an item doesn't affect our results, we will pick the first item from our initial list and load it repeatedly like this...


    String searchKey = array[0];

    long start = System.nanoTime();

    for (int i = 0; i < LOOP_SIZE; i++) {

        map.get(searchKey);

    }

    System.out.println("Took " + (System.nanoTime() - start) / 1000000
+ " ms");

Key Size	Initial Capacity	Reads	Read time (ms)

1000	10	100 million	~2400
1000	100	100 million	~2400
1000	1000	100 million	~2400
1000	10,000	100 million	~2400
1000	100,000	100 million	~2400
1000	1,000,000	100 million	~2400

10,000	10	100 million	~2400
10,000	100	100 million	~2400
10,000	1000	100 million	~2400
10,000	10,000	100 million	~2400
10,000	100,000	100 million	~2400
10,000	1,000,000	100 million	~2400

100,000	10	100 million	~2400
100,000	100	100 million	~2400
100,000	1000	100 million	~2400
100,000	10,000	100 million	~2400
100,000	100,000	100 million	~2400
100,000	1,000,000	100 million	~2400
Table 2 - String key read times

Once again I hope these results are non-threatening and pretty much as expected. Regardless of the size of the HashMap and the number of items it contains, the time taken to retrieve an item does not change. Note that 100 million reads is a fair amount of work for our HashMap to do, and is almost one operation for every millisecond in the day.

Iterating the Strings

The final piece of work to build our basic results is to test the amount of time taken to iterate the results. To get values that are visible, we iterate each set of keys depending on the number of items it contains, like this...


    long start = System.nanoTime();

    for (int i = 0; i < ITERATE_COUNT; i++) {

        for (String string : map.keySet()) {

            // This operation should be 'free', and the compiler

            // should not be able to compile it out of the byte code.

            string.hashCode();

        }

    }

    System.out.println("Took " + (System.nanoTime() - start) / 1000000
+ " ms");

Key Size	Initial Capacity	Iterations	Iteration Time (ms)

1000	10	10,000 (x 1000)	~510
1000	100	10,000 (x 1000)	~510
1000	1000	10,000 (x 1000)	~550
1000	10,000	10,000 (x 1000)	1,100
1000	100,000	10,000 (x 1000)	6,200
1000	1,000,000	10,000 (x 1000)	50,000

10,000	10	1,000 (x 10,000)	~520
10,000	100	1,000 (x 10,000)	~520
10,000	1000	1,000 (x 10,000)	~520
10,000	10,000	1,000 (x 10,000)	~550
10,000	100,000	1,000 (x 10,000)	1,200
10,000	1,000,000	1,000 (x 10,000)

100,000	10	100 (x 100,000)	1800
100,000	100	100 (x 100,000)	1800
100,000	1000	100 (x 100,000)	1800
100,000	10,000	100 (x 100,000)	1800
100,000	100,000	100 (x 100,000)	1800
100,000	1,000,000	100 (x 100,000)
Table 3 - String key iteration times

Summarising Strings as Keys

Now that we have our baseline results using String as our keys, we can summarise:

It is very fast to load String keys into a HashMap
While the javadoc warns against making the initial capacity too small due to rehashing costs, this wasn't apparent in the results above.
The 'read time' appears constant regardless of the number of items in the map or the current capacity. This is described in the HashMap API.
The iteration time appears fairly constant unless the capacity is very large. This is described in the HashMap API too.

To be fair, the String class is a first class citizen in Java, and is able to make use of skilled design and specific treatment by the compiler. The largest help is that the String classes hashCode() method is lazy loaded and therefore cached. This was used in the iteration example since the hash code value had already been calculated and can therefore be reused without being recalculated, but it is also critical to the results we have seen so far, as we are about to discover.

HashMaps With Exotic Keys

While Strings and Integers are common choices as keys, it is often necessary to use classes that are more complicated, and the complicated nature of the class could relate directly to the behaviour of the HashMap utilising the class. Without trying to go overboard creating a class that incurs a cost while calculating its hash code, we will create a class that contains an array of primitive ints and provides a hash code based on the array contents.

The LargeHash Class


// This class is only designed to illustrate the cost of hashing.

// It is not for production use!

public class LargeHash {



    private final int[] intValues;



    public LargeHash (final int intValue) {

        intValues = new int[1000];

        for (int i = 0; i < intValues.length; i++) {

            intValues[i] = intValue + i;

        }

    }



    @Override

    public int hashCode() {

        final int prime = 31;

        int result = 17;

        result = prime * result + Arrays.hashCode(intValues);

        return result;

    }



    @Override

    public boolean equals (Object obj) {

        if (this == obj )

            return true;

        if (obj == null)

            return false;

        if (getClass() != obj.getClass())

            return false;

        final  LargeHash other = (LargeHash) obj;

        if (!Arrays.equals(intValues, other.intValues))

            return false;

        return true;

    }

}

The class is designed to be a HashMap key, and once an instance is created it has only limited functionality. The hashCode() and equals() methods were initially generated by Eclipse, and use methods from the Arrays helper class.

We will be performing the same operations of loading, reading and iterating a HashMap containing a large number of instances as keys, and the results are...

Key Size	Initial Capacity	Load Time (ms)	Reads	Read Time (ms)	Iterations	Iteration Time (ms)

1000	10	5	100 million	275,000 *	10,000 (x 1000)	500
1000	100	5	100 million	280,000 *	10,000 (x 1000)	500
1000	1000	4	100 million	275,000 *	10,000 (x 1000)	500
1000	10,000	5	100 million	280,000 *	10,000 (x 1000)	1100
1000	100,000	5	100 million	275,000 *	10,000 (x 1000)	6200 **
1000	1,000,000	4	100 million	275,000 *	10,000 (x 1000)	50,000 **

10,000	10	40	100 million	280,000 *	1000 (x 10,000)	500
10,000	100	40	100 million	280,000 *	1000 (x 10,000)	500
10,000	1000	40	100 million	280,000 *	1000 (x 10,000)	520
10,000	10,000	42	100 million	290,000 *	1000 (x 10,000)	520
10,000	100,000	43	100 million	290,000 *	1000 (x 10,000)	1100 **
10,000	1,000,000	42	100 million	280,000 *	1000 (x 10,000)	5800 **

100,000	10	350	100 million	280,000 *	100 (x 100,000)	2100
100,000	100	350	100 million	280,000 *	100 (x 100,000)	2100
100,000	1000	350	100 million	280,000 *	100 (x 100,000)	2100
100,000	10,000	350	100 million	280,000 *	100 (x 100,000)	2100
100,000	100,000	330	100 million	280,000 *	100 (x 100,000)	2000 **
100,000	1,000,000	330	100 million	280,000 *	100 (x 100,000)	2400 **

* values are interpreted from repeated runs of 1 million reads ** See the afterthoughts...
Table 4 - Simple LargeHash times

The LargeHash Behaviour

Firstly, the load time was larger but still not really noticeable. It took possibly twice as long, but twice 'not a hell of a lot' is still 'not much'. Interestingly, the API recommends setting the initial size to some realistic value to prevent incurring the cost of rehashing, but that cost doesn't appear in the results. It is possible that the initial capacity made a difference when we use 100,000 items in our Map, but not enough to provide any conclusive results.

The read time was obviously worse, to the extent that I decided to extrapolate the results rather than wait several minutes for each run to complete. This shows us that the hashCode() method is too important when used in HashMaps to be taken lightly, and consideration should be taken when choosing or designing a class that will be used as a hashed key. There is nothing incredibly wrong with the class itself, but when put under pressure it fails to perform and can cause a drag on the code.

Building a Responsible Key

The lesson to learn from the String class is that lazy loading and caching the hash value can have a dramatic effect on the read times, since this is crucial to the HashMap performance. This is particularly true when you have high high usage HashMaps like the ones in the introduction. Is important to not that the hash code cannot be cached if the instance state can change in a way that alters the hash code, which implies that the key should first be made immutable before the hash code can be lazy loaded.

The LargeHash class is almost immutable, since it doesn't require defensive copies of data coming in and no data is exposed. This means we are only required to make the class final to fulfil the immutable contract.

There a few options for lazy loading and caching the hash code, but we will copy the approach used in the String class. Since no synchronization is used it could lead to a race condition in which the hash code is set to different values, but immutable internal state protects us from this happening in this case. It is still possible for the hash code to be generated by more than thread, but once the value is set then it will be defined forever and we no longer worry about the value changing and do not suffer the cost of synchronization that is no longer required.

It is also important to note that the local variable 'h' is used throughout the method to ensure thread safety. Otherwise another thread could see the hash variable in an intermediate (non zero) state and return the incorrect value. The updated method looks like this...


    @Override

    // Style copied from String.hashCode()

    public int hashCode() {

        int h = hash;

        if (h == 0) {

            final int prime = 31;

            h = 17;

            h = prime * h + Arrays.hashCode(intValues);

            hash = h;

        }

        return  h;

    }

...and now has performance comparable to the String key while being all the more exotic in nature.

Key Size	Initial Capacity	Load Time (ms)	Reads	Read Time (ms)	Iterations	Iteration Time (ms)

1000	10	5	100 million	2400	10,000 (x 1000)	500
1000	100	5	100 million	2400	10,000 (x 1000)	500
1000	1000	5	100 million	2400	10,000 (x 1000)	500
1000	10,000	4	100 million	2400	10,000 (x 1000)	1100
1000	100,000	5	100 million	2400	10,000 (x 1000)	6200 **
1000	1,000,000	5	100 million	2400	10,000 (x 1000)	50,000 **

10,000	10	40	100 million	2400	1000 (x 10,000)	500
10,000	100	40	100 million	2400	1000 (x 10,000)	500
10,000	1000	42	100 million	2400	1000 (x 10,000)	500
10,000	10,000	40	100 million	2400	1000 (x 10,000)	500
10,000	100,000	40	100 million	2400	1000 (x 10,000)	1000 **
10,000	1,000,000	40	100 million	2400	1000 (x 10,000)	5800 **

100,000	10	350	100 million	2400	100 (x 100,000)	2000
100,000	100	350	100 million	2400	100 (x 100,000)	2000
100,000	1000	440 *	100 million	2400	100 (x 100,000)	1800 *
100,000	10,000	350	100 million	2400	100 (x 100,000)	2000
100,000	100,000	350	100 million	2400	100 (x 100,000)	2000 **
100,000	1,000,000	350	100 million	2400	100 (x 100,000)	2400 **

* These values were a repeatable aberration ** See the afterthoughts...
Table 5 - Optimised LargeHash times

In Conclusion

The hasCode() method matters. In fact I would extend that to state that all aspects of all contract methods matter. Joshua Bloch has plenty to say about the hashCode(), equals() and comparable() contracts, but when you consider that these methods are the backbone of large parts of the Java programming language, it may not be sufficient just to get the right. In critical places, the default hashing algorithm may not be enough.

While the API warns that making the initial size too small can incur an overhead due to rehashing when the capacity increases, this was not apparent in the tables above. The cost to iteration due to large capacities is easier to see. Therefore the data suggests that HashMaps should be initialised to a smaller size and definitely not overly large.

Afterthoughts

Unusual Iteration Times

Tables 4 and 5 show that the iteration time increases as the capacity increases, and we expected this result. What was unexpected were times that decreased when the Maps contained more entries.

I don't have much more to say about this than "gosh".

Actual Size

An interesting diversion is to look at the actual capacity of the HashMap. The internal size is always kept as a power of two, presumably for performance purposes, but it is also important to note that the 'power of two' property is enforced under the covers and certainly not exposed via the API. Therefore as a rule you should ignore it in your own code. One thing this means is that you shouldn't try to initialise HashMaps to an appropriate 'power of two' value as it is already done for you, but in doing so you may be actively harming yourself if the particulars are changed in later implementations.

The actual capacity is the smallest power of two that is greater than the initial capacity and greater than the number of items divided by the load factor. This second point is the reason that an initial capacity of 1000 would set the actual capacity to 1024 (is 2^10) but once this was 75% full (load factor 0.75) the capacity would be altered to 2048 (ie 2^11) as seen in Table 6.

While the actual capacity is not exposed directly, there are many ways to inspect the value and I found it easiest to attach a debugger to interrogate the HashMap instance.

Key Size	Initial Capacity	Actual Capacity

1000	10	2^11
1000	100	2^11
1000	1000	2^11
1000	10,000	2^14
1000	100,000	2^17
1000	1,000,000	2^20

10,000	10	2^14
10,000	100	2^14
10,000	1000	2^14
10,000	10,000	2^14
10,000	100,000	2^17
10,000	1,000,000	2^20

100,000	10	2^17
100,000	100	2^17
100,000	1000	2^17
100,000	10,000	2^17
100,000	100,000	2^17
100,000	1,000,000	2^20
Table 6 - Actual capacities in the previous tests

Iterating Map.Entry Sets

As a final aside, there are many articles that recommend iterating the Map.Entry set from HashMaps in preference to iterating the keys and then using the key to obtain the value. We already have the time required to iterate in the data above, and have also shown that the time to retrieve the value is constant, but we already knew that from the API.


    long start = System.nanoTime();

    for (int i = 0; i < ITERATE_COUNT; i++) {

        for (Map.Entry<String,String> entry : map.entrySet()) {

            entry.getKey().hashCode();

        }

    }

    System.out.println("Took " + (System.nanoTime() - start) / 1000000
+ " ms");

While potentially faster, this code is less readable and would therefore need to be quite a bit faster to warrant adoption, so lets run the tests one last time...

Key Size	Initial Capacity	Iterations	Get then Read * (ms)	Map.Entry (ms)

1000	10,000	10,000 (x 1000)	1340	1330
1000	100,000	10,000 (x 1000)	6400	6400
1000	1,000,000	10,000 (x 1000)	50,000	50,000

10,000	10,000	1,000 (x 10,000)	740	700
10,000	100,000	1,000 (x 10,000)	1200	1200
10,000	1,000,000	1,000 (x 10,000)	6000	5900

100,000	10,000	100 (x 100,000)	2200	1900
100,000	100,000	100 (x 100,000)	2200	1500
100,000	1,000,000	100 (x 100,000)	2600	1900
* Times were calculated using the data from Tables 5
Table 7 - Map.Entry iteration times

The Map.Entry does show faster iteration times at the extreme end of the test data, but otherwise I'm not so convinced.

Code was output using the Java2Html plug-in for Eclipse using MyEclipse 6.5

Discuss this article in The Big Moose Saloon!

Return to Top

Pagination using JDBC and JSP

by Deepak Bala

What is pagination ?

The first time I heard the word pagination, I thought "What's that?". Then I googled and found out that it's a fancy word for splitting search results into bite size pages that the user can snack into. Pagination depends on some factors. It might even depend on what type of framework you use to deliver your solution. An off the shelf API that paginates on a web browser might not work with a Swing application that wants to paginate results. Let's look at some of the ways to paginate.

Using an off the shelf solution to paginate results

This method has its own pros and cons like the rest. To begin with it's quite easy to find an API that's out there, and to use its functions to paginate your results. Often all you have to do is to provide the API, details about the number of results, how many pages to display etc. The API may even be configurable by the user if it is flexible enough. For example JavaWorld featured an article quite a while back about using such an API. Given below is the code related to the pagination tag (a custom tag)

<pagination-tag:pager start="<%=start%>" range="<%=range%>" results="<%=results%>"/>

Given the start position, range and total number of results we can let the API do its job and paginate the results. This kind of solution requires the least work from the developer. The code is ready made and available. All the developer needs to do is to know how to use the API and things work by themselves. (DisplayTag is another such tag libray; see the reference section below for a link.) Although such APIs may reduce development time they may not be as flexible as we might want them to be. If a client requires changes to the way pagination occurs in the application we may have to dive down to the guts of the API and find a solution if it is not available. Worse still, if you don't have the source code to the library you may be in trouble. The pagination tag is a custom tag that a developer can use to paginate his/her results. Imagine if you need to change the code contents of the custom tag because the client wants a change to the way we paginate. Any open solution must be used with caution. Anything that is free and saves time is welcome as long as it does not complicate the solution at a later stage in its life.

Writing your own pagination logic

There are 2 ways (mostly) through which you can fetch your data. One would be to fetch all the results (hence the term greedy) and display it to a user after caching the results. The other approach is to get a selection of the results each time a user requests the data from the search query.

Fetching results the greedy way

I am a big fan of patterns. A particular pattern called the ValueListHandler is very useful when dealing with paginating greedily. This pattern is reusable across several queries and can help your pagination process. A class diagram of the ValueListHandler pattern is illustrated below.

Class diagram: ValueListHandler

The idea is that a client uses a ValueListHandler, which accesses a DAO to return a ValueList. Each value list can contain a couple of value objects (The DAO and VO patterns are mentioned in the references section if you need to go through them). Every time the user needs results to be displayed it can be handled gracefully by asking the ValueListHandler a sub list from the ValueList. Since the ValueList caches the results there is not need to go the database again to fetch the next set of results when the user clicks "Next". Here is a sequence diagram to explain it better.

Sequence diagram: ValueListHandler

Every time the user needs the next (or previous) result set you can ask the ValueListHandler for the next/previous sub list. If you allow the value list to store a list of Objects you can use this solution to set any type of VO into the ValueList and apply this pattern across several searches. You will of course have to cache the ValueListHandler in memory. You will also have to down cast the list of objects from the ValueList to your VO type everytime you want to display the data in the object.

Fetching results the not so greedy way

There is a little problem with the solution that was described above. What if your search result returns 100,000 matches? Your user may have to wait for a long time and be a very unhappy camper indeed. So it often makes sense to fetch only the number of records the user wants to see. Different database have different methods to limit the query size. Oracle uses ROWNUM, and Sybase uses LIMIT. The queries below give you an example of how you can restrict your result set.

Sybase: (Give me rows 0 to 10 of the result)

SELECT * FROM TABLE WHERE {CONDITION} LIMIT 0, 10

Oracle: (Give me the rows between min_row_to_fetch and max_row_to_fetch )

SELECT *
    FROM ( SELECT /*+ FIRST_ROWS(n) */ a.*, ROWNUM rnum
           FROM ( your_query_goes_here, with order by ) a
           WHERE ROWNUM <= :MAX_ROW_TO_FETCH )
    WHERE rnum >= :MIN_ROW_TO_FETCH;

Oracle's version of the query limitation is a little different because of the way it treats ROWNUM. In order to make the SQL more database agnostic, you can paginate by requesting data by using a sub set of the primary keys that you are interested in. For example if your data list contains the primary keys {1,34,25,56,333,23,99,101} you can create a sub list {1,34,25,56} and fetch this list. The next sub list would be {333,23,99,101}. Pagination is then possible with the following SQL:

SELECT * FROM TABLE WHERE PRIMARY_KEY IN (MY_SUB_LIST)

Another drawback with the greedy approach is that the data becomes stale once you cache it. If your application changes the data in the result set frequently you may have to consider the accuracy of your results when you choose a solution.

Which technique should I use?

Whichever solution fits your requirements. The solution may not even be one of the 3 listed here. These are 3 common approaches to paginating and usually fit the bill based on the user's response time requirements. My personal favorite is to add a Strategy pattern on top of this. If the number of results is less than 500 (say) I use the greedy technique to display the results. If it is more than 500 I use the non-greedy technique to display the results since the user may have to wait for a long time. Be sure to consider the drawbacks carefully in each case. At the risk of dumbing down the Strategy pattern, an illustration for this scenario is given below. Practically one would switch between various implementations of an interface to achieve this. Check the reference section below for more information on this. I would recommend the Head First Design Pattern book or the Gang Of Four design pattern book.

Strategy based on result set size

Conclusion

Each scenario would solve the problem given a certain boundary. You should choose wisely based on what solution best suits you. In the end it does not matter how we solve the problem as long as the client's requirements are met and the design of the solution is flexible to accommodate future changes.

References

Discuss this article in The Big Moose Saloon!

Return to Top

Interesting stuff elsewhere on the Ranch

by Ulf Dittmer

Not all the good stuff is in these Journal articles. The Saloon sees lots of insight and wisdom all the time, and the staff is continuously reviewing interesting books and blogging about this, that and the other. Plus, we've got book authors hanging out with us all the time. So here are a few nuggets you may have missed that should prove interesting to many folks.

Big Moose Saloon

Recognizing that Java isn't the be-all and end-all of languages, and that JVM-based languages are all the rage nowadays, we've started new forums for Groovy and Scala. So head over there with all your questions, and be sure to let your programming buddies know about them.

A rancher wonders when is the right time to commit source code to the repository in Macro vs Micro commits: (just about) every time a change is made, or only after major chunks of work have been completed and tested?

A discussion on the various possibilities of generating Swing GUIs ensues in Designing GUIs. While the original question concerns using a GUI builder tool like Matisse vs. coding GUIs by hand, ranchers' opinions also cover building GUIs from XML descriptions, and point to various libraries that may help.

Using JAAS and Single-Sign-On (SSO) are frequently-discussed topics on the ranch. Now a member asks about a portable JAAS solution for multiple web apps, for which SSO will also be needed. Various options and approaches are discussed in Understanding JAAS/Web app SSO.

For years the Word Association thread has been the longest (and longest-running) topic on JavaRanch. Now there's a newcomer with an interesting twist: Picture Association. Thanks to fellow rancher Robert Liguori for starting this one – it's an intriguing concept.

Book Reviews

To make it easier to find books we have reviewed, there's now a Book Search. It allows you to search the library of reviews by title, author, publisher, ISBN, reviewer, review contents and category. The search criteria can also be combined, and the results bookmarked.

Beautiful Code by various authors

Emergent Design by Scott L. Bain

Effective Java by Joshua Bloch

Core Java, Vol. II -- Advanced Features by Cay Horstmann and Gary Cornell

ThoughtWorks Anthology by various ThoughtWorks Inc. authors

Programming Groovy: Dynamic Productivity for the Java Developer by Venkat Subramaniam

Code Barn

The Code Barn is one of the oldest sections of the ranch, and maybe one of the lesser-known ones. It collects fully-functioning code snippets that demonstrate how various Java features (or 3rd party libraries) work. It has been moved to the wiki a while ago -so it is editable by anyone who's logged into the Saloon-, and we welcome comments and contributions. So if you have a piece you wrote for a particular purpose, and you think it might help others, let us know about it (or add it yourself, if you feel strong with the wiki force). Currently, there are sections for applets, servlets/JSP, XML, 3rd party libraries, and various uncategorized items in beginner, intermediate and advanced coding levels, but we can certainly add new categories if there's interest and code to start with.

Upcoming Promotions

On August 5, Dan Allen will be answering questions about Seam in Action

On August 12, Jevgeni Kabanov & Toomas Römer introduce Java Rebel

On August 19, Riyad Kalla talks about MyEclipse

Return to Top


Copyright © 1998-2013 Paul Wheaton, Missoula, Montana	Contact Us