// Java 19's new Map/Set Factory Methods

Beside the big preview features of Java 19 (e.g. virtual threads which I blogged about over two years ago - time flies), there are also some noteworthy API updates which might fall under the radar. One API change are new factory methods for creating mutable, pre-sized Map and Set collections. Lets take a quick look and find out how they differ from the old constructor counterparts and when to use them instead.

Pre-sized Maps/Sets

Many collections can be created with pre-allocated initial capacity if the number of items which will be stored in them is already known or can be estimated. This avoids unnecessary resizing operations which happen in the background while the implementation has to dynamically expand the collection based on item count.

This is all straight forward but there is one little anomaly: HashMaps (and related collections) have an initialCapacity and a loadFactor (default: 0.75) parameter. The initialCapacity however is not for the expected entry count, it is for the initial size of an internal table (impl. detail: rounded up to the nearest power-of-two), which is larger than the actual entry count in the map, since it is only filled until the given loadFactor is reached before it is resized. [javadoc]

This detail is very easy to overlook, for example:


        Map<String, String> map = new HashMap<>(4);
        map.put("one", "1");
        map.put("two", "2");
        map.put("three", "3");
        map.put("four", "4");

may look correct on first glance, but it will resize the internal table when more than 0.75*4 entries are added (which is the case above). Resizing a (large) Map can be comparatively expensive, the code responsible for it isn't trivial. Further, the javadoc mentions that "creating a HashMap with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table".

What we want is to create a Map which can actually hold 4 entries, for this we have to calculate the capacity parameter unfortunately:


// Java 18
Map<String, String> map = new HashMap<>((int) Math.ceil(4 / 0.75));

The new factory methods are simply hiding this calculation.


// Java 19+
// analog exists for HashSet, LinkedHashSet, LinkedHashMap and WeakHashMap
Map<String, String> map = HashMap.newHashMap(4);

If you think you made this mistake before - don't feel bad :). Since even OpenJDK code overlooked this detail on some occasions as can be seen in PR1 and PR2 which introduce the new factory methods and also refactor JDK code to use them (instead of the various ways how the capacity was calculated). Stuart Marks gives here a few more examples of how to not calculate it. The javadoc of the respective constructors (e.g. for HashMap#HashMap(int)) contains now also an API Note which delegates to the factory methods.

note: ConcurrentHashMap and IdentityHashmap already expect the parameter being the actual entry count, which means they didn't receive factory methods.

I might add some jackpot code transformation rules to this collection to make migration a bit more convenient (update: done).

Just for Fun: A Quick Experiment

We can show this by observing the internal table size of the Map while adding entries:


    public static void main(String[] args) throws ReflectiveOperationException {
        System.out.println(Runtime.version());
        
        int entries = 4; // small number to fit the output on a blog entry
        inspect("new HashMap<>(entries)", new HashMap<>(entries), entries);
        inspect("HashMap.newHashMap(entries)", HashMap.newHashMap(entries), entries);
        inspect("new HashMap<>(((int) Math.ceil(entries / 0.75)))", new HashMap<>(((int) Math.ceil(entries / 0.75))), entries);
    }

    private static void inspect(String desc, Map<? super Object, ? super Object> map,
                                int entries) throws ReflectiveOperationException {

        System.out.println();
        System.out.println("filling '"+desc+"' with "+entries+" entries...");
        System.out.println("table size: [content]");
        for (int i = 0; i < entries; i++) {
            map.put("key"+i, "value");

            Field field = HashMap.class.getDeclaredField("table");
            field.setAccessible(true);
            Object[] table = (Object[]) field.get(map);
            System.out.println(table.length+": "+Arrays.asList(table));
        }

        System.out.println("map size: {content}");
        System.out.println(map.size()+": "+map);
    }

output:


19+36-2238

filling 'new HashMap<>(entries)' with 4 entries...
table size: [content]
4: [null, null, null, key0=value]
4: [key1=value, null, null, key0=value]
4: [key1=value, key2=value, null, key0=value]
8: [key1=value, key2=value, null, key0=value, null, null, key3=value, null]
map size: {content}
4: {key1=value, key2=value, key0=value, key3=value}

filling 'HashMap.newHashMap(entries)' with 4 entries...
table size: [content]
8: [null, null, null, key0=value, null, null, null, null]
8: [key1=value, null, null, key0=value, null, null, null, null]
8: [key1=value, key2=value, null, key0=value, null, null, null, null]
8: [key1=value, key2=value, null, key0=value, null, null, key3=value, null]
map size: {content}
4: {key1=value, key2=value, key0=value, key3=value}

filling 'new HashMap<>(((int) Math.ceil(entries / 0.75)))' with 4 entries...
table size: [content]
8: [null, null, null, key0=value, null, null, null, null]
8: [key1=value, null, null, key0=value, null, null, null, null]
8: [key1=value, key2=value, null, key0=value, null, null, null, null]
8: [key1=value, key2=value, null, key0=value, null, null, key3=value, null]
map size: {content}
4: {key1=value, key2=value, key0=value, key3=value}

Conclusion

In Java 19 and later, we can use the new factory methods for creating mutable Sets/Maps with pre-allocated item count without having to calculate the internal capacity. The old constructor counterparts are now only needed when non-standard load factors are chosen.

- - - sidenotes - - -

HashMaps initialize their internal table lazily. Calling it pre-allocation isn't entirely correct. Adding the first entry will however allocate the correctly sized table even if its done later.

Since HashMap's internal table size is rounded up to the nearest power-of-two, the capacity might be still sufficient to not cause resize ops even when the constructor was used incorrectly by mistake without properly calculating the initial capacity (still no excuse for not fixing it ;)).




Comments:

Post a Comment:
  • HTML Syntax: NOT allowed