// Java 19's new Map/Set Factory Methods

Beside the big preview features of Java 19 (e.g. virtual threads which I blogged about over two years ago - time flies), there are also some noteworthy API updates which might fall under the radar. One API change are new factory methods for creating mutable, pre-sized Map and Set collections. Lets take a quick look and find out how they differ from the old constructor counterparts and when to use them instead.

Pre-sized Maps/Sets

Many collections can be created with pre-allocated initial capacity if the number of items which will be stored in them is already known or can be estimated. This avoids unnecessary resizing operations which happen in the background while the implementation has to dynamically expand the collection based on item count.

This is all straight forward but there is one little anomaly: HashMaps (and related collections) have an initialCapacity and a loadFactor (default: 0.75) parameter. The initialCapacity however is not for the expected entry count, it is for the initial size of an internal table (impl. detail: rounded up to the nearest power-of-two), which is larger than the actual entry count in the map, since it is only filled until the given loadFactor is reached before it is resized. [javadoc]

This detail is very easy to overlook, for example:


        Map<String, String> map = new HashMap<>(4);
        map.put("one", "1");
        map.put("two", "2");
        map.put("three", "3");
        map.put("four", "4");

may look correct on first glance, but it will resize the internal table when more than 0.75*4 entries are added (which is the case above). Resizing a (large) Map can be comparatively expensive, the code responsible for it isn't trivial. Further, the javadoc mentions that "creating a HashMap with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table".

What we want is to create a Map which can actually hold 4 entries, for this we have to calculate the capacity parameter unfortunately:


// Java 18
Map<String, String> map = new HashMap<>((int) Math.ceil(4 / 0.75));

The new factory methods are simply hiding this calculation.


// Java 19+
// analog exists for HashSet, LinkedHashSet, LinkedHashMap and WeakHashMap
Map<String, String> map = HashMap.newHashMap(4);

If you think you made this mistake before - don't feel bad :). Since even OpenJDK code overlooked this detail on some occasions as can be seen in PR1 and PR2 which introduce the new factory methods and also refactor JDK code to use them (instead of the various ways how the capacity was calculated). Stuart Marks gives here a few more examples of how to not calculate it. The javadoc of the respective constructors (e.g. for HashMap#HashMap(int)) contains now also an API Note which delegates to the factory methods.

note: ConcurrentHashMap and IdentityHashmap already expect the parameter being the actual entry count, which means they didn't receive factory methods.

I might add some jackpot code transformation rules to this collection to make migration a bit more convenient (update: done).

Just for Fun: A Quick Experiment

We can show this by observing the internal table size of the Map while adding entries:


    public static void main(String[] args) throws ReflectiveOperationException {
        System.out.println(Runtime.version());
        
        int entries = 4; // small number to fit the output on a blog entry
        inspect("new HashMap<>(entries)", new HashMap<>(entries), entries);
        inspect("HashMap.newHashMap(entries)", HashMap.newHashMap(entries), entries);
        inspect("new HashMap<>(((int) Math.ceil(entries / 0.75)))", new HashMap<>(((int) Math.ceil(entries / 0.75))), entries);
    }

    private static void inspect(String desc, Map<? super Object, ? super Object> map,
                                int entries) throws ReflectiveOperationException {

        System.out.println();
        System.out.println("filling '"+desc+"' with "+entries+" entries...");
        System.out.println("table size: [content]");
        for (int i = 0; i < entries; i++) {
            map.put("key"+i, "value");

            Field field = HashMap.class.getDeclaredField("table");
            field.setAccessible(true);
            Object[] table = (Object[]) field.get(map);
            System.out.println(table.length+": "+Arrays.asList(table));
        }

        System.out.println("map size: {content}");
        System.out.println(map.size()+": "+map);
    }

output:


19+36-2238

filling 'new HashMap<>(entries)' with 4 entries...
table size: [content]
4: [null, null, null, key0=value]
4: [key1=value, null, null, key0=value]
4: [key1=value, key2=value, null, key0=value]
8: [key1=value, key2=value, null, key0=value, null, null, key3=value, null]
map size: {content}
4: {key1=value, key2=value, key0=value, key3=value}

filling 'HashMap.newHashMap(entries)' with 4 entries...
table size: [content]
8: [null, null, null, key0=value, null, null, null, null]
8: [key1=value, null, null, key0=value, null, null, null, null]
8: [key1=value, key2=value, null, key0=value, null, null, null, null]
8: [key1=value, key2=value, null, key0=value, null, null, key3=value, null]
map size: {content}
4: {key1=value, key2=value, key0=value, key3=value}

filling 'new HashMap<>(((int) Math.ceil(entries / 0.75)))' with 4 entries...
table size: [content]
8: [null, null, null, key0=value, null, null, null, null]
8: [key1=value, null, null, key0=value, null, null, null, null]
8: [key1=value, key2=value, null, key0=value, null, null, null, null]
8: [key1=value, key2=value, null, key0=value, null, null, key3=value, null]
map size: {content}
4: {key1=value, key2=value, key0=value, key3=value}

Conclusion

In Java 19 and later, we can use the new factory methods for creating mutable Sets/Maps with pre-allocated item count without having to calculate the internal capacity. The old constructor counterparts are now only needed when non-standard load factors are chosen.

- - - sidenotes - - -

HashMaps initialize their internal table lazily. Calling it pre-allocation isn't entirely correct. Adding the first entry will however allocate the correctly sized table even if its done later.

Since HashMap's internal table size is rounded up to the nearest power-of-two, the capacity might be still sufficient to not cause resize ops even when the constructor was used incorrectly by mistake without properly calculating the initial capacity (still no excuse for not fixing it ;)).


// Java 17's Enhanced Pseudo-Random Number Generators

JEP 356 adds a new set of pseudo-random number generators to Java 17 and a nice new API to list and instantiate them. Lets take a look.

RandomGeneratorFactory

The new main entry point is java.util.random.RandomGeneratorFactory, which can list all available factories (all()), get one by name (of("..")) or return the default factory (getDefault()). Lets see first what JDK 17 ships with.


            RandomGeneratorFactory.all()
                .map(fac -> fac.group()+":"+fac.name()
                                + " {"
                                + (fac.isSplittable()?" splitable":"")
                                + (fac.isStreamable()?" streamable":"")
                                + (fac.isJumpable()?" jumpable":"")
                                + (fac.isArbitrarilyJumpable()?" arbitrary-jumpable":"")
                                + (fac.isLeapable()?" leapable":"")
                                + (fac.isHardware()?" hardware":"")
                                + (fac.isStatistical()?" statistical":"")
                                + (fac.isStochastic()?" stochastic":"")
                                + " stateBits: "+fac.stateBits()
                                + " }"
                    )
                .sorted().forEach(System.out::println);

prints...


LXM:L128X1024MixRandom { splitable streamable statistical stateBits: 1152 }
LXM:L128X128MixRandom { splitable streamable statistical stateBits: 256 }
LXM:L128X256MixRandom { splitable streamable statistical stateBits: 384 }
LXM:L32X64MixRandom { splitable streamable statistical stateBits: 96 }
LXM:L64X1024MixRandom { splitable streamable statistical stateBits: 1088 }
LXM:L64X128MixRandom { splitable streamable statistical stateBits: 192 }
LXM:L64X128StarStarRandom { splitable streamable statistical stateBits: 192 }
LXM:L64X256MixRandom { splitable streamable statistical stateBits: 320 }
Legacy:Random { statistical stateBits: 48 }
Legacy:SecureRandom { stochastic stateBits: 2147483647 }
Legacy:SplittableRandom { splitable streamable statistical stateBits: 64 }
Xoroshiro:Xoroshiro128PlusPlus { streamable jumpable leapable statistical stateBits: 128 }
Xoshiro:Xoshiro256PlusPlus { streamable jumpable leapable statistical stateBits: 256 }

The Legecy group represents the old PRNGs. For example the "Random" factory will produce java.util.Random instances while "SecureRandom" produces java.security.SecureRandom.


        RandomGenerator rng1 = RandomGeneratorFactory.of("Random").create(42);   // new way
        RandomGenerator rng2 = new Random(42);                                   // old way
        RandomGenerator rng3 = RandomGeneratorFactory.getDefault().create(42);   // new default
        RandomGenerator rng4 = RandomGenerator.getDefault();                     // shortcut to new default

        System.out.println(rng1.getClass()); // class java.util.Random
        System.out.println(rng2.getClass()); // class java.util.Random
        System.out.println(rng3.getClass()); // class jdk.random.L32X64MixRandom
        System.out.println(rng4.getClass()); // class jdk.random.L32X64MixRandom

The default implementation is already a new algorithm of the LXM group, which is fine since the API didn't exist before - existing applications won't be affected. From the doc: "Returns a RandomGenerator meeting the minimal requirement of having an algorithm whose state bits are greater than or equal 64."

No Thread Safety

None of the new implementations are thread safe while both java.util.Random and java.security.SecureRandom are.

Although it is not very common to share the same instance between threads (there is even ThreadLocalRandom for this specific purpose if it doesn't have to be cryptographically secure), I would advice against blindly refactoring the code into something like


        RandomGenerator threadSafeQuestionMark = RandomGeneratorFactory.all()
                                    .filter(RandomGeneratorFactory::isStochastic)
                                    .sorted((g1, g2) -> g2.stateBits() - g1.stateBits())
                                    .findFirst().get().create();

This will return a thread safe SecureRandom now, but if there would be a better implementation in future, which isn't thread safe, your application might break if it relied on that fact. There is no isThreadSafe() in the API so there is no good way to filter it. Make sure you don't rely on the special nature of the legacy implementations before using filters in a forward-incompatible way. See SplittableGenerator section for a better solution instead to sharing.

Which random to pick?

...if you are using java.security.SecureRandom

If you look at the capability list above you will notice that one algorithm is not quite like the others. SecureRandom is the only stochastic algorithm and it initialized by some entropy source, usually responsibility of your kernel (/dev/random) during boot, or a lava lamp, etc. So if your application used SecureRandom before, keep using it, there is currently only one cryptographically strong RNG in the JDK.

...if you are using java.util.Random

You have several options to pick from now on (as long you don't share the instance between threads). The javadoc for the java.util.random package has a great description of the new algorithms, mixing functions and also a section which helps with choosing the right one.

Consider getDefault() before picking a factory at random ;)

...if you are using java.util.SplittableRandom

Consider switching to the new SplittableGenerators, see quote in benchmark section.

SplittableGenerator and Threads

As soon multiple threads are involved you want to make sure that individual threads don't generate the same random numbers in parallel. A quick and dirty way of doing this in past, was by simply sharing a thread safe java.util.Random. A slightly better aproach is ThreadLocalRandom.current() (however thread locals will face scalability issues once virtual threads arrive). But a much better approach is Java 8 java.util.SplittableRandom (see Legacy group above).

Java 17 adds several LXM implementations which all implement the SplittableGenerator interface of the new java.util.random package. The general idea is to split a new instance from a local source, before a new thread (or task) is spawned without causing any contention. This ensures that the instances are initialized in a way that they don't end up in the same cycle of pseudo random numbers.


        ExecutorService vte = Executors.newVirtualThreadExecutor();
        
        SplittableGenerator source = RandomGeneratorFactory.<SplittableGenerator>of("L128X1024MixRandom").create();
        source.splits(100).forEach((rng) -> {
            vte.submit(() -> {
                // this is one of 100 virtual threads with its own independent rng instance
                // the instance uses the same "L128X1024MixRandom" algorithm
                long random = rng.nextLong();
                ...
            });
        });

Each splitted generator is also a SplittableGenerator, so tasks can split more generators for their subtasks recursively on demand (useful for ForkJoinPools).

ScopeLocals of project loom will be another way to inject context dependent variables into tasks (but this is beyond JDK 17).


    private final static ScopeLocal<SplittableGenerator> rng_scope = 
                                            ScopeLocal.inheritableForType(SplittableGenerator.class);
    
    public static void main(String[] args) throws InterruptedException {
                
        SplittableGenerator rng1 =
                RandomGeneratorFactory.<SplittableGenerator>of("L128X1024MixRandom").create();
        SplittableGenerator rng2 =
                RandomGeneratorFactory.<SplittableGenerator>of("L32X64MixRandom").create();
                
        try (ExecutorService vte = Executors.newVirtualThreadExecutor()) {
            for (int i = 0; i < 5; i++) {
                ScopeLocal.where(rng_scope, rng1.split(), () -> { vte.submit(new Task()); });
            }   
            for (int i = 0; i < 5; i++) {
                ScopeLocal.where(rng_scope, rng2.split(), () -> { vte.submit(new Task()); });
            }
        }
    }
    
    private static class Task implements Runnable {
        @Override public void run() {
            SplittableGenerator rng = rng_scope.get();
            System.out.println(rng);
        }
    }
prints 5x L128X1024MixRandom and 5x L32X64MixRandom, with every virtual thread having its own instance.

jdk.random.L128X1024MixRandom@2d7b71b1
jdk.random.L128X1024MixRandom@7ab82aa3
jdk.random.L128X1024MixRandom@704041d3
jdk.random.L32X64MixRandom@3542c1bf
jdk.random.L32X64MixRandom@e941886
jdk.random.L32X64MixRandom@43dd13b
jdk.random.L32X64MixRandom@760156b6
jdk.random.L32X64MixRandom@556d3ef0
jdk.random.L128X1024MixRandom@456e8e4d
jdk.random.L128X1024MixRandom@316b0e77

Sources

A SplittableGenerator can also split a new instance of its implementation from a different source. Interestingly, the source has to be a SplittableGenerator as well.


interface SplittableGenerator extends StreamableGenerator {
...
        SplittableGenerator split();
...
        SplittableGenerator split(SplittableGenerator source);
...

After going through the sourcecode I couldn't find a reason why the source couldn't be a different generator type, for example a high entropy SecureRandom instance. So I asked on the core-libs-dev list and it turns out it is deliberate:

"(...) You are right that the comment in the JEP was a little loose, and that the implementation(s) of the split/splits methods could in principle draw random values from a RandomGenerator that is not itself splittable. There might even be applications for such functionality.

However, we chose not to support that more general functionality for a fairly subtle reason: there are concerns that if a PNRG is less than perfect, using it as a source of entropy for seeding a PRNG that uses a different algorithm might result in unexpected correlations that could drastically reduce the quality of the output of the new PRNG instance. (...) —Guy Steele " full mail

Basically implementations of SplittableGenerator have a certain baseline quality which makes them viable as source for splits of their own or other splittable implementations. Other algorithms which don't implement SplittableGenerator might not necessarily have this quality and could cause problems down the line - interesting.

Benchmark

There can't be a blog post about algorithms without the obligatory benchmark.


# Run complete. Total time: 00:21:44

Benchmark                         (name)  Mode  Cnt    Score    Error  Units
RandomJMH.rngInt      L128X1024MixRandom  avgt    5    5.037 ±  0.035  ns/op
RandomJMH.rngInt       L128X128MixRandom  avgt    5    3.640 ±  0.035  ns/op
RandomJMH.rngInt       L128X256MixRandom  avgt    5    3.948 ±  0.014  ns/op
RandomJMH.rngInt         L32X64MixRandom  avgt    5    1.983 ±  0.001  ns/op
RandomJMH.rngInt       L64X1024MixRandom  avgt    5    2.545 ±  0.001  ns/op
RandomJMH.rngInt        L64X128MixRandom  avgt    5    2.045 ±  0.006  ns/op
RandomJMH.rngInt   L64X128StarStarRandom  avgt    5    2.055 ±  0.023  ns/op
RandomJMH.rngInt        L64X256MixRandom  avgt    5    2.659 ±  1.715  ns/op
RandomJMH.rngInt                  Random  avgt    5    8.979 ±  0.001  ns/op
RandomJMH.rngInt            SecureRandom  avgt    5  183.858 ±  0.798  ns/op
RandomJMH.rngInt        SplittableRandom  avgt    5    1.291 ±  0.003  ns/op
RandomJMH.rngInt    Xoroshiro128PlusPlus  avgt    5    1.771 ±  0.001  ns/op
RandomJMH.rngInt      Xoshiro256PlusPlus  avgt    5    2.063 ±  0.023  ns/op
RandomJMH.rngLong     L128X1024MixRandom  avgt    5    5.035 ±  0.037  ns/op
RandomJMH.rngLong      L128X128MixRandom  avgt    5    3.647 ±  0.046  ns/op
RandomJMH.rngLong      L128X256MixRandom  avgt    5    3.953 ±  0.042  ns/op
RandomJMH.rngLong        L32X64MixRandom  avgt    5    3.003 ±  0.001  ns/op
RandomJMH.rngLong      L64X1024MixRandom  avgt    5    2.589 ±  0.030  ns/op
RandomJMH.rngLong       L64X128MixRandom  avgt    5    2.046 ±  0.005  ns/op
RandomJMH.rngLong  L64X128StarStarRandom  avgt    5    2.052 ±  0.027  ns/op
RandomJMH.rngLong       L64X256MixRandom  avgt    5    2.455 ±  0.001  ns/op
RandomJMH.rngLong                 Random  avgt    5   17.983 ±  0.190  ns/op
RandomJMH.rngLong           SecureRandom  avgt    5  367.623 ±  2.274  ns/op
RandomJMH.rngLong       SplittableRandom  avgt    5    1.296 ±  0.014  ns/op
RandomJMH.rngLong   Xoroshiro128PlusPlus  avgt    5    1.776 ±  0.023  ns/op
RandomJMH.rngLong     Xoshiro256PlusPlus  avgt    5    2.063 ±  0.001  ns/op

linux 5.10.49; jdk-17+28; CPU i7-6700K, HT off, boost time limit off, boost thread limit off. source.

The bad performance of the old Random class is most likely attributed to its thread safe promise, it has to work with atomic longs and CAS loops while the new implementations can just compute and return the next value.

Keep in mind this is just CPU time, this does not take per instance memory footprint or the mathematical properties of the algorithms into account (javadoc for more info). This benchmark also only tests two methods.

The old SplittableRandom for example, although performing very well, has its own problems. Quoting the JEP: "In 2016, testing revealed two new weaknesses in the algorithm used by class SplittableRandom. On the one hand, a relatively minor revision can avoid those weaknesses. On the other hand, a new class of splittable PRNG algorithms (LXM) has also been discovered that are almost as fast, even easier to implement, and appear to completely avoid the three classes of weakness to which SplittableRandom is prone."

Summary

Java 17 adds the java.util.random package with new APIs and PRNG implementations. Switching to it can be worth it, but be careful when migrating old code due to changes in thread safety. If you are using SecureRandom, keep using it, in all other cases consider getDefault() instead of legacy Random or pick a specific implementation which fits your use case best. Take a look at SplittableGenerators of the LXM group for multi threaded scenarios. If it doesn't have to be splittable, consider the Xoroshiro and Xoshiro implementations.

Thats all for now. Until next time ;)