Michael Bien's Weblog

// Java 17's Enhanced Pseudo-Random Number Generators

Posted on July 16, 2021 by Michael Bien

JEP 356 adds a new set of pseudo-random number generators to Java 17 and a nice new API to list and instantiate them. Lets take a look.

RandomGeneratorFactory

The new main entry point is java.util.random.RandomGeneratorFactory, which can list all available factories (all()), get one by name (of("..")) or return the default factory (getDefault()). Lets see first what JDK 17 ships with.


            RandomGeneratorFactory.all()
                .map(fac -> fac.group()+":"+fac.name()
                                + " {"
                                + (fac.isSplittable()?" splitable":"")
                                + (fac.isStreamable()?" streamable":"")
                                + (fac.isJumpable()?" jumpable":"")
                                + (fac.isArbitrarilyJumpable()?" arbitrary-jumpable":"")
                                + (fac.isLeapable()?" leapable":"")
                                + (fac.isHardware()?" hardware":"")
                                + (fac.isStatistical()?" statistical":"")
                                + (fac.isStochastic()?" stochastic":"")
                                + " stateBits: "+fac.stateBits()
                                + " }"
                    )
                .sorted().forEach(System.out::println);

prints...


LXM:L128X1024MixRandom { splitable streamable statistical stateBits: 1152 }
LXM:L128X128MixRandom { splitable streamable statistical stateBits: 256 }
LXM:L128X256MixRandom { splitable streamable statistical stateBits: 384 }
LXM:L32X64MixRandom { splitable streamable statistical stateBits: 96 }
LXM:L64X1024MixRandom { splitable streamable statistical stateBits: 1088 }
LXM:L64X128MixRandom { splitable streamable statistical stateBits: 192 }
LXM:L64X128StarStarRandom { splitable streamable statistical stateBits: 192 }
LXM:L64X256MixRandom { splitable streamable statistical stateBits: 320 }
Legacy:Random { statistical stateBits: 48 }
Legacy:SecureRandom { stochastic stateBits: 2147483647 }
Legacy:SplittableRandom { splitable streamable statistical stateBits: 64 }
Xoroshiro:Xoroshiro128PlusPlus { streamable jumpable leapable statistical stateBits: 128 }
Xoshiro:Xoshiro256PlusPlus { streamable jumpable leapable statistical stateBits: 256 }

The Legecy group represents the old PRNGs. For example the "Random" factory will produce java.util.Random instances while "SecureRandom" produces java.security.SecureRandom.


        RandomGenerator rng1 = RandomGeneratorFactory.of("Random").create(42);   // new way
        RandomGenerator rng2 = new Random(42);                                   // old way
        RandomGenerator rng3 = RandomGeneratorFactory.getDefault().create(42);   // new default
        RandomGenerator rng4 = RandomGenerator.getDefault();                     // shortcut to new default

        System.out.println(rng1.getClass()); // class java.util.Random
        System.out.println(rng2.getClass()); // class java.util.Random
        System.out.println(rng3.getClass()); // class jdk.random.L32X64MixRandom
        System.out.println(rng4.getClass()); // class jdk.random.L32X64MixRandom

The default implementation is already a new algorithm of the LXM group, which is fine since the API didn't exist before - existing applications won't be affected. From the doc: "Returns a RandomGenerator meeting the minimal requirement of having an algorithm whose state bits are greater than or equal 64."

No Thread Safety

None of the new implementations are thread safe while both java.util.Random and java.security.SecureRandom are.

Although it is not very common to share the same instance between threads (there is even ThreadLocalRandom for this specific purpose if it doesn't have to be cryptographically secure), I would advice against blindly refactoring the code into something like


        RandomGenerator threadSafeQuestionMark = RandomGeneratorFactory.all()
                                    .filter(RandomGeneratorFactory::isStochastic)
                                    .sorted((g1, g2) -> g2.stateBits() - g1.stateBits())
                                    .findFirst().get().create();

This will return a thread safe SecureRandom now, but if there would be a better implementation in future, which isn't thread safe, your application might break if it relied on that fact. There is no isThreadSafe() in the API so there is no good way to filter it. Make sure you don't rely on the special nature of the legacy implementations before using filters in a forward-incompatible way. See SplittableGenerator section for a better solution instead to sharing.

Which random to pick?

...if you are using java.security.SecureRandom

If you look at the capability list above you will notice that one algorithm is not quite like the others. SecureRandom is the only stochastic algorithm and it initialized by some entropy source, usually responsibility of your kernel (/dev/random) during boot, or a lava lamp, etc. So if your application used SecureRandom before, keep using it, there is currently only one cryptographically strong RNG in the JDK.

...if you are using java.util.Random

You have several options to pick from now on (as long you don't share the instance between threads). The javadoc for the java.util.random package has a great description of the new algorithms, mixing functions and also a section which helps with choosing the right one.

Consider getDefault() before picking a factory at random ;)

...if you are using java.util.SplittableRandom

Consider switching to the new SplittableGenerators, see quote in benchmark section.

SplittableGenerator and Threads

As soon multiple threads are involved you want to make sure that individual threads don't generate the same random numbers in parallel. A quick and dirty way of doing this in past, was by simply sharing a thread safe java.util.Random. A slightly better aproach is ThreadLocalRandom.current() (however thread locals will face scalability issues once virtual threads arrive). But a much better approach is Java 8 java.util.SplittableRandom (see Legacy group above).

Java 17 adds several LXM implementations which all implement the SplittableGenerator interface of the new java.util.random package. The general idea is to split a new instance from a local source, before a new thread (or task) is spawned without causing any contention. This ensures that the instances are initialized in a way that they don't end up in the same cycle of pseudo random numbers.


        ExecutorService vte = Executors.newVirtualThreadExecutor();
        
        SplittableGenerator source = RandomGeneratorFactory.<SplittableGenerator>of("L128X1024MixRandom").create();
        source.splits(100).forEach((rng) -> {
            vte.submit(() -> {
                // this is one of 100 virtual threads with its own independent rng instance
                // the instance uses the same "L128X1024MixRandom" algorithm
                long random = rng.nextLong();
                ...
            });
        });

Each splitted generator is also a SplittableGenerator, so tasks can split more generators for their subtasks recursively on demand (useful for ForkJoinPools).

ScopeLocals of project loom will be another way to inject context dependent variables into tasks (but this is beyond JDK 17).


    private final static ScopeLocal<SplittableGenerator> rng_scope = 
                                            ScopeLocal.inheritableForType(SplittableGenerator.class);
    
    public static void main(String[] args) throws InterruptedException {
                
        SplittableGenerator rng1 =
                RandomGeneratorFactory.<SplittableGenerator>of("L128X1024MixRandom").create();
        SplittableGenerator rng2 =
                RandomGeneratorFactory.<SplittableGenerator>of("L32X64MixRandom").create();
                
        try (ExecutorService vte = Executors.newVirtualThreadExecutor()) {
            for (int i = 0; i < 5; i++) {
                ScopeLocal.where(rng_scope, rng1.split(), () -> { vte.submit(new Task()); });
            }   
            for (int i = 0; i < 5; i++) {
                ScopeLocal.where(rng_scope, rng2.split(), () -> { vte.submit(new Task()); });
            }
        }
    }
    
    private static class Task implements Runnable {
        @Override public void run() {
            SplittableGenerator rng = rng_scope.get();
            System.out.println(rng);
        }
    }

prints 5x L128X1024MixRandom and 5x L32X64MixRandom, with every virtual thread having its own instance.


jdk.random.L128X1024MixRandom@2d7b71b1
jdk.random.L128X1024MixRandom@7ab82aa3
jdk.random.L128X1024MixRandom@704041d3
jdk.random.L32X64MixRandom@3542c1bf
jdk.random.L32X64MixRandom@e941886
jdk.random.L32X64MixRandom@43dd13b
jdk.random.L32X64MixRandom@760156b6
jdk.random.L32X64MixRandom@556d3ef0
jdk.random.L128X1024MixRandom@456e8e4d
jdk.random.L128X1024MixRandom@316b0e77

Sources

A SplittableGenerator can also split a new instance of its implementation from a different source. Interestingly, the source has to be a SplittableGenerator as well.


interface SplittableGenerator extends StreamableGenerator {
...
        SplittableGenerator split();
...
        SplittableGenerator split(SplittableGenerator source);
...

After going through the sourcecode I couldn't find a reason why the source couldn't be a different generator type, for example a high entropy SecureRandom instance. So I asked on the core-libs-dev list and it turns out it is deliberate:

"(...) You are right that the comment in the JEP was a little loose, and that the implementation(s) of the split/splits methods could in principle draw random values from a RandomGenerator that is not itself splittable. There might even be applications for such functionality.

However, we chose not to support that more general functionality for a fairly subtle reason: there are concerns that if a PNRG is less than perfect, using it as a source of entropy for seeding a PRNG that uses a different algorithm might result in unexpected correlations that could drastically reduce the quality of the output of the new PRNG instance. (...) —Guy Steele " full mail

Basically implementations of SplittableGenerator have a certain baseline quality which makes them viable as source for splits of their own or other splittable implementations. Other algorithms which don't implement SplittableGenerator might not necessarily have this quality and could cause problems down the line - interesting.

Benchmark

There can't be a blog post about algorithms without the obligatory benchmark.


# Run complete. Total time: 00:21:44

Benchmark                         (name)  Mode  Cnt    Score    Error  Units
RandomJMH.rngInt      L128X1024MixRandom  avgt    5    5.037 ±  0.035  ns/op
RandomJMH.rngInt       L128X128MixRandom  avgt    5    3.640 ±  0.035  ns/op
RandomJMH.rngInt       L128X256MixRandom  avgt    5    3.948 ±  0.014  ns/op
RandomJMH.rngInt         L32X64MixRandom  avgt    5    1.983 ±  0.001  ns/op
RandomJMH.rngInt       L64X1024MixRandom  avgt    5    2.545 ±  0.001  ns/op
RandomJMH.rngInt        L64X128MixRandom  avgt    5    2.045 ±  0.006  ns/op
RandomJMH.rngInt   L64X128StarStarRandom  avgt    5    2.055 ±  0.023  ns/op
RandomJMH.rngInt        L64X256MixRandom  avgt    5    2.659 ±  1.715  ns/op
RandomJMH.rngInt                  Random  avgt    5    8.979 ±  0.001  ns/op
RandomJMH.rngInt            SecureRandom  avgt    5  183.858 ±  0.798  ns/op
RandomJMH.rngInt        SplittableRandom  avgt    5    1.291 ±  0.003  ns/op
RandomJMH.rngInt    Xoroshiro128PlusPlus  avgt    5    1.771 ±  0.001  ns/op
RandomJMH.rngInt      Xoshiro256PlusPlus  avgt    5    2.063 ±  0.023  ns/op
RandomJMH.rngLong     L128X1024MixRandom  avgt    5    5.035 ±  0.037  ns/op
RandomJMH.rngLong      L128X128MixRandom  avgt    5    3.647 ±  0.046  ns/op
RandomJMH.rngLong      L128X256MixRandom  avgt    5    3.953 ±  0.042  ns/op
RandomJMH.rngLong        L32X64MixRandom  avgt    5    3.003 ±  0.001  ns/op
RandomJMH.rngLong      L64X1024MixRandom  avgt    5    2.589 ±  0.030  ns/op
RandomJMH.rngLong       L64X128MixRandom  avgt    5    2.046 ±  0.005  ns/op
RandomJMH.rngLong  L64X128StarStarRandom  avgt    5    2.052 ±  0.027  ns/op
RandomJMH.rngLong       L64X256MixRandom  avgt    5    2.455 ±  0.001  ns/op
RandomJMH.rngLong                 Random  avgt    5   17.983 ±  0.190  ns/op
RandomJMH.rngLong           SecureRandom  avgt    5  367.623 ±  2.274  ns/op
RandomJMH.rngLong       SplittableRandom  avgt    5    1.296 ±  0.014  ns/op
RandomJMH.rngLong   Xoroshiro128PlusPlus  avgt    5    1.776 ±  0.023  ns/op
RandomJMH.rngLong     Xoshiro256PlusPlus  avgt    5    2.063 ±  0.001  ns/op

linux 5.10.49; jdk-17+28; CPU i7-6700K, HT off, boost time limit off, boost thread limit off. source.

The bad performance of the old Random class is most likely attributed to its thread safe promise, it has to work with atomic longs and CAS loops while the new implementations can just compute and return the next value.

Keep in mind this is just CPU time, this does not take per instance memory footprint or the mathematical properties of the algorithms into account (javadoc for more info). This benchmark also only tests two methods.

The old SplittableRandom for example, although performing very well, has its own problems. Quoting the JEP: "In 2016, testing revealed two new weaknesses in the algorithm used by class SplittableRandom. On the one hand, a relatively minor revision can avoid those weaknesses. On the other hand, a new class of splittable PRNG algorithms (LXM) has also been discovered that are almost as fast, even easier to implement, and appear to completely avoid the three classes of weakness to which SplittableRandom is prone."

Summary

Java 17 adds the java.util.random package with new APIs and PRNG implementations. Switching to it can be worth it, but be careful when migrating old code due to changes in thread safety. If you are using SecureRandom, keep using it, in all other cases consider getDefault() instead of legacy Random or pick a specific implementation which fits your use case best. Take a look at SplittableGenerators of the LXM group for multi threaded scenarios. If it doesn't have to be splittable, consider the Xoroshiro and Xoshiro implementations.

Thats all for now. Until next time ;)

Tags: apis concurrency java rng Permalink 3 Comments

// Configuring Eclipse Jetty to use Virtual Threads

Posted on August 05, 2020 by Michael Bien

A quick guide about how to configure Jetty to use Project Loom's virtual threads instead of plain old java threads.

Jetty's default thread pool implementation can be swapped out by implementing Jetty's ThreadPool interface and passing an instance to the Server constructor. If you are using jetty stand alone, everything is initialized by xml files.

Assuming you are using the recommended jetty home / jetty base folder structure, all what is needed is to create jetty-threadpool.xml in [jetty-base]/etc containing the following:


<Configure>
<New id="threadPool" class="dev.mbien.virtualthreads4jetty.VirtualThreadExecutor"/>
</Configure>

and put a jar containing the custom VirtualThreadExecutor into [jetty-base]/lib/ext. I uploaded a build to the release section of the vt4jetty github project.

If you don't have an lib/ext folder yet you can enable it with:


java -jar $JETTY_HOME/start.jar --add-to-start=ext

here the code:


package dev.mbien.virtualthreads4jetty;

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import org.eclipse.jetty.util.thread.ThreadPool;

/**
 * Executes each task in a new virtual thread.
 * 
 * <p>Java's default ForkJoinPool is used as scheduler. To influence carrier
 * thread count use -Djdk.defaultScheduler.parallelism=N. Default is
 * {@link Runtime#availableProcessors()}.
 * 
 * @author mbien
 */
public class VirtualThreadExecutor implements ThreadPool {
    
    private final ExecutorService executor;

    public VirtualThreadExecutor() {
        executor = Executors.newThreadExecutor(
                Thread.builder().virtual().name("jetty-vt#", 0).factory());
        // too early for logging libs
        System.out.println("VirtualThreadExecutor is active.");
    }
    
    @Override
    public void execute(Runnable command) {
        executor.execute(command);
    }

    @Override
    public void join() throws InterruptedException {
        executor.shutdown();
        executor.awaitTermination(3, TimeUnit.SECONDS);
    }

    // those are hopefully only used for stats/dashboards etc
    @Override
    public int getThreads() { return -1; }

    @Override
    public int getIdleThreads() { return -1; }

    @Override
    public boolean isLowOnThreads() { return false; }
    
}

Tested with JDK16-loom+4-56 (2020/7/25) early access build from here and latest Jetty.

I encountered some JVM crashes while load testing Apache Roller with virtual threads enabled - keep in mind this is still all very much work in progress.

Tags: concurrency java jetty jvm Permalink No Comments

// Taking a look at Virtual Threads (Project Loom)

Posted on May 22, 2020 by Michael Bien

Project Loom introduces lightweight, JVM managed, virtual threads (old name: fibers) to java. Lets take a look how the project is progressing and see how they compare to plain old OS managed threads.

Loom is currently a separate project based on JDK 15, but since there is no JEP available yet and the deadline is approaching, it is unlikely that it will land in the main repository as preview feature before JDK 16. Early access binaries are available from jdk.java.net/loom. I used Build 15-loom+7-141 (2020/5/11) for the experiments in this blog entry - be warned the API is not final.

virtual threads

Virtual threads are lightweight threads scheduled by the java runtime to run on plain old java threads ("real" threads). The threads used to run virtual threads are called carrier threads. While plain old java threads (POTs) can be fairly heavyweight due to the fact that they represent OS threads, millions of virtual threads can be spawned without causing problems.

The main feature however is that a virtual thread doesn't block its current carrier thread on blocking operations, like IO (Sockets, NIO channels...) or blocking java.util.concurrent API calls (Semaphores, Queues, Locks, Future.get()...) and even Thread.sleep(). Instead of blocking, the carrier will mount and resume a different virtual thread while the blocked virtual thread is waiting for a resource or an event to occur. Once the virtual thread is no longer blocked, it simply resumes execution on the next available carrier thread.

This should allow more efficient use of the CPU and additionally reduce the total number of POTs, since a thread running on a core which would normally be idle while waiting for a resource, can now work on something else, by replacing a blocked virtual thread with a another which isn't blocked.

some properties:

virtual threads are java entities, independent of OS threads
java.lang.Thread is used for both kinds of threads, virtual and OS
all virtual threads are daemons
spawning and blocking virtual threads is cheap
virtual threads require carrier threads to run on
- a carrier thread runs a virtual thread by mounting it
- if the VT blocks, the stack is stored and the VT is unmounted to be resumed later
j.u.c.Executor like a ForkJoinPool or ThreadPoolExecutor is used to schedule VTs to carriers
- custom schedulers can be provided by implementing the Executor interface
millions of virtual threads can run on few carrier threads
Continuations (basically a VT without a scheduler) won't be in the initial release but might appear later
serialization is planned but currently low priority

edge cases:

ThreadLocals must be used with care but will still work
- Thread.Builder#disallowThreadLocals() can be used to prohibit it entirely
- better solutions like Scopes, Carrier- or ProcessorLocals might be implemented in future
some situations will cause pinning which will block the carrier if the virtual thread blocks while pinned
- native stack frames will cause pinning
- blocking a VT while holding a monitor (i.e. synchronized block) will currently block the carrier
  - this might be only a temporary limitation
  - doesn't apply to alternatives like j.u.c.ReentrantLock which can be used instead
- -Djdk.tracePinnedThreads=short or -Djdk.tracePinnedThreads=full will log pinned threads

for more info: [State of Loom 1, 2] [Loom Proposal] [loom-dev mailing list] [ea javadoc]

a quick test

Since Loom is implemented as a preview feature, the flag --enable-preview has to be passed to both javac and also to the JVM at launch. This will load the preview module and tell the JVM that it is ok to run bytecode which has been compiled with preview features. This should reduce the risk of it accidentally landing on production machines via a maven repository :).


    public static void main(String[] args) {
        Thread.startVirtualThread(() -> {
            System.out.println("Hello Loom from "+Thread.currentThread()+"!");
        });
    }

output:

Hello Loom from VirtualThread[<unnamed>,ForkJoinPool-1-worker-3,CarrierThreads]!

The code above attaches a Runnable via Thread.startVirtualThread(Runnable task) to a new virtual thread and schedules it for execution on the global carrier thread pool. The output shows that the carrier thread pool in use is in fact a j.u.c.ForkJoinPool which has a work-stealing scheduler. The size of the global carrier pool can be set with the flag -Djdk.defaultScheduler.parallelism=N, the default is set to the available core count (or hardware thread count or whatever the container is configured to return).

a better test

The following example can run tasks on either a classic fixed size thread pool (with POTs) or on a unbounded virtual thread pool. The virtual thread pool attaches each task to a new virtual thread and uses the fixed size thread pool as carrier pool. The tasks consist of a simulated IO part and a computational part, the carrier thread count and the number of tasks can be adjusted.

This is no benchmark or load test, but rather an attempt to demonstrate the differences between the two thread types.



    public static void main(String[] args) throws InterruptedException {
       
        final boolean USE_VIRTUAL_THREADS = true;
        final int CARRIER_THREAD_COUNT = 1;
        final int TASK_COUNT = 2;
        
        // plain old thread factory and thread pool using the new builder
        ThreadFactory carrierTF = Thread.builder().name("carrier#", 0)
                                                  .daemon(true).factory();
        ExecutorService carrierPool = Executors.newFixedThreadPool(
                                          CARRIER_THREAD_COUNT, carrierTF);
        
        ExecutorService executor;
        
        if(USE_VIRTUAL_THREADS) {

            // factory for virtual threads scheduled on the carrier pool
            ThreadFactory virtualTF = Thread.builder()
                    .virtual(carrierPool)
                    .name("virtual#", 0).factory();

            // thread executor will spawn a new virtual thread for each task
            executor = Executors.newThreadExecutor(virtualTF);
            
        }else{
            executor = carrierPool;
        }
        
        for (int i = 0; i < TASK_COUNT; i++)
            executor.submit(new WaitAndHurry());
        
        executor.shutdown();
        executor.awaitTermination(20, TimeUnit.SECONDS); // virtual threads are daemons
        
    }

The task itself is less interesting:


    
    private final static class WaitAndHurry implements Runnable {

        private final static long START_TIME = System.nanoTime();
        
        @Override
        public void run() {
            doIO();    // block for 2s
            doWork();  // compute something for ~2s
            print("done");
        }

        private void doIO() {
            print("io");
            try {
                Thread.sleep(2000);
            } catch (InterruptedException ex) {
                throw new RuntimeException(ex);
            }
        }

        private void doWork() {
            print("work");
            long number = 479001599; 
            boolean prime = true;
            for(long i = 2; i <= number/2; ++i) {
                if(number % i == 0)  {
                    prime = false;
                    break;
                }
            }
            if (!prime) {throw new RuntimeException("wrong result");} // to prevent the JIT to optimize everything away
        }

        private void print(String msg) {
            double elapsed = (System.nanoTime()-START_TIME)/1_000_000_000.0d;
            String timestamp = String.format("%.2fs", elapsed);
            System.out.println(timestamp + " " + Thread.currentThread() + " " + msg);
        }
        
    }

output for 1 carrier thread and 2 tasks attached to virtual threads:

0.00s VirtualThread[virtual#0,carrier#0,main] io
0.01s VirtualThread[virtual#1,carrier#0,main] io
2.03s VirtualThread[virtual#0,carrier#0,main] work
3.88s VirtualThread[virtual#0,carrier#0,main] done
3.88s VirtualThread[virtual#1,carrier#0,main] work
5.67s VirtualThread[virtual#1,carrier#0,main] done

Knowing that the IO part of the task takes 2s and the computational part about 1.8s (on my system without warmup) we can put it into a chart by looking at the timestamps:

  VT0: |WAIT||WORK|
  VT1: |WAIT|      |WORK|

If we view the carrier thread as a resource we can draw a less abstract version which is closer to reality:

  CT0: |IDLE||WORK||WORK|
  VT0: |WAIT|     .
  VT1: |WAIT|           .

This shows that virtual threads already have ability to wait in parallel, even when run on just a single carrier thread. The carrier thread is also the only entity which is able to do work since it can only mount one virtual thread at a time.

Rule of thumb: virtual threads are concurrent waiters while real threads are concurrent workers.

Classic thread pool for reference using a single thread:

0.00s Thread[carrier#0,5,main] io
2.02s Thread[carrier#0,5,main] work
3.84s Thread[carrier#0,5,main] done
3.84s Thread[carrier#0,5,main] io
5.84s Thread[carrier#0,5,main] work
7.67s Thread[carrier#0,5,main] done

  CT0: |WAIT||WORK||WAIT||WORK|

Sequential as expected.

lets bump it to 2 carrier threads and 4 tasks:

0.02s VirtualThread[virtual#0,carrier#0,main] io
0.03s VirtualThread[virtual#2,carrier#0,main] io
0.03s VirtualThread[virtual#3,carrier#0,main] io
0.02s VirtualThread[virtual#1,carrier#1,main] io
2.03s VirtualThread[virtual#0,carrier#0,main] work
2.04s VirtualThread[virtual#2,carrier#1,main] work
3.85s VirtualThread[virtual#2,carrier#1,main] done
3.85s VirtualThread[virtual#3,carrier#1,main] work
3.86s VirtualThread[virtual#0,carrier#0,main] done
3.86s VirtualThread[virtual#1,carrier#0,main] work
5.63s VirtualThread[virtual#3,carrier#1,main] done
5.69s VirtualThread[virtual#1,carrier#0,main] done

  VT0: |WAIT||WORK|
  VT1: |WAIT||WORK|
  VT2: |WAIT|      |WORK|
  VT3: |WAIT|      |WORK|
        
  CT0: |IDLE||WORK||WORK|
  CT1: |IDLE||WORK||WORK|
  VT0: |WAIT|     .
  VT1: |WAIT|     .
  VT2: |WAIT|           .
  VT3: |WAIT|           .

Now we gained the ability to work in parallel using two threads while using all virtual threads to wait in parallel - best of both worlds.

Classic thread pool for reference using 2 threads:

0.00s Thread[carrier#1,5,main] io
0.00s Thread[carrier#0,5,main] io
2.03s Thread[carrier#1,5,main] work
2.03s Thread[carrier#0,5,main] work
3.87s Thread[carrier#0,5,main] done
3.87s Thread[carrier#0,5,main] io
3.88s Thread[carrier#1,5,main] done
3.88s Thread[carrier#1,5,main] io
5.87s Thread[carrier#0,5,main] work
5.88s Thread[carrier#1,5,main] work
7.67s Thread[carrier#0,5,main] done
7.70s Thread[carrier#1,5,main] done

  CT0: |WAIT||WORK||WAIT||WORK|
  CT1: |WAIT||WORK||WAIT||WORK|

No surprises.

real threads in a virtual world

Virtual threads implicitly convert blocking APIs into a async/await pattern - and you won't even have to be aware of it as user of an API (most of the time at least). Entire callback based frameworks (buzzword: reactive) could be made obsolete, since their main purpose has always been to avoid that programmers have to deal with any kind of concurrency problems, often even accepting that nothing can run in parallel in them (only parallel waiting is happening behind the scenes, basically like in python, or virtual threads on a single carrier in our example). Even Node.js received basic worker_threads in v12 using a language which is single threaded by design (data is copied to the worker when it starts and copied back again in a callback when the job is done).

Java on the other hand was multi threaded since the beginning (25 years ago; time flies) and is only now getting virtual threads (if you don't count green threads of Java 1.1). Since virtual threads are using the same java.lang.Thread class as the OS threads do, they are pretty much interchangeable with each other and can keep using established APIs. Asynchronous IO APIs are hopefully going to be used less often in future, because code which does async IO now, can be made less error prone and easier to read by using simple blocking IO APIs from within virtual threads.

Plain old java threads will most likely still have a purpose (beside being a carrier) in future however: Not every long running background task which is periodically reading a few bytes from a file will benefit from being virtual and limitations like pinning due to native stack frames in virtual threads which also block the carrier, will probably always require some additional POTs for special cases.

summary

Project Loom made significant progress recently and is already in a fairly usable state. I am looking forward to it being integrated into the main repository (hopefully JDK 16 or 17). Virtual Threads have the potential to be a big game changer for Java: better concurrency while using fewer OS threads without significant code changes - what more to ask.

Debugging a few million virtual threads is going to be interesting, thread dumps of the future will require a bit more tooling, e.g. hierarchical views etc or at least a good scroll wheel on the mouse :)

Tags: concurrency forkjoin java jvm Permalink No Comments