// Taking a look at Virtual Threads (Project Loom)

Project Loom introduces lightweight, JVM managed, virtual threads (old name: fibers) to java. Lets take a look how the project is progressing and see how they compare to plain old OS managed threads.

Loom is currently a separate project based on JDK 15, but since there is no JEP available yet and the deadline is approaching, it is unlikely that it will land in the main repository as preview feature before JDK 16. Early access binaries are available from jdk.java.net/loom. I used Build 15-loom+7-141 (2020/5/11) for the experiments in this blog entry - be warned the API is not final.

virtual threads

Virtual threads are lightweight threads scheduled by the java runtime to run on plain old java threads ("real" threads). The threads used to run virtual threads are called carrier threads. While plain old java threads (POTs) can be fairly heavyweight due to the fact that they represent OS threads, millions of virtual threads can be spawned without causing problems.

The main feature however is that a virtual thread doesn't block its current carrier thread on blocking operations, like IO (Sockets, NIO channels...) or blocking java.util.concurrent API calls (Semaphores, Queues, Locks, Future.get()...) and even Thread.sleep(). Instead of blocking, the carrier will mount and resume a different virtual thread while the blocked virtual thread is waiting for a resource or an event to occur. Once the virtual thread is no longer blocked, it simply resumes execution on the next available carrier thread.

This should allow more efficient use of the CPU and additionally reduce the total number of POTs, since a thread running on a core which would normally be idle while waiting for a resource, can now work on something else, by replacing a blocked virtual thread with a another which isn't blocked.

some properties:

  • virtual threads are java entities, independent of OS threads
  • java.lang.Thread is used for both kinds of threads, virtual and OS
  • all virtual threads are daemons
  • spawning and blocking virtual threads is cheap
  • virtual threads require carrier threads to run on
    • a carrier thread runs a virtual thread by mounting it
    • if the VT blocks, the stack is stored and the VT is unmounted to be resumed later
  • j.u.c.Executor like a ForkJoinPool or ThreadPoolExecutor is used to schedule VTs to carriers
    • custom schedulers can be provided by implementing the Executor interface
  • millions of virtual threads can run on few carrier threads
  • Continuations (basically a VT without a scheduler) won't be in the initial release but might appear later
  • serialization is planned but currently low priority

edge cases:

  • ThreadLocals must be used with care but will still work
    • Thread.Builder#disallowThreadLocals() can be used to prohibit it entirely
    • better solutions like Scopes, Carrier- or ProcessorLocals might be implemented in future
  • some situations will cause pinning which will block the carrier if the virtual thread blocks while pinned
    • native stack frames will cause pinning
    • blocking a VT while holding a monitor (i.e. synchronized block) will currently block the carrier
      • this might be only a temporary limitation
      • doesn't apply to alternatives like j.u.c.ReentrantLock which can be used instead
    • -Djdk.tracePinnedThreads=short or -Djdk.tracePinnedThreads=full will log pinned threads

for more info: [State of Loom 1, 2] [Loom Proposal] [loom-dev mailing list] [ea javadoc]

a quick test

Since Loom is implemented as a preview feature, the flag --enable-preview has to be passed to both javac and also to the JVM at launch. This will load the preview module and tell the JVM that it is ok to run bytecode which has been compiled with preview features. This should reduce the risk of it accidentally landing on production machines via a maven repository :).


    public static void main(String[] args) {
        Thread.startVirtualThread(() -> {
            System.out.println("Hello Loom from "+Thread.currentThread()+"!");
        });
    }
output:
Hello Loom from VirtualThread[<unnamed>,ForkJoinPool-1-worker-3,CarrierThreads]!

The code above attaches a Runnable via Thread.startVirtualThread(Runnable task) to a new virtual thread and schedules it for execution on the global carrier thread pool. The output shows that the carrier thread pool in use is in fact a j.u.c.ForkJoinPool which has a work-stealing scheduler. The size of the global carrier pool can be set with the flag -Djdk.defaultScheduler.parallelism=N, the default is set to the available core count (or hardware thread count or whatever the container is configured to return).

a better test

The following example can run tasks on either a classic fixed size thread pool (with POTs) or on a unbounded virtual thread pool. The virtual thread pool attaches each task to a new virtual thread and uses the fixed size thread pool as carrier pool. The tasks consist of a simulated IO part and a computational part, the carrier thread count and the number of tasks can be adjusted.

This is no benchmark or load test, but rather an attempt to demonstrate the differences between the two thread types.



    public static void main(String[] args) throws InterruptedException {
       
        final boolean USE_VIRTUAL_THREADS = true;
        final int CARRIER_THREAD_COUNT = 1;
        final int TASK_COUNT = 2;
        
        // plain old thread factory and thread pool using the new builder
        ThreadFactory carrierTF = Thread.builder().name("carrier#", 0)
                                                  .daemon(true).factory();
        ExecutorService carrierPool = Executors.newFixedThreadPool(
                                          CARRIER_THREAD_COUNT, carrierTF);
        
        ExecutorService executor;
        
        if(USE_VIRTUAL_THREADS) {

            // factory for virtual threads scheduled on the carrier pool
            ThreadFactory virtualTF = Thread.builder()
                    .virtual(carrierPool)
                    .name("virtual#", 0).factory();

            // thread executor will spawn a new virtual thread for each task
            executor = Executors.newThreadExecutor(virtualTF);
            
        }else{
            executor = carrierPool;
        }
        
        for (int i = 0; i < TASK_COUNT; i++)
            executor.submit(new WaitAndHurry());
        
        executor.shutdown();
        executor.awaitTermination(20, TimeUnit.SECONDS); // virtual threads are daemons
        
    }

The task itself is less interesting:


    
    private final static class WaitAndHurry implements Runnable {

        private final static long START_TIME = System.nanoTime();
        
        @Override
        public void run() {
            doIO();    // block for 2s
            doWork();  // compute something for ~2s
            print("done");
        }

        private void doIO() {
            print("io");
            try {
                Thread.sleep(2000);
            } catch (InterruptedException ex) {
                throw new RuntimeException(ex);
            }
        }

        private void doWork() {
            print("work");
            long number = 479001599; 
            boolean prime = true;
            for(long i = 2; i <= number/2; ++i) {
                if(number % i == 0)  {
                    prime = false;
                    break;
                }
            }
            if (!prime) {throw new RuntimeException("wrong result");} // to prevent the JIT to optimize everything away
        }

        private void print(String msg) {
            double elapsed = (System.nanoTime()-START_TIME)/1_000_000_000.0d;
            String timestamp = String.format("%.2fs", elapsed);
            System.out.println(timestamp + " " + Thread.currentThread() + " " + msg);
        }
        
    }

output for 1 carrier thread and 2 tasks attached to virtual threads:

0.00s VirtualThread[virtual#0,carrier#0,main] io
0.01s VirtualThread[virtual#1,carrier#0,main] io
2.03s VirtualThread[virtual#0,carrier#0,main] work
3.88s VirtualThread[virtual#0,carrier#0,main] done
3.88s VirtualThread[virtual#1,carrier#0,main] work
5.67s VirtualThread[virtual#1,carrier#0,main] done
Knowing that the IO part of the task takes 2s and the computational part about 1.8s (on my system without warmup) we can put it into a chart by looking at the timestamps:
  VT0: |WAIT||WORK|
  VT1: |WAIT|      |WORK|
If we view the carrier thread as a resource we can draw a less abstract version which is closer to reality:
  CT0: |IDLE||WORK||WORK|
  VT0: |WAIT|     .
  VT1: |WAIT|           .

This shows that virtual threads already have ability to wait in parallel, even when run on just a single carrier thread. The carrier thread is also the only entity which is able to do work since it can only mount one virtual thread at a time.

Rule of thumb: virtual threads are concurrent waiters while real threads are concurrent workers.

Classic thread pool for reference using a single thread:
0.00s Thread[carrier#0,5,main] io
2.02s Thread[carrier#0,5,main] work
3.84s Thread[carrier#0,5,main] done
3.84s Thread[carrier#0,5,main] io
5.84s Thread[carrier#0,5,main] work
7.67s Thread[carrier#0,5,main] done

  CT0: |WAIT||WORK||WAIT||WORK|

Sequential as expected.


lets bump it to 2 carrier threads and 4 tasks:

0.02s VirtualThread[virtual#0,carrier#0,main] io
0.03s VirtualThread[virtual#2,carrier#0,main] io
0.03s VirtualThread[virtual#3,carrier#0,main] io
0.02s VirtualThread[virtual#1,carrier#1,main] io
2.03s VirtualThread[virtual#0,carrier#0,main] work
2.04s VirtualThread[virtual#2,carrier#1,main] work
3.85s VirtualThread[virtual#2,carrier#1,main] done
3.85s VirtualThread[virtual#3,carrier#1,main] work
3.86s VirtualThread[virtual#0,carrier#0,main] done
3.86s VirtualThread[virtual#1,carrier#0,main] work
5.63s VirtualThread[virtual#3,carrier#1,main] done
5.69s VirtualThread[virtual#1,carrier#0,main] done

  VT0: |WAIT||WORK|
  VT1: |WAIT||WORK|
  VT2: |WAIT|      |WORK|
  VT3: |WAIT|      |WORK|
        
  CT0: |IDLE||WORK||WORK|
  CT1: |IDLE||WORK||WORK|
  VT0: |WAIT|     .
  VT1: |WAIT|     .
  VT2: |WAIT|           .
  VT3: |WAIT|           .

Now we gained the ability to work in parallel using two threads while using all virtual threads to wait in parallel - best of both worlds.

Classic thread pool for reference using 2 threads:

0.00s Thread[carrier#1,5,main] io
0.00s Thread[carrier#0,5,main] io
2.03s Thread[carrier#1,5,main] work
2.03s Thread[carrier#0,5,main] work
3.87s Thread[carrier#0,5,main] done
3.87s Thread[carrier#0,5,main] io
3.88s Thread[carrier#1,5,main] done
3.88s Thread[carrier#1,5,main] io
5.87s Thread[carrier#0,5,main] work
5.88s Thread[carrier#1,5,main] work
7.67s Thread[carrier#0,5,main] done
7.70s Thread[carrier#1,5,main] done

  CT0: |WAIT||WORK||WAIT||WORK|
  CT1: |WAIT||WORK||WAIT||WORK|

No surprises.


real threads in a virtual world

Virtual threads implicitly convert blocking APIs into a async/await pattern - and you won't even have to be aware of it as user of an API (most of the time at least). Entire callback based frameworks (buzzword: reactive) could be made obsolete, since their main purpose has always been to avoid that programmers have to deal with any kind of concurrency problems, often even accepting that nothing can run in parallel in them (only parallel waiting is happening behind the scenes, basically like in python, or virtual threads on a single carrier in our example). Even Node.js received basic worker_threads in v12 using a language which is single threaded by design (data is copied to the worker when it starts and copied back again in a callback when the job is done).

Java on the other hand was multi threaded since the beginning (25 years ago; time flies) and is only now getting virtual threads (if you don't count green threads of Java 1.1). Since virtual threads are using the same java.lang.Thread class as the OS threads do, they are pretty much interchangeable with each other and can keep using established APIs. Asynchronous IO APIs are hopefully going to be used less often in future, because code which does async IO now, can be made less error prone and easier to read by using simple blocking IO APIs from within virtual threads.

Plain old java threads will most likely still have a purpose (beside being a carrier) in future however: Not every long running background task which is periodically reading a few bytes from a file will benefit from being virtual and limitations like pinning due to native stack frames in virtual threads which also block the carrier, will probably always require some additional POTs for special cases.


summary

Project Loom made significant progress recently and is already in a fairly usable state. I am looking forward to it being integrated into the main repository (hopefully JDK 16 or 17). Virtual Threads have the potential to be a big game changer for Java: better concurrency while using fewer OS threads without significant code changes - what more to ask.

Debugging a few million virtual threads is going to be interesting, thread dumps of the future will require a bit more tooling, e.g. hierarchical views etc or at least a good scroll wheel on the mouse :)




Comments:

Post a Comment:
  • HTML Syntax: NOT allowed