// CRaC - Coordinated Restore at Checkpoint

Last year I experimented a little bit with the instant restoration of started and warmed up Java programs from disk, beside a couple of other potential use cases for checkpoints. To achieve this, I accessed a rootless build of CRIU directly from Java via its C/RPC-API (using Panama as binding layer). Although it worked surprisingly well, it quickly became clear that a proper implementation would require help from the JVM on a lower level and also an API to coordinate checkpoint/restore events between libraries.

I was pleased to see that there is a decent chance this might actually happen, since a new project with the name CRaC is currently in the voting stage to be officially started as OpenJDK sub-project. Lets take a look at the prototype.

update: CRaC has been approved (OpenJDK project, github).

With a little Help from the JVM

Why would checkpoint/restore benefit from JVM and OpenJDK support? Several reasons. CRIU does not like it when files change between C/R, a simple log file might spoil the fun if a JVM is restored, shut down and then restored again (which will fail). A JVM is also in an excellent position to run heap cleanup and compaction prior to calling CRIU to dump the process to disk. Checkpointing could be also done after driving the JVM into a safe point and making sure that everything stopped.

The CRaC prototype covers all of that already and more:

  • CheckpointException is thrown if files or sockets are open at a checkpoint
  • a simple API allows coordination with C/R events
  • Heap is cleaned, compacted and the checkpoint is made when the JVM reached a safe point
  • CRaC handles some JVM produced files automatically (no need to set -XX:-UsePerfData for example)
  • The jcmd tool can be used to checkpoint a JVM from a shell
  • CRIU is bundled in the JDK as a bonus - no need to have it installed

Since CRaC would be potentially part of OpenJDK one day, it could manage the files of JFR repositories automatically, and help with other tasks like the re-seeding SecureRandom instances or updating SSL certificates in future, which would be difficult (or impossible) to achieve as a third party library.

Coordinated Restore at Checkpoint

The API is very simple and somewhat similar to what I wrote for JCRIU, the main difference is that the current implementation does not allow the JVM to continue running after a checkpoint is created (But I don't see why this can't change in future).


Core.checkpointRestore();

serves currently both as checkpoint and program exit. It is also at the same time the entry point for a restore.


Core.getGlobalContext().register(resource);

A global context is used to register resources which will be notified before a checkpoint is created and in reverse order after the process is restored.

Minimal Example

Lets say we have a class CRACTest which can write Strings to a file (like a logger). To coordinate with C/Rs, it would need to close the file before checkpoint and reopen it after restore.


public class CRACTest implements Resource, AutoCloseable {

    private OutputStreamWriter writer;

    public CRACTest() {
        writer = newWriter();
        Core.getGlobalContext().register(this); // register as resource
    }
...
...
    @Override
    public void beforeCheckpoint(Context<? extends Resource> context) throws Exception {
        System.out.println("resource pre-checkpoint");
        writer.close();
        writer = null;
    }

    @Override 
    public void afterRestore(Context<? extends Resource> context) throws Exception {
        System.out.println("resource post-restore");
        writer = newWriter();
    }
    
    public static void main(String[] args) throws IOException {
        System.out.println(Runtime.version());
        
        try (CRACTest writer = new CRACTest()) {
            writer.append("hello");
            try {
                System.out.println("pre-checkpoint PID: "+ProcessHandle.current().pid());
                Core.checkpointRestore();   // exit and restore point
                System.out.println("post-restore PID: "+ProcessHandle.current().pid());
            } catch (CheckpointException | RestoreException ex) {
                throw new RuntimeException("C/R failed", ex);
            }
            writer.append(" there!\n");
        }
    }
}

start + checkpoint + exit:


$CRaC/bin/java -XX:CRaCCheckpointTo=/tmp/cp -cp target/CRACTest-0.1-SNAPSHOT.jar dev.mbien.CRACTest
14-crac+0-adhoc..crac-jdk
pre-checkpoint PID: 12119
resource pre-checkpoint

restore at checkpoint:


$CRaC/bin/java -XX:CRaCRestoreFrom=/tmp/cp -cp target/CRACTest-0.1-SNAPSHOT.jar dev.mbien.CRACTest
resource post-restore
post-restore PID: 12119

lets see what we wrote to the file:


cat /tmp/test/CRACTest/out.txt
hello there!

restore 3 more times as a test:


./restore.sh
resource post-restore
post-restore PID: 12119
./restore.sh
resource post-restore
post-restore PID: 12119
./restore.sh
resource post-restore
post-restore PID: 12119

cat /tmp/test/CRACTest/out.txt
hello there!
 there!
 there!
 there!

works as expected.

What happens when we leave an io stream open? Lets remove writer.close() from beforeCheckpoint() and attempt to run a fresh instance.


./run.sh
14-crac+0-adhoc..crac-jdk
pre-checkpoint PID: 12431
resource pre-checkpoint
resource post-restore
Exception in thread "main" java.lang.RuntimeException: C/R failed
	at dev.mbien.cractest.CRACTest.main(CRACTest.java:72)
Caused by: jdk.crac.CheckpointException
	at java.base/jdk.crac.Core.checkpointRestore1(Core.java:134)
	at java.base/jdk.crac.Core.checkpointRestore(Core.java:177)
	at dev.mbien.cractest.CRACTest.main(CRACTest.java:69)
	Suppressed: jdk.crac.impl.CheckpointOpenFileException: /tmp/test/CRACTest/out.txt
		at java.base/jdk.crac.Core.translateJVMExceptions(Core.java:76)
		at java.base/jdk.crac.Core.checkpointRestore1(Core.java:137)
		... 2 more

The JVM will detect and tell us which files are still open before a checkpoint is attempted. In this case no checkpoint is made and the JVM continues. By adding this restriction, CRaC avoids a big list of potential restore failures.

Tool Integration

Checkpoints can be also triggered externally by using the jcmd tool.


jcmd 15119 JDK.checkpoint
15119:
Command executed successfully

Context and Resources

The Context itself implements Resource. This allows hierarchies of custom contexts to be registered to the global context. Since the context of a resource is passed to the beforeCheckpoint and afterRestore methods, it can be used to carry information to assist in C/R of specific resources.

Performance

As demonstrated with JCRIU, restoring initialized and warmed up Java applications can be really fast - CRaC however can be even faster due to the fact that the process image is much more compact. The average time to restore the JVM running this blog from a checkpoint using JCRIU was ~200 ms, while CRaC can restore JVMs in ~50 ms. Although this will depend on the size of the process image and IO read speed.

Potential use-cases beside instant restore

CRaC seems to be concentrating mainly on the use-case of restoring a started and warmed up JVM as fast as possible. This makes of course sense, since why would someone start a JVM in a container, on-demand, when it could have been already started when the container image was built? The purpose of the container is most likely to run business logic, not to start programs.

However, if CRaC would allow programs to continue running after a checkpoint, it would open up many other possibilities. For example:

  • time traveling debuggers, stepping backwards to past breakpoints (checkpoints)
  • snapshotting of a production JVM to restore and test/inspect it locally, do heap dumps etc
  • maybe some niche use-cases of periodic checkpoints and automatic restoration on failure (incremental dumps)
  • instantly starting IDEs (although this won't be a small task)

in any case... exciting times :)

Thanks to Anton Kozlov from Azul for immediately fixing a bug I encountered during testing.


- - - sidenotes - - -

jdk14-crac/lib/criu and jdk14-crac/lib/action-script might require cap_sys_ptrace to be set on some systems to not fail during restore.

The rootless mode for CRIU hasn't made it yet into the master branch which means that the JVM or criu has to be run with root privileges for now.

C/R of UI doesn't work at all, since disposing a window will still leave some cached resources behind (opened sockets, file descriptors etc) - but this is another aspect which could be only solved on the JDK level (although this won't be trivial).


// Custom Java Runtimes with jlink [and jdeps for classpath applications]

The jlink command line tool can be used to create custom java runtimes, which only include the functionality required by the (modular) java application. However, what if the application isn't modular and still uses the classpath? In this case an extra step is needed to determine which JDK modules are required by the application before jlink can be used.

classic classpaths: finding module dependencies with jdeps

jdeps is excellent for porting classic classpath based applications to java modules. It analyzes jars and list all their dependencies, which can be other jars, or modules, with package granularity. Although we don't want to port the dusty application to the module system for this blog post, listing all the module dependencies is exactly what we need for jlink, to be able to create a custom java runtime.

jdeps produces the following output:


# foo.jar depends on bar.jar
foo.jar -> bar.jar
...
# or foo.jar depends on a module
foo.jar -> module.name
...

Since the tool is intended to assist with porting applications to java modules, the default output will be fairly detailed down to package dependencies. The summary (-s) omits all that and only lists jars or modules.

All we have to do is to go recursively through all jars and remember the module names they depend on.


# -s omit detailed package dependencies
# -R analyze recursively through all found dependencies
# --multi-release 16 for the case that there are multi release jars involved
$JDK/bin/jdeps -s -R --multi-release 16 --class-path 'lib/*' dusty-application.jar
jakarta.persistence-2.2.3.jar -> java.base
jakarta.persistence-2.2.3.jar -> java.instrument
jakarta.persistence-2.2.3.jar -> java.logging
jakarta.persistence-2.2.3.jar -> java.sql
foo.jar -> bar.jar
...

Some greping and deduplication and we have a short list of JDK modules our application depends on.


$JDK/bin/jdeps -s -R --multi-release 16 --class-path 'lib/*' dusty-application.jar\
 | grep -Ev '\.jar$' | cut -d " " -f 3 | sort -u
java.base
java.desktop
java.instrument
java.logging
java.naming
java.net.http
java.security.jgss
java.sql
java.xml
jdk.unsupported

Thats it? Not quite. Analyzing an application like that won't show dependencies which are caused via reflection. So you will have to take a good look at the resulting modules and probably add some manually. A good candidate are jdk.crypto.* modules. jlink can assist with that task too by listing service providers.


$JDK/bin/jlink --suggest-providers java.security.Provider
Suggested providers:
  java.naming provides java.security.Provider used by java.base
  java.security.jgss provides java.security.Provider used by java.base
  jdk.crypto.ec provides java.security.Provider used by java.base
...

You might also want to add modules like jdk.jfr, java.management or jdk.localedata even when the application isn't direclty depending on them. You can experiment with options like --compile-time which will usually list more dependencies (default is runtime analysis). jlink adds transitive dependencies automatically.

Any missing modules should be quickly noticed during integration tests.

custom runtimes with jlink

Once we have the module list we can give it to jlink for the actual heavy lifting.


MODS=...
JDK=/home/mbien/dev/java/jdk-16.0.1+9
DIST=custom-$(basename $JDK)

$JDK/bin/jlink -v --add-modules $MODS\
 --compress=2 --no-header-files --no-man-pages\
 --vendor-version="[mbien.dev pod REv1]"\
 --output $DIST

du -s $DIST

jlink is automatically using the modules of the JDK which contains the tool, which means that the example above will create a runtime based on jdk-16.0.1+9. The flag --module-path would set a path to a alternative module folder. If the application is already modular, the path could also include the application modules, if they should be part of the runtime too.

some noteworthy flags:

  • --strip-debug this is going to strip debug symbols from both the native binaries and bytecode, you probably don't want to use this since this will remove all line numbers from stack traces. Its likely that the binaries of the JDK distribution you are using have most of their symbols already stripped.
  • --strip-native-debug-symbols=objcopy=/usr/bin/objcopy Same as above, but only for native binaries
  • --compress=0|1|2 0 for no compression, 1 for string deduplication, 2 for zip compressed modules. This might influence startup time slightly; see CDS section below
  • --include-locales=langtag[,langtag]* include only a subset of locales instead of the full module
  • --vendor-version="i made this" this looks uninteresting at first glance but it is very useful if you want to recognize your custom runtime again once you have multiple variants in containers. Adding domain name/project name or purpose of the base image helps.
    It will appear on the second line of the output of java -version

full JDK as baseline


MODS=ALL-MODULE-PATH

# --compress=1
138372 (151812 with CDS)

# --compress=2
102988 (116428 with CDS)

# --compress=2 --strip-debug
90848 (102904 with CDS)

custom runtime example


MODS=java.base,java.instrument,java.logging,java.naming,java.net.http,\
java.security.jgss,java.sql,java.xml,jdk.jfr,jdk.unsupported,java.rmi,\
java.management,java.datatransfer,java.transaction.xa,\
jdk.crypto.cryptoki,jdk.crypto.ec

# --compress=1
55996 (69036 with CDS)

# --compress=2
45304 (58344 with CDS)

# --compress=2 --strip-debug
40592 (52240 with CDS)

(this a aarch64 build of OpenJDK, x64 binaries are slightly larger)

Most modules are actually fairly small, the 5 largest modules are java.base, java.desktop, jdk.localedata, jdk.compiler and jdk.internal.vm.compiler. Since java.base is mandatory anyway, adding more modules won't significantly influence the runtime size unless you can't avoid some of the big ones.

Once you are happy with the custom runtime you should add it to your test environment of the project and IDE.

CDS - to share or not to share?

I wrote about class data sharing before so I keep this short. A CDS archive is a file which is mapped into memory by the JVM on startup and is shared between JVM instances. This even works for co-located containers, sharing the same image layer which includes the CDS archive.

Although it adds to the image size, zip compression + CDS seems to be always smaller than uncompressed without CDS. The CDS file should also eliminate the need to decompress modules during startup since it should contain the most important classes already. So the decision seems to be made easy: compact size + improved startup time and potential (small) memory footprint savings as bonus.

Leaving the CDS out frees up ~10 MB of image size. If this matters to your project, benchmark it to see if it makes a difference. It is also possible to put application classes into the shared archive or create a separate archive for the application which extends the runtime archive (dynamic class data sharing). Or go a step further and bundle the application and runtime in a single, AOT compiled, compact, native image with GraalVM (although this might reduce peak throughput due to lack of JIT and have a smaller choice of GCs beside other restrictions) - but this probably won't happen for dusty applications.


# create CDS archive for the custom runtime
$DIST/bin/java -Xshare:dump

# check if it worked, this will fail if it can't map the archive
$DIST/bin/java -Xshare:on -version

# list all modules included in the custom java runtime
$DIST/bin/java --list-modules

summary

Only a single extra step is needed to determine most of the dependencies of an application, even if it hasn't been ported to java modules yet. Maintaining a module list won't be difficult since it should be fairly static (backend services won't suddenly start using swing packages when they are updated). Make sure that the custom runtime is used in your automated tests and IDE.

Stop using java 8, time to move on - even without a modular application :)


- - - sidenotes - - -

If you want to create a runtime which can compile and run single-file-sourcode-programs adding just jdk.compiler isn't enough. This will result in a a little bit misleading "IllegalArgumentException: error: release version 16 not supported" exception. Solution is to add jdk.zipfs too - I haven't investigated it any further.

If jlink has to be run from within a container (can be useful for building for foreign archs, e.g aarch64 on x64), you might have to change the process fork mechanism if you run into trouble (java.io.IOException: Cannot run program "objcopy": error=0, Failed to exec spawn helper: pid: 934, exit value: 1). (export JAVA_TOOL_OPTIONS="-Djdk.lang.Process.launchMechanism=vfork" worked for me)


// Configuring Eclipse Jetty to use Virtual Threads

A quick guide about how to configure Jetty to use Project Loom's virtual threads instead of plain old java threads.

Jetty's default thread pool implementation can be swapped out by implementing Jetty's ThreadPool interface and passing an instance to the Server constructor. If you are using jetty stand alone, everything is initialized by xml files.

Assuming you are using the recommended jetty home / jetty base folder structure, all what is needed is to create jetty-threadpool.xml in [jetty-base]/etc containing the following:


<Configure>
<New id="threadPool" class="dev.mbien.virtualthreads4jetty.VirtualThreadExecutor"/>
</Configure>

and put a jar containing the custom VirtualThreadExecutor into [jetty-base]/lib/ext. I uploaded a build to the release section of the vt4jetty github project.

If you don't have an lib/ext folder yet you can enable it with:


java -jar $JETTY_HOME/start.jar --add-to-start=ext

here the code:


package dev.mbien.virtualthreads4jetty;

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import org.eclipse.jetty.util.thread.ThreadPool;

/**
 * Executes each task in a new virtual thread.
 * 
 * <p>Java's default ForkJoinPool is used as scheduler. To influence carrier
 * thread count use -Djdk.defaultScheduler.parallelism=N. Default is
 * {@link Runtime#availableProcessors()}.
 * 
 * @author mbien
 */
public class VirtualThreadExecutor implements ThreadPool {
    
    private final ExecutorService executor;

    public VirtualThreadExecutor() {
        executor = Executors.newThreadExecutor(
                Thread.builder().virtual().name("jetty-vt#", 0).factory());
        // too early for logging libs
        System.out.println("VirtualThreadExecutor is active.");
    }
    
    @Override
    public void execute(Runnable command) {
        executor.execute(command);
    }

    @Override
    public void join() throws InterruptedException {
        executor.shutdown();
        executor.awaitTermination(3, TimeUnit.SECONDS);
    }

    // those are hopefully only used for stats/dashboards etc
    @Override
    public int getThreads() { return -1; }

    @Override
    public int getIdleThreads() { return -1; }

    @Override
    public boolean isLowOnThreads() { return false; }
    
}

Tested with JDK16-loom+4-56 (2020/7/25) early access build from here and latest Jetty.

I encountered some JVM crashes while load testing Apache Roller with virtual threads enabled - keep in mind this is still all very much work in progress.


// [Java in] Rootless Containers with Podman

I have always been a little surprised how quickly it became acceptable to run applications wrapped in containers as root processes. Nobody would have run a web server as root before docker became mainstream if there was some way to avoid it. But with docker it became OK to have the docker daemon and the container processes all running as root. The first item in most docker tutorials became how to elevate your user rights so that you don't have to type sudo before every docker command.

But this doesn't have to be the case of course. One project I had an eye on was podman which is a container engine implementing the docker command line interface with quite good support for rootless operations. With the release of Podman 2.0.x (and the fact that it is slowly making it into the debian repositories) I started to experiment with it a bit more. (for the experimental rootless mode of Docker check out this page)

cgroups v2

Containers rely heavily on kernel namespaces and a feature called control groups. To properly run rootless containers the kernel must be supporting and running with cgroups v2 enabled. To check if cgroups v2 are enabled simply run:


ls /sys/fs/cgroup
cgroup.controllers  cgroup.max.depth  cgroup.max.descendants  cgroup.procs ...

If the files are prefixed with cgroup. you are running cgroups v2, if not, its still v1.

Many distributions will still run with cgroups v1 enabled by default for backwards compatibility. But you can enable cgroups v2 with the systemd kernel flag: systemd.unified_cgroup_hierarchy=1. To do this with grub for example:

  • edit /etc/default/grub and
  • add systemd.unified_cgroup_hierarchy=1 to the key GRUB_CMDLINE_LINUX_DEFAULT (space separated list)
  • then run sudo grub-mkconfig -o /boot/grub/grub.cfg and reboot.

... and make sure you are not running an ancient linux kernel.

crun

The underlying OCI implementation has to support cgroups v2 too. I tested mostly on crun which is a super fast and lightweight alternative to runc. The runtime can be passed to podman via the --runtime flag


podman --runtime /usr/bin/crun <commands>

but it got picked up automatically in my case after I installed the package (manjaro linux, runc is still installed too).


podman info | grep -A5 ociRuntime
  ociRuntime:
    name: crun
    package: Unknown
    path: /usr/bin/crun
    version: |-
      crun version 0.14.1

subordinate uid and gids

The last step required to set up rootless containers are /etc/subuid and /etc/subgid. If the files don't exist yet, create them and add a mapping range from your user name to container users. For example the line:

duke:100000:65536

Gives duke the right to create 65536 users in container images, starting from UID 100000. Duke himself will be mapped by default to root (0) in the container. (Same must be done for groups in subgid). The range should never overlap with UIDs on the host system. Details in man subuid. More on users later in the volumes section.

rootless containers

Some things to keep in mind:
  • rootless podman runs containers with less privileges than the user which started the container
    • some of these restrictions can be lifted (via --privileged, for example)
    • but rootless containers will never have more privileges than the user that launched them
    • root in the container is the user on the host
  • rootless containers have no IP or MAC address, because nw device association requires root privileges
    • podman uses slirp4netns for user mode networking
    • pinging something from within a container won't work out of the box - but don't panic: it can be configured if desired

podman

Podman uses the same command-line interface as Docker and it also understands Dockerfiles. So if everything is configured correctly it should all look familiar:

$ podman version
Version:      2.0.2
API Version:  1
Go Version:   go1.14.4
Git Commit:   201c9505b88f451ca877d29a73ed0f1836bb96c7
Built:        Sun Jul 12 22:46:58 2020
OS/Arch:      linux/amd64

$ podman pull debian:stable-slim
...
$ podman images
REPOSITORY                       TAG          IMAGE ID      CREATED       SIZE
docker.io/library/debian         stable-slim  56fae066253c  4 days ago    72.5 MB
...
$ podman run --rm debian:stable-slim cat /etc/debian_version
10.4

Setting alias docker=podman allows existing scripts to be reused without modification - but I stick here to podman to not cause confusion.

container communication

Rootless containers don't have their own IP addresses but you can bind them to ports (>1024). Host2container communication works therefore analog to communicating with any service you would have running on the host.


$ podman run --name wiki --rm -d -p 8443:8443 jspwiki-jetty
$ podman port -a
fd4c06b454ee	8443/tcp -> 0.0.0.0:8443
$ firefox https://localhost:8443/wiki

To setup quick and dirty container2container communication you can let them communicate over the IP address of the host (or host name) and ports, if the firewall is OK with that. But a better maintainable approach are pods. Pods are groups of containers which belong together. It is basically a infrastructure container, containing the actual containers. All containers in a pod share the same localhost and use it for pod-local communication. The outside world is reached via opened ports on the pod.

Lets say we have a blog and a db. The blog requires the db but all the host cares about is the https port of the blog container. So we can simply put blog container and db container into a blog-pod and let both communicate via pod-local localhost (podhost?). The https port is opened on the blog-pod for the host while the db isn't reachable from the outside.


$ podman pod create --name blogpod -p 8443:8443

# note: a pod starts out with one container already in it,
# it is the infrastructure container - basically the pod itself
$ podman pod list
POD ID        NAME     STATUS   CREATED        # OF CONTAINERS  INFRA ID
39ad88b8892f  blogpod  Created  7 seconds ago  1                af7baf0e7fde

$ podman run --pod blogpod --name blogdb --rm -d blog-db
$ podman run --pod blogpod --name apacheroller --rm -d roller-jetty

$ podman pod list
POD ID        NAME     STATUS   CREATED         # OF CONTAINERS  INFRA ID
39ad88b8892f  blogpod  Created  30 seconds ago  3                af7baf0e7fde

$ firefox https://localhost:8443/blog

Now we already have two containers able to communicate with each other and a host which is able to communicate with a container in the pod - and no sudo in sight.

volumes and users

We already know that the user on the outside is root on the inside, but lets quickly check it just to be sure:


$ whoami
duke
$ id -u
1000
$ mkdir /tmp/outside
$ podman run --rm -it -v /tmp/outside:/home/inside debian:stable-slim bash
root@2fbc9edaa0ee:/$ id -u
0
root@2fbc9edaa0ee:/$ touch /home/inside/hello_from_inside && exit
$ ls -l /tmp/outside
-rw-r--r-- 1 duke duke 0 31. Jul 06:55 hello_from_inside

Indeed, duke's UID of 1000 was mapped to 0 on the inside.

Since we are using rootless containers and not half-rootless containers we can let the blog and the db within their containers run in their own user namespaces too, what if they write logs to mounted volumes? That is when the subuid and subgid files come into play we configured earlier.

Lets say the blog-db container process should run in the namespace of the user dbduke. Since dbduke doesn't have root rights on the inside (as intended), dbduke won't also have rights to write to the mounted volume which is owned by duke. The easiest way to solve this problem is to simply chown the volume folder on the host to the mapped user of the container.


# scripts starts blog-db
# query the UID from the container and chown the volumes folder
UID_INSIDE=$(podman run --name UID_probe --rm blog-db /usr/bin/id -u)
podman unshare chown -R $UID_INSIDE /path/to/volume

podman run -v /path/to/volume:/path/inside ...  blog-db

Podman ships with a tool called unshare (the name is going to make less sense the longer you think about it) which lets you execute commands in the namespace of a different user. The command podman unshare allows to use the rights of duke to chown a folder to the internal UID of dbduke.

If we would check the folder rights from both sides, we would see that the UID was mapped from:


podman run --name UID_probe --rm blog-db /usr/bin/id -u
998
to

$ ls -l volume/
drwxr-xr-x 2 100997 100997 4096 31. Jul 07:54 logs
on the outside which is within the range specified in the /etc/subuid file - so everything works as intended. This allows user namespace isolation between containers (dbduke, wikiduke etc) and also between containers and the user on the host who launched the containers (duke himself).

And still no sudo in sight.

memory and cpu limits [and java]

Memory limits should work out of the box in rootless containers

$ podman run -it --rm -m 256m blog-db java -Xlog:os+container -version
[0.003s][info][os,container] Memory Limit is: 268435456
...

This allows the JVM to make smarter choices without having to provide absolute -Xmx flags (but you still can).

Setting CPU limits might not work out of the box without root (tested on Manjaro which is basically Arch), since the cgroups config might have user delegation turned off by default. But it is very easy to change:


# assuming your user id is 1000 like duke
$ sudo systemctl edit user@1000.service

# now modify the file so that it contains
[Service]
Delegate=yes

# and check if it worked
$ cat /sys/fs/cgroup/user.slice/user-1000.slice/cgroup.controllers
cpuset cpu io memory pids

You might have to reboot - it worked right away in my case.


# default CPU settings uses all cores
$ podman run -it --rm blog-db sh -c\
 "echo 'Runtime.getRuntime().availableProcessors();/exit' | jshell -q"
jshell> Runtime.getRuntime().availableProcessors()$1 ==> 4

# assign specific cores to container
$ podman run -it --rm --cpuset-cpus 1,2 blog-db sh -c\
 "echo 'Runtime.getRuntime().availableProcessors();/exit' | jshell -q"
jshell> Runtime.getRuntime().availableProcessors()$1 ==> 2

Container CPU core limits should become less relevant in the java world going forward, especially once projects like Loom [blog post] have been integrated. Since most things in java will run on virtual threads on top of a static carrier thread pool, it will be really easy to restrict the parallelism level of a JVM (basically -Djdk.defaultScheduler.parallelism=N and maybe another flag to limit max GC thread count).

But it works if you need it for rootless containers too.

class data sharing

Podman uses fuse-overlayfs for image management by default, which is overlayfs running in user mode.

$ podman info | grep -A5 overlay.mount_program
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: Unknown
      Version: |-
        fusermount3 version: 3.9.2
        fuse-overlayfs: version 1.1.0

This means that JVM class data sharing is also supported out of the box if the image containing the class data archive is shared in the image graph between multiple rootless containers.

The class data stored in debian-slim-jdk (a local image I created) will be mapped only once into memory and shared between all child containers. Which are in the example below blog-db, roller-jetty and wiki-jetty.


$ podman image tree --whatrequires debian-slim-jdk
Image ID: 57c885825969
Tags:     [localhost/debian-slim-jdk:latest]
Size:     340.1MB
Image Layers
└── ID: ab5467b188d7 Size: 267.6MB Top Layer of: [localhost/debian-slim-jdk:latest]
 ├── ID: fa04d6485aa5 Size:  7.68kB
 │   └── ID: 649de1f63ecc Size: 11.53MB Top Layer of: [localhost/debian-slim-jdk-jetty:latest]
 │    ├── ID: 34ce3917399d Size: 8.192kB
 │    │   └── ID: d128b826459d Size:  56.7MB Top Layer of: [localhost/roller-jetty:latest]
 │    ├── ID: 9a9c51927e42 Size: 8.192kB
 │    └── ID: d5c7176c1073 Size: 27.56MB Top Layer of: [localhost/wiki-jetty:latest]
 ├── ID: 06acb45dd590 Size:  7.68kB
 └── ID: 529ff7e97882 Size: 1.789MB Top Layer of: [localhost/blog-db:latest]

stopping java containers cleanly

A little tip in the end: In my experience JVMs are quite often launched from scripts in containers. This has automatically the side effect that the JVM process won't be PID 1, since it isn't the entry point. Commands like podman stop <container> will post SIGKILL to the shell script (!) wait 10s then simply kill the container process without the JVM ever knowing what was going on. More on that in the stopping containers blog entry.

Shutdown actions like dumping the JFR file won't be executed and IO writes might not have completed. So unless you trap the signal in the script and send it to the JVM somehow, there are more direct ways to stop a java container:


# script to cleanly shut down a container
$ podman exec -it $(container) sh -c\
 "kill \$(/jdk/bin/jps | grep -v Jps | cut -f1 -d' ')"

#... or if killall is installed (it usually isn't)
$ podman exec -it $(container) killall java

Once the JVM has cleanly shut down the launch script will finish which will be also noticed by the container once PID 1 is gone and it will cleanly shutdown too.

- - -

fly safe podman


// Taking a look at Virtual Threads (Project Loom)

Project Loom introduces lightweight, JVM managed, virtual threads (old name: fibers) to java. Lets take a look how the project is progressing and see how they compare to plain old OS managed threads.

Loom is currently a separate project based on JDK 15, but since there is no JEP available yet and the deadline is approaching, it is unlikely that it will land in the main repository as preview feature before JDK 16. Early access binaries are available from jdk.java.net/loom. I used Build 15-loom+7-141 (2020/5/11) for the experiments in this blog entry - be warned the API is not final.

virtual threads

Virtual threads are lightweight threads scheduled by the java runtime to run on plain old java threads ("real" threads). The threads used to run virtual threads are called carrier threads. While plain old java threads (POTs) can be fairly heavyweight due to the fact that they represent OS threads, millions of virtual threads can be spawned without causing problems.

The main feature however is that a virtual thread doesn't block its current carrier thread on blocking operations, like IO (Sockets, NIO channels...) or blocking java.util.concurrent API calls (Semaphores, Queues, Locks, Future.get()...) and even Thread.sleep(). Instead of blocking, the carrier will mount and resume a different virtual thread while the blocked virtual thread is waiting for a resource or an event to occur. Once the virtual thread is no longer blocked, it simply resumes execution on the next available carrier thread.

This should allow more efficient use of the CPU and additionally reduce the total number of POTs, since a thread running on a core which would normally be idle while waiting for a resource, can now work on something else, by replacing a blocked virtual thread with a another which isn't blocked.

some properties:

  • virtual threads are java entities, independent of OS threads
  • java.lang.Thread is used for both kinds of threads, virtual and OS
  • all virtual threads are daemons
  • spawning and blocking virtual threads is cheap
  • virtual threads require carrier threads to run on
    • a carrier thread runs a virtual thread by mounting it
    • if the VT blocks, the stack is stored and the VT is unmounted to be resumed later
  • j.u.c.Executor like a ForkJoinPool or ThreadPoolExecutor is used to schedule VTs to carriers
    • custom schedulers can be provided by implementing the Executor interface
  • millions of virtual threads can run on few carrier threads
  • Continuations (basically a VT without a scheduler) won't be in the initial release but might appear later
  • serialization is planned but currently low priority

edge cases:

  • ThreadLocals must be used with care but will still work
    • Thread.Builder#disallowThreadLocals() can be used to prohibit it entirely
    • better solutions like Scopes, Carrier- or ProcessorLocals might be implemented in future
  • some situations will cause pinning which will block the carrier if the virtual thread blocks while pinned
    • native stack frames will cause pinning
    • blocking a VT while holding a monitor (i.e. synchronized block) will currently block the carrier
      • this might be only a temporary limitation
      • doesn't apply to alternatives like j.u.c.ReentrantLock which can be used instead
    • -Djdk.tracePinnedThreads=short or -Djdk.tracePinnedThreads=full will log pinned threads

for more info: [State of Loom 1, 2] [Loom Proposal] [loom-dev mailing list] [ea javadoc]

a quick test

Since Loom is implemented as a preview feature, the flag --enable-preview has to be passed to both javac and also to the JVM at launch. This will load the preview module and tell the JVM that it is ok to run bytecode which has been compiled with preview features. This should reduce the risk of it accidentally landing on production machines via a maven repository :).


    public static void main(String[] args) {
        Thread.startVirtualThread(() -> {
            System.out.println("Hello Loom from "+Thread.currentThread()+"!");
        });
    }
output:
Hello Loom from VirtualThread[<unnamed>,ForkJoinPool-1-worker-3,CarrierThreads]!

The code above attaches a Runnable via Thread.startVirtualThread(Runnable task) to a new virtual thread and schedules it for execution on the global carrier thread pool. The output shows that the carrier thread pool in use is in fact a j.u.c.ForkJoinPool which has a work-stealing scheduler. The size of the global carrier pool can be set with the flag -Djdk.defaultScheduler.parallelism=N, the default is set to the available core count (or hardware thread count or whatever the container is configured to return).

a better test

The following example can run tasks on either a classic fixed size thread pool (with POTs) or on a unbounded virtual thread pool. The virtual thread pool attaches each task to a new virtual thread and uses the fixed size thread pool as carrier pool. The tasks consist of a simulated IO part and a computational part, the carrier thread count and the number of tasks can be adjusted.

This is no benchmark or load test, but rather an attempt to demonstrate the differences between the two thread types.



    public static void main(String[] args) throws InterruptedException {
       
        final boolean USE_VIRTUAL_THREADS = true;
        final int CARRIER_THREAD_COUNT = 1;
        final int TASK_COUNT = 2;
        
        // plain old thread factory and thread pool using the new builder
        ThreadFactory carrierTF = Thread.builder().name("carrier#", 0)
                                                  .daemon(true).factory();
        ExecutorService carrierPool = Executors.newFixedThreadPool(
                                          CARRIER_THREAD_COUNT, carrierTF);
        
        ExecutorService executor;
        
        if(USE_VIRTUAL_THREADS) {

            // factory for virtual threads scheduled on the carrier pool
            ThreadFactory virtualTF = Thread.builder()
                    .virtual(carrierPool)
                    .name("virtual#", 0).factory();

            // thread executor will spawn a new virtual thread for each task
            executor = Executors.newThreadExecutor(virtualTF);
            
        }else{
            executor = carrierPool;
        }
        
        for (int i = 0; i < TASK_COUNT; i++)
            executor.submit(new WaitAndHurry());
        
        executor.shutdown();
        executor.awaitTermination(20, TimeUnit.SECONDS); // virtual threads are daemons
        
    }

The task itself is less interesting:


    
    private final static class WaitAndHurry implements Runnable {

        private final static long START_TIME = System.nanoTime();
        
        @Override
        public void run() {
            doIO();    // block for 2s
            doWork();  // compute something for ~2s
            print("done");
        }

        private void doIO() {
            print("io");
            try {
                Thread.sleep(2000);
            } catch (InterruptedException ex) {
                throw new RuntimeException(ex);
            }
        }

        private void doWork() {
            print("work");
            long number = 479001599; 
            boolean prime = true;
            for(long i = 2; i <= number/2; ++i) {
                if(number % i == 0)  {
                    prime = false;
                    break;
                }
            }
            if (!prime) {throw new RuntimeException("wrong result");} // to prevent the JIT to optimize everything away
        }

        private void print(String msg) {
            double elapsed = (System.nanoTime()-START_TIME)/1_000_000_000.0d;
            String timestamp = String.format("%.2fs", elapsed);
            System.out.println(timestamp + " " + Thread.currentThread() + " " + msg);
        }
        
    }

output for 1 carrier thread and 2 tasks attached to virtual threads:

0.00s VirtualThread[virtual#0,carrier#0,main] io
0.01s VirtualThread[virtual#1,carrier#0,main] io
2.03s VirtualThread[virtual#0,carrier#0,main] work
3.88s VirtualThread[virtual#0,carrier#0,main] done
3.88s VirtualThread[virtual#1,carrier#0,main] work
5.67s VirtualThread[virtual#1,carrier#0,main] done
Knowing that the IO part of the task takes 2s and the computational part about 1.8s (on my system without warmup) we can put it into a chart by looking at the timestamps:
  VT0: |WAIT||WORK|
  VT1: |WAIT|      |WORK|
If we view the carrier thread as a resource we can draw a less abstract version which is closer to reality:
  CT0: |IDLE||WORK||WORK|
  VT0: |WAIT|     .
  VT1: |WAIT|           .

This shows that virtual threads already have ability to wait in parallel, even when run on just a single carrier thread. The carrier thread is also the only entity which is able to do work since it can only mount one virtual thread at a time.

Rule of thumb: virtual threads are concurrent waiters while real threads are concurrent workers.

Classic thread pool for reference using a single thread:
0.00s Thread[carrier#0,5,main] io
2.02s Thread[carrier#0,5,main] work
3.84s Thread[carrier#0,5,main] done
3.84s Thread[carrier#0,5,main] io
5.84s Thread[carrier#0,5,main] work
7.67s Thread[carrier#0,5,main] done

  CT0: |WAIT||WORK||WAIT||WORK|

Sequential as expected.


lets bump it to 2 carrier threads and 4 tasks:

0.02s VirtualThread[virtual#0,carrier#0,main] io
0.03s VirtualThread[virtual#2,carrier#0,main] io
0.03s VirtualThread[virtual#3,carrier#0,main] io
0.02s VirtualThread[virtual#1,carrier#1,main] io
2.03s VirtualThread[virtual#0,carrier#0,main] work
2.04s VirtualThread[virtual#2,carrier#1,main] work
3.85s VirtualThread[virtual#2,carrier#1,main] done
3.85s VirtualThread[virtual#3,carrier#1,main] work
3.86s VirtualThread[virtual#0,carrier#0,main] done
3.86s VirtualThread[virtual#1,carrier#0,main] work
5.63s VirtualThread[virtual#3,carrier#1,main] done
5.69s VirtualThread[virtual#1,carrier#0,main] done

  VT0: |WAIT||WORK|
  VT1: |WAIT||WORK|
  VT2: |WAIT|      |WORK|
  VT3: |WAIT|      |WORK|
        
  CT0: |IDLE||WORK||WORK|
  CT1: |IDLE||WORK||WORK|
  VT0: |WAIT|     .
  VT1: |WAIT|     .
  VT2: |WAIT|           .
  VT3: |WAIT|           .

Now we gained the ability to work in parallel using two threads while using all virtual threads to wait in parallel - best of both worlds.

Classic thread pool for reference using 2 threads:

0.00s Thread[carrier#1,5,main] io
0.00s Thread[carrier#0,5,main] io
2.03s Thread[carrier#1,5,main] work
2.03s Thread[carrier#0,5,main] work
3.87s Thread[carrier#0,5,main] done
3.87s Thread[carrier#0,5,main] io
3.88s Thread[carrier#1,5,main] done
3.88s Thread[carrier#1,5,main] io
5.87s Thread[carrier#0,5,main] work
5.88s Thread[carrier#1,5,main] work
7.67s Thread[carrier#0,5,main] done
7.70s Thread[carrier#1,5,main] done

  CT0: |WAIT||WORK||WAIT||WORK|
  CT1: |WAIT||WORK||WAIT||WORK|

No surprises.


real threads in a virtual world

Virtual threads implicitly convert blocking APIs into a async/await pattern - and you won't even have to be aware of it as user of an API (most of the time at least). Entire callback based frameworks (buzzword: reactive) could be made obsolete, since their main purpose has always been to avoid that programmers have to deal with any kind of concurrency problems, often even accepting that nothing can run in parallel in them (only parallel waiting is happening behind the scenes, basically like in python, or virtual threads on a single carrier in our example). Even Node.js received basic worker_threads in v12 using a language which is single threaded by design (data is copied to the worker when it starts and copied back again in a callback when the job is done).

Java on the other hand was multi threaded since the beginning (25 years ago; time flies) and is only now getting virtual threads (if you don't count green threads of Java 1.1). Since virtual threads are using the same java.lang.Thread class as the OS threads do, they are pretty much interchangeable with each other and can keep using established APIs. Asynchronous IO APIs are hopefully going to be used less often in future, because code which does async IO now, can be made less error prone and easier to read by using simple blocking IO APIs from within virtual threads.

Plain old java threads will most likely still have a purpose (beside being a carrier) in future however: Not every long running background task which is periodically reading a few bytes from a file will benefit from being virtual and limitations like pinning due to native stack frames in virtual threads which also block the carrier, will probably always require some additional POTs for special cases.


summary

Project Loom made significant progress recently and is already in a fairly usable state. I am looking forward to it being integrated into the main repository (hopefully JDK 16 or 17). Virtual Threads have the potential to be a big game changer for Java: better concurrency while using fewer OS threads without significant code changes - what more to ask.

Debugging a few million virtual threads is going to be interesting, thread dumps of the future will require a bit more tooling, e.g. hierarchical views etc or at least a good scroll wheel on the mouse :)


// JFR Event Streaming with Java 14

The Java Flight Recorder has a long history. It used to be part of the BEA JRockit JVM, became a commercial feature of the Oracle JDK (7+) after Oracle acquired BEA and Sun and was finally fully open sourced with the release of OpenJDK 11 (backports exist for 8) (JEP328). OpenJDK 14 will add some improvements to the JFR.

JEP 349 will allow the continuous consumption of java flight recorder events in-memory from within the same JVM, or out-of-process from a different JVM via its JFR repository file.

JEP 349 made it already into the early access builds and can be experimented with since OpenJDK 14 build 22+. Lets check it out.

In-Process Streaming

The base JFR configuration files (XML) can be found in JDK_HOME/lib/jfr. The default configuration (default.jfc) is relatively low overhead, while profile.jfc will provide more data. Java Mission Control can create custom settings based on templates if needed. I am using the default config for the examples.

The first example will start JFR on the local JVM using the default recorder settings and register a few event handlers to check if it is working.


import java.io.IOException;
import java.text.ParseException;
import jdk.jfr.Configuration;
import jdk.jfr.consumer.EventStream;
import jdk.jfr.consumer.RecordingStream;

public class JFRStreamTest {
    
    public static void main(String[] args) throws IOException, ParseException  {
        
        Configuration config = Configuration.getConfiguration("default");
        System.out.println(config.getDescription());
        System.out.println("settings:");
        config.getSettings().forEach((key, value) -> System.out.println(key+": "+value));
        
        // open a stream and start local recording
        try (EventStream es = new RecordingStream(config)) {
            
            // register event handlers
            es.onEvent("jdk.GarbageCollection", System.out::println);
            es.onEvent("jdk.CPULoad", System.out::println);
            es.onEvent("jdk.JVMInformation", System.out::println);
            
            // start and block
            es.start();
        }
    
    }
    
}

The above example should print information about the running JVM once, CPU load periodically and GC events when they occur.

Out-of-Process Streaming

Simply start the flight recorder as usual via jcmd <PID> JFR.start or via the JVM flag -XX:+FlightRecorder at startup. The repository location will be stored in the jdk.jfr.repository system property as soon JFR is running (new in Java 14). It can also be set at startup via a comma separated list of flight recorder options: -XX:FlightRecorderOptions:repository=./blackbox

Update thanks to Erik from Oracle in the comments section: The repository location can be also set using jcmd <PID> JFR.configure repositorypath=<directory>. If you set it after a recording has started, new data will be written in the new location.


$ jcmd -l | grep netbeans
8492 org.netbeans.Main ... --branding nb

$ jcmd 8492 JFR.start name=streamtest
Started recording 1. No limit specified, using maxsize=250MB as default.
Use jcmd 8492 JFR.dump name=streamtest filename=FILEPATH to copy recording data to file.

$ jinfo -sysprops 8492 | grep jfr
jdk.jfr.repository=/tmp/2019_11_18_02_19_59_8492

Now that the recording is running and we know where the repository is, a second JVM can open a stream to the live JFR repository and monitor the application. Note that we did not dump any JFR records to a file, we connect to the live repository directly.


import java.io.IOException;
import java.nio.file.Path;
import jdk.jfr.consumer.EventStream;

public class JFRStreamTest {
    
    public static void main(String[] args) throws IOException  {
        // connect to JFR repository
        try (EventStream es = EventStream.openRepository(Path.of("/tmp/2019_11_18_02_19_59_8492"))) {
            
            // register some event handlers
            //es.onEvent("jdk.CPULoad", System.out::println);
            es.onEvent("jdk.SocketRead", System.out::println);
            es.onEvent("jdk.SocketWrite", System.out::println);
            
            // start and block
            es.start();
        }
    
    }
    
}

As a quick test i monitored with the example above a NetBeans instance running on Java 14 and let the IDE check for updates. Since we watch for SocketRead and Write events the output looked like:

jdk.SocketRead {
  startTime = 04:34:09.571
  duration = 117.764 ms
  host = "netbeans.apache.org"
  address = "40.79.78.1"
  port = 443
  timeout = 30.000 s
  bytesRead = 5 bytes
  endOfStream = false
  eventThread = "pool-5-thread-1" (javaThreadId = 163)
  stackTrace = [
    java.net.Socket$SocketInputStream.read(byte[], int, int) line: 68
    sun.security.ssl.SSLSocketInputRecord.read(InputStream, byte[], int, int) line: 457
    sun.security.ssl.SSLSocketInputRecord.decode(ByteBuffer[], int, int) line: 165
    sun.security.ssl.SSLTransport.decode(TransportContext, ByteBuffer[], int, int, ...
    sun.security.ssl.SSLSocketImpl.decode(ByteBuffer) line: 1460
    ...
  ]
}
...

Streaming Dumped Records

Opening streams (EventStream.openFile(path)) to JFR record dumps (jcmd <PID> JFR.dump filename=foo.jfr) is also possible of course - works as expected.

Conclusion

Pretty cool new feature! It is currently not possible to do in-memory but out-of-process steaming without syncing on repository files. But since a ramdisk can workaround this issue so easily I am not even sure if this capability would be worth it.

fly safe


// [Dynamic] [Application] Class Data Sharing in Java

Class Data Sharing as a JVM feature exists for quite some time already, but it became more popular in context of containers. CDS maps a pre-defined class archive into memory and makes it shareable between JVM processes. This can improve startup time but also reduce per-JVM footprint in some cases.

Although basic CDS was already supported in Java 5, this blog entry assumes that Java 11 or later is used, since JVM flags and overall capabilities changed over time. CDS might not be supported on all architectures or GCs: G1, CMS, ShenandoahGC, Serial GC, Parallel GC, and ParallelOld GC do all support CDS on 64bit linux, while ZGC for example doesn't have support for it yet.


JDK Class Data Sharing

The most basic form of CDS is setup to share only JDK class files. This is enabled by default since JDK 12 (JEP 341). If you check the java version using a shell you notice that the JVM version string will contain "sharing".

[mbien@hulk server]$ java -version
openjdk version "12.0.1" 2019-04-16
OpenJDK Runtime Environment AdoptOpenJDK (build 12.0.1+12)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 12.0.1+12, mixed mode, sharing)
The shared class archive can be found pre installed in ${JAVA_HOME}/lib/server/ called classes.jsa. By temporary renaming the file or switching to an older JDK, "sharing" won't appear anymore in the version string.

[mbien@hulk server]$ java -version
openjdk version "12.0.1" 2019-04-16
OpenJDK Runtime Environment AdoptOpenJDK (build 12.0.1+12)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 12.0.1+12, mixed mode)

The JDK CDS archive can be manually generated by invoking java -Xshare:dump with root privileges. Recent JVMs are configured to use -Xshare:auto at startup, which will automatically use CDS if available. Enforcing CDS with -Xshare:on will cause the JVM to fail if no archive is found.


Application Class Data Sharing

The basic CDS archive only contains JDK class data. Adding application class data (JEP 310) is fairly easy but comes with some limitations.

  • the classpath used at archive creation time must be the same as (or a prefix of) the classpath used at run time
  • wildcards or exploded JARs are not allowed in the path (no cheating)

1) create a list of classes (JDK and APP; Java 11+) to include in the shared archive:


$ java -Xshare:off -XX:DumpLoadedClassList=classes.list -jar foo.jar

2) create the archive using that list:


$ java -Xshare:dump -XX:SharedClassListFile=classes.list -XX:SharedArchiveFile=foo.jsa -cp foo.jar

3) launch the applications using the shared archive:


$ java -Xshare:on -XX:SharedArchiveFile=foo.jsa -jar foo.jar

-Xlog:class+load helps to verify that it is working properly.

[0.062s][info][class,load] java.lang.Object source: shared objects file
[0.062s][info][class,load] java.io.Serializable source: shared objects file
[0.062s][info][class,load] java.lang.Comparable source: shared objects file
[0.062s][info][class,load] java.lang.CharSequence source: shared objects file
...

The applications (containers) don't have to be identical to make use of custom CDS archives. The archive is based on the class list of step 1 which can be freely modified or merged with other lists. The main limitation is the classpath prefix rule.

Some Results

As quick test I used OpenJDK 11 + Eclipse Jetty + Apache JSP Wiki and edited a single page (to ensure the classes are loaded), generated a custom archive and started a few more instances with CDS enabled using the archive.


classes.list size = 4296 entries
wiki.jsa size = 57,6 MB 

 VIRT    RES     SHR  %MEM    TIME    COMMAND
2771292 315128  22448  8,3   0:20.54 java -Xmx128m -Xms128m -Xshare:off -cp ... 
2735468 259148  45184  6,9   0:18.38 java -Xmx128m -Xms128m -XX:SharedArchiveFile=wiki.jsa -cp ...
2735296 270044  45192  7,1   0:19.66 java -Xmx128m -Xms128m -XX:SharedArchiveFile=wiki.jsa -cp ...
2735468 259812  45060  6,9   0:18.26 java -Xmx128m -Xms128m -XX:SharedArchiveFile=wiki.jsa -cp ...
2735468 256800  45196  6,8   0:19.10 java -Xmx128m -Xms128m -XX:SharedArchiveFile=wiki.jsa -cp ...

The first JVM is started with CDS turned off as reference. The four other instances have sharing enabled using the custom archive.


Dynamic Class Data Archives

The proposal in JEP 350 which is currently targeted for Java 13 will allow to combine a static base archive with a dynamically generated archive.

The dynamic class data archive is created in a setup phase at application exit (-XX:ArchiveClassesAtExit=dynamic.jsa) and essentially automates step 1 and 2. Just like before -XX:SharedArchiveFile=dynamic.jsa will tell the JVM to map the shared archive. Since the dynamic archive references the static JDK archive, both will be used automatically.

The main advantage (beside the added convenience) is that the dynamic archive will support both builtin class loaders and user-defined class loaders.

now go forth and share class data :)