<?xml version="1.0" encoding="utf-8"?>
<!-- 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153
-->
<?xml-stylesheet type="text/xsl" href="https://mbien.dev/roller-ui/styles/rss.xsl" media="screen"?><rss version="2.0" 
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
  <title>Michael Bien&apos;s Weblog</title>
  <link>https://mbien.dev/blog/</link>
    <atom:link rel="self" type="application/rss+xml" href="https://mbien.dev/blog/feed/entries/rss?tags=jcriu" />
  <description>don&apos;t panic</description>
  <language>en-us</language>
  <copyright>Copyright 2024</copyright>
  <lastBuildDate>Sat, 24 Aug 2024 07:57:58 +0000</lastBuildDate>
  <generator>Apache Roller 6.1.4</generator>
  <item>
    <guid isPermaLink="true">https://mbien.dev/blog/entry/java-and-rootless-criu-using</guid>
    <title>Defrosting Warmed-up Java [using Rootless CRIU and Project Panama]</title>
    <dc:creator>mbien</dc:creator>
    <link>https://mbien.dev/blog/entry/java-and-rootless-criu-using</link>
    <pubDate>Sat, 21 Nov 2020 03:22:16 +0000</pubDate>
    <category>Java</category>
    <category>criu</category>
    <category>java</category>
    <category>jcriu</category>
    <category>linux</category>
    <category>panama</category>
    <category>tools</category>
<description>&lt;p&gt;
I needed a toy project to experiment with &lt;a href=&quot;https://openjdk.java.net/jeps/389&quot;&gt;JEP 389&lt;/a&gt; of &lt;a href=&quot;https://github.com/openjdk/panama-foreign&quot;&gt;Project Panama&lt;/a&gt; (modern &lt;a href=&quot;https://en.wikipedia.org/wiki/Java_Native_Interface&quot;&gt;JNI&lt;/a&gt;) but wanted to take a better look at &lt;a href=&quot;https://criu.org&quot;&gt;CRIU&lt;/a&gt; (Checkpoint/Restore In Userspace) too. So I thought, lets try to combine both and created &lt;a href=&quot;https://github.com/mbien/JCRIU/&quot;&gt;JCRIU&lt;/a&gt;. The immediate questions I had were: how fast can it defrost a warmed up JVM and can it make a program time travel.
&lt;/p&gt;
&lt;p&gt;
Lets attempt to investigate the first question with this blog entry.
&lt;/p&gt;

&lt;h3&gt;CRIU Crash Course&lt;/h3&gt;
&lt;p&gt;
CRIU can dump process trees to disk (checkpoint) and restore them any time later (implemented in user space) - its all in the name.
&lt;/p&gt;

&lt;p&gt;
Lets run a minimal test first. 
&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;
#!/bin/bash
echo my pid: $$
i=0
while true
do
    echo $i &amp;&amp; ((i=i+1)) &amp;&amp; sleep 1
done
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;
The script above will print its PID initially and then continue to print and increment a number. It isn&apos;t important that this is a bash script, it could be any process.
&lt;/p&gt;

&lt;h3&gt;shell 1:&lt;/h3&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;
$ sh test.sh 
my pid: 14255
0
1
...
9
Killed
&lt;/code&gt;&lt;/pre&gt;

&lt;h3&gt;shell 2:&lt;/h3&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;
$ criu dump -t 14255 --shell-job -v -D dump/
...
(00.021161) Dumping finished successfully
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;
This command will let CRIU dump (checkpoint) the process with the specified PID and store its image in &lt;code&gt;./dump&lt;/code&gt; (overwriting any older image on the same path). The flag &lt;code&gt;--shell-job&lt;/code&gt; tells CRIU that the process is attached to a console. Dumping a process will automatically kill it, like in this example, unless &lt;code&gt;-R&lt;/code&gt; is specified.
&lt;/p&gt;

&lt;h3&gt;shell 2:&lt;/h3&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;
$ criu restore --shell-job -D dump/
10
11
12
...
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;
To restore, simply replace &quot;dump&quot; with &quot;restore&quot;, without specifying the PID. As expected the program continues counting in shell 2, right where it was stopped in shell 1.
&lt;/p&gt;

&lt;h3&gt;Rootless CRIU&lt;/h3&gt;
&lt;p&gt;
As of now (Nov. 2020) the CRIU commands above still require root permissions. But this might change soon. Linux 5.9 received &lt;code&gt;cap_checkpoint_restore&lt;/code&gt; (&lt;a href=&quot;http://lkml.iu.edu/hypermail/linux/kernel/2008.0/02646.html&quot;&gt;patch&lt;/a&gt;) and CRIU is also already &lt;a href=&quot;https://github.com/checkpoint-restore/criu/pull/1155&quot;&gt;being prepared&lt;/a&gt;.
To test rootless CRIU, simply build the non-root branch and set &lt;code&gt;cap_checkpoint_restore&lt;/code&gt; to the resulting binary (no need to install, you can use &lt;code&gt;criu&lt;/code&gt; directly).
&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;
sudo setcap cap_checkpoint_restore=eip /path/to/criu/binary
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;
&lt;b&gt;Note:&lt;/b&gt; Dependent on your linux distribution you might have to set &lt;code&gt;cap_sys_ptrace&lt;/code&gt; too. Some features might not work yet, for example restoring as &lt;code&gt;--shell-job&lt;/code&gt; or using the CRIU API. Use a recent Kernel (at least 5.9.8) before trying to restore a JVM.
&lt;/p&gt;

&lt;h3&gt;CRIU + Java + Panama = JCRIU&lt;/h3&gt;
&lt;p&gt;
&lt;a href=&quot;https://github.com/mbien/JCRIU/&quot;&gt;JCRIU&lt;/a&gt; uses Panama&apos;s &lt;code&gt;jextract&lt;/code&gt; tool during build time to generate a low level (1:1) binding directly from the header of the CRIU API. The low level binding isn&apos;t exposed through the public API however, its just a implementation detail. Both &lt;code&gt;jextract&lt;/code&gt; and the foreign function module are part of project Panama, early access builds are available &lt;a href=&quot;https://jdk.java.net/panama/&quot;&gt;here&lt;/a&gt;. &lt;a href=&quot;https://openjdk.java.net/jeps/389&quot;&gt;JEP 389&lt;/a&gt;: Foreign Linker API has been (&lt;a href=&quot;https://mail.openjdk.java.net/pipermail/jdk-dev/2020-November/004893.html&quot;&gt;today&lt;/a&gt;) accepted for inclusion as JDK 16 incubator module - it might appear in mainline builds soon.
&lt;/p&gt;
&lt;p&gt;
The main entry point is &lt;code&gt;CRIUContext&lt;/code&gt; which implements &lt;code&gt;AutoCloseable&lt;/code&gt; to cleanly dispose resources after use. Potential errors are mapped to &lt;code&gt;CRIUException&lt;/code&gt;s. Checkpointing should be fairly robust since the communication is done over RPC with the actual CRIU process. Crashing CRIU most likely won&apos;t take the JVM down too.
&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;
    public static void main(String[] args) throws IOException, InterruptedException {
        
        // create empty dir for images
        Path image = Paths.get(&quot;checkpoint_test_image&quot;);

        if (!Files.exists(image))
            Files.createDirectory(image);
        
        // checkpoint the JVM every second
        try (CRIUContext criu = CRIUContext.create()
                .logLevel(WARNING).leaveRunning(true).shellJob(true)) {
            
            int n = 0;
            
            while(true) {
                Thread.sleep(1000);

                criu.checkpoint(image); // checkpoint and entry point for a restore

                long pid = ProcessHandle.current().pid()
                System.out.println(&quot;my PID: &quot;+pid+&quot; checkpont# &quot;+n++);
            }
        }
    }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;
The above example is somewhat similar to the simple bash script. The main difference is that the Java program is checkpointing itself every second. This allows us to &lt;b&gt;CTRL+C&lt;/b&gt; any time - the program will keep counting and checkpointing where it left of, if restored. 
&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;
[mbien@longbow JCRIUTest]$ sudo sh start-demo.sh 
WARNING: Using incubator modules: jdk.incubator.foreign
my PID: 16195 checkpont# 0
my PID: 16195 checkpont# 1
my PID: 16195 checkpont# 2
my PID: 16195 checkpont# 3
my PID: 16195 checkpont# 4
my PID: 16195 checkpont# 5
CTRL+C
[mbien@longbow JCRIUTest]$ sudo criu restore --shell-job -D checkpoint_test_image/
my PID: 16195 checkpont# 5
my PID: 16195 checkpont# 6
my PID: 16195 checkpont# 7
my PID: 16195 checkpont# 8
my PID: 16195 checkpont# 9
CTRL+C
[mbien@longbow JCRIUTest]$ sudo criu restore --shell-job -D checkpoint_test_image/
my PID: 16195 checkpont# 9
my PID: 16195 checkpont# 10
my PID: 16195 checkpont# 11
my PID: 16195 checkpont# 12
my PID: 16195 checkpont# 13
my PID: 16195 checkpont# 14
CTRL+C
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;
&lt;b&gt;Note:&lt;/b&gt; start-demo.sh is just setting env variables to an early access JDK 16 panama build, enables &lt;code&gt;jdk.incubator.foreign&lt;/code&gt; etc. The project README has the details.
&lt;/p&gt;

&lt;h3&gt;Important Details and Considerations&lt;/h3&gt;
&lt;p&gt;
&lt;ul&gt;
&lt;li&gt;CRIU restores images with the same PIDs the processes had during checkpoint. This won&apos;t cause much trouble in containers since the namespace should be quite empty, but might conflict from time to time on a workstation. If the same image should be restored multiple times concurrently, it will have to run in its own PID namespace. This can be achieved with &lt;code&gt;sudo unshare -p -m -f [restore command]&lt;/code&gt;. See &lt;code&gt;man unshare&lt;/code&gt; for details.&lt;/li&gt;
&lt;li&gt;Opened files are not allowed to change (in size) between checkpoint and restore. If they do, the restore operation will fail. (watch out for log files, JFR repos, JVM perf data or temporary files)&lt;/li&gt;
&lt;li&gt;If the application established TCP connections you have to tell CRIU that via the &lt;code&gt;--tcp-established &lt;/code&gt; flag (or similar named method in CRIUContext). CRIU will try to restore all connections in their correct states. &lt;a href=&quot;https://criu.org/CLI&quot;&gt;wiki link to more options&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;The first checkpoint or restore after system boot can take a few seconds because CRIU has to gather information about the system configuration first; this information is cached for subsequent uses&lt;/li&gt;
&lt;li&gt;Some application dependent post-restore tasks might be required, for example keystore/cert replacement or RNG re-initialization (...)&lt;/li&gt;
&lt;li&gt;CRIU can&apos;t checkpoint resources it can&apos;t reach. A X Window or state stored on a GPU can&apos;t be dumped&lt;/li&gt;
&lt;li&gt;Migration should probably only be attempted between (very) similar systems and hardware&lt;/li&gt;
&lt;/ul&gt;
&lt;/p&gt;

&lt;h3&gt;Instant Defrosting of Warmed-up JVMs&lt;/h3&gt;
&lt;p&gt;
Lets take a look what you can do with super luminal, absolute zero, instant defrosting JCRIU (ok I&apos;ll stop ;)) when applied to my favorite dusty java web monolith: Apache Roller. I stopped the time this blog here would require to start on my workstation when loaded from a NVMe on JDK 16 +  Jetty 9.4.34. (I consider it started when the website has loaded in the browser, not when the app server reports it started)
&lt;/p&gt;
&lt;p&gt;
classic start: &lt;b&gt;~6.5 s&lt;/b&gt;
&lt;/p&gt;
&lt;p&gt;
(for comparison: it takes about a minute to start on a Raspberry Pi 3b+, which is serving this page you are reading right now)
&lt;/p&gt;

&lt;p&gt;
Now lets try this again. But this time Roller will warm itself up, generate RSS feeds, populate the in-memory cache, give the JIT a chance to compile hot paths, compact the heap by calling &lt;code&gt;System.gc()&lt;/code&gt; and finally shock frost itself via &lt;code&gt;criu.checkpoint(...)&lt;/code&gt;.
&lt;/p&gt;

&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;
        warmup();    // generates/caches landing page/RSS feeds and first 20 blog entries
        System.gc(); // give the GC a chance to clean up unused objects before checkpoint

        try (CRIUContext criu = CRIUContext.create()
                .logLevel(WARNING).leaveRunning(false).tcpEstablished(true)) {

            criu.checkpoint(imagePath);  // checkpoint + exit

        } catch (CRIUException ex) {
            jfrlog.warn(&quot;post warmup checkpoint failed&quot;, ex);
        }
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;
(The uncompressed image size was between 500-600 MB during my tests, heap was set to 1 GB with ParallelGC active)
&lt;/p&gt;

&lt;p&gt;
restore:
&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;
$ sudo time criu restore --shell-job --tcp-established -d -D blog_image/

real 0m0,204s
user 0m0,015s
sys  0m0,022s
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;
instant defrosting: &lt;b&gt;204 ms&lt;/b&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;Note:&lt;/b&gt; &lt;code&gt;-d&lt;/code&gt; detaches the shell after the restore operation completed. Alternative way to measure defrosting time is by enabling verbose logging with &lt;code&gt;-v&lt;/code&gt; and comparing the last timestamp, this is slightly slower (+20ms) since CRIU tends to log a lot on lower log levels. Let me know if there is a better way of measuring this, but I double checked everything and the image loading speed would be well below the average read speed of my M.2 NVMe.
&lt;/p&gt;
&lt;p&gt;
The blog is immediately reachable in the browser, served by a warmed-up JVM.
&lt;/p&gt;

&lt;h3&gt;Conclusion &amp;&amp; Discussion&lt;/h3&gt;
&lt;p&gt;
CRIU is quite interesting for use cases where Java startup time matters. Quarkus for example moves slow framework initialization from startup to build time, native images with GraalVM further improve initialization by AOT compiling the application into a single binary, but this also sacrifices a little bit throughput. CRIU can be another tool in the toolbox to quickly map a running JVM with application into memory (no noteworthy code changes required).
&lt;/p&gt;
&lt;p&gt;
The Foreign Linker API (JEP 389) is currently proposed as preview feature for OpenJDK 16, which is a major part of project Panama. However, to use JCRIU on older JDKs, another implementation for CRIUContext would be needed. A implementation which communicates via google protocol buffers with CRIU would completely avoid binding to the CRIU C-API for example.
&lt;/p&gt;
&lt;p&gt;
The JVM would be in an excellent position to aid CRIU in many ways. It already is an operating system for Java/Bytecode based programs (soon even with its own implementation for &lt;a href=&quot;https://mbien.dev/blog/entry/taking-a-look-at-virtual&quot;&gt;threads&lt;/a&gt;) and knows how to drive itself to safe points (checkpointing an application which is under load is probably a bad idea), how to compact or resize the heap, invalidate code cache etc - I see great potential there.
&lt;/p&gt;
&lt;p&gt;
Let me know what you think.
&lt;/p&gt;
&lt;p&gt;
Thanks a lot to Adrian Reber (&lt;a href=&quot;https://twitter.com/adrian__reber&quot;&gt;@adrian__reber&lt;/a&gt;) who patiently answered all my questions about CRIU.
&lt;/p&gt;
</description>  </item>
</channel>
</rss>