Michael Bien's Weblogdon't panichttps://mbien.dev/blog/feed/entries/atom2023-03-16T06:30:59+00:00Apache Rollerhttps://mbien.dev/blog/entry/java-19-s-map-setJava 19's new Map/Set Factory Methodsmbien2022-09-19T07:15:44+00:002022-09-29T04:21:18+00:00<p>
Beside the big preview features of Java 19 (e.g. virtual threads which I blogged about over <a href="https://mbien.dev/blog/entry/taking-a-look-at-virtual">two years ago</a> - time flies), there are also some noteworthy API updates which might fall under the radar.
One API change are <a href="https://bugs.openjdk.org/browse/JDK-8284975">new factory methods</a> for creating mutable, pre-sized Map and Set collections. Lets take a quick look and find out how they differ from the old constructor counterparts and when to use them instead.
</p>
<h3>Pre-sized Maps/Sets</h3>
<p>
Many collections can be created with pre-allocated initial capacity if the number of items which will be stored in them is already known or can be estimated. This avoids unnecessary resizing operations which happen in the background while the implementation has to dynamically expand the collection based on item count.
</p>
<p>
This is all straight forward but there is one little anomaly: HashMaps (and related collections) have an <code>initialCapacity</code> and a <code>loadFactor</code> (default: 0.75) parameter. The <code>initialCapacity</code> however is <b>not</b> for the expected entry count, it is for the initial size of an internal table (impl. detail: rounded up to the nearest power-of-two), which is larger than the actual entry count in the map, since it is only filled until the given <code>loadFactor</code> is reached before it is resized. [<a href="https://docs.oracle.com/en/java/javase/19/docs/api/java.base/java/util/HashMap.html">javadoc</a>]
</p>
<p>
This detail is very easy to overlook, for example:
</p>
<pre><code class="language-java">
Map<String, String> map = new HashMap<>(4);
map.put("one", "1");
map.put("two", "2");
map.put("three", "3");
map.put("four", "4");
</code></pre>
<p>
may look correct on first glance, but it will resize the internal table when more than 0.75*4 entries are added (which is the case above). Resizing a (large) Map can be comparatively expensive, the <a href="https://github.com/openjdk/jdk/blob/4b03e135e157cb6cc9ba5eebf4a1f1b6e9143f48/src/java.base/share/classes/java/util/HashMap.java#L674-L755">code</a> responsible for it isn't trivial. Further, the javadoc mentions that <i>"creating a HashMap with a sufficiently large capacity will allow the mappings to be stored more efficiently than letting it perform automatic rehashing as needed to grow the table"</i>.
</p>
<p>
What we want is to create a Map which can actually hold 4 entries, for this we have to calculate the capacity parameter unfortunately:
</p>
<pre><code class="language-java">
// Java 18
Map<String, String> map = new HashMap<>((int) Math.ceil(4 / 0.75));
</code></pre>
<p>
The new factory methods are simply <a href="https://github.com/openjdk/jdk/blob/4b03e135e157cb6cc9ba5eebf4a1f1b6e9143f48/src/java.base/share/classes/java/util/HashMap.java#L2556-L2585">hiding this calculation</a>.
</p>
<pre><code class="language-java">
// Java 19+
// analog exists for HashSet, LinkedHashSet, LinkedHashMap and WeakHashMap
Map<String, String> map = HashMap.newHashMap(4);
</code></pre>
<p>
If you think you made this mistake before - don't feel bad :). Since even OpenJDK code overlooked this detail on some occasions as can be seen in <a href="https://github.com/openjdk/jdk/pull/7928/files">PR1</a> and <a href="https://github.com/openjdk/jdk/pull/8302/files">PR2</a> which introduce the new factory methods and also refactor JDK code to use them (instead of the various ways how the capacity was calculated). Stuart Marks gives <a href="https://bugs.openjdk.org/browse/JDK-8186958?focusedCommentId=14478251&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14478251">here</a> a few more examples of how to not calculate it. The javadoc of the respective constructors (e.g. for <a href="https://docs.oracle.com/en/java/javase/19/docs/api/java.base/java/util/HashMap.html#%3Cinit%3E(int)">HashMap#HashMap(int)</a>) contains now also an API Note which delegates to the factory methods.
</p>
<p>
<b>note:</b> ConcurrentHashMap and IdentityHashmap already expect the parameter being the actual entry count, which means they didn't receive factory methods.
</p>
<p>
I might add some jackpot code transformation rules to this <a href="https://github.com/mbien/jackpot-inspections">collection</a> to make migration a bit more convenient (update: <a href="https://github.com/mbien/jackpot-inspections/blob/2c9887940fd5cca4dbb0a0db20d68dc718a591ee/CollectionPerformance.hint#L330-L459">done</a>).
</p>
<h3>Just for Fun: A Quick Experiment</h3>
<p>
We can show this by observing the internal table size of the Map while adding entries:
</p>
<pre><code class="language-java">
public static void main(String[] args) throws ReflectiveOperationException {
System.out.println(Runtime.version());
int entries = 4; // small number to fit the output on a blog entry
inspect("new HashMap<>(entries)", new HashMap<>(entries), entries);
inspect("HashMap.newHashMap(entries)", HashMap.newHashMap(entries), entries);
inspect("new HashMap<>(((int) Math.ceil(entries / 0.75)))", new HashMap<>(((int) Math.ceil(entries / 0.75))), entries);
}
private static void inspect(String desc, Map<? super Object, ? super Object> map,
int entries) throws ReflectiveOperationException {
System.out.println();
System.out.println("filling '"+desc+"' with "+entries+" entries...");
System.out.println("table size: [content]");
for (int i = 0; i < entries; i++) {
map.put("key"+i, "value");
Field field = HashMap.class.getDeclaredField("table");
field.setAccessible(true);
Object[] table = (Object[]) field.get(map);
System.out.println(table.length+": "+Arrays.asList(table));
}
System.out.println("map size: {content}");
System.out.println(map.size()+": "+map);
}
</code></pre>
<p>
output:
</p>
<pre><code class="language-java">
19+36-2238
filling 'new HashMap<>(entries)' with 4 entries...
table size: [content]
4: [null, null, null, key0=value]
4: [key1=value, null, null, key0=value]
4: [key1=value, key2=value, null, key0=value]
8: [key1=value, key2=value, null, key0=value, null, null, key3=value, null]
map size: {content}
4: {key1=value, key2=value, key0=value, key3=value}
filling 'HashMap.newHashMap(entries)' with 4 entries...
table size: [content]
8: [null, null, null, key0=value, null, null, null, null]
8: [key1=value, null, null, key0=value, null, null, null, null]
8: [key1=value, key2=value, null, key0=value, null, null, null, null]
8: [key1=value, key2=value, null, key0=value, null, null, key3=value, null]
map size: {content}
4: {key1=value, key2=value, key0=value, key3=value}
filling 'new HashMap<>(((int) Math.ceil(entries / 0.75)))' with 4 entries...
table size: [content]
8: [null, null, null, key0=value, null, null, null, null]
8: [key1=value, null, null, key0=value, null, null, null, null]
8: [key1=value, key2=value, null, key0=value, null, null, null, null]
8: [key1=value, key2=value, null, key0=value, null, null, key3=value, null]
map size: {content}
4: {key1=value, key2=value, key0=value, key3=value}
</code></pre>
<h3>Conclusion</h3>
<p>
In Java 19 and later, we can use the new factory methods for creating mutable Sets/Maps with pre-allocated item count without having to calculate the internal capacity. The old constructor counterparts are now only needed when non-standard load factors are chosen.
</p>
<p>- - - sidenotes - - -</p>
<p>
HashMaps initialize their internal table lazily. Calling it pre-allocation isn't entirely correct. Adding the first entry will however allocate the correctly sized table even if its done later.
</p>
<p>
Since HashMap's internal table size is rounded up to the nearest power-of-two, the capacity might be still sufficient to not cause resize ops even when the constructor was used incorrectly by mistake without properly calculating the initial capacity (still no excuse for not fixing it ;)).
</p>
https://mbien.dev/blog/entry/crac-coordinated-restore-at-checkpointCRaC - Coordinated Restore at Checkpointmbien2021-08-11T03:18:51+00:002021-09-15T11:24:10+00:00<p>
Last year I experimented a little bit with the instant restoration of started and warmed up Java programs from disk, beside a couple of other potential use cases for checkpoints. To achieve this, I accessed a rootless build of <a href="https://criu.org">CRIU</a> directly from <a href="https://mbien.dev/blog/entry/java-and-rootless-criu-using">Java via its C/RPC-API</a> (using Panama as binding layer). Although it worked surprisingly well, it quickly became clear that a proper implementation would require help from the JVM on a lower level and also an API to coordinate checkpoint/restore events between libraries.
</p>
<p>
I was pleased to see that there is a decent chance this might actually happen, since a new project with the name <a href="https://github.com/CRaC/jdk">CRaC</a> is currently in the <a href="https://mail.openjdk.java.net/pipermail/announce/2021-July/000304.html">voting stage</a> to be officially started as OpenJDK sub-project. Lets take a look at the prototype.
</p>
<p>
<b>update:</b> CRaC has been <a href="https://mail.openjdk.java.net/pipermail/crac-dev/2021-September/000000.html">approved</a> (<a href="https://openjdk.java.net/projects/crac/">OpenJDK project</a>, <a href="https://github.com/openjdk/crac">github</a>).
</p>
<h3>With a little Help from the JVM</h3>
<p>
Why would checkpoint/restore benefit from JVM and OpenJDK support? Several reasons. CRIU does not like it when files change between C/R, a simple log file might spoil the fun if a JVM is restored, shut down and then restored again (which will fail). A JVM is also in an excellent position to run heap cleanup and compaction prior to calling CRIU to dump the process to disk. Checkpointing could be also done after driving the JVM into a safe point and making sure that everything stopped.
</p>
<p>
The <a href="https://github.com/CRaC/jdk">CRaC prototype</a> covers all of that already and more:
</p>
<ul>
<li>CheckpointException is thrown if files or sockets are open at a checkpoint</li>
<li>a simple API allows coordination with C/R events</li>
<li>Heap is cleaned, compacted and the checkpoint is made when the JVM reached a safe point</li>
<li>CRaC handles some JVM produced files automatically (no need to set -<code>XX:-UsePerfData</code> for example)</li>
<li>The <code>jcmd</code> tool can be used to checkpoint a JVM from a shell</li>
<li>CRIU is bundled in the JDK as a bonus - no need to have it installed</li>
</ul>
<p>
Since CRaC would be potentially part of OpenJDK one day, it could manage the files of JFR repositories automatically, and help with other tasks like the re-seeding SecureRandom instances or updating SSL certificates in future, which would be difficult (or impossible) to achieve as a third party library.
</p>
<h3>Coordinated Restore at Checkpoint</h3>
<p>
The API is very simple and somewhat similar to what I wrote for <a href="https://github.com/mbien/JCRIU/">JCRIU</a>, the main difference is that the current implementation does not allow the JVM to continue running after a checkpoint is created (But I don't see why this can't change in future).
</p>
<pre><code class="language-java">
Core.checkpointRestore();
</code></pre>
<p>
serves currently both as checkpoint and program exit. It is also at the same time the entry point for a restore.
</p>
<pre><code class="language-java">
Core.getGlobalContext().register(resource);
</code></pre>
<p>
A global context is used to register resources which will be notified before a checkpoint is created and in reverse order after the process is restored.
</p>
<h3>Minimal Example</h3>
<p>
Lets say we have a class CRACTest which can write Strings to a file (like a logger). To coordinate with C/Rs, it would need to close the file before checkpoint and reopen it after restore.
</p>
<pre><code class="language-java">
public class CRACTest implements Resource, AutoCloseable {
private OutputStreamWriter writer;
public CRACTest() {
writer = newWriter();
Core.getGlobalContext().register(this); // register as resource
}
...
...
@Override
public void beforeCheckpoint(Context<? extends Resource> context) throws Exception {
System.out.println("resource pre-checkpoint");
writer.close();
writer = null;
}
@Override
public void afterRestore(Context<? extends Resource> context) throws Exception {
System.out.println("resource post-restore");
writer = newWriter();
}
public static void main(String[] args) throws IOException {
System.out.println(Runtime.version());
try (CRACTest writer = new CRACTest()) {
writer.append("hello");
try {
System.out.println("pre-checkpoint PID: "+ProcessHandle.current().pid());
Core.checkpointRestore(); // exit and restore point
System.out.println("post-restore PID: "+ProcessHandle.current().pid());
} catch (CheckpointException | RestoreException ex) {
throw new RuntimeException("C/R failed", ex);
}
writer.append(" there!\n");
}
}
}
</code></pre>
<p>
start + checkpoint + exit:
</p>
<pre><code class="language-bash">
$CRaC/bin/java -XX:CRaCCheckpointTo=/tmp/cp -cp target/CRACTest-0.1-SNAPSHOT.jar dev.mbien.CRACTest
14-crac+0-adhoc..crac-jdk
pre-checkpoint PID: 12119
resource pre-checkpoint
</code></pre>
<p>
restore at checkpoint:
</p>
<pre><code class="language-bash">
$CRaC/bin/java -XX:CRaCRestoreFrom=/tmp/cp -cp target/CRACTest-0.1-SNAPSHOT.jar dev.mbien.CRACTest
resource post-restore
post-restore PID: 12119
</code></pre>
<p>
lets see what we wrote to the file:
</p>
<pre><code class="language-bash">
cat /tmp/test/CRACTest/out.txt
hello there!
</code></pre>
<p>
restore 3 more times as a test:
</p>
<pre><code class="language-bash">
./restore.sh
resource post-restore
post-restore PID: 12119
./restore.sh
resource post-restore
post-restore PID: 12119
./restore.sh
resource post-restore
post-restore PID: 12119
cat /tmp/test/CRACTest/out.txt
hello there!
there!
there!
there!
</code></pre>
<p>
works as expected.
</p>
<p>
What happens when we leave an io stream open? Lets remove <code>writer.close()</code> from <code>beforeCheckpoint()</code> and attempt to run a fresh instance.
</p>
<pre><code class="language-bash">
./run.sh
14-crac+0-adhoc..crac-jdk
pre-checkpoint PID: 12431
resource pre-checkpoint
resource post-restore
Exception in thread "main" java.lang.RuntimeException: C/R failed
at dev.mbien.cractest.CRACTest.main(CRACTest.java:72)
Caused by: jdk.crac.CheckpointException
at java.base/jdk.crac.Core.checkpointRestore1(Core.java:134)
at java.base/jdk.crac.Core.checkpointRestore(Core.java:177)
at dev.mbien.cractest.CRACTest.main(CRACTest.java:69)
Suppressed: jdk.crac.impl.CheckpointOpenFileException: /tmp/test/CRACTest/out.txt
at java.base/jdk.crac.Core.translateJVMExceptions(Core.java:76)
at java.base/jdk.crac.Core.checkpointRestore1(Core.java:137)
... 2 more
</code></pre>
<p>
The JVM will detect and tell us which files are still open before a checkpoint is attempted. In this case no checkpoint is made and the JVM continues. By adding this restriction, CRaC avoids a big list of potential restore failures.
</p>
<h3>Tool Integration</h3>
<p>
Checkpoints can be also triggered externally by using the <code>jcmd</code> tool.
</p>
<pre><code class="language-bash">
jcmd 15119 JDK.checkpoint
15119:
Command executed successfully
</code></pre>
<h3>Context and Resources</h3>
<p>
The Context itself implements Resource. This allows hierarchies of custom contexts to be registered to the global context. Since the context of a resource is passed to the <code>beforeCheckpoint</code> and <code>afterRestore</code> methods, it can be used to carry information to assist in C/R of specific resources.
</p>
<h3>Performance</h3>
<p>
As demonstrated with <a href="https://github.com/mbien/JCRIU/">JCRIU</a>, restoring initialized and warmed up Java applications can be really fast - CRaC however can be even faster due to the fact that the process image is much more compact.
The average time to restore the JVM running this blog from a checkpoint using <a href="https://mbien.dev/blog/entry/java-and-rootless-criu-using">JCRIU was ~200 ms</a>, while CRaC can <a href="https://github.com/crac/docs#results">restore JVMs in ~50 ms</a>. Although this will depend on the size of the process image and IO read speed.
</p>
<h3>Potential use-cases beside instant restore</h3>
<p>
CRaC seems to be concentrating mainly on the use-case of restoring a started and warmed up JVM as fast as possible. This makes of course sense, since why would someone start a JVM in a container, on-demand, when it could have been already started when the container image was built? The purpose of the container is most likely to run business logic, not to start programs.
</p>
<p>
However, if CRaC would allow programs to continue running after a checkpoint, it would open up many other possibilities. For example:
</p>
<ul>
<li>time traveling debuggers, stepping backwards to past breakpoints (checkpoints)</li>
<li>snapshotting of a production JVM to restore and test/inspect it locally, do heap dumps etc</li>
<li>maybe some niche use-cases of periodic checkpoints and automatic restoration on failure (<a href="https://criu.org/Incremental_dumps">incremental dumps</a>)</li>
<li>instantly starting IDEs (although this won't be a small task)</li>
</ul>
<p>
in any case... exciting times :)
</p>
<p>
Thanks to Anton Kozlov from Azul for immediately <a href="https://github.com/AntonKozlov/crac-jdk/commit/066dd04112b32a7171c5fa6d56c8e6688b2993f0">fixing a bug</a> I encountered during testing.
</p>
<br/>
<p>- - - sidenotes - - -</p>
<p>
jdk14-crac/lib/criu and jdk14-crac/lib/action-script might require cap_sys_ptrace to be set on some systems to not fail during restore.
</p>
<p>
The <a href="https://github.com/checkpoint-restore/criu/pull/1155">rootless mode for CRIU</a> hasn't made it yet into the master branch which means that the JVM or criu has to be run with root privileges for now.
</p>
<p>
C/R of UI doesn't work at all, since disposing a window will still leave some cached resources behind (opened sockets, file descriptors etc) - but this is another aspect which could be only solved on the JDK level (although this won't be trivial).
</p>
https://mbien.dev/blog/entry/enhanced-pseudo-random-number-generatorsJava 17's Enhanced Pseudo-Random Number Generatorsmbien2021-07-16T17:47:52+00:002021-12-21T07:51:25+00:00<p>
<a href="https://openjdk.java.net/jeps/356">JEP 356</a> adds a new set of pseudo-random number generators to Java 17 and a nice new API to list and instantiate them. Lets take a look.
</p>
<h3>RandomGeneratorFactory</h3>
<p>
The new main entry point is <code>java.util.random.RandomGeneratorFactory</code>, which can list all available factories (<i>all()</i>), get one by name (<i>of("..")</i>) or return the default factory (<i>getDefault()</i>). Lets see first what JDK 17 ships with.
</p>
<pre><code class="language-java">
RandomGeneratorFactory.all()
.map(fac -> fac.group()+":"+fac.name()
+ " {"
+ (fac.isSplittable()?" splitable":"")
+ (fac.isStreamable()?" streamable":"")
+ (fac.isJumpable()?" jumpable":"")
+ (fac.isArbitrarilyJumpable()?" arbitrary-jumpable":"")
+ (fac.isLeapable()?" leapable":"")
+ (fac.isHardware()?" hardware":"")
+ (fac.isStatistical()?" statistical":"")
+ (fac.isStochastic()?" stochastic":"")
+ " stateBits: "+fac.stateBits()
+ " }"
)
.sorted().forEach(System.out::println);
</code></pre>
<p>
prints...
</p>
<pre><code class="language-bash">
LXM:L128X1024MixRandom { splitable streamable statistical stateBits: 1152 }
LXM:L128X128MixRandom { splitable streamable statistical stateBits: 256 }
LXM:L128X256MixRandom { splitable streamable statistical stateBits: 384 }
LXM:L32X64MixRandom { splitable streamable statistical stateBits: 96 }
LXM:L64X1024MixRandom { splitable streamable statistical stateBits: 1088 }
LXM:L64X128MixRandom { splitable streamable statistical stateBits: 192 }
LXM:L64X128StarStarRandom { splitable streamable statistical stateBits: 192 }
LXM:L64X256MixRandom { splitable streamable statistical stateBits: 320 }
Legacy:Random { statistical stateBits: 48 }
Legacy:SecureRandom { stochastic stateBits: 2147483647 }
Legacy:SplittableRandom { splitable streamable statistical stateBits: 64 }
Xoroshiro:Xoroshiro128PlusPlus { streamable jumpable leapable statistical stateBits: 128 }
Xoshiro:Xoshiro256PlusPlus { streamable jumpable leapable statistical stateBits: 256 }
</code></pre>
<p>
The <code>Legecy</code> group represents the old PRNGs. For example the "Random" factory will produce <code>java.util.Random</code> instances while "SecureRandom" produces <code>java.security.SecureRandom</code>.
</p>
<pre><code class="language-java">
RandomGenerator rng1 = RandomGeneratorFactory.of("Random").create(42); // new way
RandomGenerator rng2 = new Random(42); // old way
RandomGenerator rng3 = RandomGeneratorFactory.getDefault().create(42); // new default
RandomGenerator rng4 = RandomGenerator.getDefault(); // shortcut to new default
System.out.println(rng1.getClass()); // class java.util.Random
System.out.println(rng2.getClass()); // class java.util.Random
System.out.println(rng3.getClass()); // class jdk.random.L32X64MixRandom
System.out.println(rng4.getClass()); // class jdk.random.L32X64MixRandom
</code></pre>
<p>
The default implementation is already a new algorithm of the LXM group, which is fine since the API didn't exist before - existing applications won't be affected. From the doc: <i>"Returns a RandomGenerator meeting the minimal requirement of having an algorithm whose state bits are greater than or equal 64."</i>
</p>
<h3>No Thread Safety</h3>
<p>
None of the new implementations are thread safe while both <code>java.util.Random</code> and <code>java.security.SecureRandom</code> are.
</p>
<p>
Although it is not very common to share the same instance between threads (there is even ThreadLocalRandom for this specific purpose if it doesn't have to be cryptographically secure), I would advice against blindly refactoring the code into something like
</p>
<pre><code class="language-java">
RandomGenerator threadSafeQuestionMark = RandomGeneratorFactory.all()
.filter(RandomGeneratorFactory::isStochastic)
.sorted((g1, g2) -> g2.stateBits() - g1.stateBits())
.findFirst().get().create();
</code></pre>
<p>
This will return a thread safe SecureRandom now, but if there would be a better implementation in future, which isn't thread safe, your application might break if it relied on that fact. There is no <i>isThreadSafe()</i> in the API so there is no good way to filter it. Make sure you don't rely on the special nature of the legacy implementations before using filters in a forward-incompatible way. See SplittableGenerator section for a better solution instead to sharing.
</p>
<h3>Which random to pick?</h3>
<h3>...if you are using java.security.SecureRandom</h3>
<p>
If you look at the capability list above you will notice that one algorithm is not quite like the others. SecureRandom is the only stochastic algorithm and it initialized by some entropy source, usually responsibility of your kernel (/dev/random) during boot, or a lava lamp, etc. So if your application used SecureRandom before, keep using it, there is currently only one cryptographically strong RNG in the JDK.
</p>
<h3>...if you are using java.util.Random</h3>
<p>
You have several options to pick from now on (as long you don't share the instance between threads). The javadoc for the <a href="https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/random/package-summary.html">java.util.random package</a> has a great description of the new algorithms, mixing functions and also a section which helps with choosing the right one.
</p>
<p>
Consider <i>getDefault()</i> before picking a factory at random ;)
</p>
<h3>...if you are using java.util.SplittableRandom</h3>
<p>
Consider switching to the new SplittableGenerators, see quote in benchmark section.
</p>
<h3>SplittableGenerator and Threads</h3>
<p>
As soon multiple threads are involved you want to make sure that individual threads don't generate the same random numbers in parallel. A quick and dirty way of doing this in past, was by simply sharing a thread safe <code>java.util.Random</code>. A slightly better aproach is <code> ThreadLocalRandom.current()</code> (however thread locals will face scalability issues once <a href="https://mbien.dev/blog/entry/taking-a-look-at-virtual">virtual threads</a> arrive). But a much better approach is Java 8 <code>java.util.SplittableRandom</code> (see Legacy group above).
</p>
<p>
Java 17 adds several LXM implementations which all implement the <code>SplittableGenerator</code> interface of the new <code>java.util.random</code> package. The general idea is to split a new instance from a local source, before a new thread (or task) is spawned without causing any contention. This ensures that the instances are initialized in a way that they don't end up in the same cycle of pseudo random numbers.
</p>
<pre><code class="language-java">
ExecutorService vte = Executors.newVirtualThreadExecutor();
SplittableGenerator source = RandomGeneratorFactory.<SplittableGenerator>of("L128X1024MixRandom").create();
source.splits(100).forEach((rng) -> {
vte.submit(() -> {
// this is one of 100 virtual threads with its own independent rng instance
// the instance uses the same "L128X1024MixRandom" algorithm
long random = rng.nextLong();
...
});
});
</code></pre>
<p>
Each splitted generator is also a SplittableGenerator, so tasks can split more generators for their subtasks recursively on demand (useful for ForkJoinPools).
</p>
<p>
<a href="https://openjdk.java.net/jeps/8263012">ScopeLocals</a> of project loom will be another way to inject context dependent variables into tasks (but this is beyond JDK 17).
</p>
<pre><code class="language-java">
private final static ScopeLocal<SplittableGenerator> rng_scope =
ScopeLocal.inheritableForType(SplittableGenerator.class);
public static void main(String[] args) throws InterruptedException {
SplittableGenerator rng1 =
RandomGeneratorFactory.<SplittableGenerator>of("L128X1024MixRandom").create();
SplittableGenerator rng2 =
RandomGeneratorFactory.<SplittableGenerator>of("L32X64MixRandom").create();
try (ExecutorService vte = Executors.newVirtualThreadExecutor()) {
for (int i = 0; i < 5; i++) {
ScopeLocal.where(rng_scope, rng1.split(), () -> { vte.submit(new Task()); });
}
for (int i = 0; i < 5; i++) {
ScopeLocal.where(rng_scope, rng2.split(), () -> { vte.submit(new Task()); });
}
}
}
private static class Task implements Runnable {
@Override public void run() {
SplittableGenerator rng = rng_scope.get();
System.out.println(rng);
}
}
</code></pre>
prints 5x L128X1024MixRandom and 5x L32X64MixRandom, with every virtual thread having its own instance.
<pre><code class="language-bash">
jdk.random.L128X1024MixRandom@2d7b71b1
jdk.random.L128X1024MixRandom@7ab82aa3
jdk.random.L128X1024MixRandom@704041d3
jdk.random.L32X64MixRandom@3542c1bf
jdk.random.L32X64MixRandom@e941886
jdk.random.L32X64MixRandom@43dd13b
jdk.random.L32X64MixRandom@760156b6
jdk.random.L32X64MixRandom@556d3ef0
jdk.random.L128X1024MixRandom@456e8e4d
jdk.random.L128X1024MixRandom@316b0e77
</code></pre>
<h3>Sources</h3>
<p>
A SplittableGenerator can also split a new instance of its implementation from a different source. Interestingly, the source has to be a SplittableGenerator as well.
</p>
<pre><code class="language-java">
interface SplittableGenerator extends StreamableGenerator {
...
SplittableGenerator split();
...
SplittableGenerator split(SplittableGenerator source);
...
</code></pre>
<p>
After going through the <a href="https://github.com/openjdk/jdk/blob/b4371e9bcaa1c8aa394b5eca409c5afc669cc146/src/jdk.random/share/classes/jdk/random/L128X1024MixRandom.java#L298-L310">sourcecode</a> I couldn't find a reason why the source couldn't be a different generator type, for example a high entropy SecureRandom instance. So I <a href="https://mail.openjdk.java.net/pipermail/core-libs-dev/2021-July/079863.html">asked</a> on the core-libs-dev list and it turns out it is deliberate:
</p>
<p><i>
"(...) You are right that the comment in the JEP was a little loose, and that the implementation(s) of the split/splits methods could in principle draw random values from a RandomGenerator that is not itself splittable. There might even be applications for such functionality.</i>
</p><p><i>
However, we chose not to support that more general functionality for a fairly subtle reason: there are concerns that if a PNRG is less than perfect, using it as a source of entropy for seeding a PRNG that uses a different algorithm might result in unexpected correlations that could drastically reduce the quality of the output of the new PRNG instance.
(...) —Guy Steele
"</i>
<a href="https://mail.openjdk.java.net/pipermail/core-libs-dev/2021-July/079872.html">full mail</a>
</p>
<p>
Basically implementations of SplittableGenerator have a certain baseline quality which makes them viable as source for splits of their own or other splittable implementations. Other algorithms which don't implement SplittableGenerator might not necessarily have this quality and could cause problems down the line - interesting.
</p>
<h3>Benchmark</h3>
<p>
There can't be a blog post about algorithms without the obligatory benchmark.
</p>
<pre><code class="language-bash">
# Run complete. Total time: 00:21:44
Benchmark (name) Mode Cnt Score Error Units
RandomJMH.rngInt L128X1024MixRandom avgt 5 5.037 ± 0.035 ns/op
RandomJMH.rngInt L128X128MixRandom avgt 5 3.640 ± 0.035 ns/op
RandomJMH.rngInt L128X256MixRandom avgt 5 3.948 ± 0.014 ns/op
RandomJMH.rngInt L32X64MixRandom avgt 5 1.983 ± 0.001 ns/op
RandomJMH.rngInt L64X1024MixRandom avgt 5 2.545 ± 0.001 ns/op
RandomJMH.rngInt L64X128MixRandom avgt 5 2.045 ± 0.006 ns/op
RandomJMH.rngInt L64X128StarStarRandom avgt 5 2.055 ± 0.023 ns/op
RandomJMH.rngInt L64X256MixRandom avgt 5 2.659 ± 1.715 ns/op
RandomJMH.rngInt Random avgt 5 8.979 ± 0.001 ns/op
RandomJMH.rngInt SecureRandom avgt 5 183.858 ± 0.798 ns/op
RandomJMH.rngInt SplittableRandom avgt 5 1.291 ± 0.003 ns/op
RandomJMH.rngInt Xoroshiro128PlusPlus avgt 5 1.771 ± 0.001 ns/op
RandomJMH.rngInt Xoshiro256PlusPlus avgt 5 2.063 ± 0.023 ns/op
RandomJMH.rngLong L128X1024MixRandom avgt 5 5.035 ± 0.037 ns/op
RandomJMH.rngLong L128X128MixRandom avgt 5 3.647 ± 0.046 ns/op
RandomJMH.rngLong L128X256MixRandom avgt 5 3.953 ± 0.042 ns/op
RandomJMH.rngLong L32X64MixRandom avgt 5 3.003 ± 0.001 ns/op
RandomJMH.rngLong L64X1024MixRandom avgt 5 2.589 ± 0.030 ns/op
RandomJMH.rngLong L64X128MixRandom avgt 5 2.046 ± 0.005 ns/op
RandomJMH.rngLong L64X128StarStarRandom avgt 5 2.052 ± 0.027 ns/op
RandomJMH.rngLong L64X256MixRandom avgt 5 2.455 ± 0.001 ns/op
RandomJMH.rngLong Random avgt 5 17.983 ± 0.190 ns/op
RandomJMH.rngLong SecureRandom avgt 5 367.623 ± 2.274 ns/op
RandomJMH.rngLong SplittableRandom avgt 5 1.296 ± 0.014 ns/op
RandomJMH.rngLong Xoroshiro128PlusPlus avgt 5 1.776 ± 0.023 ns/op
RandomJMH.rngLong Xoshiro256PlusPlus avgt 5 2.063 ± 0.001 ns/op
</code></pre>
<p>
linux 5.10.49; jdk-17+28; CPU i7-6700K, HT off, boost time limit off, boost thread limit off. <a href="https://gist.github.com/mbien/0f0def456076cf2bc3af3910ed151bce ">source</a>.
</p>
<p>
The bad performance of the old Random class is most likely attributed to its thread safe promise, it has to work with atomic longs and CAS loops while the new implementations can just compute and return the next value.
</p>
<p>
Keep in mind this is just CPU time, this does not take per instance memory footprint or the mathematical properties of the algorithms into account (<a href ="https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/random/package-summary.html">javadoc for more info</a>). This benchmark also only tests two methods.
</p>
<p>
The old SplittableRandom for example, although performing very well, has its own problems. Quoting the JEP: <i>"In 2016, testing revealed two new weaknesses in the algorithm used by class SplittableRandom. On the one hand, a relatively minor revision can avoid those weaknesses. On the other hand, a new class of splittable PRNG algorithms (LXM) has also been discovered that are almost as fast, even easier to implement, and appear to completely avoid the three classes of weakness to which SplittableRandom is prone."</i>
</p>
<h3>Summary</h3>
<p>
Java 17 adds the <code>java.util.random</code> package with new APIs and PRNG implementations. Switching to it can be worth it, but be careful when migrating old code due to changes in thread safety. If you are using SecureRandom, keep using it, in all other cases consider getDefault() instead of legacy Random or pick a specific implementation which fits your use case best. Take a look at SplittableGenerators of the LXM group for multi threaded scenarios. If it doesn't have to be splittable, consider the Xoroshiro and Xoshiro implementations.
</p>
<p>
Thats all for now. Until next time ;)
</p>
https://mbien.dev/blog/entry/custom-java-runtimes-with-jlinkCustom Java Runtimes with jlink [and jdeps for classpath applications]mbien2021-06-20T18:05:34+00:002021-07-18T15:49:11+00:00<p>
The <code>jlink</code> command line tool can be used to create custom java runtimes, which only include the functionality required by the (modular) java application. However, what if the application isn't modular and still uses the classpath? In this case an extra step is needed to determine which JDK modules are required by the application before <code>jlink</code> can be used.
</p>
<h3>classic classpaths: finding module dependencies with jdeps</h3>
<p>
<code>jdeps</code> is excellent for porting classic classpath based applications to java modules. It analyzes jars and list all their dependencies, which can be other jars, or modules, with package granularity. Although we don't want to port the dusty application to the module system for this blog post, listing all the module dependencies is exactly what we need for <code>jlink</code>, to be able to create a custom java runtime.
</p>
<p>
<code>jdeps</code> produces the following output:
</p>
<pre><code class="language-bash">
# foo.jar depends on bar.jar
foo.jar -> bar.jar
...
# or foo.jar depends on a module
foo.jar -> module.name
...
</code></pre>
<p>
Since the tool is intended to assist with porting applications to java modules, the default output will be fairly detailed down to package dependencies. The summary (<code>-s</code>) omits all that and only lists jars or modules.
</p>
<p>
All we have to do is to go recursively through all jars and remember the module names they depend on.
</p>
<pre><code class="language-bash">
# -s omit detailed package dependencies
# -R analyze recursively through all found dependencies
# --multi-release 16 for the case that there are multi release jars involved
$JDK/bin/jdeps -s -R --multi-release 16 --class-path 'lib/*' dusty-application.jar
jakarta.persistence-2.2.3.jar -> java.base
jakarta.persistence-2.2.3.jar -> java.instrument
jakarta.persistence-2.2.3.jar -> java.logging
jakarta.persistence-2.2.3.jar -> java.sql
foo.jar -> bar.jar
...
</code></pre>
<p>
Some greping and deduplication and we have a short list of JDK modules our application depends on.
</p>
<pre><code class="language-bash">
$JDK/bin/jdeps -s -R --multi-release 16 --class-path 'lib/*' dusty-application.jar\
| grep -Ev '\.jar$' | cut -d " " -f 3 | sort -u
java.base
java.desktop
java.instrument
java.logging
java.naming
java.net.http
java.security.jgss
java.sql
java.xml
jdk.unsupported
</code></pre>
<p>
Thats it? Not quite. Analyzing an application like that won't show dependencies which are caused via reflection. So you will have to take a good look at the resulting modules and probably add some manually. A good candidate are <i>jdk.crypto.*</i> modules. <code>jlink</code> can assist with that task too by listing service providers.
</p>
<pre><code class="language-bash">
$JDK/bin/jlink --suggest-providers java.security.Provider
Suggested providers:
java.naming provides java.security.Provider used by java.base
java.security.jgss provides java.security.Provider used by java.base
jdk.crypto.ec provides java.security.Provider used by java.base
...
</code></pre>
<p>
You might also want to add modules like <i>jdk.jfr</i>, <i>java.management</i> or <i>jdk.localedata</i> even when the application isn't direclty depending on them. You can experiment with options like <code>--compile-time</code> which will usually list more dependencies (default is runtime analysis). <code>jlink</code> adds transitive dependencies automatically.
</p>
<p>
Any missing modules should be quickly noticed during integration tests.
</p>
<h3>custom runtimes with jlink</h3>
<p>
Once we have the module list we can give it to <code>jlink</code> for the actual heavy lifting.
</p>
<pre><code class="language-bash">
MODS=...
JDK=/home/mbien/dev/java/jdk-16.0.1+9
DIST=custom-$(basename $JDK)
$JDK/bin/jlink -v --add-modules $MODS\
--compress=2 --no-header-files --no-man-pages\
--vendor-version="[mbien.dev pod REv1]"\
--output $DIST
du -s $DIST
</code></pre>
<p>
<code>jlink</code> is automatically using the modules of the JDK which contains the tool, which means that the example above will create a runtime based on jdk-16.0.1+9. The flag <code>--module-path</code> would set a path to a alternative module folder. If the application is already modular, the path could also include the application modules, if they should be part of the runtime too.
</p>
<p>
some noteworthy flags:
</p>
<ul>
<li><code>--strip-debug</code> this is going to strip debug symbols from both the native binaries and bytecode, you probably don't want to use this since this will remove all line numbers from stack traces. Its likely that the binaries of the JDK distribution you are using have most of their symbols already stripped.</li>
<li><code>--strip-native-debug-symbols=objcopy=/usr/bin/objcopy</code> Same as above, but only for native binaries</li>
<li><code>--compress=0|1|2</code> 0 for no compression, 1 for string deduplication, 2 for zip compressed modules. This might influence startup time slightly; see CDS section below</li>
<li><code>--include-locales=langtag[,langtag]*</code> include only a subset of locales instead of the full module</li>
<li><code>--vendor-version="i made this"</code> this looks uninteresting at first glance but it is very useful if you want to recognize your custom runtime again once you have multiple variants in containers. Adding domain name/project name or purpose of the base image helps. <br/> It will appear on the second line of the output of <code>java -version</code></li>
</ul>
<h3>full JDK as baseline</h3>
<pre><code class="language-bash">
MODS=ALL-MODULE-PATH
# --compress=1
138372 (151812 with CDS)
# --compress=2
102988 (116428 with CDS)
# --compress=2 --strip-debug
90848 (102904 with CDS)
</code></pre>
<h3>custom runtime example</h3>
<pre><code class="language-bash">
MODS=java.base,java.instrument,java.logging,java.naming,java.net.http,\
java.security.jgss,java.sql,java.xml,jdk.jfr,jdk.unsupported,java.rmi,\
java.management,java.datatransfer,java.transaction.xa,\
jdk.crypto.cryptoki,jdk.crypto.ec
# --compress=1
55996 (69036 with CDS)
# --compress=2
45304 (58344 with CDS)
# --compress=2 --strip-debug
40592 (52240 with CDS)
</code></pre>
<p>(this a aarch64 build of OpenJDK, x64 binaries are slightly larger)</p>
<p>
Most modules are actually fairly small, the 5 largest modules are <i>java.base, java.desktop, jdk.localedata, jdk.compiler</i> and <i>jdk.internal.vm.compiler</i>. Since java.base is mandatory anyway, adding more modules won't significantly influence the runtime size unless you can't avoid some of the big ones.
</p>
<p>
Once you are happy with the custom runtime you should add it to your test environment of the project and IDE.
</p>
<h3>CDS - to share or not to share?</h3>
<p>
I wrote about <a href="https://mbien.dev/blog/entry/dynamic-application-class-data-sharing">class data sharing</a> before so I keep this short. A CDS archive is a file which is mapped into memory by the JVM on startup and is shared between JVM instances. This even works for co-located containers, sharing the same image layer which includes the CDS archive.
</p>
<p>
Although it adds to the image size, zip compression + CDS seems to be always smaller than uncompressed without CDS. The CDS file should also eliminate the need to decompress modules during startup since it should contain the most important classes already. So the decision seems to be made easy: compact size + improved startup time and potential (small) memory footprint savings as bonus.
</p>
<p>
Leaving the CDS out frees up ~10 MB of image size. If this matters to your project, benchmark it to see if it makes a difference. It is also possible to put application classes into the shared archive or create a separate archive for the application which extends the runtime archive (dynamic class data sharing). Or go a step further and bundle the application and runtime in a single, AOT compiled, compact, native image with GraalVM (although this might reduce peak throughput due to lack of JIT and have a smaller choice of GCs beside other restrictions) - but this probably won't happen for dusty applications.
</p>
<pre><code class="language-bash">
# create CDS archive for the custom runtime
$DIST/bin/java -Xshare:dump
# check if it worked, this will fail if it can't map the archive
$DIST/bin/java -Xshare:on -version
# list all modules included in the custom java runtime
$DIST/bin/java --list-modules
</code></pre>
<h3>summary</h3>
<p>
Only a single extra step is needed to determine most of the dependencies of an application, even if it hasn't been ported to java modules yet. Maintaining a module list won't be difficult since it should be fairly static (backend services won't suddenly start using swing packages when they are updated). Make sure that the custom runtime is used in your automated tests and IDE.
</p>
<p>
Stop using java 8, time to move on - even without a modular application :)
</p>
<br/>
<p>- - - sidenotes - - -</p>
<p>
If you want to create a runtime which can compile and run <a href="https://mbien.dev/blog/entry/cleaning-bash-history-using-a">single-file-sourcode-programs</a> adding just jdk.compiler isn't enough. This will result in a a little bit misleading "IllegalArgumentException: error: release version 16 not supported" exception. Solution is to add jdk.zipfs too - I haven't investigated it any further.
</p>
<p>
If jlink has to be run from within a container (can be useful for building for foreign archs, e.g aarch64 on x64), you might have to change the process fork mechanism if you run into trouble (java.io.IOException: Cannot run program "objcopy": error=0, Failed to exec spawn helper: pid: 934, exit value: 1).
(export JAVA_TOOL_OPTIONS="-Djdk.lang.Process.launchMechanism=vfork" worked for me)
</p>https://mbien.dev/blog/entry/java-and-rootless-criu-usingDefrosting Warmed-up Java [using Rootless CRIU and Project Panama]mbien2020-11-21T03:22:16+00:002020-11-24T21:01:09+00:00<p>
I needed a toy project to experiment with <a href="https://openjdk.java.net/jeps/389">JEP 389</a> of <a href="https://github.com/openjdk/panama-foreign">Project Panama</a> (modern <a href="https://en.wikipedia.org/wiki/Java_Native_Interface">JNI</a>) but wanted to take a better look at <a href="https://criu.org">CRIU</a> (Checkpoint/Restore In Userspace) too. So I thought, lets try to combine both and created <a href="https://github.com/mbien/JCRIU/">JCRIU</a>. The immediate questions I had were: how fast can it defrost a warmed up JVM and can it make a program time travel.
</p>
<p>
Lets attempt to investigate the first question with this blog entry.
</p>
<h3>CRIU Crash Course</h3>
<p>
CRIU can dump process trees to disk (checkpoint) and restore them any time later (implemented in user space) - its all in the name.
</p>
<p>
Lets run a minimal test first.
</p>
<pre><code class="language-bash">
#!/bin/bash
echo my pid: $$
i=0
while true
do
echo $i && ((i=i+1)) && sleep 1
done
</code></pre>
<p>
The script above will print its PID initially and then continue to print and increment a number. It isn't important that this is a bash script, it could be any process.
</p>
<h3>shell 1:</h3>
<pre><code class="language-bash">
$ sh test.sh
my pid: 14255
0
1
...
9
Killed
</code></pre>
<h3>shell 2:</h3>
<pre><code class="language-bash">
$ criu dump -t 14255 --shell-job -v -D dump/
...
(00.021161) Dumping finished successfully
</code></pre>
<p>
This command will let CRIU dump (checkpoint) the process with the specified PID and store its image in <code>./dump</code> (overwriting any older image on the same path). The flag <code>--shell-job</code> tells CRIU that the process is attached to a console. Dumping a process will automatically kill it, like in this example, unless <code>-R</code> is specified.
</p>
<h3>shell 2:</h3>
<pre><code class="language-bash">
$ criu restore --shell-job -D dump/
10
11
12
...
</code></pre>
<p>
To restore, simply replace "dump" with "restore", without specifying the PID. As expected the program continues counting in shell 2, right where it was stopped in shell 1.
</p>
<h3>Rootless CRIU</h3>
<p>
As of now (Nov. 2020) the CRIU commands above still require root permissions. But this might change soon. Linux 5.9 received <code>cap_checkpoint_restore</code> (<a href="http://lkml.iu.edu/hypermail/linux/kernel/2008.0/02646.html">patch</a>) and CRIU is also already <a href="https://github.com/checkpoint-restore/criu/pull/1155">being prepared</a>.
To test rootless CRIU, simply build the non-root branch and set <code>cap_checkpoint_restore</code> to the resulting binary (no need to install, you can use <code>criu</code> directly).
</p>
<pre><code class="language-bash">
sudo setcap cap_checkpoint_restore=eip /path/to/criu/binary
</code></pre>
<p>
<b>Note:</b> Dependent on your linux distribution you might have to set <code>cap_sys_ptrace</code> too. Some features might not work yet, for example restoring as <code>--shell-job</code> or using the CRIU API. Use a recent Kernel (at least 5.9.8) before trying to restore a JVM.
</p>
<h3>CRIU + Java + Panama = JCRIU</h3>
<p>
<a href="https://github.com/mbien/JCRIU/">JCRIU</a> uses Panama's <code>jextract</code> tool during build time to generate a low level (1:1) binding directly from the header of the CRIU API. The low level binding isn't exposed through the public API however, its just a implementation detail. Both <code>jextract</code> and the foreign function module are part of project Panama, early access builds are available <a href="https://jdk.java.net/panama/">here</a>. <a href="https://openjdk.java.net/jeps/389">JEP 389</a>: Foreign Linker API has been (<a href="https://mail.openjdk.java.net/pipermail/jdk-dev/2020-November/004893.html">today</a>) accepted for inclusion as JDK 16 incubator module - it might appear in mainline builds soon.
</p>
<p>
The main entry point is <code>CRIUContext</code> which implements <code>AutoCloseable</code> to cleanly dispose resources after use. Potential errors are mapped to <code>CRIUException</code>s. Checkpointing should be fairly robust since the communication is done over RPC with the actual CRIU process. Crashing CRIU most likely won't take the JVM down too.
</p>
<pre><code class="language-java">
public static void main(String[] args) throws IOException, InterruptedException {
// create empty dir for images
Path image = Paths.get("checkpoint_test_image");
if (!Files.exists(image))
Files.createDirectory(image);
// checkpoint the JVM every second
try (CRIUContext criu = CRIUContext.create()
.logLevel(WARNING).leaveRunning(true).shellJob(true)) {
int n = 0;
while(true) {
Thread.sleep(1000);
criu.checkpoint(image); // checkpoint and entry point for a restore
long pid = ProcessHandle.current().pid()
System.out.println("my PID: "+pid+" checkpont# "+n++);
}
}
}
</code></pre>
<p>
The above example is somewhat similar to the simple bash script. The main difference is that the Java program is checkpointing itself every second. This allows us to <b>CTRL+C</b> any time - the program will keep counting and checkpointing where it left of, if restored.
</p>
<pre><code class="language-bash">
[mbien@longbow JCRIUTest]$ sudo sh start-demo.sh
WARNING: Using incubator modules: jdk.incubator.foreign
my PID: 16195 checkpont# 0
my PID: 16195 checkpont# 1
my PID: 16195 checkpont# 2
my PID: 16195 checkpont# 3
my PID: 16195 checkpont# 4
my PID: 16195 checkpont# 5
CTRL+C
[mbien@longbow JCRIUTest]$ sudo criu restore --shell-job -D checkpoint_test_image/
my PID: 16195 checkpont# 5
my PID: 16195 checkpont# 6
my PID: 16195 checkpont# 7
my PID: 16195 checkpont# 8
my PID: 16195 checkpont# 9
CTRL+C
[mbien@longbow JCRIUTest]$ sudo criu restore --shell-job -D checkpoint_test_image/
my PID: 16195 checkpont# 9
my PID: 16195 checkpont# 10
my PID: 16195 checkpont# 11
my PID: 16195 checkpont# 12
my PID: 16195 checkpont# 13
my PID: 16195 checkpont# 14
CTRL+C
</code></pre>
<p>
<b>Note:</b> start-demo.sh is just setting env variables to an early access JDK 16 panama build, enables <code>jdk.incubator.foreign</code> etc. The project README has the details.
</p>
<h3>Important Details and Considerations</h3>
<p>
<ul>
<li>CRIU restores images with the same PIDs the processes had during checkpoint. This won't cause much trouble in containers since the namespace should be quite empty, but might conflict from time to time on a workstation. If the same image should be restored multiple times concurrently, it will have to run in its own PID namespace. This can be achieved with <code>sudo unshare -p -m -f [restore command]</code>. See <code>man unshare</code> for details.</li>
<li>Opened files are not allowed to change (in size) between checkpoint and restore. If they do, the restore operation will fail. (watch out for log files, JFR repos, JVM perf data or temporary files)</li>
<li>If the application established TCP connections you have to tell CRIU that via the <code>--tcp-established </code> flag (or similar named method in CRIUContext). CRIU will try to restore all connections in their correct states. <a href="https://criu.org/CLI">wiki link to more options</a></li>
<li>The first checkpoint or restore after system boot can take a few seconds because CRIU has to gather information about the system configuration first; this information is cached for subsequent uses</li>
<li>Some application dependent post-restore tasks might be required, for example keystore/cert replacement or RNG re-initialization (...)</li>
<li>CRIU can't checkpoint resources it can't reach. A X Window or state stored on a GPU can't be dumped</li>
<li>Migration should probably only be attempted between (very) similar systems and hardware</li>
</ul>
</p>
<h3>Instant Defrosting of Warmed-up JVMs</h3>
<p>
Lets take a look what you can do with super luminal, absolute zero, instant defrosting JCRIU (ok I'll stop ;)) when applied to my favorite dusty java web monolith: Apache Roller. I stopped the time this blog here would require to start on my workstation when loaded from a NVMe on JDK 16 + Jetty 9.4.34. (I consider it started when the website has loaded in the browser, not when the app server reports it started)
</p>
<p>
classic start: <b>~6.5 s</b>
</p>
<p>
(for comparison: it takes about a minute to start on a Raspberry Pi 3b+, which is serving this page you are reading right now)
</p>
<p>
Now lets try this again. But this time Roller will warm itself up, generate RSS feeds, populate the in-memory cache, give the JIT a chance to compile hot paths, compact the heap by calling <code>System.gc()</code> and finally shock frost itself via <code>criu.checkpoint(...)</code>.
</p>
<pre><code class="language-java">
warmup(); // generates/caches landing page/RSS feeds and first 20 blog entries
System.gc(); // give the GC a chance to clean up unused objects before checkpoint
try (CRIUContext criu = CRIUContext.create()
.logLevel(WARNING).leaveRunning(false).tcpEstablished(true)) {
criu.checkpoint(imagePath); // checkpoint + exit
} catch (CRIUException ex) {
jfrlog.warn("post warmup checkpoint failed", ex);
}
</code></pre>
<p>
(The uncompressed image size was between 500-600 MB during my tests, heap was set to 1 GB with ParallelGC active)
</p>
<p>
restore:
</p>
<pre><code class="language-bash">
$ sudo time criu restore --shell-job --tcp-established -d -D blog_image/
real 0m0,204s
user 0m0,015s
sys 0m0,022s
</code></pre>
<p>
instant defrosting: <b>204 ms</b>
</p>
<p>
<b>Note:</b> <code>-d</code> detaches the shell after the restore operation completed. Alternative way to measure defrosting time is by enabling verbose logging with <code>-v</code> and comparing the last timestamp, this is slightly slower (+20ms) since CRIU tends to log a lot on lower log levels. Let me know if there is a better way of measuring this, but I double checked everything and the image loading speed would be well below the average read speed of my M.2 NVMe.
</p>
<p>
The blog is immediately reachable in the browser, served by a warmed-up JVM.
</p>
<h3>Conclusion && Discussion</h3>
<p>
CRIU is quite interesting for use cases where Java startup time matters. Quarkus for example moves slow framework initialization from startup to build time, native images with GraalVM further improve initialization by AOT compiling the application into a single binary, but this also sacrifices a little bit throughput. CRIU can be another tool in the toolbox to quickly map a running JVM with application into memory (no noteworthy code changes required).
</p>
<p>
The Foreign Linker API (JEP 389) is currently proposed as preview feature for OpenJDK 16, which is a major part of project Panama. However, to use JCRIU on older JDKs, another implementation for CRIUContext would be needed. A implementation which communicates via google protocol buffers with CRIU would completely avoid binding to the CRIU C-API for example.
</p>
<p>
The JVM would be in an excellent position to aid CRIU in many ways. It already is an operating system for Java/Bytecode based programs (soon even with its own implementation for <a href="https://mbien.dev/blog/entry/taking-a-look-at-virtual">threads</a>) and knows how to drive itself to safe points (checkpointing an application which is under load is probably a bad idea), how to compact or resize the heap, invalidate code cache etc - I see great potential there.
</p>
<p>
Let me know what you think.
</p>
<p>
Thanks a lot to Adrian Reber (<a href="https://twitter.com/adrian__reber">@adrian__reber</a>) who patiently answered all my questions about CRIU.
</p>
https://mbien.dev/blog/entry/jfrlog-commandline-toolsFormatting JFR Events with J'Bang [and JFRLog]mbien2020-10-28T22:29:29+00:002020-10-30T19:20:10+00:00<p>
Once <a href="https://github.com/mbien/JFRLog/">JFRLog</a> stored all logs as events in JFR records, you might want to read them out again for inspection and maybe even in an easy readable format which resembles classic log files a bit more.
</p>
<p>
For this I wrote the <a href="https://github.com/mbien/JFRLog/blob/master/cli/src/main/java/dev/mbien/jfrlog/cli/JFRPrint.java">JFRPrint</a> utility which was originally a single-file java program (<a href="//mbien.dev/blog/entry/cleaning-bash-history-using-a">SFJP</a>) but can now be used as <a href="https://github.com/jbangdev/jbang/">jbang</a> script. The utility can format any JFR event, not only log messages.
</p>
<h3>setup and example</h3>
<pre><code class="language-bash">
# add the catalog to jbang
$ jbang catalog add jfrlog https://github.com/mbien/JFRLog/blob/master/cli/jbang-catalog.json
# define a log pattern
$ MSG_PATTERN="{eventName,0d,C} {startTime,dt:yyyy-MM-dd HH:mm:ss.SSS}\
[{eventThread.javaName}] {origin,1d}: {message} {throwable,o,n}"
# print jfr records using the pattern
$ jbang jfrprint 10h log.* "$MSG_PATTERN" dump.jfr
INFO 2020-09-30 16:12:42.458 [main] jfrlogtest.LogTest: Hello There!
INFO 2020-09-30 16:12:42.460 [main] jfrlogtest.LogTest: 1 2 3 test
WARN 2020-09-30 16:12:42.461 [main] jfrlogtest.LogTest: don't panic
java.lang.IllegalArgumentException: test, please ignore
at dev.mbien.jfrlogtest.LogTest.main(LogTest.java:12)
...
</code></pre>
<p>
Usage is as follows:
</p>
<pre><code class="language-bash">
jfrprint timespan event_name pattern [jfr_record_file | jfr_repository_folder]
</code></pre>
<p>
<code>timespan</code> can be set to only print events which happened after now-timespan. The util will print all events matching <code>event_name</code> (supports * wildcard as postfix) if a JFR record is passed as an argument. If it is a repository folder however, it is going to behave similar to <code>tail -f</code> and will stream all upcoming events from the live JFR repository.
</p>
<p>
To print the usage and more examples simply type:
</p>
<pre><code class="language-bash">
jbang jfrprint help
</code></pre>
<h3>message pattern</h3>
<p>
<code>{fieldName, option1, option2, ..}</code>
</p>
<p>
The message pattern format fairly simple. The curly brace blocks are replaced with the event field defined by <code>fieldName</code>. Printing the name of the event thread for example becomes <code>{eventThread.javaName}</code>.
</p>
<p>
Options can be appended in a coma separated list after <code>fieldName</code>.
</p>
<ul>
<li><code>dt:</code> prefix defines a date-time format and supports everything what <a href="https://docs.oracle.com/en/java/javase/15/docs/api/java.base/java/time/format/DateTimeFormatter.html">DateTimeFormatter.ofPattern()</a> is able to parse</li>
<li><code>c</code> sets the String to lower case, <code>C</code> to upper case</li>
<li><code>[0-n]d</code> defines how many dots you want to see. <code>0d</code> would format "log.Warn" to "Warn". <code>1d</code> formats "foo.bar.Bazz" to "bar.Bazz"</li>
<li><code>o</code> stands for optional and won't print anything if the field is null. This can be useful for printing exceptions when combined with <code>n</code> which adds a new line before the field</li>
<li>for more options check the <a href="https://github.com/mbien/JFRLog/blob/master/cli/src/main/java/dev/mbien/jfrlog/cli/JFRPrint.java#L333">source</a></li>
</ul>
<p>
<code>{...}</code> is a special token which will print all fields which haven't been printed yet. This is especially useful for events which aren't log messages which might have unknown fields.
</p>
<p>
The following pattern will print all events with all their fields which happened in the last hour:
</p>
<pre><code class="language-bash">
jfrprint 1h * "{eventName} {startTime} {...}" record.jfr
</code></pre>
<p>
Note: if no pattern is provided the output will match the multi-line output of OpenJDK's <code>jfr print</code> CLI tool which is also the same format as used in <code>jdk.jfr.Event::toString()</code>.
</p>
<h3>it is still very basic</h3>
<p>
I wrote it originally as SFJP and tried to keep everything as simple and concise as possible. But since it is now set up as as jbang "script", it would allow to make the CLI experience a bit nicer - which i might do in future ;)
</p>
<p>
Let me know if you find it useful or want to see a particular feature.
</p>https://mbien.dev/blog/entry/stopping-containers-correctlyStopping Containers Correctlymbien2020-09-08T07:27:31+00:002020-09-08T10:10:54+00:00<p>
Stopping a container with
</p>
<pre><code class="language-bash">
$ podman stop container-name
</code></pre>
or
<pre><code class="language-bash">
$ docker stop container-name
</code></pre>
<p>
will send <code>SIGTERM</code> to the first process (PID 1) and shut down the container when the process terminates. If this doesn't happen within a certain time frame (default is 10s), the runtime will send <code>SIGKILL</code> to the process and take the container down.
</p>
<p>
So far so good, things are getting interesting when your container process isn't PID 1.
</p>
<p>
This is already the case if the process is started via a shell script.
</p>
<pre><code class="language-bash">
#!/bin/bash
...
java $FLAGS $APP
</code></pre>
<p>
Attempting to stop this container will terminate the script, while the JVM will keep running. The container runtime is usually smart enough to notice that a process is still active after the script terminated and will wait the grace period anyway, before shutting down the container forcefully. The JVM however won't notice anything and won't have the opportunity to call shutdown hooks, write JFR dumps or finish transactions.
</p>
<h3>signal delegation</h3>
<p>
One way to solve this is by delegating the signal from the shell script to the main process:
</p>
<pre><code class="language-bash">
...
java $FLAGS $APP & # detach process from script
PID=$! # remember process ID
trap 'kill -TERM $PID' INT TERM # delegate kill signal to JVM
wait $PID # attach script to JVM again; note: TERM signal unblocks this wait
trap - TERM INT
wait $PID # wait for JVM to exit after signal delegation
EXIT_STATUS=$?
</code></pre>
<p>
The second wait prevents the script from exiting before the JVM finished termination and is required since the first wait is unblocked as soon the script received the signal.
</p>
<h3>it still didn't work</h3>
<p>
Interestingly, after implementing this (and trying out other variations of the same concept) it still didn't work for some reason - debugging showed the trap never fired.
</p>
<p>
Turns out that nothing was wrong with the signal delegation - signals just never reached the script :). So I searched a bit around and found this <a href="https://veithen.io/2014/11/16/sigterm-propagation.html">article</a> which basically described the same async/wait/delegate method in greater detail (thats where I stole the EXIT_STATUS line from), so I knew it had to work. Another <a href="https://blog.true-kubernetes.com/why-does-my-docker-container-take-10-seconds-to-stop/">great article</a> gave me the idea to check the Dockerfile again.
</p>
<pre><code class="language-bash">
FROM ...
...
CMD ./start.sh
</code></pre>
<p>
The <code>sh</code> shell interpreting the bash script was the first process!
</p>
<pre><code class="language-bash">
$ podman ps
CONTAINER ID IMAGE COMMAND ...
de216106ff39 localhost/test:latest /bin/sh -c ./start.sh ...
</code></pre>
<p>
htop (on the host) in tree view shows it more clearly:
</p>
<pre><code class="language-bash">
$ htop
1 root ... /sbin/init
15643 podpilot ... - /usr/libexec/podman/conmon --api-version ...
15646 100996 ... | - /bin/sh -c ./start.sh ...
15648 100996 ... | - /bin/bash ./start.sh
15662 100996 ... | - /home/jdk/bin/java -Xshare:on ...
</code></pre>
<p>
To fix this a different CMD (or ENTRYPOINT) syntax is needed:
</p>
<pre><code class="language-bash">
FROM ...
...
CMD [ "./start.sh" ]
</code></pre>
<p>
Lets check again after rebuild + run:
</p>
<pre><code class="language-bash">
$ podman ps
CONTAINER ID IMAGE COMMAND ...
72e3e60ed60b localhost/test:latest ./start.sh ...
$ htop
1 root ... /sbin/init
15746 podpilot ... - /usr/libexec/podman/conmon --api-version ...
15749 100996 ... | - /bin/bash ./start.sh ...
15771 100996 ... | - /home/jdk/bin/java -Xshare:on ...
</code></pre>
<p>
Much better. Since the script is now executed directly, it is able to receive and delegate the signals to the JVM. The Java Flight Recorder records also appeared in the volume, which meant that the JVM had enough time to convert the JFR repository into a single record file. The <code>podman stop</code> command also returned within a fraction of a second.
</p>
<p>
Since the trap is listening to <code>SIGINT</code> too, even the <code>CTRL+C</code> signal is properly propagated when the container is started in non-detached mode. Nice bonus for manual testing.
</p>
<h3>alternatives</h3>
<p>
Starting the JVM with
</p>
<pre><code class="language-bash">
exec java $FLAGS $APP
</code></pre>
<p>
will replace the shell process with the JVM process without changing PID or process name. Disadvantage: no java commandline in top and the shell won't execute any lines after the exec line (because it basically doesn't exist anymore).
</p>
<p>
... and if you don't care about the container life cycle too much you can always tell the JVM directly to shutdown, this will close all parent shells bottom-up until PID 1 terminated which will finally stop the container.
</p>
<pre><code class="language-bash">
podman exec -it container sh -c "kill \$(jps | grep -v Jps | cut -f1 -d' ')"
</code></pre>
<p>
- - -
</p>
<p>
lessons learned:<br/>
sometimes two square brackets make the difference :)
</p>
https://mbien.dev/blog/entry/jfrlog-logging-directly-to-theJFRLog - Logging directly to the Java Flight Recordermbien2020-08-23T13:10:05+00:002021-04-16T13:44:48+00:00<p>
I just wrote <a href="https://github.com/mbien/JFRLog">JFRLog</a> - a lightweight SLF4J logger implementation which lets you log directly into the Java Flight Recorder. Since SLF4J (Simple Logging Facade for Java) has a binding to basically every logging framework out there, it doesn't even matter how the logging is done or how many frameworks the application or the dependencies use (hopefully not many).
</p>
<p>
Why? I was mostly curious how practical something like that would be and it turned out it works great. If you are running linux its likely that most of the logging is already centralized and stored in binary form and you need tools like <code>journalctl</code> to read it. If you ever had to correlate log files with events of a JFR recording you probably also asked yourself why isn't everything in the recording in the first place. Having everything in one place simplifies containers too.
</p>
<h3>JFRLog</h3>
<p>
The project is available <a href="https://github.com/mbien/JFRLog">on github</a> under the MIT License. It requires JDK 8+ to run but JDK 14+ to build/test since the JUnit tests rely on the <a href="//mbien.dev/blog/entry/jfr-event-streaming-with-java">event streaming API</a>. JFRLog's only dependency is the SLF4J API.
</p>
<h3>quickstart</h3>
<p>
The basic 2.5 steps to get it working are:
</p>
<ol>
<li>replace your SLF4J compatible logging impl with slf4j-jfr-bridge-x.x.x.jar</li>
<li>start the JVM with JFR enabled (e.g. <code>-XX:StartFlightRecording:filename=logs/dump.jfr,dumponexit=true</code>) or enable recording later</li>
<li>check the flight recorder repository or recording dump for log.* events</li>
</ol>
<p>
<b>update:</b> <a href="https://search.maven.org/artifact/dev.mbien.jfrlog/slf4j-jfr-bridge">artifacts</a> of the <a href="https://search.maven.org/artifact/dev.mbien.jfrlog/slf4j-jfr-bridge/0.1.0/jar">first release</a> are now available on maven central.
</p>
<b>minimal maven dependencies:</b>
<pre><code class="language-xml">
<dependencies>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.30</version>
</dependency>
<!-- depends on slf4j-api -->
<dependency>
<groupId>dev.mbien.jfrlog</groupId>
<artifactId>slf4j-jfr-bridge</artifactId>
<version>0.1.0</version>
</dependency>
</dependencies>
</code></pre>
<p>
(adding slf4j-api is optional since it is already a dependency of JFRLog but it is a good practice to depend on the interface
and makes it also more obvious what is going on)
</p>
<b>minimal example:</b>
<pre><code class="language-java">
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class HelloJFRLog {
private final static Logger log = LoggerFactory.getLogger(HelloJFRLog.class);
public static void main(String[] args) {
log.info("does it work?");
}
}
</code></pre>
<b>start with JFR enabled</b>, for example by using the following JVM flag:
<code>-XX:StartFlightRecording:filename=test_dump.jfr,dumponexit=true</code>
<p>
<b>inspect the flight record:</b>
</p>
<pre><code class="language-bash">
$ jfr print --events "log.*" test_dump.jfr
log.Info {
startTime = 11:52:15.954
message = "does it work?"
origin = "dev.mbien.jfrlogtestproject.HelloJFRLog"
throwable = N/A
eventThread = "main" (javaThreadId = 1)
}
</code></pre>
<p>
Thats it! Log and JFR events all in one place.
</p>
<h3>configuration</h3>
<p>
Configuration is straight forward. You can set log levels via java -D arguments or via a <code>jfrlog.properties</code> file and place it into the classpath. Java -D args are treated with higher priority than the configuration file to simplify testing.
</p>
example <code>jfrlog.properties</code>:
<pre><code class="language-bash">
# Sets MyKlass to debug, the rest of the package to error and the default log level
# to info (for everything which isn't explicitly configured).
# The most specific rule wins (order does not matter).
jfrlog.default=info
jfrlog.dev.cool.app=error
jfrlog.dev.cool.app.MyKlass=debug
</code></pre>
<p>
You can also disable entire log levels by disabling the corresponding events in the JFR recording profile if so desired.
</p>
<h3>this blog you are reading is using it right now</h3>
<p>
The Apache Roller blog is a big dusty webapp with many dependencies. Since dependencies use their own favorite logging framework its a nice example of a near-worst-case-scenario a.k.a.: the LoggingHell (extends JarHell). Half of the Apache libs use the old Java Commons Logging layer, struts2 depends on log4j2 for some reason, some webjars use java.util.logging, eclipselink uses its own (!) logging but thankfully with a SLF4J compatibility extension and roller used log4j1 at some point (the version in my <a href="https://github.com/mbien/roller">WIP</a> branch uses log4j2). Ah, and the web server needs some logging too, but thankfully Jetty uses the SLF4J layer so it doesn't care that much.
</p>
<p>
So, to see some text in a file, all what is needed is:
</p>
<ul>
<li><code>jcl-over-slf4j</code> to connect the JCL abstraction layer to the SLF4J layer for spring etc</li>
<li><code>log4j-to-slf4j</code> to make struts happy</li>
<li><code>jul-to-slf4j</code> for webjars (i think)</li>
<li><code>org.eclipse.persistence.extension</code> to bridge eclipselink's personal logger to SLF4J</li>
</ul>
<p>
Beautiful, all logs lead to SLF4J now. We only forgot about the logging impl itself. Usually you would have a <code>log4j-slf4j-impl</code> or <code>logback-classic</code> there - most likely.
But to enable JFRLog all what is needed is to replace it with <code>slf4j-jfr-bridge</code> and that was exactly what I did.
</p>
<p>
I also wrote some java (<a href="//mbien.dev/blog/entry/cleaning-bash-history-using-a">single file java program</a>) scripts using the JFR streaming API for easy inspection of the logging repository. (<s>but that is probably a story for another time</s> <b>update:</b> <a href="//mbien.dev/blog/entry/jfrlog-commandline-tools">CLI tools blog entry</a>)
</p>
<p>
How many new logging frameworks or abstraction layers will the current pandemic give us?
</p>
<p>
Let me know what you think.
</p>
<p>
fly safe (and keep JFR on)
</p>https://mbien.dev/blog/entry/configuring-eclipse-jetty-to-useConfiguring Eclipse Jetty to use Virtual Threadsmbien2020-08-05T12:16:33+00:002020-08-06T00:44:31+00:00<p>
A quick guide about how to configure Jetty to use <a href="https://wiki.openjdk.java.net/display/loom">Project Loom's</a> <a href="//mbien.dev/blog/entry/taking-a-look-at-virtual">virtual threads</a> instead of plain old java threads.
</p>
<p>
Jetty's default thread pool implementation can be swapped out by implementing Jetty's <code>ThreadPool</code> interface and passing an instance to the <code>Server</code> constructor. If you are using jetty stand alone, everything is initialized by xml files.
</p>
<p>
Assuming you are using the recommended <a href="https://www.eclipse.org/jetty/documentation/current/startup-base-and-home.html">jetty home / jetty base</a> folder structure, all what is needed is to create <code>jetty-threadpool.xml</code> in <code>[jetty-base]/etc</code> containing the following:
</p>
<pre><code class="language-xml">
<Configure>
<New id="threadPool" class="dev.mbien.virtualthreads4jetty.VirtualThreadExecutor"/>
</Configure>
</code></pre>
<p>
and put a jar containing the custom <code>VirtualThreadExecutor</code> into <code>[jetty-base]/lib/ext</code>.
I uploaded a build to the release section of the <a href="https://github.com/mbien/vt4jetty">vt4jetty</a> github project.
</p>
<p>
If you don't have an lib/ext folder yet you can enable it with:
</p>
<pre><code class="language-bash">
java -jar $JETTY_HOME/start.jar --add-to-start=ext
</code></pre>
<p>
here the code:
</p>
<pre><code class="language-java">
package dev.mbien.virtualthreads4jetty;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import org.eclipse.jetty.util.thread.ThreadPool;
/**
* Executes each task in a new virtual thread.
*
* <p>Java's default ForkJoinPool is used as scheduler. To influence carrier
* thread count use -Djdk.defaultScheduler.parallelism=N. Default is
* {@link Runtime#availableProcessors()}.
*
* @author mbien
*/
public class VirtualThreadExecutor implements ThreadPool {
private final ExecutorService executor;
public VirtualThreadExecutor() {
executor = Executors.newThreadExecutor(
Thread.builder().virtual().name("jetty-vt#", 0).factory());
// too early for logging libs
System.out.println("VirtualThreadExecutor is active.");
}
@Override
public void execute(Runnable command) {
executor.execute(command);
}
@Override
public void join() throws InterruptedException {
executor.shutdown();
executor.awaitTermination(3, TimeUnit.SECONDS);
}
// those are hopefully only used for stats/dashboards etc
@Override
public int getThreads() { return -1; }
@Override
public int getIdleThreads() { return -1; }
@Override
public boolean isLowOnThreads() { return false; }
}
</code></pre>
<p>
Tested with <b>JDK16-loom+4-56 (2020/7/25)</b> early access build from <a href="https://jdk.java.net/loom/">here</a> and latest Jetty.
</p>
<p>
I encountered some JVM crashes while load testing Apache Roller with virtual threads enabled - keep in mind this is still all very much work in progress.
</p>
https://mbien.dev/blog/entry/java-in-rootless-containers-with[Java in] Rootless Containers with Podmanmbien2020-08-01T08:39:50+00:002020-09-08T07:32:12+00:00<p>
I have always been a little surprised how quickly it became acceptable to run applications wrapped in containers as root processes. Nobody would have run a web server as root before docker became mainstream if there was some way to avoid it. But with docker it became OK to have the docker daemon and the container processes all running as root. The first item in most docker tutorials became how to elevate your user rights so that you don't have to type sudo before every docker command.
</p>
<p>
But this doesn't have to be the case of course. One project I had an eye on was <a href="https://github.com/containers/podman">podman</a> which is a container engine implementing the docker command line interface with quite good support for rootless operations. With the release of Podman 2.0.x (and the fact that it is slowly making it into the <a href="https://tracker.debian.org/pkg/libpod">debian repositories</a>) I started to experiment with it a bit more. (for the experimental rootless mode of Docker check out this <a href="https://docs.docker.com/engine/security/rootless/">page</a>)
</p>
<h3>cgroups v2</h3>
<p>
Containers rely heavily on kernel namespaces and a feature called <a href="https://en.wikipedia.org/wiki/Cgroups">control groups</a>. To properly run rootless containers the kernel must be supporting and running with cgroups v2 enabled. To check if cgroups v2 are enabled simply run:
</p>
<pre><code class="language-bash">
ls /sys/fs/cgroup
cgroup.controllers cgroup.max.depth cgroup.max.descendants cgroup.procs ...
</code></pre>
<p>
If the files are prefixed with <code>cgroup.</code> you are running cgroups v2, if not, its still v1.
</p>
<p>
Many distributions will still run with cgroups v1 enabled by default for backwards compatibility. But you can enable cgroups v2 with the systemd kernel flag: <code>systemd.unified_cgroup_hierarchy=1</code>. To do this with grub for example:
</p>
<ul>
<li>edit <code>/etc/default/grub</code> and</li>
<li>add <code>systemd.unified_cgroup_hierarchy=1</code> to the key <code>GRUB_CMDLINE_LINUX_DEFAULT</code> (space separated list)</li>
<li>then run <code>sudo grub-mkconfig -o /boot/grub/grub.cfg</code> and reboot.</li>
</ul>
<p>
... and make sure you are not running an ancient linux kernel.
</p>
<h3>crun</h3>
<p>
The underlying OCI implementation has to support cgroups v2 too. I tested mostly on <a href="https://github.com/containers/crun">crun</a> which is a super fast and lightweight alternative to runc. The runtime can be passed to podman via the --runtime flag
</p>
<pre><code class="language-bash">
podman --runtime /usr/bin/crun <commands>
</code></pre>
<p>
but it got picked up automatically in my case after I installed the package (manjaro linux, runc is still installed too).
</p>
<pre><code class="language-bash">
podman info | grep -A5 ociRuntime
ociRuntime:
name: crun
package: Unknown
path: /usr/bin/crun
version: |-
crun version 0.14.1
</code></pre>
<h3>subordinate uid and gids</h3>
The last step required to set up rootless containers are <code>/etc/subuid</code> and <code>/etc/subgid</code>. If the files don't exist yet, create them and add a mapping range from your user name to container users.
For example the line:
<pre><code class="language-bash">
duke:100000:65536
</code></pre>
<p>
Gives duke the right to create 65536 users in container images, starting from UID 100000. Duke himself will be mapped by default to root (0) in the container. (Same must be done for groups in subgid).
The range should never overlap with UIDs on the host system. Details in <code>man subuid</code>. More on users later in the volumes section.
</p>
<h3>rootless containers</h3>
Some things to keep in mind:
<ul>
<li>rootless podman runs containers with less privileges than the user which started the container
<ul>
<li>some of these restrictions can be lifted (via <code>--privileged</code>, for example)</li>
<li>but rootless containers will never have more privileges than the user that launched them</li>
<li>root in the container is the user on the host</li>
</ul>
</li>
<li>rootless containers have no IP or MAC address, because nw device association requires root privileges
<ul>
<li>podman uses <a href="https://github.com/rootless-containers/slirp4netns">slirp4netns</a> for user mode networking</li>
<li>pinging something from within a container won't work out of the box - but don't panic: it can be <a href="https://github.com/containers/podman/blob/master/troubleshooting.md#5-rootless-containers-cannot-ping-hosts">configured</a> if desired</li>
</ul>
</li>
</ul>
<h3>podman</h3>
Podman uses the same command-line interface as Docker and it also understands Dockerfiles. So if everything is configured correctly it should all look familiar:
<pre><code class="language-bash">
$ podman version
Version: 2.0.2
API Version: 1
Go Version: go1.14.4
Git Commit: 201c9505b88f451ca877d29a73ed0f1836bb96c7
Built: Sun Jul 12 22:46:58 2020
OS/Arch: linux/amd64
$ podman pull debian:stable-slim
...
$ podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/library/debian stable-slim 56fae066253c 4 days ago 72.5 MB
...
$ podman run --rm debian:stable-slim cat /etc/debian_version
10.4
</code></pre>
<p>
Setting <code>alias docker=podman</code> allows existing scripts to be reused without modification - but I stick here to podman to not cause confusion.
</p>
<h3>container communication</h3>
<p>
Rootless containers don't have their own IP addresses but you can bind them to ports (>1024). Host2container communication works therefore analog to communicating with any service you would have running on the host.
</p>
<pre><code class="language-bash">
$ podman run --name wiki --rm -d -p 8443:8443 jspwiki-jetty
$ podman port -a
fd4c06b454ee 8443/tcp -> 0.0.0.0:8443
$ firefox https://localhost:8443/wiki
</code></pre>
<p>
To setup quick and dirty container2container communication you can let them communicate over the IP address of the host (or host name) and ports, if the firewall is OK with that. But a better maintainable approach are pods. Pods are groups of containers which belong together. It is basically a infrastructure container, containing the actual containers. All containers in a pod share the same localhost and use it for pod-local communication. The outside world is reached via opened ports on the pod.
</p>
<p>
Lets say we have a blog and a db. The blog requires the db but all the host cares about is the https port of the blog container. So we can simply put blog container and db container into a blog-pod and let both communicate via pod-local localhost (podhost?). The https port is opened on the blog-pod for the host while the db isn't reachable from the outside.
</p>
<pre><code class="language-bash">
$ podman pod create --name blogpod -p 8443:8443
# note: a pod starts out with one container already in it,
# it is the infrastructure container - basically the pod itself
$ podman pod list
POD ID NAME STATUS CREATED # OF CONTAINERS INFRA ID
39ad88b8892f blogpod Created 7 seconds ago 1 af7baf0e7fde
$ podman run --pod blogpod --name blogdb --rm -d blog-db
$ podman run --pod blogpod --name apacheroller --rm -d roller-jetty
$ podman pod list
POD ID NAME STATUS CREATED # OF CONTAINERS INFRA ID
39ad88b8892f blogpod Created 30 seconds ago 3 af7baf0e7fde
$ firefox https://localhost:8443/blog
</code></pre>
<p>
Now we already have two containers able to communicate with each other and a host which is able to communicate with a container in the pod - and no sudo in sight.
</p>
<h3>volumes and users</h3>
<p>
We already know that the user on the outside is root on the inside, but lets quickly check it just to be sure:
</p>
<pre><code class="language-bash">
$ whoami
duke
$ id -u
1000
$ mkdir /tmp/outside
$ podman run --rm -it -v /tmp/outside:/home/inside debian:stable-slim bash
root@2fbc9edaa0ee:/$ id -u
0
root@2fbc9edaa0ee:/$ touch /home/inside/hello_from_inside && exit
$ ls -l /tmp/outside
-rw-r--r-- 1 duke duke 0 31. Jul 06:55 hello_from_inside
</code></pre>
<p>
Indeed, duke's UID of 1000 was mapped to 0 on the inside.
</p>
<p>
Since we are using rootless containers and not half-rootless containers we can let the blog and the db within their containers run in their own user namespaces too, what if they write logs to mounted volumes? That is when the subuid and subgid files come into play we configured earlier.
</p>
<p>
Lets say the blog-db container process should run in the namespace of the user dbduke. Since dbduke doesn't have root rights on the inside (as intended), dbduke won't also have rights to write to the mounted volume which is owned by duke. The easiest way to solve this problem is to simply chown the volume folder on the host to the mapped user of the container.
</p>
<pre><code class="language-bash">
# scripts starts blog-db
# query the UID from the container and chown the volumes folder
UID_INSIDE=$(podman run --name UID_probe --rm blog-db /usr/bin/id -u)
podman unshare chown -R $UID_INSIDE /path/to/volume
podman run -v /path/to/volume:/path/inside ... blog-db
</code></pre>
<p>
Podman ships with a tool called unshare (the name is going to make less sense the longer you think about it) which lets you execute commands in the namespace of a different user. The command <code>podman unshare</code> allows to use the rights of duke to chown a folder to the internal UID of dbduke.
</p>
<p>
If we would check the folder rights from both sides, we would see that the UID was mapped from:
</p>
<pre><code class="language-bash">
podman run --name UID_probe --rm blog-db /usr/bin/id -u
998
</code></pre>
to
<pre><code class="language-bash">
$ ls -l volume/
drwxr-xr-x 2 100997 100997 4096 31. Jul 07:54 logs
</code></pre>
on the outside which is within the range specified in the /etc/subuid file - so everything works as intended. This allows user namespace isolation between containers (dbduke, wikiduke etc) and also between containers and the user on the host who launched the containers (duke himself).
<p>
And still no sudo in sight.
</p>
<h3>memory and cpu limits [and java]</h3>
Memory limits should work out of the box in rootless containers
<pre><code class="language-bash">
$ podman run -it --rm -m 256m blog-db java -Xlog:os+container -version
[0.003s][info][os,container] Memory Limit is: 268435456
...
</code></pre>
<p>
This allows the JVM to make smarter choices without having to provide absolute <code>-Xmx</code> flags (but you still can).
</p>
<p>
Setting CPU limits might not work out of the box without root (tested on Manjaro which is basically Arch), since the cgroups config might have user delegation turned off by default. But it is very easy to change:
</p>
<pre><code class="language-bash">
# assuming your user id is 1000 like duke
$ sudo systemctl edit user@1000.service
# now modify the file so that it contains
[Service]
Delegate=yes
# and check if it worked
$ cat /sys/fs/cgroup/user.slice/user-1000.slice/cgroup.controllers
cpuset cpu io memory pids
</code></pre>
<p>
You might have to reboot - it worked right away in my case.
</p>
<pre><code class="language-bash">
# default CPU settings uses all cores
$ podman run -it --rm blog-db sh -c\
"echo 'Runtime.getRuntime().availableProcessors();/exit' | jshell -q"
jshell> Runtime.getRuntime().availableProcessors()$1 ==> 4
# assign specific cores to container
$ podman run -it --rm --cpuset-cpus 1,2 blog-db sh -c\
"echo 'Runtime.getRuntime().availableProcessors();/exit' | jshell -q"
jshell> Runtime.getRuntime().availableProcessors()$1 ==> 2
</code></pre>
<p>
Container CPU core limits should become less relevant in the java world going forward, especially once projects like <a href="https://wiki.openjdk.java.net/display/loom/Main">Loom</a> [<a href="//mbien.dev/blog/entry/taking-a-look-at-virtual">blog post</a>] have been integrated. Since most things in java will run on virtual threads on top of a static carrier thread pool, it will be really easy to restrict the parallelism level of a JVM (basically <code>-Djdk.defaultScheduler.parallelism=N</code> and maybe another flag to limit max GC thread count).
</p>
<p>
But it works if you need it for rootless containers too.
</p>
<h3>class data sharing</h3>
Podman uses fuse-overlayfs for image management by default, which is overlayfs running in user mode.
<pre><code class="language-bash">
$ podman info | grep -A5 overlay.mount_program
overlay.mount_program:
Executable: /usr/bin/fuse-overlayfs
Package: Unknown
Version: |-
fusermount3 version: 3.9.2
fuse-overlayfs: version 1.1.0
</code></pre>
<p>
This means that JVM <a href="//mbien.dev/blog/entry/dynamic-application-class-data-sharing">class data sharing</a> is also supported out of the box if the image containing the class data archive is shared in the image graph between multiple rootless containers.
</p>
<p>
The class data stored in debian-slim-jdk (a local image I created) will be mapped only once into memory and shared between all child containers. Which are in the example below blog-db, roller-jetty and wiki-jetty.
</p>
<pre><code class="language-bash">
$ podman image tree --whatrequires debian-slim-jdk
Image ID: 57c885825969
Tags: [localhost/debian-slim-jdk:latest]
Size: 340.1MB
Image Layers
└── ID: ab5467b188d7 Size: 267.6MB Top Layer of: [localhost/debian-slim-jdk:latest]
├── ID: fa04d6485aa5 Size: 7.68kB
│ └── ID: 649de1f63ecc Size: 11.53MB Top Layer of: [localhost/debian-slim-jdk-jetty:latest]
│ ├── ID: 34ce3917399d Size: 8.192kB
│ │ └── ID: d128b826459d Size: 56.7MB Top Layer of: [localhost/roller-jetty:latest]
│ ├── ID: 9a9c51927e42 Size: 8.192kB
│ └── ID: d5c7176c1073 Size: 27.56MB Top Layer of: [localhost/wiki-jetty:latest]
├── ID: 06acb45dd590 Size: 7.68kB
└── ID: 529ff7e97882 Size: 1.789MB Top Layer of: [localhost/blog-db:latest]
</code></pre>
<h3>stopping java containers cleanly</h3>
<p>
A little tip in the end: In my experience JVMs are quite often launched from scripts in containers. This has automatically the side effect that the JVM process won't be PID 1, since it isn't the entry point. Commands like <code>podman stop <container></code> will post SIGKILL to the shell script (!) wait 10s then simply kill the container process without the JVM ever knowing what was going on. More on that in the <a href="//mbien.dev/blog/entry/stopping-containers-correctly">stopping containers</a> blog entry.
</p>
<p>
Shutdown actions like dumping the JFR file won't be executed and IO writes might not have completed. So unless you trap the signal in the script and send it to the JVM somehow, there are more direct ways to stop a java container:
</p>
<pre><code class="language-bash">
# script to cleanly shut down a container
$ podman exec -it $(container) sh -c\
"kill \$(/jdk/bin/jps | grep -v Jps | cut -f1 -d' ')"
#... or if killall is installed (it usually isn't)
$ podman exec -it $(container) killall java
</code></pre>
<p>
Once the JVM has cleanly shut down the launch script will finish which will be also noticed by the container once PID 1 is gone and it will cleanly shutdown too.
</p>
- - -
<p>
<a href="https://www.youtube.com/watch?v=T5VicNzqsQA">fly safe podman</a>
</p>https://mbien.dev/blog/entry/taking-a-look-at-virtualTaking a look at Virtual Threads (Project Loom)mbien2020-05-22T11:23:48+00:002020-08-04T14:51:57+00:00<p>
<a href="https://wiki.openjdk.java.net/display/loom/Main">Project Loom</a> introduces lightweight, JVM managed, virtual threads (old name: fibers) to java. Lets take a look how the project is progressing and see how they compare to plain old OS managed threads.
</p>
<p>
Loom is currently a separate project based on JDK 15, but since there is no JEP available yet and the deadline is approaching, it is unlikely that it will land in the main repository as preview feature before JDK 16. Early access binaries are available from <a href="https://jdk.java.net/loom/">jdk.java.net/loom</a>. I used <b>Build 15-loom+7-141 (2020/5/11)</b> for the experiments in this blog entry - <b>be warned the API is not final</b>.
</p>
<h3>virtual threads</h3>
<p>
Virtual threads are lightweight threads scheduled by the java runtime to run on plain old java threads ("real" threads). The threads used to run virtual threads are called carrier threads. While plain old java threads (POTs) can be fairly heavyweight due to the fact that they represent OS threads, millions of virtual threads can be spawned without causing problems.
</p>
<p>
The main feature however is that a virtual thread doesn't block its current carrier thread on blocking operations, like IO (<a href="https://openjdk.java.net/jeps/353">Sockets</a>, NIO channels...) or blocking <code>java.util.concurrent</code> API calls (Semaphores, Queues, Locks, <code>Future.get()</code>...) and even <code>Thread.sleep()</code>. Instead of blocking, the carrier will mount and resume a different virtual thread while the blocked virtual thread is waiting for a resource or an event to occur. Once the virtual thread is no longer blocked, it simply resumes execution on the next available carrier thread.
</p>
<p>
This should allow more efficient use of the CPU and additionally reduce the total number of POTs, since a thread running on a core which would normally be idle while waiting for a resource, can now work on something else, by replacing a blocked virtual thread with a another which isn't blocked.
</p>
<p>
some properties:
</p>
<ul>
<li>virtual threads are java entities, independent of OS threads</li>
<li><code>java.lang.Thread</code> is used for both kinds of threads, virtual and OS</li>
<li>all virtual threads are daemons</li>
<li>spawning and blocking virtual threads is cheap</li>
<li>virtual threads require carrier threads to run on
<ul>
<li>a carrier thread runs a virtual thread by mounting it</li>
<li>if the VT blocks, the stack is stored and the VT is unmounted to be resumed later</li>
</ul>
</li>
<li><code>j.u.c.Executor</code> like a <code>ForkJoinPool</code> or <code>ThreadPoolExecutor</code> is used to schedule VTs to carriers
<ul>
<li>custom schedulers can be provided by implementing the <code>Executor</code> interface</li>
</ul>
</li>
<li>millions of virtual threads can run on few carrier threads</li>
<li><code>Continuation</code>s (basically a VT without a scheduler) won't be in the initial release but might appear later</li>
<li>serialization is planned but currently low priority</li>
</ul>
<p>
edge cases:
</p>
<ul>
<li><code>ThreadLocal</code>s must be used with care but will still work
<ul>
<li><code>Thread.Builder#disallowThreadLocals()</code> can be used to prohibit it entirely</li>
<li>better solutions like Scopes, Carrier- or ProcessorLocals might be implemented in future</li>
</ul>
</li>
<li>some situations will cause pinning which will block the carrier if the virtual thread blocks while pinned
<ul>
<li>native stack frames will cause pinning</li>
<li>blocking a VT while holding a monitor (i.e. <code>synchronized</code> block) will currently block the carrier
<ul>
<li>this might be only a temporary limitation</li>
<li>doesn't apply to alternatives like <code>j.u.c.ReentrantLock</code> which can be used instead</li>
</ul>
</li>
<li>-Djdk.tracePinnedThreads=short or -Djdk.tracePinnedThreads=full will log pinned threads</li>
</ul>
</li>
</ul>
<p>
for more info: [State of Loom <a href="https://cr.openjdk.java.net/~rpressler/loom/loom/sol1_part1.html">1</a>, <a href="https://cr.openjdk.java.net/~rpressler/loom/loom/sol1_part2.html">2</a>]
[<a href="https://cr.openjdk.java.net/~rpressler/loom/Loom-Proposal.html">Loom Proposal</a>] [<a href="https://mail.openjdk.java.net/pipermail/loom-dev/">loom-dev</a> mailing list] [<a href="https://download.java.net/java/early_access/loom/docs/api/">ea javadoc</a>]
</p>
<h3>a quick test</h3>
<p>
Since Loom is implemented as a preview feature, the flag <b>--enable-preview</b> has to be passed to both javac and also to the JVM at launch. This will load the preview module and tell the JVM that it is ok to run bytecode which has been compiled with preview features. This should reduce the risk of it accidentally landing on production machines via a maven repository :).
</p>
<pre><code class="language-java">
public static void main(String[] args) {
Thread.startVirtualThread(() -> {
System.out.println("Hello Loom from "+Thread.currentThread()+"!");
});
}
</code></pre>
output:
<pre>
Hello Loom from VirtualThread[<unnamed>,ForkJoinPool-1-worker-3,CarrierThreads]!
</pre>
<p>
The code above attaches a <code>Runnable</code> via <code>Thread.startVirtualThread(Runnable task)</code> to a new virtual thread and schedules it for execution on the global carrier thread pool. The output shows that the carrier thread pool in use is in fact a <a href="https://docs.oracle.com/en/java/javase/14/docs/api/java.base/java/util/concurrent/ForkJoinPool.html"><code>j.u.c.ForkJoinPool</code></a> which has a work-stealing scheduler. The size of the global carrier pool can be set with the flag <b>-Djdk.defaultScheduler.parallelism=N</b>, the default is set to the available core count (or hardware thread count or whatever the container is configured to return).
</p>
<h3>a better test</h3>
<p>
The following example can run tasks on either a classic fixed size thread pool (with POTs) or on a unbounded virtual thread pool. The virtual thread pool attaches each task to a new virtual thread and uses the fixed size thread pool as carrier pool.
The tasks consist of a simulated IO part and a computational part, the carrier thread count and the number of tasks can be adjusted.
</p>
<p>This is no benchmark or load test, but rather an attempt to demonstrate the differences between the two thread types.</p>
<pre><code class="language-java">
public static void main(String[] args) throws InterruptedException {
final boolean USE_VIRTUAL_THREADS = true;
final int CARRIER_THREAD_COUNT = 1;
final int TASK_COUNT = 2;
// plain old thread factory and thread pool using the new builder
ThreadFactory carrierTF = Thread.builder().name("carrier#", 0)
.daemon(true).factory();
ExecutorService carrierPool = Executors.newFixedThreadPool(
CARRIER_THREAD_COUNT, carrierTF);
ExecutorService executor;
if(USE_VIRTUAL_THREADS) {
// factory for virtual threads scheduled on the carrier pool
ThreadFactory virtualTF = Thread.builder()
.virtual(carrierPool)
.name("virtual#", 0).factory();
// thread executor will spawn a new virtual thread for each task
executor = Executors.newThreadExecutor(virtualTF);
}else{
executor = carrierPool;
}
for (int i = 0; i < TASK_COUNT; i++)
executor.submit(new WaitAndHurry());
executor.shutdown();
executor.awaitTermination(20, TimeUnit.SECONDS); // virtual threads are daemons
}
</code></pre>
<p>
The task itself is less interesting:
</p>
<pre class="compact_scroll"><code class="language-java">
private final static class WaitAndHurry implements Runnable {
private final static long START_TIME = System.nanoTime();
@Override
public void run() {
doIO(); // block for 2s
doWork(); // compute something for ~2s
print("done");
}
private void doIO() {
print("io");
try {
Thread.sleep(2000);
} catch (InterruptedException ex) {
throw new RuntimeException(ex);
}
}
private void doWork() {
print("work");
long number = 479001599;
boolean prime = true;
for(long i = 2; i <= number/2; ++i) {
if(number % i == 0) {
prime = false;
break;
}
}
if (!prime) {throw new RuntimeException("wrong result");} // to prevent the JIT to optimize everything away
}
private void print(String msg) {
double elapsed = (System.nanoTime()-START_TIME)/1_000_000_000.0d;
String timestamp = String.format("%.2fs", elapsed);
System.out.println(timestamp + " " + Thread.currentThread() + " " + msg);
}
}
</code></pre>
<br/>
<h3>
output for 1 carrier thread and 2 tasks attached to virtual threads:
</h3>
<pre>
0.00s VirtualThread[virtual#0,carrier#0,main] io
0.01s VirtualThread[virtual#1,carrier#0,main] io
2.03s VirtualThread[virtual#0,carrier#0,main] work
3.88s VirtualThread[virtual#0,carrier#0,main] done
3.88s VirtualThread[virtual#1,carrier#0,main] work
5.67s VirtualThread[virtual#1,carrier#0,main] done
</pre>
Knowing that the IO part of the task takes 2s and the computational part about 1.8s (on my system without warmup) we can put it into a chart by looking at the timestamps:
<pre>
VT0: |WAIT||WORK|
VT1: |WAIT| |WORK|
</pre>
If we view the carrier thread as a resource we can draw a less abstract version which is closer to reality:
<pre>
CT0: |IDLE||WORK||WORK|
VT0: |WAIT| .
VT1: |WAIT| .
</pre>
<p>
This shows that virtual threads already have ability to wait in parallel, even when run on just a single carrier thread. The carrier thread is also the only entity which is able to do work since it can only mount one virtual thread at a time.
</p>
<p>
Rule of thumb: virtual threads are concurrent waiters while real threads are concurrent workers.
</p>
Classic thread pool for reference using a single thread:
<pre>
0.00s Thread[carrier#0,5,main] io
2.02s Thread[carrier#0,5,main] work
3.84s Thread[carrier#0,5,main] done
3.84s Thread[carrier#0,5,main] io
5.84s Thread[carrier#0,5,main] work
7.67s Thread[carrier#0,5,main] done
CT0: |WAIT||WORK||WAIT||WORK|
</pre>
<p>
Sequential as expected.
</p>
<br/>
<h3>
lets bump it to 2 carrier threads and 4 tasks:
</h3>
<pre>
0.02s VirtualThread[virtual#0,carrier#0,main] io
0.03s VirtualThread[virtual#2,carrier#0,main] io
0.03s VirtualThread[virtual#3,carrier#0,main] io
0.02s VirtualThread[virtual#1,carrier#1,main] io
2.03s VirtualThread[virtual#0,carrier#0,main] work
2.04s VirtualThread[virtual#2,carrier#1,main] work
3.85s VirtualThread[virtual#2,carrier#1,main] done
3.85s VirtualThread[virtual#3,carrier#1,main] work
3.86s VirtualThread[virtual#0,carrier#0,main] done
3.86s VirtualThread[virtual#1,carrier#0,main] work
5.63s VirtualThread[virtual#3,carrier#1,main] done
5.69s VirtualThread[virtual#1,carrier#0,main] done
VT0: |WAIT||WORK|
VT1: |WAIT||WORK|
VT2: |WAIT| |WORK|
VT3: |WAIT| |WORK|
CT0: |IDLE||WORK||WORK|
CT1: |IDLE||WORK||WORK|
VT0: |WAIT| .
VT1: |WAIT| .
VT2: |WAIT| .
VT3: |WAIT| .
</pre>
<p>
Now we gained the ability to work in parallel using two threads while using all virtual threads to wait in parallel - best of both worlds.
</p>
<p>
Classic thread pool for reference using 2 threads:
</p>
<pre>
0.00s Thread[carrier#1,5,main] io
0.00s Thread[carrier#0,5,main] io
2.03s Thread[carrier#1,5,main] work
2.03s Thread[carrier#0,5,main] work
3.87s Thread[carrier#0,5,main] done
3.87s Thread[carrier#0,5,main] io
3.88s Thread[carrier#1,5,main] done
3.88s Thread[carrier#1,5,main] io
5.87s Thread[carrier#0,5,main] work
5.88s Thread[carrier#1,5,main] work
7.67s Thread[carrier#0,5,main] done
7.70s Thread[carrier#1,5,main] done
CT0: |WAIT||WORK||WAIT||WORK|
CT1: |WAIT||WORK||WAIT||WORK|
</pre>
<p>
No surprises.
</p>
<br/>
<h3>real threads in a virtual world</h3>
<p>
Virtual threads implicitly convert blocking APIs into a async/await pattern - and you won't even have to be aware of it as user of an API (most of the time at least). Entire callback based frameworks (buzzword: reactive) could be made obsolete, since their main purpose has always been to avoid that programmers have to deal with any kind of concurrency problems, often even accepting that nothing can run in parallel in them (only parallel waiting is happening behind the scenes, <a href="https://wiki.python.org/moin/GlobalInterpreterLock">basically like in python</a>, or virtual threads on a single carrier in our example). Even Node.js received basic worker_threads in v12 using a language which is single threaded by design (data is copied to the worker when it starts and copied back again in a callback when the job is done).
</p>
<p>
Java on the other hand was multi threaded since the beginning (25 years ago; time flies) and is only now getting virtual threads (if you don't count <a href="https://en.wikipedia.org/wiki/Green_threads">green threads</a> of Java 1.1). Since virtual threads are using the same <code>java.lang.Thread</code> class as the OS threads do, they are pretty much interchangeable with each other and can keep using established APIs. Asynchronous IO APIs are hopefully going to be used less often in future, because code which does async IO now, can be made less error prone and easier to read by using simple blocking IO APIs from within virtual threads.
</p>
<p>
Plain old java threads will most likely still have a purpose (beside being a carrier) in future however: Not every long running background task which is periodically reading a few bytes from a file will benefit from being virtual and limitations like pinning due to native stack frames in virtual threads which also block the carrier, will probably always require some additional POTs for special cases.
</p>
<br/>
<h3>summary</h3>
<p>
Project Loom made significant progress recently and is already in a fairly usable state. I am looking forward to it being integrated into the main repository (hopefully JDK 16 or 17).
Virtual Threads have the potential to be a big game changer for Java: better concurrency while using fewer OS threads without significant code changes - what more to ask.
</p>
<p>
Debugging a few million virtual threads is going to be interesting, thread dumps of the future will require a bit more tooling, e.g. hierarchical views etc or at least a good scroll wheel on the mouse :)
</p>https://mbien.dev/blog/entry/tracking-apache-roller-page-viewsTracking Apache Roller page views using JFR custom eventsmbien2020-03-09T01:02:48+00:002020-05-28T11:09:55+00:00<p>
The Java Flight Recorder is great for monitoring the JVM but after checking out the <a href ="//mbien.dev/blog/entry/jfr-event-streaming-with-java">new JFR Event Streaming in JDK 14</a>, I was curious how well it would work in practice for basic logging needs. I always wanted to generate some page view statistics for this web blog but never liked the idea of tracking users in their browsers (or at least those few who turned off tracking protection for some reason or used a outdated browser).
</p>
<p>
So lets take a look if we can record custom page view events using JFR instead of logging or storing stats in a database.
</p>
<p>
The goal is to record the link the anonymous reader clicked to get to the page (or null) and the actual page on the blog which is visited.
</p>
<h3>Custom Event</h3>
<pre><code class="language-java">
@Label("Incoming Referrer Event")
@Description("Reader landed on the Blog.")
@StackTrace(false)
public class IncomingReferrerEvent extends Event {
@Label("Referrer URL")
private final String referrerUrl;
@Label("Request URL")
private final String requestUrl;
public IncomingReferrerEvent(String referrerUrl, String requestUrl) {
this.referrerUrl = referrerUrl;
this.requestUrl = requestUrl;
}
}
</code></pre>
<p>
Defining a custom JFR event is trivial, simply extend <b>jdk.jfr.Event</b> and annotate the fields, the code should be self explanatory.<b>@StackTrace(false)</b> disables the recording of stack traces since we don't need it for this event. Custom events are enabled by default. The event name is the fully qualified class name by default, but can be overwritten with <b>@Name("dev.foo.NewName")</b>.
</p>
<pre><code class="language-java">
new IncomingReferrerEvent(referrerUrl, requestUrl).commit();
</code></pre>
<p>
Records the event (placed somewhere in doGet() of Apache Roller's HttpServlet). Events can have a beginning and an end, by simply calling <b>commit()</b> without <b>begin()</b> or <b>end()</b>, only the time stamp at commit and the event values are stored. Due to the design of the JFR you don't really have to worry about concurrency because the events are held in thread-local buffers before they are flushed to disk which should make it fairly performant even in highly parallel settings.
</p>
<h3>Configuration</h3>
<p>
The next step is to setup a JFR configuration file.
</p>
<pre><code class="language-xml">
<?xml version="1.0" encoding="UTF-8"?>
<configuration version="2.0" label="Blog" description="Roller JFR Events" provider="Duke Himself">
<event name="org.apache.roller.weblogger.jfr.IncomingReferrerEvent">
<setting name="enabled">true</setting>
</event>
</configuration>
</code></pre>
<p>
I could have used the default configuration provided by the jdk but this would lead to much larger file sizes after some uptime. The configuration above only enables our custom event, everything else is turned off. For a larger blueprint see <b>JDK_HOME/lib/jfr/default.jfc</b>.
</p>
<p>
Once the server is running we can switch the recording on using jcmd. ($1 is the PID)
</p>
<pre><code class="language-bash">
jcmd $1 JFR.configure repositorypath="/home/server/jfr" samplethreads=false
jcmd $1 JFR.start name="blogstats" settings="/home/server/jfr/blog.jfc"
</code></pre>
<h3>First Test - 8 MB, not great, not terrible</h3>
<p>
After a quick test we can plot the summary of the recording file in the repository to check if we see any events.
</p>
<pre><code class="language-bash">
jfr summary 2020_02_03_04_35_07_519/2020_02_07_16_46_39.jfr
Version: 2.1
Chunks: 1
Start: 2020-02-07 15:46:39 (UTC)
Event Type Count Size (bytes)
==============================================================================
jdk.CheckPoint 130997 12587784
org.apache.roller.weblogger.jfr.IncomingReferrerEvent 66 4735
jdk.Metadata 1 81383
jdk.ThreadEnd 0 0
jdk.TLSHandshake 0 0
jdk.ThreadSleep 0 0
jdk.ThreadPark 0 0
jdk.JavaMonitorEnter 0 0
...
</code></pre>
<p>
There it is, we can see <b>IncomingReferrerEvent</b> in the record. You might be wondering what those <b>jdk.CheckPoint</b> events are. I wasn't sure either since I noticed them the first time with JDK 14 and a websearch resulted in no hits, so I <a href="https://mail.openjdk.java.net/pipermail/hotspot-jfr-dev/2020-February/001153.html">asked on hotspot-jfr-dev</a>. It turns out that they are part of the file format - for more details read the <a href="https://mail.openjdk.java.net/pipermail/hotspot-jfr-dev/2020-February/001154.html">reply</a> or even better the rest of the mail thread too.
</p>
<p>
The check points will create around 8 MB of data per day (one check point per second), which can be seen as static footprint without recording any actual events. 8 MB, not great, not terrible.
</p>
<h3>More Events</h3>
<p>
Having now a good estimate for the footprint, it was time to enable more events. Since the correlation of IncomingReferrerEvents with GC events, network/CPU load or thread count might be interesting given that it all is running on a <a href="//mbien.dev/blog/entry/running-64bit-debian-buster-with">Raspberry Pi 3b+</a> with limited resources. I ended up enabling some more events after grep'ing the code base of VisualVM for event Strings (it looks like JFR support is coming soon, the custom build I am using looks fairly complete already) and then adding them to the blog.jfc config file.
</p>
<pre><code class="language-bash">
jfr summary blog_dump06.jfr
...
Event Type Count Size (bytes)
==============================================================================
jdk.CheckPoint 1223057 118202307
jdk.NetworkUtilization 264966 4731868
jdk.CPULoad 242765 6198487
jdk.ActiveSetting 3348 123156
org.apache.roller.weblogger.jfr.IncomingReferrerEvent 6367 89338
jdk.GCHeapSummary 830 35942
jdk.GarbageCollection 415 12577
jdk.Metadata 13 1038982
jdk.JVMInformation 12 27113
jdk.GCHeapConfiguration 12 353
jdk.ActiveRecording 12 727
</code></pre>
<p>
Even with the added events and another 14 day test run I ended up with a file size of about 130 MB. That is about 9.28 MB per day - mostly CheckPoints ;) - in fact ~118 MB of them. luckily the file compresses nicely.
</p>
<h3>The undocumented Flag</h3>
If you take a look at the JDK sources you notice a <a href="https://github.com/AdoptOpenJDK/openjdk-jdk/blob/f5e865a9c17365cd56b02e704dc96584e549a9c2/src/hotspot/share/jfr/dcmd/jfrDcmds.cpp#L352">flag</a> with the description: <i>"flush-interval, Minimum time before flushing buffers, measured in (s)econds..."</i> - sounds exactly what we need.
<pre><code class="language-bash">
jcmd $1 JFR.start name="blogstats" flush-interval=10s settings="/home/server/jfr/blog.jfc"
</code></pre>
<p>
Results look promising: as expected around 10x reduction in CheckPoint events. (I don't have more data to show since I literally just added the flag and rebooted the server as I am typing this)
</p>
<p>
update: size of the record dump file after ~14.3 days (343h) is 25.13 MB which is about 1.8 MB per day - much better.
</p>
<p>
I can't recommend relying on undocumented or hidden flags since they can change or go away without notice (<a href="https://mail.openjdk.java.net/pipermail/hotspot-jfr-dev/2020-February/001158.html">@see hotspot-jfr-dev discussion</a>). The realtime nature of event streaming is also gone with a buffer delay of 10s, so keep that in mind.
</p>
<h3>The File Format</h3>
<p>
The file format for the JFR repository and record dumps isn't standardized and can change. So another thing to keep in mind if JFR is used for logging is that it might not be possible to read old JDK 14 dumps with JDK 20 in future (although reading JDK 11 dumps with 14 works just fine). But this doesn't sound like a big issue to me since JDKs don't just disappear, it will always be possible to open JDK 14 dumps with JDK 14 (and maybe even with JDK 20 from the future if backwards compatibility can be maintained).
</p>
<h3>Conclusion</h3>
<p>
Although the static footprint of the JFR in JDK 14 came as a surprise I am not too concerned of it. Storage is cheap and I am sure the issue will be solved or reduced to a minimum in one way or another.
<p>
The benefit of tracking my blog stats with JFR is that I can inspect them with the same tools I would use for JVM performance monitoring (VisualVM, JDK Mission Control, etc). <a href="//mbien.dev/blog/entry/jfr-event-streaming-with-java">JFR Event Streaming</a> and the OpenJDK command line tools make it also easy to access the data in a structured manner.
</p>
<p>
JFR can not only be used for JVM monitoring but certainly also for structured logging in some scenarios. Log files will probably never disappear completely and I am sure JFR (or the original JRockit Flight Recorder implementation) was never intended to fully replace a logging framework but maybe one day logs will mostly contain lines produced by Event#toString() running dual stacks where it is still required :).
</p>
<p>
fly safe
</p>
https://mbien.dev/blog/entry/jfr-event-streaming-with-javaJFR Event Streaming with Java 14mbien2019-11-18T05:30:14+00:002021-04-16T13:48:42+00:00<p>
The Java Flight Recorder has a long history. It used to be part of the BEA JRockit JVM, became a commercial feature of the Oracle JDK (7+) after Oracle acquired BEA and Sun and was finally fully open sourced with the release of OpenJDK 11 (backports exist for 8) (<a href="https://openjdk.java.net/jeps/328">JEP328</a>). OpenJDK 14 will add some improvements to the JFR.
</p>
<p>
<a href="https://openjdk.java.net/jeps/349">JEP 349</a> will allow the continuous consumption of java flight recorder events in-memory from within the same JVM, or out-of-process from a different JVM via its JFR repository file.
</p>
<p>
JEP 349 made it already into the early access builds and can be experimented with since <a href="https://jdk.java.net/14/">OpenJDK 14 build 22+</a>. Lets check it out.
</p>
<h3>In-Process Streaming</h3>
<p>
The base JFR configuration files (XML) can be found in <b>JDK_HOME/lib/jfr</b>. The default configuration (default.jfc) is relatively low overhead, while profile.jfc will provide more data. <a href="https://adoptopenjdk.net/jmc.html">Java Mission Control</a> can create custom settings based on templates if needed. I am using the default config for the examples.
</p>
<p>
The first example will start JFR on the local JVM using the default recorder settings and register a few event handlers to check if it is working.
</p>
<pre><code class="language-java">
import java.io.IOException;
import java.text.ParseException;
import jdk.jfr.Configuration;
import jdk.jfr.consumer.EventStream;
import jdk.jfr.consumer.RecordingStream;
public class JFRStreamTest {
public static void main(String[] args) throws IOException, ParseException {
Configuration config = Configuration.getConfiguration("default");
System.out.println(config.getDescription());
System.out.println("settings:");
config.getSettings().forEach((key, value) -> System.out.println(key+": "+value));
// open a stream and start local recording
try (EventStream es = new RecordingStream(config)) {
// register event handlers
es.onEvent("jdk.GarbageCollection", System.out::println);
es.onEvent("jdk.CPULoad", System.out::println);
es.onEvent("jdk.JVMInformation", System.out::println);
// start and block
es.start();
}
}
}
</code></pre>
<p>
The above example should print information about the running JVM once, CPU load periodically and GC events when they occur.
</p>
<h3>Out-of-Process Streaming</h3>
<p>
Simply start the flight recorder as usual via <b>jcmd <PID> JFR.start</b> or via the JVM flag <b>-XX:+FlightRecorder</b> at startup. The repository location will be stored in the <b>jdk.jfr.repository</b> system property as soon JFR is running (new in Java 14). It can also be set at startup via a comma separated list of flight recorder options: <b>-XX:FlightRecorderOptions:repository=./blackbox</b>
</p>
<p>
Update thanks to Erik from Oracle in the comments section: The repository location can be also set using <b>jcmd <PID> JFR.configure repositorypath=<directory></b>. If you set it after a recording has started, new data will be written in the new location.
</p>
<pre><code class="language-bash">
$ jcmd -l | grep netbeans
8492 org.netbeans.Main ... --branding nb
$ jcmd 8492 JFR.start name=streamtest
Started recording 1. No limit specified, using maxsize=250MB as default.
Use jcmd 8492 JFR.dump name=streamtest filename=FILEPATH to copy recording data to file.
$ jinfo -sysprops 8492 | grep jfr
jdk.jfr.repository=/tmp/2019_11_18_02_19_59_8492
</code></pre>
<p>
Now that the recording is running and we know where the repository is, a second JVM can open a stream to the live JFR repository and monitor the application. Note that we did not dump any JFR records to a file, we connect to the live repository directly.
</p>
<pre><code class="language-java">
import java.io.IOException;
import java.nio.file.Path;
import jdk.jfr.consumer.EventStream;
public class JFRStreamTest {
public static void main(String[] args) throws IOException {
// connect to JFR repository
try (EventStream es = EventStream.openRepository(Path.of("/tmp/2019_11_18_02_19_59_8492"))) {
// register some event handlers
//es.onEvent("jdk.CPULoad", System.out::println);
es.onEvent("jdk.SocketRead", System.out::println);
es.onEvent("jdk.SocketWrite", System.out::println);
// start and block
es.start();
}
}
}
</code></pre>
<p>
As a quick test i monitored with the example above a NetBeans instance running on Java 14 and let the IDE check for updates. Since we watch for SocketRead and Write events the output looked like:
</p>
<pre>
jdk.SocketRead {
startTime = 04:34:09.571
duration = 117.764 ms
host = "netbeans.apache.org"
address = "40.79.78.1"
port = 443
timeout = 30.000 s
bytesRead = 5 bytes
endOfStream = false
eventThread = "pool-5-thread-1" (javaThreadId = 163)
stackTrace = [
java.net.Socket$SocketInputStream.read(byte[], int, int) line: 68
sun.security.ssl.SSLSocketInputRecord.read(InputStream, byte[], int, int) line: 457
sun.security.ssl.SSLSocketInputRecord.decode(ByteBuffer[], int, int) line: 165
sun.security.ssl.SSLTransport.decode(TransportContext, ByteBuffer[], int, int, ...
sun.security.ssl.SSLSocketImpl.decode(ByteBuffer) line: 1460
...
]
}
...
</pre>
<h3>Streaming Dumped Records</h3>
<p>
Opening streams (EventStream.openFile(path)) to JFR record dumps (<b>jcmd <PID> JFR.dump filename=foo.jfr</b>) is also possible of course - works as expected.
</p>
<h3>Conclusion</h3>
<p>
Pretty cool new feature! It is currently not possible to do in-memory but out-of-process steaming without syncing on repository files. But since a ramdisk can workaround this issue so easily I am not even sure if this capability would be worth it.
</p>
<p>
fly safe
</p>
https://mbien.dev/blog/entry/dynamic-application-class-data-sharing[Dynamic] [Application] Class Data Sharing in Javambien2019-04-29T23:04:10+00:002020-04-09T09:38:01+00:00<p>
Class Data Sharing as a JVM feature exists for quite some time already, but it became more popular in context of containers. CDS maps a pre-defined class archive into memory and makes it shareable between JVM processes. This can improve startup time but also reduce per-JVM footprint in some cases.
</p>
<p>
Although basic CDS was already supported in Java 5, this blog entry assumes that Java 11 or later is used, since JVM flags and overall capabilities changed over time. CDS might not be supported on all architectures or GCs: G1, CMS, ShenandoahGC, Serial GC, Parallel GC, and ParallelOld GC do all support CDS on 64bit linux, while ZGC for example doesn't have support for it yet.
</p>
<br/>
<h2>JDK Class Data Sharing</h2>
The most basic form of CDS is setup to share only JDK class files. This is enabled by default since JDK 12 (<a href="https://openjdk.java.net/jeps/341">JEP 341</a>).
If you check the java version using a shell you notice that the JVM version string will contain "sharing".
<pre><code class="language-bash">
[mbien@hulk server]$ java -version
openjdk version "12.0.1" 2019-04-16
OpenJDK Runtime Environment AdoptOpenJDK (build 12.0.1+12)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 12.0.1+12, mixed mode, sharing)
</code></pre>
The shared class archive can be found pre installed in <b> ${JAVA_HOME}/lib/server/</b> called <b>classes.jsa</b>.
By temporary renaming the file or switching to an older JDK, "sharing" won't appear anymore in the version string.
<pre><code class="language-bash">
[mbien@hulk server]$ java -version
openjdk version "12.0.1" 2019-04-16
OpenJDK Runtime Environment AdoptOpenJDK (build 12.0.1+12)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 12.0.1+12, mixed mode)
</code></pre>
<p>
The JDK CDS archive can be manually generated by invoking <b>java -Xshare:dump</b> with root privileges. Recent JVMs are configured to use <b>-Xshare:auto</b> at startup, which will automatically use CDS if available. Enforcing CDS with <b>-Xshare:on</b> will cause the JVM to fail if no archive is found.
</p>
<br/>
<h2>Application Class Data Sharing</h2>
<p>
The basic CDS archive only contains JDK class data. Adding application class data (<a href="https://openjdk.java.net/jeps/310">JEP 310</a>) is fairly easy but comes with some limitations.
</p>
<ul>
<li>the classpath used at archive creation time must be the same as (or a prefix of) the classpath used at run time</li>
<li>wildcards or exploded JARs are not allowed in the path (no cheating)</li>
</ul>
<h3>1) create a list of classes (JDK and APP; Java 11+) to include in the shared archive:</h3>
<pre><code class="language-bash">
$ java -Xshare:off -XX:DumpLoadedClassList=classes.list -jar foo.jar
</code></pre>
<h3>2) create the archive using that list:</h3>
<pre><code class="language-bash">
$ java -Xshare:dump -XX:SharedClassListFile=classes.list -XX:SharedArchiveFile=foo.jsa -cp foo.jar
</code></pre>
<h3>3) launch the applications using the shared archive:</h3>
<pre><code class="language-bash">
$ java -Xshare:on -XX:SharedArchiveFile=foo.jsa -jar foo.jar
</code></pre>
<p>
<b>-Xlog:class+load</b> helps to verify that it is working properly.
</p>
<pre>
[0.062s][info][class,load] java.lang.Object source: shared objects file
[0.062s][info][class,load] java.io.Serializable source: shared objects file
[0.062s][info][class,load] java.lang.Comparable source: shared objects file
[0.062s][info][class,load] java.lang.CharSequence source: shared objects file
...
</pre>
<p>
The applications (containers) don't have to be identical to make use of custom CDS archives. The archive is based on the class list of step 1 which can be freely modified or merged with other lists. The main limitation is the classpath prefix rule.
</p>
<h3>Some Results</h3>
<p>
As quick test I used OpenJDK 11 + Eclipse Jetty + Apache JSP Wiki and edited a single page (to ensure the classes are loaded), generated a custom archive and started a few more instances with CDS enabled using the archive.
</p>
<pre><code class="language-bash">
classes.list size = 4296 entries
wiki.jsa size = 57,6 MB
VIRT RES SHR %MEM TIME COMMAND
2771292 315128 22448 8,3 0:20.54 java -Xmx128m -Xms128m -Xshare:off -cp ...
2735468 259148 45184 6,9 0:18.38 java -Xmx128m -Xms128m -XX:SharedArchiveFile=wiki.jsa -cp ...
2735296 270044 45192 7,1 0:19.66 java -Xmx128m -Xms128m -XX:SharedArchiveFile=wiki.jsa -cp ...
2735468 259812 45060 6,9 0:18.26 java -Xmx128m -Xms128m -XX:SharedArchiveFile=wiki.jsa -cp ...
2735468 256800 45196 6,8 0:19.10 java -Xmx128m -Xms128m -XX:SharedArchiveFile=wiki.jsa -cp ...
</code></pre>
<p>
The first JVM is started with CDS turned off as reference. The four other instances have sharing enabled using the custom archive.
</p>
<br/>
<h2>Dynamic Class Data Archives</h2>
<p>
The proposal in <a href="https://openjdk.java.net/jeps/350">JEP 350</a> which is currently targeted for Java 13 will allow to combine a static base archive with a dynamically generated archive.
</p>
<p>
The dynamic class data archive is created in a setup phase at application exit (<b>-XX:ArchiveClassesAtExit=dynamic.jsa</b>) and essentially automates step 1 and 2. Just like before <b>-XX:SharedArchiveFile=dynamic.jsa</b> will tell the JVM to map the shared archive. Since the dynamic archive references the static JDK archive, both will be used automatically.
</p>
<p>
The main advantage (beside the added convenience) is that the dynamic archive will support both builtin class loaders and user-defined class loaders.
</p>
<p>now go forth and share class data :)</p>https://mbien.dev/blog/entry/cleaning-bash-history-using-aCleaning Bash History using a Java 11 Single-File Sourcecode Programmbien2018-02-21T15:13:42+00:002021-08-18T01:05:08+00:00<p>
Java 11 adds with <a href="https://openjdk.java.net/jeps/330">JEP330</a> the ability to launch a Java source file directly from the command line without requiring to explicitly compile it. I can see this feature being convenient for some simple scripting or quick tests.
Although Java isn't the most concise language it still has the benefit of being quite readable and having powerful utility APIs.
</p>
<p>
The following example reads the bash shell history from a file into a LinkedHashSet to retain ordering and keeps the most recently used line if duplicates exist, essentially cleaning up the history.
</p>
<pre><code class="language-java">
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.LinkedHashSet;
import java.util.HashSet;
import java.util.stream.Stream;
public class HistoryCleaner {
public static void main(String[] args) throws IOException {
Path path = Path.of(System.getProperty("user.home")+"/.bash_history");
HashSet<String> deduplicated = new LinkedHashSet<>(1024);
try (Stream<String> lines = Files.lines(path)) {
lines.forEach(line -> {
deduplicated.remove(line);
deduplicated.add(line);
});
}
Files.write(path, deduplicated);
}
}
</code></pre>
<pre><code class="language-bash">
duke@virtual1:~$ java HistoryCleaner.java
</code></pre>
<p>
will compile and run the "script".
</p>
<br/>
<h3>Shebangs are supported too</h3>
<pre><code class="language-java">
#!/usr/bin/java --source 11 -Xmx32m -XX:+UseSerialGC
import java.io....
</code></pre>
<p>
However, since this is not part of the java language specification, single-file java programs (SFJP) with a <b>#!</b> (<a href="https://en.wikipedia.org/wiki/Shebang_(Unix)">shebang</a>) in the first line are technically no regular java files and will not compile when used with javac. This also means that the common file naming conventions are not required for pure SFJPs (i.e. file name does not have to be the same as the class name).
</p>
<p>
Using a different file extension is advised. We can't call it <b>.js</b> - so lets call it <b>.sfjp</b> :).
</p>
<pre><code class="language-bash">
duke@virtual1:~$ ./HistoryCleaner.sfjp
</code></pre>
<p>
This will compile and run the HistoryCleaner SFJP containing the shebang declaration in the first line. Keeping the <b>.java</b> file extension would result in a compiler error.
As bonus we can also set JVM flags using the shebang mechanism. Keep in mind that <code>-source [version]</code> must be the first flag to tell the java launcher that it should expect a source file instead of bytecode.
</p>https://mbien.dev/blog/entry/netbeans_opencl_packNetBeans OpenCL Packmbien2011-08-30T02:16:02+00:002020-08-01T08:43:44+00:00<p>
Since I am doing a lot with OpenCL lately I decided to try to improve the tooling around OpenCL a bit. A weekend later the NetBeans OpenCL Pack was born :).
</p>
<h3>Features Including:</h3>
<ul>
<li>OpenCL Editor with syntax highlighting code completion and CL reference pages integration</li>
<li>OpenCL compiler integration</li>
<li>In-editor annotations of compiler warnings and errors updated as you type</li>
<li>JOCL project template</li>
</ul>
<h3>Technical Details:</h3>
<p>
The editor uses ANTLR as parser and lexer. This allows such simple things like keyword highlighting and also more complex features like semantic highlighting, formatting and auto completion (formatting is not yet implemented). It can also detect and report syntax errors, however this feature is automatically disabled if an OpenCL compiler is present on the host system. All with help of JOCL detected <a href="//mbien.dev/blog/entry/developing_with_jocl_on_amd">OpenCL implementations</a> can be used as compiler backend.
</p>
<p>
Instead of using the old OpenGL Pack as template I decided to write it from scratch using latest NetBeans 7 and Java 7 APIs. So you will have to start NB with JDK7 to be able to use it.
</p>
<h3>Download</h3>
<p>
you can download it from the NetBeans <a href="http://plugins.netbeans.org/plugin/39980/?show=true">plugin portal</a>
[<a href="https://github.com/mbien/netbeans-opencl-pack/downloads">mirror</a>], sourcecode is on <a href="https://github.com/mbien/netbeans-opencl-pack">github</a>
</p>
<p>
feedback and/or contributions/bugreports are as always appreciated
</p>
<h3>Screenshots:</h3>
<img src="https://lh5.googleusercontent.com/-DuJOgPf6ntA/TlwDOi5edBI/AAAAAAAAAIc/ptvPNatMwMU/cl-completion.png" alt="auto completion"/>
<img src="https://lh3.googleusercontent.com/SamaXSwwQCQHCzXjz7iZsqkvIGriiGpDfqOF86E17kfq73tOIDcADKEakVk0uK1WLdDe4Sc1xucEeft2G63uDVEmMikXxWZ6MY_FA96RvRu4OYIY2QKUG4jx4--t8hNSIpYcbvOvhxA_9x4kVCwIa1yR_CuRZq-qBknPPuBAa_JqG3jUG2vStXhrgBOpMJE9ITx6yciXcAclP4QUHBh4DsXeH-FTP5qT4G3UhsPrloxmPIOeCXam1YGmydpgOdpaP9iC-HDIBQHJwEdF7H6Bn838XhAB8z2q4hdTbQPG8ZLl91nWKlgwIYZtRadJ0gtJ7_K4m2dk1bFvQM4-sQRxryjz6acualS3EMms8Nxyc5mmeUQafu9FLuzWTvlbkaWMNyV04x3E9jjKCfpYJ5N3dJfZbnPNoXJlXnXBMQfCMh8tby5BOXfIHVgnCsCMDEkpGGvbZ5nvE51VZmZUvntxlA5WyiSchRnOqWamcsZ1hAQlkaFL_ctsTNOZStq8J4vsP6ywE8x5uSnQ71wIgpWvFYHC0IMF19QSixrt1wwZosd0j_toU20hdQ9T1OmAWznmRgQIt_dWw79mC1B4SxLXJ3nDZ9USbr0mOLxs7LO5MDuzE71EACGXY1N_wH9pwtcjBP2i1NkGM2q1jRynalZfoX8oFR7g10A=w815-h604-no" width="750" height="556" alt="editor"/>
<img src="https://lh3.googleusercontent.com/-JdGJHriqOYI/TlwDO9j7xBI/AAAAAAAAAIg/AJnl47x7X2k/jocl-project.png" alt="project templates"/>
<p>
have fun!
</p>
https://mbien.dev/blog/entry/many_little_improvements_made_itMany little improvements made it into JOCL recentlymbien2011-08-05T00:24:59+00:002020-04-09T09:50:29+00:00<p>
Ok some of them are big, but I will only cover the little things with this blog entry :).
</p>
<h3>CLKernel</h3>
<p>
I added multiple utility methods to CLKernel and related classes. It is for example now possible to create a kernel and set its arguments in one line.
</p>
<pre><code class="language-java">
CLKernel sha512 = program.createCLKernel("sha512", padBuffer, digestBuffer, rangeBuffer);
</code></pre>
<p>
Thanks to feedback in the jocl forums I also added methods to set vector typed arguments directly. In past you could do this only by setting them via a java.util.Buffer.
</p>
<pre><code class="language-java">
kernel.setArg(index, x, y, z, w);
</code></pre>
<p>
Another small feature of CLKernel is to enforce 32bit arguments. You may want to switch between single and double floatingpoint precision at runtime or mix between both to improve performance you will have to compile the program with the double FP extension enabled. By setting kernel.setForce32bitArgs(true) all java doubles used as kernel arguments will be automatically cast down to 32bit CL floats (see MultiDeviceFractal demo for a example). This is nothing special but might safe you several if(single){setArg((float)foo)}else{setArg(foo)} constructs.
</p>
<h3>CLWork</h3>
<p>
CLKernel still only represents the function in the OpenCL program you want to call - nothing more. The new CLWork object contains everything required for kernel execution, like the NDRange and the kernel itself.
</p>
<pre><code class="language-java">
int size = buffer.getNIOCapacity();
CLWork1D work = CLWork.create1D(program.createCLKernel("sum", buffer, size));
work.setWorkSize(size, 1).optimizeFor(device);
// execute
queue.putWriteBuffer(buffer, false)
.putWork(work)
.putReadBuffer(buffer, true);
</code></pre>
<p>
optimizeFor(device) adjusts the workgroup size to meet device specific recommended values. This should make sure that all computing units of your GPU are used by dividing the work into groups (however this only works if your task does not care about the workgroup size, see javadoc).
</p>
<h3>CLSubDevice</h3>
<p>
Sometimes you don't want to put your CLDevice under 100% load. This might be the case for example if your device is the CPU your application is running on or if you have to share the GPU with an OpenGL context for rendering. One easy way of controlling device load is to limit the amount of compute units used for a task.
</p>
<pre><code class="language-java">
CLPlatform platform = CLPlatform.getDefault(version(CL_1_1), type(CPU));
CLDevice devices = platform.getMaxFLOPSDevice(type(CPU));
CLSubDevice[] subs = device.createSubDevicesByCount(4, 4);
// array contains now two virtual devices containing four CPU cores each
CLContext context = CLContext.create(subs);
CLCommandQueue queue = subs[0].createCommandQueue();
...
</code></pre>
<p>
CLSubDevices extends CLDevice and can be used for context creation, queue creation and everywhere you would use the CLDevice. Prior to creating subdevices you should check if device.isFissionSupported() returns true.
</p>
<h3>CLProgram builder</h3>
<p>
Ok, this utility is not that new but I haven't blogged about it yet. If program.build() isn't enough you should take a look at the program builder.
CLBuildConfiguration stores everything which is needed for program compilation and is easily configurable via the builder pattern :).
</p>
<pre><code class="language-java">
// reusable builder
CLBuildConfiguration builder = CLProgramBuilder.createConfiguration()
.withOption(ENABLE_MAD)
.forDevices(context.getDevices())
.withDefine("RADIUS", 5)
.withDefine("ENABLE_FOOBAR");
builder.build(programA);
builder.build(programB);
...
</code></pre>
<p>
CLBuildConfiguration is fully reusable and can be upgraded to CLProgramConfiguration if you combine it with a CLProgram. Both can be serialised which allows to store the build configuration or the entire prebuild program on disc or send it over the network. (caching binaries on disc can safe startup time for example)
</p>
<pre><code class="language-java">
// program configuration
ois = new ObjectInputStream(new FileInputStream(file));
CLProgramConfiguration programConfig = CLProgramBuilder.loadConfiguration(ois, context);
assertNotNull(programConfig.getProgram());
ois.close();
program = programConfig.build(); // builds from source or loads binaries if possible
assertTrue(program.isExecutable());
</code></pre>
<p>
Note: loading binaries and associating them with the right driver/device is currently not trivial with OpenCL. Even if everything works as intended it is still possible that the driver refuses the binaries for some reason (driver update...etc). Thats why its recommended to add the program source to the configuration before calling build() to allow a automatic rebuild as fallback.
</p>
<pre><code class="language-java">
// another entry point for complex builds (prepare() returns CLProgramConfiguration)
program.prepare().withOption(ENABLE_MAD).forDevice(context.getMaxFlopsDevice()).build();
</code></pre>
<p>
(all snippets have been stolen from the junit tests)<br/>
I am sure I forgot something... but this should cover at least some of the incremental improvements. Expect a few more blog entries for the larger features soon.
</p>
<p>
- - - - - - <br/>
In other news: Nvidia released OpenCL 1.1 drivers, some of us thought this would never happen -> all major vendors (AMD, Intel, NV, IBM, ZiiLABS ..) support now OpenCL 1.1 (<a href="https://picasaweb.google.com/lh/photo/NraNQ0dsxwHqk_tzmozDJg">screenshot</a>)
</p>
<p>
have fun!
</p>
https://mbien.dev/blog/entry/developing_with_jocl_on_amdDeveloping with JOCL on AMD, Intel and Nvidia OpenCL platformsmbien2011-05-17T20:30:55+00:002011-05-17T21:10:29+00:00<p>
One nice feature of OpenCL is that the platform abstraction was handled in the spec from the first day on. You can install all OpenCL drivers side by side and let the application choose at runtime, on which device and on which platform it should execute the kernels.
</p>
<p>
As of today there are <del>three</del> four vendors which provide OpenCL implementations for the desktop. AMD and Intel support the OpenCL 1.1 specification where Nvidia apparently tries to stick with 1.0 to encourage their customers to stick with CUDA ;-). [edit] And of course there is also Apple providing out-of-the box OpenCL 1.0 support in MacOS 10.6.
</p>
<p>
<a href="http://jogamp.org/jocl/www/">JOCL</a> contains a small <a href="http://jogamp.org/deployment/webstart-next/jocl-demos/clinfo.jnlp">CLInfo</a> utility which can be used to quickly verify OpenCL installations. Here is the output on my system (ubuntu was booted) having all three SDKs installed:
</p>
<div style="overflow:auto; height:400px;">
<table border="1"><tr><td>CL_PLATFORM_NAME</td><th colspan="1">ATI Stream</th><th colspan="2">NVIDIA CUDA</th><th colspan="1">Intel(R) OpenCL</th></tr><tr><td>CL_PLATFORM_VERSION</td><td colspan="1">OpenCL 1.1 ATI-Stream-v2.2 (302)</td><td colspan="2">OpenCL 1.0 CUDA 4.0.1</td><td colspan="1">OpenCL 1.1 LINUX</td></tr><tr><td>CL_PLATFORM_PROFILE</td><td colspan="1">FULL_PROFILE</td><td colspan="2">FULL_PROFILE</td><td colspan="1">FULL_PROFILE</td></tr><tr><td>CL_PLATFORM_VENDOR</td><td colspan="1">Advanced Micro Devices, Inc.</td><td colspan="2">NVIDIA Corporation</td><td colspan="1">Intel(R) Corporation</td></tr><tr><td>CL_PLATFORM_ICD_SUFFIX_KHR</td><td colspan="1">AMD</td><td colspan="2">NV</td><td colspan="1">Intel</td></tr><tr><td>CL_PLATFORM_EXTENSIONS</td><td colspan="1">[cl_khr_icd, cl_amd_event_callback]</td><td colspan="2">[cl_khr_icd, cl_khr_byte_addressable_store, cl_nv_compiler_options, cl_nv_pragma_unroll, cl_nv_device_attribute_query, cl_khr_gl_sharing]</td><td colspan="1">[cl_khr_icd, cl_khr_byte_addressable_store, cl_khr_fp64, cl_khr_local_int32_extended_atomics, cl_khr_local_int32_base_atomics, cl_khr_global_int32_base_atomics, cl_khr_gl_sharing, cl_intel_printf, cl_khr_global_int32_extended_atomics, cl_ext_device_fission]</td></tr><tr><td>CL_DEVICE_NAME</td><th colspan="1">Intel(R) Core(TM) i7 CPU 940 @ 2.93GHz</th><th colspan="1">GeForce GTX 295</th><th colspan="1">GeForce GTX 295</th><th colspan="1">Intel(R) Core(TM) i7 CPU 940 @ 2.93GHz</th></tr><tr><td>CL_DEVICE_TYPE</td><td colspan="1">CPU</td><td colspan="1">GPU</td><td colspan="1">GPU</td><td colspan="1">CPU</td></tr><tr><td>CL_DEVICE_AVAILABLE</td><td colspan="1">true</td><td colspan="1">true</td><td colspan="1">true</td><td colspan="1">true</td></tr><tr><td>CL_DEVICE_VERSION</td><td colspan="1">OpenCL 1.1 ATI-Stream-v2.2 (302)</td><td colspan="1">OpenCL 1.0 CUDA</td><td colspan="1">OpenCL 1.0 CUDA</td><td colspan="1">OpenCL 1.1 </td></tr><tr><td>CL_DEVICE_PROFILE</td><td colspan="1">FULL_PROFILE</td><td colspan="1">FULL_PROFILE</td><td colspan="1">FULL_PROFILE</td><td colspan="1">FULL_PROFILE</td></tr><tr><td>CL_DEVICE_ENDIAN_LITTLE</td><td colspan="1">true</td><td colspan="1">true</td><td colspan="1">true</td><td colspan="1">true</td></tr><tr><td>CL_DEVICE_VENDOR</td><td colspan="1">GenuineIntel</td><td colspan="1">NVIDIA Corporation</td><td colspan="1">NVIDIA Corporation</td><td colspan="1">Intel(R) Corporation</td></tr><tr><td>CL_DEVICE_EXTENSIONS</td><td colspan="1">[cl_amd_device_attribute_query, cl_khr_byte_addressable_store, cl_khr_int64_extended_atomics, cl_khr_local_int32_extended_atomics, cl_amd_fp64, cl_amd_printf, cl_khr_local_int32_base_atomics, cl_khr_int64_base_atomics, cl_khr_global_int32_base_atomics, cl_khr_gl_sharing, cl_khr_global_int32_extended_atomics, cl_ext_device_fission]</td><td colspan="1">[cl_khr_icd, cl_khr_byte_addressable_store, cl_khr_fp64, cl_khr_local_int32_extended_atomics, cl_khr_local_int32_base_atomics, cl_nv_compiler_options, cl_nv_pragma_unroll, cl_nv_device_attribute_query, cl_khr_global_int32_base_atomics, cl_khr_gl_sharing, cl_khr_global_int32_extended_atomics]</td><td colspan="1">[cl_khr_icd, cl_khr_byte_addressable_store, cl_khr_fp64, cl_khr_local_int32_extended_atomics, cl_khr_local_int32_base_atomics, cl_nv_compiler_options, cl_nv_pragma_unroll, cl_nv_device_attribute_query, cl_khr_global_int32_base_atomics, cl_khr_gl_sharing, cl_khr_global_int32_extended_atomics]</td><td colspan="1">[cl_khr_byte_addressable_store, cl_khr_fp64, cl_khr_local_int32_extended_atomics, cl_khr_local_int32_base_atomics, cl_khr_global_int32_base_atomics, cl_khr_gl_sharing, cl_intel_printf, cl_khr_global_int32_extended_atomics, cl_ext_device_fission]</td></tr><tr><td>CL_DEVICE_MAX_COMPUTE_UNITS</td><td colspan="1">8</td><td colspan="1">30</td><td colspan="1">30</td><td colspan="1">8</td></tr><tr><td>CL_DEVICE_MAX_CLOCK_FREQUENCY</td><td colspan="1">2934</td><td colspan="1">1242</td><td colspan="1">1242</td><td colspan="1">2930</td></tr><tr><td>CL_DEVICE_VENDOR_ID</td><td colspan="1">4098</td><td colspan="1">4318</td><td colspan="1">4318</td><td colspan="1">32902</td></tr><tr><td>CL_DEVICE_OPENCL_C_VERSION</td><td colspan="1">OpenCL C 1.1 </td><td colspan="1">com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info string [error: CL_INVALID_VALUE]</td><td colspan="1">com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info string [error: CL_INVALID_VALUE]</td><td colspan="1">OpenCL C 1.1 </td></tr><tr><td>CL_DRIVER_VERSION</td><td colspan="1">2.0</td><td colspan="1">270.41.06</td><td colspan="1">270.41.06</td><td colspan="1">1.1</td></tr><tr><td>CL_DEVICE_ADDRESS_BITS</td><td colspan="1">64</td><td colspan="1">32</td><td colspan="1">32</td><td colspan="1">64</td></tr><tr><td>CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT</td><td colspan="1">8</td><td colspan="1">1</td><td colspan="1">1</td><td colspan="1">8</td></tr><tr><td>CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR</td><td colspan="1">16</td><td colspan="1">1</td><td colspan="1">1</td><td colspan="1">16</td></tr><tr><td>CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT</td><td colspan="1">4</td><td colspan="1">1</td><td colspan="1">1</td><td colspan="1">4</td></tr><tr><td>CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG</td><td colspan="1">2</td><td colspan="1">1</td><td colspan="1">1</td><td colspan="1">2</td></tr><tr><td>CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT</td><td colspan="1">4</td><td colspan="1">1</td><td colspan="1">1</td><td colspan="1">4</td></tr><tr><td>CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE</td><td colspan="1">0</td><td colspan="1">1</td><td colspan="1">1</td><td colspan="1">2</td></tr><tr><td>CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR</td><td colspan="1">16</td><td colspan="1">com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]</td><td colspan="1">com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]</td><td colspan="1">16</td></tr><tr><td>CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT</td><td colspan="1">8</td><td colspan="1">com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]</td><td colspan="1">com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]</td><td colspan="1">8</td></tr><tr><td>CL_DEVICE_NATIVE_VECTOR_WIDTH_INT</td><td colspan="1">4</td><td colspan="1">com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]</td><td colspan="1">com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]</td><td colspan="1">4</td></tr><tr><td>CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG</td><td colspan="1">2</td><td colspan="1">com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]</td><td colspan="1">com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]</td><td colspan="1">2</td></tr><tr><td>CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF</td><td colspan="1">0</td><td colspan="1">com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]</td><td colspan="1">com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]</td><td colspan="1">0</td></tr><tr><td>CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT</td><td colspan="1">4</td><td colspan="1">com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]</td><td colspan="1">com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]</td><td colspan="1">4</td></tr><tr><td>CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE</td><td colspan="1">0</td><td colspan="1">com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]</td><td colspan="1">com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]</td><td colspan="1">2</td></tr><tr><td>CL_DEVICE_MAX_WORK_GROUP_SIZE</td><td colspan="1">1024</td><td colspan="1">512</td><td colspan="1">512</td><td colspan="1">1024</td></tr><tr><td>CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS</td><td colspan="1">3</td><td colspan="1">3</td><td colspan="1">3</td><td colspan="1">3</td></tr><tr><td>CL_DEVICE_MAX_WORK_ITEM_SIZES</td><td colspan="1">[1024, 1024, 1024]</td><td colspan="1">[512, 512, 64]</td><td colspan="1">[512, 512, 64]</td><td colspan="1">[1024, 1024, 1024]</td></tr><tr><td>CL_DEVICE_MAX_PARAMETER_SIZE</td><td colspan="1">4096</td><td colspan="1">4352</td><td colspan="1">4352</td><td colspan="1">1024</td></tr><tr><td>CL_DEVICE_MAX_MEM_ALLOC_SIZE</td><td colspan="1">1073741824</td><td colspan="1">234831872</td><td colspan="1">234700800</td><td colspan="1">3154703360</td></tr><tr><td>CL_DEVICE_GLOBAL_MEM_SIZE</td><td colspan="1">3221225472</td><td colspan="1">939327488</td><td colspan="1">938803200</td><td colspan="1">12618813440</td></tr><tr><td>CL_DEVICE_LOCAL_MEM_SIZE</td><td colspan="1">32768</td><td colspan="1">16384</td><td colspan="1">16384</td><td colspan="1">32768</td></tr><tr><td>CL_DEVICE_HOST_UNIFIED_MEMORY</td><td colspan="1">true</td><td colspan="1">com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]</td><td colspan="1">com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]</td><td colspan="1">true</td></tr><tr><td>CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE</td><td colspan="1">65536</td><td colspan="1">65536</td><td colspan="1">65536</td><td colspan="1">131072</td></tr><tr><td>CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE</td><td colspan="1">64</td><td colspan="1">0</td><td colspan="1">0</td><td colspan="1">64</td></tr><tr><td>CL_DEVICE_GLOBAL_MEM_CACHE_SIZE</td><td colspan="1">32768</td><td colspan="1">0</td><td colspan="1">0</td><td colspan="1">262144</td></tr><tr><td>CL_DEVICE_MAX_CONSTANT_ARGS</td><td colspan="1">8</td><td colspan="1">9</td><td colspan="1">9</td><td colspan="1">128</td></tr><tr><td>CL_DEVICE_IMAGE_SUPPORT</td><td colspan="1">false</td><td colspan="1">true</td><td colspan="1">true</td><td colspan="1">true</td></tr><tr><td>CL_DEVICE_MAX_READ_IMAGE_ARGS</td><td colspan="1">0</td><td colspan="1">128</td><td colspan="1">128</td><td colspan="1">128</td></tr><tr><td>CL_DEVICE_MAX_WRITE_IMAGE_ARGS</td><td colspan="1">0</td><td colspan="1">8</td><td colspan="1">8</td><td colspan="1">128</td></tr><tr><td>CL_DEVICE_IMAGE2D_MAX_WIDTH</td><td colspan="1">0</td><td colspan="1">4096</td><td colspan="1">4096</td><td colspan="1">8192</td></tr><tr><td>CL_DEVICE_IMAGE2D_MAX_HEIGHT</td><td colspan="1">0</td><td colspan="1">32768</td><td colspan="1">32768</td><td colspan="1">8192</td></tr><tr><td>CL_DEVICE_IMAGE3D_MAX_WIDTH</td><td colspan="1">0</td><td colspan="1">2048</td><td colspan="1">2048</td><td colspan="1">2048</td></tr><tr><td>CL_DEVICE_IMAGE3D_MAX_HEIGHT</td><td colspan="1">0</td><td colspan="1">2048</td><td colspan="1">2048</td><td colspan="1">2048</td></tr><tr><td>CL_DEVICE_IMAGE3D_MAX_DEPTH</td><td colspan="1">0</td><td colspan="1">2048</td><td colspan="1">2048</td><td colspan="1">2048</td></tr><tr><td>CL_DEVICE_MAX_SAMPLERS</td><td colspan="1">0</td><td colspan="1">16</td><td colspan="1">16</td><td colspan="1">128</td></tr><tr><td>CL_DEVICE_PROFILING_TIMER_RESOLUTION</td><td colspan="1">1</td><td colspan="1">1000</td><td colspan="1">1000</td><td colspan="1">340831</td></tr><tr><td>CL_DEVICE_EXECUTION_CAPABILITIES</td><td colspan="1">[EXEC_KERNEL, EXEC_NATIVE_KERNEL]</td><td colspan="1">[EXEC_KERNEL]</td><td colspan="1">[EXEC_KERNEL]</td><td colspan="1">[EXEC_KERNEL, EXEC_NATIVE_KERNEL]</td></tr><tr><td>CL_DEVICE_HALF_FP_CONFIG</td><td colspan="1">[]</td><td colspan="1">[]</td><td colspan="1">[]</td><td colspan="1">[]</td></tr><tr><td>CL_DEVICE_SINGLE_FP_CONFIG</td><td colspan="1">[DENORM, INF_NAN, ROUND_TO_NEAREST, ROUND_TO_INF, ROUND_TO_ZERO]</td><td colspan="1">[INF_NAN, ROUND_TO_NEAREST, ROUND_TO_INF, ROUND_TO_ZERO, FMA]</td><td colspan="1">[INF_NAN, ROUND_TO_NEAREST, ROUND_TO_INF, ROUND_TO_ZERO, FMA]</td><td colspan="1">[DENORM, INF_NAN, ROUND_TO_NEAREST]</td></tr><tr><td>CL_DEVICE_DOUBLE_FP_CONFIG</td><td colspan="1">[]</td><td colspan="1">[DENORM, INF_NAN, ROUND_TO_NEAREST, ROUND_TO_INF, ROUND_TO_ZERO, FMA]</td><td colspan="1">[DENORM, INF_NAN, ROUND_TO_NEAREST, ROUND_TO_INF, ROUND_TO_ZERO, FMA]</td><td colspan="1">[DENORM, INF_NAN, ROUND_TO_NEAREST, ROUND_TO_INF, ROUND_TO_ZERO, FMA]</td></tr><tr><td>CL_DEVICE_LOCAL_MEM_TYPE</td><td colspan="1">GLOBAL</td><td colspan="1">LOCAL</td><td colspan="1">LOCAL</td><td colspan="1">GLOBAL</td></tr><tr><td>CL_DEVICE_GLOBAL_MEM_CACHE_TYPE</td><td colspan="1">READ_WRITE</td><td colspan="1">NONE</td><td colspan="1">NONE</td><td colspan="1">READ_WRITE</td></tr><tr><td>CL_DEVICE_QUEUE_PROPERTIES</td><td colspan="1">[PROFILING_MODE]</td><td colspan="1">[OUT_OF_ORDER_MODE, PROFILING_MODE]</td><td colspan="1">[OUT_OF_ORDER_MODE, PROFILING_MODE]</td><td colspan="1">[OUT_OF_ORDER_MODE, PROFILING_MODE]</td></tr><tr><td>CL_DEVICE_COMPILER_AVAILABLE</td><td colspan="1">true</td><td colspan="1">true</td><td colspan="1">true</td><td colspan="1">true</td></tr><tr><td>CL_DEVICE_ERROR_CORRECTION_SUPPORT</td><td colspan="1">false</td><td colspan="1">false</td><td colspan="1">false</td><td colspan="1">false</td></tr><tr><td>cl_khr_fp16</td><td colspan="1">false</td><td colspan="1">false</td><td colspan="1">false</td><td colspan="1">false</td></tr><tr><td>cl_khr_fp64</td><td colspan="1">false</td><td colspan="1">true</td><td colspan="1">true</td><td colspan="1">true</td></tr><tr><td>cl_khr_gl_sharing | cl_APPLE_gl_sharing</td><td colspan="1">true</td><td colspan="1">true</td><td colspan="1">true</td><td colspan="1">true</td></tr></table>
</div>
<p>
The CLInfo utility is part of the <a href="http://jogamp.org/jocl-demos/www/">jocl-demos</a> project and is also available via webstart. For a plain text version of the above output you can run:
</p>
<pre>
java -jar jocl.jar:gluegen-rt.jar\
-Djava.library.path="path/to/jocl/libs:path/to/gluegen/libs" com.jogamp.opencl.util.CLInfo
</pre>
<p>
(btw to install the intel sdk on debian based systems follow this <a href="http://mhr3.blogspot.com/2011/05/opencl-on-ubuntu.html">link</a>)
</p>
<p>
happy coding!
</p>https://mbien.dev/blog/entry/java_binding_for_the_openclJava Binding for the OpenCL API mbien2010-09-10T22:49:42+00:002020-04-09T09:54:27+00:00<p>
I am currently working on Java Binding for the <a href="http://en.wikipedia.org/wiki/Opencl">OpenCL</a> API using <a href="https://jogamp.org/gluegen/www/">GlueGen</a> (as used in <a href="https://jogl.jogamp.org">JOGL</a>, <a href="https://joal.jogamp.org">JOAL</a>). The project started as part of my bachelor of CS thesis short after the release of the first OpenCL specification draft and is now fully feature complete with OpenCL 1.1. <a href="https://jogamp.org/jocl/www/">JOCL</a> is currently in the stabilization phase, a beta release shouldn't be far away.
</p>
<h3>Overview - How does it work?</h3>
JOCL enables applications running on the JVM to use OpenCL for massively parallel, high performance computing tasks, executed on heterogeneous hardware (GPUs, CPUs, FPGAs etc) in a platform independent manner. JOCL consists of two parts, the low level and the high level binding.
<p>
The <b>low level bindings (LLB)</b> are automatically generated using the official OpenCL <a href="http://www.khronos.org/registry/cl/">headers</a> as input and provide a high performance, JNI based, 1:1 mapping to the C functions.
</p>
<p>This has the following advantages:</p>
<ul>
<li>reduces maintenance overhead and ensures spec conformance</li>
<li>compiletime JNI bindings are the fastest way to access native libs from the JVM</li>
<li>makes translating OpenCL C code into Java + JOCL very easy (e.g. from books or tutorials)</li>
<li>flexibility and stability: OpenCL libs are loaded dynamically and accessed via function pointers</li>
</ul>
<p>
The hand written <b>high level bindings (HLB)</b> is build on top of LLB and hides most boilerplate code (like object IDs, pointers and resource management) behind easy to use java objects.
HLB use direct NIO buffers internally for fast memory transfers between the JVM and the OpenCL implementation and is very GC friendly. Most of the API is designed for method chaining but of course you don't have to use it this way if you don't want to. JOCL also seamlessly integrates with JOGL 2 (both are built and tested together). Just pass the JOGL context as parameter to the JOCL context factory and you will receive a shared context. If you already know OpenCL and Java, HLB should be very intuitive for you.
</p>
<p>
The project is available on <a href="https://jocl.jogamp.org">jogamp.org</a>. Please use the <a href="https://jogamp.org/forum.html">mailinglist / forum</a> for feedback or questions and the <a href="https://jogamp.org/bugzilla/">bugtracker</a> if you experience any issues.
The JOCL <a href="http://github.com/mbien/jocl">root repository</a> is located on github, you may also want to take a look at the <a href="http://github.com/mbien/jocl-demos">jocl-demos</a> project. (If the demos are not enough you might also want to take a look at the junit tests)
</p>
<h3>Screenshots (sourcecode in jocl-demos project):</h3>
<a href="https://jogamp.org/jocl/www/Julia3d.png">
<img src="https://jogamp.org/jocl/www/Julia3d_sm.png" width="400" height="300" alt="JOCL Julia Set"/>
</a>
<img src="https://jogamp.org/jocl/www/mandelbrot64_sm.png" width="256" height="256" alt="high precision"/>
<p>
More regarding OpenGL interoperability and other features in upcoming blog entries.
</p>
<p>
The following sample shows basic setup, computation and cleanup using the high level APIs.
</p>
<h3>Hello World or parallel a+b=c</h3>
<pre><code class="language-java">
/**
* Hello Java OpenCL example. Adds all elements of buffer A to buffer B
* and stores the result in buffer C.
* Sample was inspired by the Nvidia VectorAdd example written in C/C++
* which is bundled in the Nvidia OpenCL SDK.
* @author Michael Bien
*/
public class HelloJOCL {
public static void main(String[] args) throws IOException {
// Length of arrays to process (arbitrary number)
int elementCount = 11444777;
// Local work size dimensions
int localWorkSize = 256;
// rounded up to the nearest multiple of the localWorkSize
int globalWorkSize = roundUp(localWorkSize, elementCount);
// setup
CLContext context = CLContext.create();
CLProgram program = context.createProgram(
HelloJOCL.class.getResourceAsStream("VectorAdd.cl")
).build();
CLBuffer<FloatBuffer> clBufferA =
context.createFloatBuffer(globalWorkSize, READ_ONLY);
CLBuffer<FloatBuffer> clBufferB =
context.createFloatBuffer(globalWorkSize, READ_ONLY);
CLBuffer<FloatBuffer> clBufferC =
context.createFloatBuffer(globalWorkSize, WRITE_ONLY);
out.println("used device memory: "
+ (clBufferA.getSize()+clBufferB.getSize()+clBufferC.getSize())/1000000 +"MB");
// fill read buffers with random numbers (just to have test data).
fillBuffer(clBufferA.getBuffer(), 12345);
fillBuffer(clBufferB.getBuffer(), 67890);
// get a reference to the kernel functon with the name 'VectorAdd'
// and map the buffers to its input parameters.
CLKernel kernel = program.createCLKernel("VectorAdd");
kernel.putArgs(clBufferA, clBufferB, clBufferC).putArg(elementCount);
// create command queue on fastest device.
CLCommandQueue queue = context.getMaxFlopsDevice().createCommandQueue();
// asynchronous write to GPU device,
// blocking read later to get the computed results back.
long time = nanoTime();
queue.putWriteBuffer(clBufferA, false)
.putWriteBuffer(clBufferB, false)
.put1DRangeKernel(kernel, 0, globalWorkSize, localWorkSize)
.putReadBuffer(clBufferC, true);
time = nanoTime() - time;
// cleanup all resources associated with this context.
context.release();
// print first few elements of the resulting buffer to the console.
out.println("a+b=c results snapshot: ");
for(int i = 0; i < 10; i++)
out.print(clBufferC.getBuffer().get() + ", ");
out.println("...; " + clBufferC.getBuffer().remaining() + " more");
out.println("computation took: "+(time/1000000)+"ms");
}
private static final void fillBuffer(FloatBuffer buffer, int seed) {
Random rnd = new Random(seed);
while(buffer.remaining() != 0)
buffer.put(rnd.nextFloat()*100);
buffer.rewind();
}
private static final int roundUp(int groupSize, int globalSize) {
int r = globalSize % groupSize;
if (r == 0) {
return globalSize;
} else {
return globalSize + groupSize - r;
}
}
}
</code></pre>
<h3>VectorAdd.cl</h3>
<pre><code class="language-opencl">
// OpenCL Kernel Function for element by element vector addition
kernel void VectorAdd(global const float* a,
global const float* b,
global float* c, int numElements) {
// get index into global data array
int iGID = get_global_id(0);
// bound check (equivalent to the limit on a 'for' loop)
if (iGID >= numElements) {
return;
}
// add the vector elements
c[iGID] = a[iGID] + b[iGID];
}
</code></pre>
https://mbien.dev/blog/entry/new_getting_started_with_joglNew Getting Started with JOGL 2 tutorialsmbien2010-09-04T00:22:38+00:002010-09-04T00:22:38+00:00<p>
Thanks to <a href="https://sites.google.com/site/justinscsstuff/Home">Justin Stoecker</a>, computer science graduate student at the University of Miami, JOGL gets a new set of getting started tutorials:
</p>
<i>
<p>
JOGL, or Java Bindings for OpenGL, allows Java programs to access the OpenGL API for graphics programming. The graphics code in JOGL programs will look almost identical to that found in C or C++ OpenGL programs, as the API is automatically generated from C header files. This is one of the greatest strengths of JOGL, as it is quite easy to port OpenGL programs written in C or C++ to JOGL; learning JOGL is essentially learning OpenGL[...]
</p>
</i>
<h3>Tutorials:</h3>
<ul>
<li><a href="https://sites.google.com/site/justinscsstuff/jogl-tutorials">index</a></li>
<li><a href="https://sites.google.com/site/justinscsstuff/jogl-tutorial-1">Tutorial 1 - Environment Setup</a></li>
<li><a href="https://sites.google.com/site/justinscsstuff/jogl-tutorial-2">Tutorial 2 - Creating a Window</a></li>
<li><a href="https://sites.google.com/site/justinscsstuff/jogl-tutorial-3">Tutorial 3 - Creating a Render Loop</a></li>
</ul>
Thanks Justin!<br/>