//Tracking Apache Roller page views using JFR custom events

The Java Flight Recorder is great for monitoring the JVM but after checking out the new JFR Event Streaming in JDK 14, I was curious how well it would work in practice for basic logging needs. I always wanted to generate some page view statistics for this web blog but never liked the idea of tracking users in their browsers (or at least those few who turned off tracking protection for some reason or used a outdated browser).

So lets take a look if we can record custom page view events using JFR instead of logging or storing stats in a database.

The goal is to record the link the anonymous reader clicked to get to the page (or null) and the actual page on the blog which is visited.

Custom Event

    @Label("Incoming Referrer Event")
    @Description("Reader landed on the Blog.")
    @StackTrace(false)
    private class IncomingReferrerEvent extends Event {

        @Label("Referrer URL")
        private final String referrerUrl;
        
        @Label("Request URL")
        private final String requestUrl;

        private IncomingReferrerEvent(String referrerUrl, String requestUrl) {
            this.referrerUrl = referrerUrl;
            this.requestUrl = requestUrl;
        }

        public String getReferrerUrl() {
            return referrerUrl;
        }

        public String getRequestUrl() {
            return requestUrl;
        }

    }

Defining a custom JFR event is trivial, simply extend jdk.jfr.Event and annotate the fields, the code should be self explanatory.@StackTrace(false) disables the recording of stack traces since we don't need it for this event. Custom events are enabled by default.

    new IncomingReferrerEvent(referrerUrl, requestUrl).commit();

Records the event (placed somewhere in doGet() of Apache Roller's HttpServlet). Events can have a beginning and an end, by simply calling commit() without begin() or end(), only the time stamp at commit and the event values are stored. Due to the design of the JFR you don't really have to worry about concurrency because the events are held in thread-local buffers before they are flushed to disk which should make it fairly performant even in highly parallel settings.

Configuration

The next step is to setup a JFR configuration file.


<?xml version="1.0" encoding="UTF-8"?>
<configuration version="2.0" label="Blog" description="Roller JFR Events" provider="Duke Himself">

    <event name="org.apache.roller.weblogger.jfr.IncomingReferrerEvent">
        <setting name="enabled">true</setting>
    </event>

</configuration>

I could have used the default configuration provided by the jdk but this would lead to much larger file sizes after some uptime. The configuration above only enables our custom event, everything else is turned off. For a larger blueprint see JDK_HOME/lib/jfr/default.jfc.

Once the server is running we can switch the recording on using jcmd. ($1 is the PID)

jcmd $1 JFR.configure repositorypath="/home/server/jfr" samplethreads=false
jcmd $1 JFR.start name="blogstats" settings="/home/server/jfr/blog.jfc"

First Test - 8 MB, not great, not terrible

After a quick test we can plot the summary of the recording file in the repository to check if we see any events.

jfr summary 2020_02_03_04_35_07_519/2020_02_07_16_46_39.jfr 

 Version: 2.1
 Chunks: 1
 Start: 2020-02-07 15:46:39 (UTC)

 Event Type                                             Count  Size (bytes) 
==============================================================================
 jdk.CheckPoint                                        130997      12587784
 org.apache.roller.weblogger.jfr.IncomingReferrerEvent     66          4735
 jdk.Metadata                                               1         81383
 jdk.ThreadEnd                                              0             0
 jdk.TLSHandshake                                           0             0
 jdk.ThreadSleep                                            0             0
 jdk.ThreadPark                                             0             0
 jdk.JavaMonitorEnter                                       0             0
...

There it is, we can see IncomingReferrerEvent in the record. You might be wondering what those jdk.CheckPoint events are. I wasn't sure either since I noticed them the first time with JDK 14 and a websearch resulted in no hits, so I asked on hotspot-jfr-dev. It turns out that they are part of the file format - for more details read the reply or even better the rest of the mail thread too.

The check points will create around 8 MB of data per day (one check point per second), which can be seen as static footprint without recording any actual events. 8 MB, not great, not terrible.

More Events

Having now a good estimate for the footprint, it was time to enable more events. Since the correlation of IncomingReferrerEvents with GC events, network/CPU load or thread count might be interesting given that it all is running on a Raspberry Pi 3b+ with limited resources. I ended up enabling some more events after grep'ing the code base of VisualVM for event Strings (it looks like JFR support is coming soon, the custom build I am using looks fairly complete already) and then adding them to the blog.jfc config file.

jfr summary blog_dump06.jfr 
...
 Event Type                                              Count  Size (bytes) 
==============================================================================
 jdk.CheckPoint                                        1223057     118202307
 jdk.NetworkUtilization                                 264966       4731868
 jdk.CPULoad                                            242765       6198487
 jdk.ActiveSetting                                        3348        123156
 org.apache.roller.weblogger.jfr.IncomingReferrerEvent    6367         89338
 jdk.GCHeapSummary                                         830         35942
 jdk.GarbageCollection                                     415         12577
 jdk.Metadata                                               13       1038982
 jdk.JVMInformation                                         12         27113
 jdk.GCHeapConfiguration                                    12           353
 jdk.ActiveRecording                                        12           727

Even with the added events and another 14 day test run I ended up with a file size of about 130 MB. That is about 9.28 MB per day - mostly CheckPoints ;) - in fact ~118 MB of them. luckily the file compresses nicely.

The undocumented Flag

If you take a look at the JDK sources you notice a flag with the description: "flush-interval, Minimum time before flushing buffers, measured in (s)econds..." - sounds exactly what we need.
jcmd $1 JFR.start name="blogstats" flush-interval=10s settings="/home/server/jfr/blog.jfc"

Results look promising: as expected around 10x reduction in CheckPoint events. (I don't have more data to show since I literally just added the flag and rebooted the server as I am typing this)

update: size of the record dump file after ~14.3 days (343h) is 25.13 MB which is about 1.8 MB per day - much better.

I can't recommend relying on undocumented or hidden flags since they can change or go away without notice (@see hotspot-jfr-dev discussion). The realtime nature of event streaming is also gone with a buffer delay of 10s, so keep that in mind.

The File Format

The file format for the JFR repository and record dumps isn't standardized and can change. So another thing to keep in mind if JFR is used for logging is that it might not be possible to read old JDK 14 dumps with JDK 20 in future (although reading JDK 11 dumps with 14 works just fine). But this doesn't sound like a big issue to me since JDKs don't just disappear, it will always be possible to open JDK 14 dumps with JDK 14 (and maybe even with JDK 20 from the future if backwards compatibility can be maintained).

Conclusion

Although the static footprint of the JFR in JDK 14 came as a surprise I am not too concerned of it. Storage is cheap and I am sure the issue will be solved or reduced to a minimum in one way or another.

The benefit of tracking my blog stats with JFR is that I can inspect them with the same tools I would use for JVM performance monitoring (VisualVM, JDK Mission Control, etc). JFR Event Streaming and the OpenJDK command line tools make it also easy to access the data in a structured manner.

JFR can not only be used for JVM monitoring but certainly also for structured logging in some scenarios. Log files will probably never disappear completely and I am sure JFR (or the original JRockit Flight Recorder implementation) was never intended to fully replace a logging framework but maybe one day logs will mostly contain lines produced by Event#toString() running dual stacks where it is still required :).

fly safe


//JFR Event Streaming with Java 14

The Java Flight Recorder has a long history. It used to be part of the BEA JRockit JVM, became a commercial feature of the Oracle JDK (7+) after Oracle acquired BEA and Sun and was finally fully open sourced with the release of OpenJDK 11 (backports exist for 8) (JEP328). OpenJDK 14 will add some improvements to the JFR.

JEP 349 will allow the continuous consumption of java flight recorder events in-memory from within the same JVM, or out-of-process from a different JVM via its JFR repository file.

JEP 349 made it already into the early access builds and can be experimented with since OpenJDK 14 build 22+. Lets check it out.

In-Process Streaming

The base JFR configuration files (XML) can be found in JDK_HOME/lib/jfr. The default configuration (default.jfc) is relatively low overhead, while profile.jfc will provide more data. Java Mission Control can create custom settings based on templates if needed. I am using the default config for the examples.

The first example will start JFR on the local JVM using the default recorder settings and register a few event handlers to check if it is working.

import java.io.IOException;
import java.text.ParseException;
import jdk.jfr.Configuration;
import jdk.jfr.consumer.EventStream;
import jdk.jfr.consumer.RecordingStream;

public class JFRStreamTest {
    
    public static void main(String[] args) throws IOException, ParseException  {
        
        Configuration config = Configuration.getConfiguration("default");
        System.out.println(config.getDescription());
        System.out.println("settings:");
        config.getSettings().forEach((key, value) -> System.out.println(key+": "+value));
        
        // open a stream and start local recording
        try (EventStream es = new RecordingStream(config)) {
            
            // register event handlers
            es.onEvent("jdk.GarbageCollection", System.out::println);
            es.onEvent("jdk.CPULoad", System.out::println);
            es.onEvent("jdk.JVMInformation", System.out::println);
            
            // start and block
            es.start();
        }
    
    }
    
}

The above example should print information about the running JVM once, CPU load periodically and GC events when they occur.

Out-of-Process Streaming

Simply start the flight recorder as usual via jcmd <PID> JFR.start or via the JVM flag -XX:+FlightRecorder at startup. The repository location will be stored in the jdk.jfr.repository system property as soon JFR is running (new in Java 14). It can also be set at startup via a comma separated list of flight recorder options: -XX:FlightRecorderOptions=repository=./blackbox

Update thanks to Erik from Oracle in the comments section: The repository location can be also set using jcmd <PID> JFR.configure repositorypath=<directory>. If you set it after a recording has started, new data will be written in the new location.

$ jcmd -l | grep netbeans
8492 org.netbeans.Main ... --branding nb

$ jcmd 8492 JFR.start name=streamtest
Started recording 1. No limit specified, using maxsize=250MB as default.
Use jcmd 8492 JFR.dump name=streamtest filename=FILEPATH to copy recording data to file.

$ jinfo -sysprops 8492 | grep jfr
jdk.jfr.repository=/tmp/2019_11_18_02_19_59_8492

Now that the recording is running and we know where the repository is, a second JVM can open a stream to the live JFR repository and monitor the application. Note that we did not dump any JFR records to a file, we connect to the live repository directly.

import java.io.IOException;
import java.nio.file.Path;
import jdk.jfr.consumer.EventStream;

public class JFRStreamTest {
    
    public static void main(String[] args) throws IOException  {
        // connect to JFR repository
        try (EventStream es = EventStream.openRepository(Path.of("/tmp/2019_11_18_02_19_59_8492"))) {
            
            // register some event handlers
            //es.onEvent("jdk.CPULoad", System.out::println);
            es.onEvent("jdk.SocketRead", System.out::println);
            es.onEvent("jdk.SocketWrite", System.out::println);
            
            // start and block
            es.start();
        }
    
    }
    
}

As a quick test i monitored with the example above a NetBeans instance running on Java 14 and let the IDE check for updates. Since we watch for SocketRead and Write events the output looked like:

jdk.SocketRead {
  startTime = 04:34:09.571
  duration = 117.764 ms
  host = "netbeans.apache.org"
  address = "40.79.78.1"
  port = 443
  timeout = 30.000 s
  bytesRead = 5 bytes
  endOfStream = false
  eventThread = "pool-5-thread-1" (javaThreadId = 163)
  stackTrace = [
    java.net.Socket$SocketInputStream.read(byte[], int, int) line: 68
    sun.security.ssl.SSLSocketInputRecord.read(InputStream, byte[], int, int) line: 457
    sun.security.ssl.SSLSocketInputRecord.decode(ByteBuffer[], int, int) line: 165
    sun.security.ssl.SSLTransport.decode(TransportContext, ByteBuffer[], int, int, ...
    sun.security.ssl.SSLSocketImpl.decode(ByteBuffer) line: 1460
    ...
  ]
}
...

Streaming Dumped Records

Opening streams (EventStream.openFile(path)) to JFR record dumps (jcmd <PID> JFR.dump filename=foo.jfr) is also possible of course - works as expected.

Conclusion

Pretty cool new feature! It is currently not possible to do in-memory but out-of-process steaming without syncing on repository files. But since a ramdisk can workaround this issue so easily I am not even sure if this capability would be worth it.

fly safe


//[Dynamic] [Application] Class Data Sharing in Java

Class Data Sharing as a JVM feature exists for quite some time already, but it became more popular in context of containers. CDS maps a pre-defined class archive into memory and makes it shareable between JVM processes. This can improve startup time but also reduce per-JVM footprint in some cases.

Although basic CDS was already supported in Java 5, this blog entry assumes that Java 11 or later is used, since JVM flags and overall capabilities changed over time. CDS might not be supported on all architectures or GCs: G1, CMS, ShenandoahGC, Serial GC, Parallel GC, and ParallelOld GC do all support CDS on 64bit linux, while ZGC for example doesn't have support for it yet.


JDK Class Data Sharing

The most basic form of CDS is setup to share only JDK class files. This is enabled by default since JDK 12 (JEP 341). If you check the java version using a shell you notice that the JVM version string will contain "sharing".
[mbien@hulk server]$ java -version
openjdk version "12.0.1" 2019-04-16
OpenJDK Runtime Environment AdoptOpenJDK (build 12.0.1+12)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 12.0.1+12, mixed mode, sharing)
The shared class archive can be found pre installed in ${JAVA_HOME}/lib/server/ called classes.jsa. By temporary renaming the file or switching to an older JDK, "sharing" won't appear anymore in the version string.
[mbien@hulk server]$ java -version
openjdk version "12.0.1" 2019-04-16
OpenJDK Runtime Environment AdoptOpenJDK (build 12.0.1+12)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 12.0.1+12, mixed mode)

The JDK CDS archive can be manually generated by invoking java -Xshare:dump with root privileges. Recent JVMs are configured to use -Xshare:auto at startup, which will automatically use CDS if available. Enforcing CDS with -Xshare:on will cause the JVM to fail if no archive is found.


Application Class Data Sharing

The basic CDS archive only contains JDK class data. Adding application class data (JEP 310) is fairly easy but comes with some limitations.

  • the classpath used at archive creation time must be the same as (or a prefix of) the classpath used at run time
  • wildcards or exploded JARs are not allowed in the path (no cheating)

1) create a list of classes (JDK and APP; Java 11+) to include in the shared archive:

$ java -Xshare:off -XX:DumpLoadedClassList=classes.list -jar foo.jar

2) create the archive using that list:

$ java -Xshare:dump -XX:SharedClassListFile=classes.list -XX:SharedArchiveFile=foo.jsa -cp foo.jar

3) launch the applications using the shared archive:

$ java -Xshare:on -XX:SharedArchiveFile=foo.jsa -jar foo.jar

-Xlog:class+load helps to verify that it is working properly.

[0.062s][info][class,load] java.lang.Object source: shared objects file
[0.062s][info][class,load] java.io.Serializable source: shared objects file
[0.062s][info][class,load] java.lang.Comparable source: shared objects file
[0.062s][info][class,load] java.lang.CharSequence source: shared objects file
...

The applications (containers) don't have to be identical to make use of custom CDS archives. The archive is based on the class list of step 1 which can be freely modified or merged with other lists. The main limitation is the classpath prefix rule.

Some Results

As quick test I used OpenJDK 11 + Eclipse Jetty + Apache JSP Wiki and edited a single page (to ensure the classes are loaded), generated a custom archive and started a few more instances with CDS enabled using the archive.

classes.list size = 4296 entries
wiki.jsa size = 57,6 MB 

 VIRT    RES     SHR  %MEM    TIME    COMMAND
2771292 315128  22448  8,3   0:20.54 java -Xmx128m -Xms128m -Xshare:off -cp ... 
2735468 259148  45184  6,9   0:18.38 java -Xmx128m -Xms128m -XX:SharedArchiveFile=wiki.jsa -cp ...
2735296 270044  45192  7,1   0:19.66 java -Xmx128m -Xms128m -XX:SharedArchiveFile=wiki.jsa -cp ...
2735468 259812  45060  6,9   0:18.26 java -Xmx128m -Xms128m -XX:SharedArchiveFile=wiki.jsa -cp ...
2735468 256800  45196  6,8   0:19.10 java -Xmx128m -Xms128m -XX:SharedArchiveFile=wiki.jsa -cp ...

The first JVM is started with CDS turned off as reference. The four other instances have sharing enabled using the custom archive.


Dynamic Class Data Archives

The proposal in JEP 350 which is currently targeted for Java 13 will allow to combine a static base archive with a dynamically generated archive.

The dynamic class data archive is created in a setup phase at application exit (-XX:ArchiveClassesAtExit=dynamic.jsa) and essentially automates step 1 and 2. Just like before -XX:SharedArchiveFile=dynamic.jsa will tell the JVM to map the shared archive. Since the dynamic archive references the static JDK archive, both will be used automatically.

The main advantage (beside the added convenience) is that the dynamic archive will support both builtin class loaders and user-defined class loaders.

now go forth and share class data :)


//Cleaning Bash History using a Java 11 Single-File Sourcecode Program

Java 11 adds with JEP330 the ability to launch a Java source file directly from the command line without requiring to explicitly compile it. I can see this feature being convenient for some simple scripting or quick tests. Although Java isn't the most concise language it still has the benefit of being quite readable and having powerful utility APIs.

The following example reads the bash shell history from a file into a LinkedHashSet and only keeps the last recently used line if duplicates exist, essentially cleaning up the history.


import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.LinkedHashSet;

public class HistoryCleaner {

    public static void main(String[] args) throws IOException {

        Path path = Path.of(System.getProperty("user.home")+"/.bash_history");

        LinkedHashSet<String> deduplicated = new LinkedHashSet<>(1024);
        Files.lines(path).forEachOrdered(line -> {
            deduplicated.remove(line);
            deduplicated.add(line);
        });

        Files.write(path, deduplicated);
    }

}

duke@virtual1:~$ java HistoryCleaner.java

will compile and run the "script".


Shebangs are supported too

#!/usr/bin/java --source 11

import java.io....
However, since this is not part of the java language specification, single-file java programs (SFJP) with a #! are technically no regular java files and will not compile when used with javac. This also means that the common file naming conventions are not required for pure SFJPs - using a different file extension is advised. We can't call it .js - so lets call it .sfjp :).
duke@virtual1:~$ ./HistoryCleaner.sfjp
Will compile and run the HistoryCleaner SFJP containing the shebang declaration in the first line. Keeping the .java file extension would result in a compiler error.

//NetBeans OpenCL Pack

Since I am doing a lot with OpenCL lately I decided to try to improve the tooling around OpenCL a bit. A weekend later the NetBeans OpenCL Pack was born :).

Features Including:

  • OpenCL Editor with syntax highlighting code completion and CL reference pages integration
  • OpenCL compiler integration
  • In-editor annotations of compiler warnings and errors updated as you type
  • JOCL project template

Technical Details:

The editor uses ANTLR as parser and lexer. This allows such simple things like keyword highlighting and also more complex features like semantic highlighting, formatting and auto completion (formatting is not yet implemented). It can also detect and report syntax errors, however this feature is automatically disabled if an OpenCL compiler is present on the host system. All with help of JOCL detected OpenCL implementations can be used as compiler backend.

Instead of using the old OpenGL Pack as template I decided to write it from scratch using latest NetBeans 7 and Java 7 APIs. So you will have to start NB with JDK7 to be able to use it.

Download

you can download it from the NetBeans plugin portal [mirror], sourcecode is on github

feedback and/or contributions/bugreports are as always appreciated

Screenshots:

auto completion editor project templates

have fun!


//Many little improvements made it into JOCL recently

Ok some of them are big, but I will only cover the little things with this blog entry :).

CLKernel

I added multiple utility methods to CLKernel and related classes. It is for example now possible to create a kernel and set its arguments in one line.

CLKernel sha512 = program.createCLKernel("sha512", padBuffer, digestBuffer, rangeBuffer);

Thanks to feedback in the jocl forums I also added methods to set vector typed arguments directly. In past you could do this only by setting them via a java.util.Buffer.

kernel.setArg(index, x, y, z, w);

Another small feature of CLKernel is to enforce 32bit arguments. You may want to switch between single and double floatingpoint precision at runtime or mix between both to improve performance you will have to compile the program with the double FP extension enabled. By setting kernel.setForce32bitArgs(true) all java doubles used as kernel arguments will be automatically cast down to 32bit CL floats (see MultiDeviceFractal demo for a example). This is nothing special but might safe you several if(single){setArg((float)foo)}else{setArg(foo)} constructs.

CLWork

CLKernel still only represents the function in the OpenCL program you want to call - nothing more. The new CLWork object contains everything required for kernel execution, like the NDRange and the kernel itself.

    int size = buffer.getNIOCapacity();
    CLWork1D work = CLWork.create1D(program.createCLKernel("sum", buffer, size));
    work.setWorkSize(size, 1).optimizeFor(device);

    // execute
    queue.putWriteBuffer(buffer, false)
         .putWork(work)
         .putReadBuffer(buffer, true);

optimizeFor(device) adjusts the workgroup size to meet device specific recommended values. This should make sure that all computing units of your GPU are used by dividing the work into groups (however this only works if your task does not care about the workgroup size, see javadoc).

CLSubDevice

Sometimes you don't want to put your CLDevice under 100% load. This might be the case for example if your device is the CPU your application is running on or if you have to share the GPU with an OpenGL context for rendering. One easy way of controlling device load is to limit the amount of compute units used for a task.

    CLPlatform platform = CLPlatform.getDefault(version(CL_1_1), type(CPU));

    CLDevice devices = platform.getMaxFLOPSDevice(type(CPU));
    CLSubDevice[] subs = device.createSubDevicesByCount(4, 4);
    // array contains now two virtual devices containing four CPU cores each

    CLContext context = CLContext.create(subs);
    CLCommandQueue queue = subs[0].createCommandQueue();
    ...

CLSubDevices extends CLDevice and can be used for context creation, queue creation and everywhere you would use the CLDevice. Prior to creating subdevices you should check if device.isFissionSupported() returns true.

CLProgram builder

Ok, this utility is not that new but I haven't blogged about it yet. If program.build() isn't enough you should take a look at the program builder. CLBuildConfiguration stores everything which is needed for program compilation and is easily configurable via the builder pattern :).

        // reusable builder
        CLBuildConfiguration builder = CLProgramBuilder.createConfiguration()
                                     .withOption(ENABLE_MAD)
                                     .forDevices(context.getDevices())
                                     .withDefine("RADIUS", 5)
                                     .withDefine("ENABLE_FOOBAR");
        builder.build(programA);
        builder.build(programB);
        ...

CLBuildConfiguration is fully reusable and can be upgraded to CLProgramConfiguration if you combine it with a CLProgram. Both can be serialised which allows to store the build configuration or the entire prebuild program on disc or send it over the network. (caching binaries on disc can safe startup time for example)

        // program configuration
        ois = new ObjectInputStream(new FileInputStream(file));
        CLProgramConfiguration programConfig = CLProgramBuilder.loadConfiguration(ois, context);
        assertNotNull(programConfig.getProgram());
        ois.close();
        program = programConfig.build(); // builds from source or loads binaries if possible
        assertTrue(program.isExecutable());

Note: loading binaries and associating them with the right driver/device is currently not trivial with OpenCL. Even if everything works as intended it is still possible that the driver refuses the binaries for some reason (driver update...etc). Thats why its recommended to add the program source to the configuration before calling build() to allow a automatic rebuild as fallback.

        // another entry point for complex builds (prepare() returns CLProgramConfiguration)
        program.prepare().withOption(ENABLE_MAD).forDevice(context.getMaxFlopsDevice()).build();

(all snippets have been stolen from the junit tests)
I am sure I forgot something... but this should cover at least some of the incremental improvements. Expect a few more blog entries for the larger features soon.

- - - - - -
In other news: Nvidia released OpenCL 1.1 drivers, some of us thought this would never happen -> all major vendors (AMD, Intel, NV, IBM, ZiiLABS ..) support now OpenCL 1.1 (screenshot)

have fun!


//Developing with JOCL on AMD, Intel and Nvidia OpenCL platforms

One nice feature of OpenCL is that the platform abstraction was handled in the spec from the first day on. You can install all OpenCL drivers side by side and let the application choose at runtime, on which device and on which platform it should execute the kernels.

As of today there are three four vendors which provide OpenCL implementations for the desktop. AMD and Intel support the OpenCL 1.1 specification where Nvidia apparently tries to stick with 1.0 to encourage their customers to stick with CUDA ;-). [edit] And of course there is also Apple providing out-of-the box OpenCL 1.0 support in MacOS 10.6.

JOCL contains a small CLInfo utility which can be used to quickly verify OpenCL installations. Here is the output on my system (ubuntu was booted) having all three SDKs installed:

CL_PLATFORM_NAMEATI StreamNVIDIA CUDAIntel(R) OpenCL
CL_PLATFORM_VERSIONOpenCL 1.1 ATI-Stream-v2.2 (302)OpenCL 1.0 CUDA 4.0.1OpenCL 1.1 LINUX
CL_PLATFORM_PROFILEFULL_PROFILEFULL_PROFILEFULL_PROFILE
CL_PLATFORM_VENDORAdvanced Micro Devices, Inc.NVIDIA CorporationIntel(R) Corporation
CL_PLATFORM_ICD_SUFFIX_KHRAMDNVIntel
CL_PLATFORM_EXTENSIONS[cl_khr_icd, cl_amd_event_callback][cl_khr_icd, cl_khr_byte_addressable_store, cl_nv_compiler_options, cl_nv_pragma_unroll, cl_nv_device_attribute_query, cl_khr_gl_sharing][cl_khr_icd, cl_khr_byte_addressable_store, cl_khr_fp64, cl_khr_local_int32_extended_atomics, cl_khr_local_int32_base_atomics, cl_khr_global_int32_base_atomics, cl_khr_gl_sharing, cl_intel_printf, cl_khr_global_int32_extended_atomics, cl_ext_device_fission]
CL_DEVICE_NAMEIntel(R) Core(TM) i7 CPU 940 @ 2.93GHzGeForce GTX 295GeForce GTX 295Intel(R) Core(TM) i7 CPU 940 @ 2.93GHz
CL_DEVICE_TYPECPUGPUGPUCPU
CL_DEVICE_AVAILABLEtruetruetruetrue
CL_DEVICE_VERSIONOpenCL 1.1 ATI-Stream-v2.2 (302)OpenCL 1.0 CUDAOpenCL 1.0 CUDAOpenCL 1.1
CL_DEVICE_PROFILEFULL_PROFILEFULL_PROFILEFULL_PROFILEFULL_PROFILE
CL_DEVICE_ENDIAN_LITTLEtruetruetruetrue
CL_DEVICE_VENDORGenuineIntelNVIDIA CorporationNVIDIA CorporationIntel(R) Corporation
CL_DEVICE_EXTENSIONS[cl_amd_device_attribute_query, cl_khr_byte_addressable_store, cl_khr_int64_extended_atomics, cl_khr_local_int32_extended_atomics, cl_amd_fp64, cl_amd_printf, cl_khr_local_int32_base_atomics, cl_khr_int64_base_atomics, cl_khr_global_int32_base_atomics, cl_khr_gl_sharing, cl_khr_global_int32_extended_atomics, cl_ext_device_fission][cl_khr_icd, cl_khr_byte_addressable_store, cl_khr_fp64, cl_khr_local_int32_extended_atomics, cl_khr_local_int32_base_atomics, cl_nv_compiler_options, cl_nv_pragma_unroll, cl_nv_device_attribute_query, cl_khr_global_int32_base_atomics, cl_khr_gl_sharing, cl_khr_global_int32_extended_atomics][cl_khr_icd, cl_khr_byte_addressable_store, cl_khr_fp64, cl_khr_local_int32_extended_atomics, cl_khr_local_int32_base_atomics, cl_nv_compiler_options, cl_nv_pragma_unroll, cl_nv_device_attribute_query, cl_khr_global_int32_base_atomics, cl_khr_gl_sharing, cl_khr_global_int32_extended_atomics][cl_khr_byte_addressable_store, cl_khr_fp64, cl_khr_local_int32_extended_atomics, cl_khr_local_int32_base_atomics, cl_khr_global_int32_base_atomics, cl_khr_gl_sharing, cl_intel_printf, cl_khr_global_int32_extended_atomics, cl_ext_device_fission]
CL_DEVICE_MAX_COMPUTE_UNITS830308
CL_DEVICE_MAX_CLOCK_FREQUENCY2934124212422930
CL_DEVICE_VENDOR_ID40984318431832902
CL_DEVICE_OPENCL_C_VERSIONOpenCL C 1.1 com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info string [error: CL_INVALID_VALUE]com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info string [error: CL_INVALID_VALUE]OpenCL C 1.1
CL_DRIVER_VERSION2.0270.41.06270.41.061.1
CL_DEVICE_ADDRESS_BITS64323264
CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT8118
CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR161116
CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT4114
CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG2112
CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT4114
CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE0112
CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR16com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]16
CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT8com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]8
CL_DEVICE_NATIVE_VECTOR_WIDTH_INT4com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]4
CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG2com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]2
CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF0com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]0
CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT4com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]4
CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE0com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]2
CL_DEVICE_MAX_WORK_GROUP_SIZE10245125121024
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS3333
CL_DEVICE_MAX_WORK_ITEM_SIZES[1024, 1024, 1024][512, 512, 64][512, 512, 64][1024, 1024, 1024]
CL_DEVICE_MAX_PARAMETER_SIZE4096435243521024
CL_DEVICE_MAX_MEM_ALLOC_SIZE10737418242348318722347008003154703360
CL_DEVICE_GLOBAL_MEM_SIZE322122547293932748893880320012618813440
CL_DEVICE_LOCAL_MEM_SIZE32768163841638432768
CL_DEVICE_HOST_UNIFIED_MEMORYtruecom.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]com.jogamp.opencl.CLException$CLInvalidValueException: error while asking for info value [error: CL_INVALID_VALUE]true
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE655366553665536131072
CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE640064
CL_DEVICE_GLOBAL_MEM_CACHE_SIZE3276800262144
CL_DEVICE_MAX_CONSTANT_ARGS899128
CL_DEVICE_IMAGE_SUPPORTfalsetruetruetrue
CL_DEVICE_MAX_READ_IMAGE_ARGS0128128128
CL_DEVICE_MAX_WRITE_IMAGE_ARGS088128
CL_DEVICE_IMAGE2D_MAX_WIDTH0409640968192
CL_DEVICE_IMAGE2D_MAX_HEIGHT032768327688192
CL_DEVICE_IMAGE3D_MAX_WIDTH0204820482048
CL_DEVICE_IMAGE3D_MAX_HEIGHT0204820482048
CL_DEVICE_IMAGE3D_MAX_DEPTH0204820482048
CL_DEVICE_MAX_SAMPLERS01616128
CL_DEVICE_PROFILING_TIMER_RESOLUTION110001000340831
CL_DEVICE_EXECUTION_CAPABILITIES[EXEC_KERNEL, EXEC_NATIVE_KERNEL][EXEC_KERNEL][EXEC_KERNEL][EXEC_KERNEL, EXEC_NATIVE_KERNEL]
CL_DEVICE_HALF_FP_CONFIG[][][][]
CL_DEVICE_SINGLE_FP_CONFIG[DENORM, INF_NAN, ROUND_TO_NEAREST, ROUND_TO_INF, ROUND_TO_ZERO][INF_NAN, ROUND_TO_NEAREST, ROUND_TO_INF, ROUND_TO_ZERO, FMA][INF_NAN, ROUND_TO_NEAREST, ROUND_TO_INF, ROUND_TO_ZERO, FMA][DENORM, INF_NAN, ROUND_TO_NEAREST]
CL_DEVICE_DOUBLE_FP_CONFIG[][DENORM, INF_NAN, ROUND_TO_NEAREST, ROUND_TO_INF, ROUND_TO_ZERO, FMA][DENORM, INF_NAN, ROUND_TO_NEAREST, ROUND_TO_INF, ROUND_TO_ZERO, FMA][DENORM, INF_NAN, ROUND_TO_NEAREST, ROUND_TO_INF, ROUND_TO_ZERO, FMA]
CL_DEVICE_LOCAL_MEM_TYPEGLOBALLOCALLOCALGLOBAL
CL_DEVICE_GLOBAL_MEM_CACHE_TYPEREAD_WRITENONENONEREAD_WRITE
CL_DEVICE_QUEUE_PROPERTIES[PROFILING_MODE][OUT_OF_ORDER_MODE, PROFILING_MODE][OUT_OF_ORDER_MODE, PROFILING_MODE][OUT_OF_ORDER_MODE, PROFILING_MODE]
CL_DEVICE_COMPILER_AVAILABLEtruetruetruetrue
CL_DEVICE_ERROR_CORRECTION_SUPPORTfalsefalsefalsefalse
cl_khr_fp16falsefalsefalsefalse
cl_khr_fp64falsetruetruetrue
cl_khr_gl_sharing | cl_APPLE_gl_sharingtruetruetruetrue

The CLInfo utility is part of the jocl-demos project and is also available via webstart. For a plain text version of the above output you can run:

 java -jar jocl.jar:gluegen-rt.jar\
    -Djava.library.path="path/to/jocl/libs:path/to/gluegen/libs" com.jogamp.opencl.util.CLInfo

(btw to install the intel sdk on debian based systems follow this link)

happy coding!


//Java Binding for the OpenCL API

I am currently working on Java Binding for the OpenCL API using GlueGen (as used in JOGL, JOAL). The project started as part of my bachelor of CS thesis short after the release of the first OpenCL specification draft and is now fully feature complete with OpenCL 1.1. JOCL is currently in the stabilization phase, a beta release shouldn't be far away.

Overview - How does it work?

JOCL enables applications running on the JVM to use OpenCL for massively parallel, high performance computing tasks, executed on heterogeneous hardware (GPUs, CPUs, FPGAs etc) in a platform independent manner. JOCL consists of two parts, the low level and the high level binding.

The low level bindings (LLB) are automatically generated using the official OpenCL headers as input and provide a high performance, JNI based, 1:1 mapping to the C functions.

This has the following advantages:

  • reduces maintenance overhead and ensures spec conformance
  • compiletime JNI bindings are the fastest way to access native libs from the JVM
  • makes translating OpenCL C code into Java + JOCL very easy (e.g. from books or tutorials)
  • flexibility and stability: OpenCL libs are loaded dynamically and accessed via function pointers

The hand written high level bindings (HLB) is build on top of LLB and hides most boilerplate code (like object IDs, pointers and resource management) behind easy to use java objects. HLB use direct NIO buffers internally for fast memory transfers between the JVM and the OpenCL implementation and is very GC friendly. Most of the API is designed for method chaining but of course you don't have to use it this way if you don't want to. JOCL also seamlessly integrates with JOGL 2 (both are built and tested together). Just pass the JOGL context as parameter to the JOCL context factory and you will receive a shared context. If you already know OpenCL and Java, HLB should be very intuitive for you.

The project is available on jogamp.org. Please use the mailinglist / forum for feedback or questions and the bugtracker if you experience any issues. The JOCL root repository is located on github, you may also want to take a look at the jocl-demos project. (If the demos are not enough you might also want to take a look at the junit tests)

Screenshots (sourcecode in jocl-demos project):

JOCL Julia Set high precision

More regarding OpenGL interoperability and other features in upcoming blog entries.

The following sample shows basic setup, computation and cleanup using the high level APIs.

Hello World or parallel a+b=c

/**
 * Hello Java OpenCL example. Adds all elements of buffer A to buffer B
 * and stores the result in buffer C.
 * Sample was inspired by the Nvidia VectorAdd example written in C/C++
 * which is bundled in the Nvidia OpenCL SDK.
 * @author Michael Bien
 */
public class HelloJOCL {

    public static void main(String[] args) throws IOException {
        // Length of arrays to process (arbitrary number)
        int elementCount = 11444777;
        // Local work size dimensions
        int localWorkSize = 256;
        // rounded up to the nearest multiple of the localWorkSize
        int globalWorkSize = roundUp(localWorkSize, elementCount);

        // setup
        CLContext context = CLContext.create();

        CLProgram program = context.createProgram(
                       HelloJOCL.class.getResourceAsStream("VectorAdd.cl")
                                 ).build();

        CLBuffer<FloatBuffer> clBufferA =
                       context.createFloatBuffer(globalWorkSize, READ_ONLY);
        CLBuffer<FloatBuffer> clBufferB =
                       context.createFloatBuffer(globalWorkSize, READ_ONLY);
        CLBuffer<FloatBuffer> clBufferC =
                       context.createFloatBuffer(globalWorkSize, WRITE_ONLY);

        out.println("used device memory: "
            + (clBufferA.getSize()+clBufferB.getSize()+clBufferC.getSize())/1000000 +"MB");

        // fill read buffers with random numbers (just to have test data).
        fillBuffer(clBufferA.getBuffer(), 12345);
        fillBuffer(clBufferB.getBuffer(), 67890);

        // get a reference to the kernel functon with the name 'VectorAdd'
        // and map the buffers to its input parameters.
        CLKernel kernel = program.createCLKernel("VectorAdd");
        kernel.putArgs(clBufferA, clBufferB, clBufferC).putArg(elementCount);

        // create command queue on fastest device.
        CLCommandQueue queue = context.getMaxFlopsDevice().createCommandQueue();

        // asynchronous write to GPU device,
        // blocking read later to get the computed results back.
        long time = nanoTime();
        queue.putWriteBuffer(clBufferA, false)
             .putWriteBuffer(clBufferB, false)
             .put1DRangeKernel(kernel, 0, globalWorkSize, localWorkSize)
             .putReadBuffer(clBufferC, true);
        time = nanoTime() - time;

        // cleanup all resources associated with this context.
        context.release();

        // print first few elements of the resulting buffer to the console.
        out.println("a+b=c results snapshot: ");
        for(int i = 0; i < 10; i++)
            out.print(clBufferC.getBuffer().get() + ", ");
        out.println("...; " + clBufferC.getBuffer().remaining() + " more");

        out.println("computation took: "+(time/1000000)+"ms");

    }

    private static final void fillBuffer(FloatBuffer buffer, int seed) {
        Random rnd = new Random(seed);
        while(buffer.remaining() != 0)
            buffer.put(rnd.nextFloat()*100);
        buffer.rewind();
    }

    private static final int roundUp(int groupSize, int globalSize) {
        int r = globalSize % groupSize;
        if (r == 0) {
            return globalSize;
        } else {
            return globalSize + groupSize - r;
        }
    }

}

VectorAdd.cl

    // OpenCL Kernel Function for element by element vector addition
    kernel void VectorAdd(global const float* a,
                          global const float* b,
                          global float* c, int numElements) {

        // get index into global data array
        int iGID = get_global_id(0);

        // bound check (equivalent to the limit on a 'for' loop)
        if (iGID >= numElements)  {
            return;
        }

        // add the vector elements
        c[iGID] = a[iGID] + b[iGID];
    }

//New Getting Started with JOGL 2 tutorials

Thanks to Justin Stoecker, computer science graduate student at the University of Miami, JOGL gets a new set of getting started tutorials:

JOGL, or Java Bindings for OpenGL, allows Java programs to access the OpenGL API for graphics programming. The graphics code in JOGL programs will look almost identical to that found in C or C++ OpenGL programs, as the API is automatically generated from C header files. This is one of the greatest strengths of JOGL, as it is quite easy to port OpenGL programs written in C or C++ to JOGL; learning JOGL is essentially learning OpenGL[...]

Tutorials:

Thanks Justin!

//JOGL 2 - Composeable Pipline

JOGL provides a feature called 'composeable pipeline' which can be quite useful in some situations. It enables you to put additional delegating layers between your java application and the OpenGL driver. A few usecases could be:
  • performance metrics
  • logging, debugging or diagnostics
  • to ignore specific function calls
It is very easy to set up. Just put this line into your code and the DebugGL layer will throw a GLException as soon an error occurs (you want this usually when you are developing the software).
    public void init(GLAutoDrawable drawable) {
        // wrap composeable pipeline in a Debug utility, all OpenGL error codes are automatically
        // converted to GLExceptions as soon as they appear
        drawable.setGL(new DebugGL3(drawable.getGL().getGL3()));
        //..
    }
Another predefined layer is TraceGL which intercepts all OpenGL calls and prints them to an output stream.
        drawable.setGL(new TraceGL3(drawable.getGL().getGL3(), System.out));
see also GL Profiles

//NetBeans GIT support

If you are using GIT as SCM and NetBeans as IDE you should probably check out NBGit. The plugin integrates GIT in NetBeans in the same way as the out of the box Mercurial support does it. In fact both modules have the same origin since nbgit is a fork of the mercurial integration project and incrementally adds features to catch up.

NBGit Version 0.3 is already fairly stable and provides the basic set of features you would expect from distributed versioning system IDE integration.

Features

  • Graph visualization of parallel branches (Browser similar to giggle)
  • Versioning History (git log)
  • Show changes (git status)
  • update/commit/reset
  • clone/clone other/git init
  • custom actions (custom git commands)
  • diff
  • in-editor annotation of code changes
  • ignore files (parsing '.gitignore' files)
  • git properties (username, email etc via options)

The project is developed by volunteers outside Sun, if you like to see GIT integration as out-of-the-box feature in a future version of NetBeans please vote for this RFE.

I use the plugin for most of my open source projects and haven't experience any serious issues so far. I would say its already safe to use since you can't do anything wrong if you do a 'git status' -> 'git push' via command line as last step anyway.


//JOGL 2 - OpenGL Profiles explained

June 16 2010, updated blogpost: OpenGL 4

JOGL 2 supports several OpenGL Profiles. In this blog entry I try to explain what profiles are and why they are needed.

History

SGI released the first OpenGL specification 1992. Since this point OpenGL 1.x constantly evolved (under the ARB and later Khronos Group) by adding new functions to the core API. This went well until programmable graphics hardware became mainstream and shaders became suddenly more flexible and efficient as the generic fixed function pipeline.

OpenGL 2.x was the last version in which you could freely mix the fixed function pipeline with the programmable pipeline (as a core feature).

With the release of OpenGL 3.0 the whole fixed function pipeline has been deprecated but you could still use it if you haven't requested a forward compatible context.

OpenGL 3.1 and 3.2 removed most deprecated functionality from core specification, however some implementations (e.g. Nvidia drivers) still allow to get them back via an optional compatibility extension. Since 3.1 was the first release which broke compatibility, it is often seen as major OpenGL 3 release.

JOGL 2 (JSR 231)

JOGL 1.1.1 lived in the timeframe up to OpenGL 3.0 which made it easy to stay in sync with the spec. To be able to solve the issue with the deprecation of functionality, JOGL 2 (JSR maintenance release) introduces an abstraction of the original OpenGL versioning called Profile. Profiles allow Java applications to be written in a way which allows compatibility with multiple OpenGL versions at the same time. Since OpenGL ES (GL for embedded systems) has overlapping functionality with OpenGL itself it opened the opportunity to add even Profiles which bridge desktop and embedded implementations. The class diagram below shows the dependencies between all available Profiles.

Before you start writing a JOGL application you will have to decide first which GLProfile you want to use. The code snippet below lists all currently supported profiles (extracted from GLProfile).


Current list of supported profiles and their mapping to the implementation versions

    /** The desktop OpenGL compatibility profile 4.x, with x >= 0, ie GL2 plus GL4.
bc stands for backward compatibility. */ public static final String GL4bc = "GL4bc"; /** The desktop OpenGL core profile 4.x, with x >= 0 */ public static final String GL4 = "GL4"; /** The desktop OpenGL compatibility profile 3.x, with x >= 1, ie GL2 plus GL3.
bc stands for backward compatibility. */ public static final String GL3bc = "GL3bc"; /** The desktop OpenGL core profile 3.x, with x >= 1 */ public static final String GL3 = "GL3"; /** The desktop OpenGL profile 1.x up to 3.0 */ public static final String GL2 = "GL2"; /** The embedded OpenGL profile ES 1.x, with x >= 0 */ public static final String GLES1 = "GLES1"; /** The embedded OpenGL profile ES 2.x, with x >= 0 */ public static final String GLES2 = "GLES2"; /** The intersection of the desktop GL2 and embedded ES1 profile */ public static final String GL2ES1 = "GL2ES1"; /** The intersection of the desktop GL3, GL2 and embedded ES2 profile */ public static final String GL2ES2 = "GL2ES2"; /** The intersection of the desktop GL3 and GL2 profile */ public static final String GL2GL3 = "GL2GL3";

Note: GL2 Profile supports OpenGL up to version 3.0 (included) - this is not a bug: OpenGL 3.1 was the big game changer

The next two code snippets show the basic steps how to set up OpenGL with JOGL 2.

Context creation

        //create a profile, in this case OpenGL 3.1 or later
        GLProfile profile = GLProfile.get(GLProfile.GL3);
        
        //configure context
        GLCapabilities capabilities = new GLCapabilities(profile);
        capabilities.setNumSamples(2); // enable anti aliasing - just as a example
        capabilities.setSampleBuffers(true);
        
        //initialize a GLDrawable of your choice
        GLCanvas canvas = new GLCanvas(capabilities);

        //register GLEventListener
        canvas.addGLEventListener(...);
        //... (start rendering thread -> start rendering...)

Rendering

    public void display(GLAutoDrawable drawable) {
        GL3 gl = drawable.getGL().getGL3();
        gl.glClear(GL.GL_COLOR_BUFFER_BIT | GL.GL_DEPTH_BUFFER_BIT);
        //.. render something
    }

Summary

Profiles make JOGL 2 very flexible and allow it to build modular and portable applications. For instance part A of an application can be written against the GL2ES2 interface and part B (which is more hardware specific) against the GL3 interface. This would in theory allow to reuse A in an embedded application and B could e.g. disable itself on old desktop hardware which runs only OpenGL 2.x or fall back to a GL2 implementation.

More information can be found on JogAmp.org (direct link to javadoc)

The next release of the OpenGL Pack for NetBeans will fully support JOGL 2. Beta builds can be found here (builds contain JOGL2 beta5):


//XPath plugin now available via NetBeans plugin portal

The XPath Utility I submitted to the NetBeans Plugin Portal over two months ago has been recently verified against NetBeans 6.7. This makes the plugin now directly available from within the IDE over the Plugin manager (Tools -> Plugins).

One .nbm less to carry with me ;)


//Object Pooling - Determinism vs. Throughput

Object pooling in java is often seen as an anti pattern and/or wasted effort - but there are still valid reasons to think about pooling for certain kind of applications.

The JVM allocates objects much faster from managed heap (young generation; contiguous and defragmented) as you could ever recycle objects from a self written pool running on top of a VM. A good configured garbage collector is also able to delete unused objects fast. GCs in fact don't delete objects explicitly, they rather evacuate all surviving objects and sweep whole memory regions in a very efficient manner and only when its necessary to reduce runtime overhead.

Object allocation (of small objects) on modern JVMs is even so fast that making a copy of immutable objects sometimes outperforms modification of mutable (and often old) objects. JVM languages like scala or clojure make heavy use of this observation. One of the reasons for that anomaly is that generational JVMs are designed to be able to deal with loads of short living objects which makes them inexpensive compared to long living objects in old generations.

Performance does not always mean Throughput

Rendering a game with 60fps might be optimal throughput for a renderer but the performance might be still unacceptable when all frames are rendered in the first half of the second with the second half spent on GC ;). Even if Object Pools may not increase system throughput they can still increase determinism of your application. Here are some observations and tips which might help:

When should I consider Object Pools?

  • GC tuning did not help - you want to try something else
  • The application creates a lot of objects which die in the old generation
  • Your Objects are expansive to create but easy to recycle
  • Determinism, e.g response time (soft real time requirements) is more important for you than throughput

Pro Pooling:

  • pools reduce GC activity in peak times (worst case scenarios)
  • are easy to implement and test (its basically an array ;))
  • are easy to disable (inject a fake pool which returns only new Objects)

Con Pooling:

  • more (old) objects are referenced when a GC kicks in (increases gc overhead)
  • memory leaks (don't forget to reclaim your objects!)
  • cause additional problems in a multi-threaded scenario (new Object() is thread safe!)
  • may decrease throughput
  • cumbersome, repetitive client code

When you decided to use pools you have to make sure to reclaim all objects as soon they are no longer used. One way of doing this is by applying the static factory method pattern for object allocation and a per object dispose method for deallocation.

/**not Thread safe!**/
public class Vector3f {
    
    private static final ObjectPool<Vector3f> pool;
    public float x, y, z;
    private boolean disposed;
    
    static{
        pool = new ObjectPool<Vector3f>(1024);
        for(int i = 0; i  < 1024; i++) {
            pool.reclaim(new Vector3f());
        }
    }

    private Vector3f() {}

    public static Vector3f create(float x, float y, float z) {
        Vector v = pool.isEmpty() ? new Vector() : pool.get();
        v.x = x;
        v.y = y;
        v.z = z;
        v.disposed = false;
        return v;
    }
    
    public void dispose() {
        if(!disposed) {
            disposed = true;
            pool.reclaim(this);
        }
    }
}

To demonstrate the perceived performance difference I captured two flyovers of my old 3d engine. The second flyover was captured with disabled object pools. The terrain engine triangulates the ground dependent on the position and view direction of the observer which makes object allocation hard to predict. The triangulation runs in parallel to the rendering thread which made the pool implementations a bit more complex as the example above.

Every vertex, normal, triangle and quad-tree node is a pooled object (wireframe on mouse over)

on the left: flyover with pre allocated object pools; right: dynamic object allocation (new Object())

Notice the pauses at 7, 17 and 26s on the flyover with disabled pools (right video).

Note on the videos: The quality is very bad since the tool I used created 700MB large files for the 30s videos a lot of frames got skipped. I even sampled them down from 1600x1200 to 1024x768 and limited the fps to 30 but the bottleneck was still the hard disk. This is the main reason why even the left video does not look smooth. (I even had to boot windows the first time in 2 years to use the tool!). I'll try to capture better vids next time.

Conclusion

Using pools requires discipline, is error prone, not good for system throughput and does not play very well with threads. However there are some attempts to make them more usable in case you think you need them. The physics engine JBullet for example uses JStackAlloc to prevent repetitive and cumbersome code by using automatic bytecode instrumentation in the build process. Type Annotations (JSR 308 targeted for OpenJDK 7) in combination with project lombok and/or the automatic resource management proposal might provide further possibilities for simplifying the usage of object pools in java and reduce the risk for memory leaks.


//NetBeans OpenGL Pack 0.5.5 released

NetBeans OpenGL Pack logo The NetBeans 6.7 compatible OpenGL Pack has been updated to version 0.5.5 and is now available on the plugin portal also. The current release is feature compatible with 0.5.4 (release notes) only JOGL and project webstart extensions have been updated to JOGL 1.1.1a security update.