//Tracking Apache Roller page views using JFR custom events

The Java Flight Recorder is great for monitoring the JVM but after checking out the new JFR Event Streaming in JDK 14, I was curious how well it would work in practice for basic logging needs. I always wanted to generate some page view statistics for this web blog but never liked the idea of tracking users in their browsers (or at least those few who turned off tracking protection for some reason or used a outdated browser).

So lets take a look if we can record custom page view events using JFR instead of logging or storing stats in a database.

The goal is to record the link the anonymous reader clicked to get to the page (or null) and the actual page on the blog which is visited.

Custom Event


    @Label("Incoming Referrer Event")
    @Description("Reader landed on the Blog.")
    @StackTrace(false)
    public class IncomingReferrerEvent extends Event {

        @Label("Referrer URL")
        private final String referrerUrl;
        
        @Label("Request URL")
        private final String requestUrl;

        public IncomingReferrerEvent(String referrerUrl, String requestUrl) {
            this.referrerUrl = referrerUrl;
            this.requestUrl = requestUrl;
        }

    }

Defining a custom JFR event is trivial, simply extend jdk.jfr.Event and annotate the fields, the code should be self explanatory.@StackTrace(false) disables the recording of stack traces since we don't need it for this event. Custom events are enabled by default. The event name is the fully qualified class name by default, but can be overwritten with @Name("dev.foo.NewName").


    new IncomingReferrerEvent(referrerUrl, requestUrl).commit();

Records the event (placed somewhere in doGet() of Apache Roller's HttpServlet). Events can have a beginning and an end, by simply calling commit() without begin() or end(), only the time stamp at commit and the event values are stored. Due to the design of the JFR you don't really have to worry about concurrency because the events are held in thread-local buffers before they are flushed to disk which should make it fairly performant even in highly parallel settings.

Configuration

The next step is to setup a JFR configuration file.



<?xml version="1.0" encoding="UTF-8"?>
<configuration version="2.0" label="Blog" description="Roller JFR Events" provider="Duke Himself">

    <event name="org.apache.roller.weblogger.jfr.IncomingReferrerEvent">
        <setting name="enabled">true</setting>
    </event>

</configuration>

I could have used the default configuration provided by the jdk but this would lead to much larger file sizes after some uptime. The configuration above only enables our custom event, everything else is turned off. For a larger blueprint see JDK_HOME/lib/jfr/default.jfc.

Once the server is running we can switch the recording on using jcmd. ($1 is the PID)


jcmd $1 JFR.configure repositorypath="/home/server/jfr" samplethreads=false
jcmd $1 JFR.start name="blogstats" settings="/home/server/jfr/blog.jfc"

First Test - 8 MB, not great, not terrible

After a quick test we can plot the summary of the recording file in the repository to check if we see any events.


jfr summary 2020_02_03_04_35_07_519/2020_02_07_16_46_39.jfr 

 Version: 2.1
 Chunks: 1
 Start: 2020-02-07 15:46:39 (UTC)

 Event Type                                             Count  Size (bytes) 
==============================================================================
 jdk.CheckPoint                                        130997      12587784
 org.apache.roller.weblogger.jfr.IncomingReferrerEvent     66          4735
 jdk.Metadata                                               1         81383
 jdk.ThreadEnd                                              0             0
 jdk.TLSHandshake                                           0             0
 jdk.ThreadSleep                                            0             0
 jdk.ThreadPark                                             0             0
 jdk.JavaMonitorEnter                                       0             0
...

There it is, we can see IncomingReferrerEvent in the record. You might be wondering what those jdk.CheckPoint events are. I wasn't sure either since I noticed them the first time with JDK 14 and a websearch resulted in no hits, so I asked on hotspot-jfr-dev. It turns out that they are part of the file format - for more details read the reply or even better the rest of the mail thread too.

The check points will create around 8 MB of data per day (one check point per second), which can be seen as static footprint without recording any actual events. 8 MB, not great, not terrible.

More Events

Having now a good estimate for the footprint, it was time to enable more events. Since the correlation of IncomingReferrerEvents with GC events, network/CPU load or thread count might be interesting given that it all is running on a Raspberry Pi 3b+ with limited resources. I ended up enabling some more events after grep'ing the code base of VisualVM for event Strings (it looks like JFR support is coming soon, the custom build I am using looks fairly complete already) and then adding them to the blog.jfc config file.


jfr summary blog_dump06.jfr 
...
 Event Type                                              Count  Size (bytes) 
==============================================================================
 jdk.CheckPoint                                        1223057     118202307
 jdk.NetworkUtilization                                 264966       4731868
 jdk.CPULoad                                            242765       6198487
 jdk.ActiveSetting                                        3348        123156
 org.apache.roller.weblogger.jfr.IncomingReferrerEvent    6367         89338
 jdk.GCHeapSummary                                         830         35942
 jdk.GarbageCollection                                     415         12577
 jdk.Metadata                                               13       1038982
 jdk.JVMInformation                                         12         27113
 jdk.GCHeapConfiguration                                    12           353
 jdk.ActiveRecording                                        12           727

Even with the added events and another 14 day test run I ended up with a file size of about 130 MB. That is about 9.28 MB per day - mostly CheckPoints ;) - in fact ~118 MB of them. luckily the file compresses nicely.

The undocumented Flag

If you take a look at the JDK sources you notice a flag with the description: "flush-interval, Minimum time before flushing buffers, measured in (s)econds..." - sounds exactly what we need.

jcmd $1 JFR.start name="blogstats" flush-interval=10s settings="/home/server/jfr/blog.jfc"

Results look promising: as expected around 10x reduction in CheckPoint events. (I don't have more data to show since I literally just added the flag and rebooted the server as I am typing this)

update: size of the record dump file after ~14.3 days (343h) is 25.13 MB which is about 1.8 MB per day - much better.

I can't recommend relying on undocumented or hidden flags since they can change or go away without notice (@see hotspot-jfr-dev discussion). The realtime nature of event streaming is also gone with a buffer delay of 10s, so keep that in mind.

The File Format

The file format for the JFR repository and record dumps isn't standardized and can change. So another thing to keep in mind if JFR is used for logging is that it might not be possible to read old JDK 14 dumps with JDK 20 in future (although reading JDK 11 dumps with 14 works just fine). But this doesn't sound like a big issue to me since JDKs don't just disappear, it will always be possible to open JDK 14 dumps with JDK 14 (and maybe even with JDK 20 from the future if backwards compatibility can be maintained).

Conclusion

Although the static footprint of the JFR in JDK 14 came as a surprise I am not too concerned of it. Storage is cheap and I am sure the issue will be solved or reduced to a minimum in one way or another.

The benefit of tracking my blog stats with JFR is that I can inspect them with the same tools I would use for JVM performance monitoring (VisualVM, JDK Mission Control, etc). JFR Event Streaming and the OpenJDK command line tools make it also easy to access the data in a structured manner.

JFR can not only be used for JVM monitoring but certainly also for structured logging in some scenarios. Log files will probably never disappear completely and I am sure JFR (or the original JRockit Flight Recorder implementation) was never intended to fully replace a logging framework but maybe one day logs will mostly contain lines produced by Event#toString() running dual stacks where it is still required :).

fly safe