Benchmarking Clojure Application

This is a snippet for benchmarking clojure application which I have used quite often in clojure applications at my job.

First an introduction to the tools:

clj-asyc-profiler: for flamegraphs and call graph.
criterion: for statistically correct benchmarking.
tufte: application level profiling.

from this we can keep both #1 and #2 in separate profile just for benchmarking and #3 must be included in the :default profile, otherwise it would defeat its purpose.

below is the code snippet for having these libraries setup in project:

(defproject clojure-benchmark "0.1.0-SNAPSHOT"
  :description "snippets to setup benchamrking tests in clojure"
  :url "http://example.com/FIXME"
  :license {:name "EPL-2.0 OR GPL-2.0-or-later WITH Classpath-exception-2.0"
            :url "https://www.eclipse.org/legal/epl-2.0/"}
  :test-selectors {:default (complement :bench)
                   :not     (fn [m s] (not (contains? m (keyword s))))
                   :bench   :bench}
  :dependencies [[org.clojure/clojure "1.11.1"]
                 [com.taoensso/tufte "2.4.5"]]
  :profiles {:test    [{:dependencies [[org.clojure/test.check "1.1.1"]
                                      [criterium "0.4.6"]]}]
             :bench   [{:test-paths   ["bench"]
                        :global-vars  {*assert* true}
                        :jvm-opts     ["-Djdk.attach.allowAttachSelf"
                                       "-XX:+UnlockDiagnosticVMOptions"
                                       "-XX:+DebugNonSafepoints"]
                        :dependencies [[com.clojure-goes-fast/clj-async-profiler "1.0.4"]]}
                       :test]}
  :repl-options {:init-ns clojure-benchmark.core})

important points to note are:

We have created a new source locaction called bench, this is added so that all of these overhead of benchmarking is not included in production code, and it can be isolated to run during local run or pipeline run.
There are :test-selectors being added so that we can specifically mark some test to be benchmark tests and run only them using lein with-profile +bench test :bench.

The overhead mentioned above will be due to including below functionlities in the code:

Runnig flamgraph profiling for the whole duration of the benchmark run.
Making sure that tufte check all the namespaces for gathering the application level metrics.
Making sure that all benchmark are being written to a file.

The code for above mentioned functionalities is:

(ns clojure-benchmark.bench
  (:require
    [clojure.java.io :as io]
    [clj-async-profiler.core :as prof]
    [taoensso.tufte :as tufte]))

;; necessary to print it so that jvm know that we are using its result
(def prof-started (let [start (atom (prof/start {}))
                        _     (println "profiling started." (prof/status))]
                       @start))

(alter-var-root #'tufte/*ns-filter* (constantly "*"))

;; accumulator for tufte result
(def stats-accumulator (tufte/add-accumulating-handler! {:ns-pattern "*"}))

;; before exiting make sure that we are printing profiling result
(System/setSecurityManager
  (proxy [SecurityManager] []
         (checkExit [status]
                    (let [_         (println "profiling ended." (prof/status))
                          _         (io/make-parents "profiling-result/random-file")
                          result    (prof/stop)
                          file-name (.getName result)]
                      (io/make-parents (format "profiling-result/%s" file-name))
                      (io/copy result (io/as-file (format "profiling-result/%s" file-name)))
                         (with-open [writer (io/writer (format "profiling-result/tufte-prof-%d.txt" (System/currentTimeMillis)))]
                                    (.write writer (tufte/format-grouped-pstats @stats-accumulator)))))
         (checkCreateClassLoader [& args]
                                 true)
         (checkPermission [& args]
                          true)))

(ns clojure-benchmark.util
  (:require [clojure.java.io :as io]
            [clojure.test :refer :all]))


(defn write-to-file [file-name f & args]
  (let [file-name (format "profiling-result/%s" file-name)
        _         (io/make-parents file-name)]
    (with-open [file (io/writer file-name)]
      (binding [*out* file]
        (apply f args)))))

Please check the sample test to understand how to use write the benchmark

(ns clojure-benchmark.core-test
  (:require [clojure.test :refer :all]
            [taoensso.tufte :refer :all]
            [criterium.core :refer [benchmark report-result]]
            [clojure-benchmark.util :refer [write-to-file]]
            [clojure-benchmark.core :refer :all]))

(defn fib [n]
  (cond (<= n 0) 0
        (= n 1) 1
        :else (+ (fib (- n 1)) (fib (- n 2)))))

(defn noob-fib [n] (p :noob (fib n)))

(def lazy-fib (lazy-cat [0 1] (map + lazy-fib (rest lazy-fib))))

(defn pro-fib [n]
    (p :pro (nth lazy-fib n)))

(deftest a-test
  (testing "test"
    (is (= (noob-fib 10) 55))
    (is (= (noob-fib 10) 55))
    (is (= (pro-fib 13) 233))
    (is (= (pro-fib 13) 233))))


(deftest ^:bench benchmark-fib
  (profile {}
    (let [input    [10 13]
          noob-fib (benchmark (doseq [inp input] (noob-fib inp)) {:verbose true :runtime true})
          pro-fib  (benchmark (doseq [inp input] (pro-fib inp)) {:verbose true :runtime true})]
      (write-to-file "pro-fib.txt" report-result pro-fib :verbose :os :runtime)
      (write-to-file "noob-fib.txt" report-result noob-fib :verbose :os :runtime))))

and its output for lein test:

lein test clojure-benchmark.core-test

Ran 1 tests containing 4 assertions.
0 failures, 0 errors.

lein with-profile +bench test :bench

profiling started. Profiling is running for 0 seconds

lein test clojure-benchmark.core-test

Ran 1 tests containing 0 assertions.
0 failures, 0 errors.
profiling ended. Profiling is running for 182 seconds

Profiling started

output of tufte:

:tufte/nil-id,
pId                   nCalls        Min      50% ≤      90% ≤      95% ≤      99% ≤        Max       Mean   MAD      Clock  Total

:noob              2,426,660     6.13μs    28.29μs    60.32μs    78.76μs   299.16μs    25.60ms    38.21μs  ±76%     1.55m     52%
:pro              34,272,466   396.00ns   786.00ns     1.24μs     2.29μs     3.17μs   917.90ms     1.04μs  ±59%    35.66s     20%
:tufte/compaction         45    39.54ms   155.80ms   456.94ms   791.48ms     1.52s      1.52s    219.91ms  ±77%     9.90s      6%

Accounted                                                                                                           2.30m     78%
Clock                                                                                                               2.96m    100%

output of benchmark:

[hdggxin@nixos:~/workspace/clojure-benchmark/profiling-result]$ cat noob-fib.txt
amd64 Linux 6.1.51 4 cpu(s)
OpenJDK 64-Bit Server VM 25.362-bga
Runtime arguments: -Dfile.encoding=UTF-8 -Djdk.attach.allowAttachSelf -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -Dclojure.compile.path=/
home/hdggxin/workspace/clojure-benchmark/target/classes -Dclojure-benchmark.version=0.1.0-SNAPSHOT -Dclojure.debug=false
Evaluation count : 2773980 in 60 samples of 46233 calls.
      Execution time sample mean : 21.751842 µs
             Execution time mean : 21.807202 µs
Execution time sample std-deviation : 2.868742 µs
    Execution time std-deviation : 2.972753 µs
   Execution time lower quantile : 20.927824 µs ( 2.5%)
   Execution time upper quantile : 25.620445 µs (97.5%)
                   Overhead used : 2.454320 ns

Found 13 outliers in 60 samples (21.6667 %)
        low-severe       4 (6.6667 %)
        low-mild         9 (15.0000 %)
 Variance from outliers : 80.7320 % Variance is severely inflated by outliers

[hdggxin@nixos:~/workspace/clojure-benchmark/profiling-result]$ cat pro-fib.txt
amd64 Linux 6.1.51 4 cpu(s)
OpenJDK 64-Bit Server VM 25.362-bga
Runtime arguments: -Dfile.encoding=UTF-8 -Djdk.attach.allowAttachSelf -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -Dclojure.compile.path=/
home/hdggxin/workspace/clojure-benchmark/target/classes -Dclojure-benchmark.version=0.1.0-SNAPSHOT -Dclojure.debug=false
Evaluation count : 34700400 in 60 samples of 578340 calls.
      Execution time sample mean : 1.807223 µs
             Execution time mean : 1.808515 µs
Execution time sample std-deviation : 165.871690 ns
    Execution time std-deviation : 166.917813 ns
   Execution time lower quantile : 1.516781 µs ( 2.5%)
   Execution time upper quantile : 2.079813 µs (97.5%)
                   Overhead used : 2.454320 ns

and finally the flamegraph:

please check full sample repo here: https://github.com/upendra1997/clojure-benchmark