When I started working with Clojure, one of the challenges I faced was understanding
exactly what Clojure was doing with my code. I was intimately familiar with how Java
and the JVM works, and tried to slot Clojure into that mental model — but found
too many cases where my model didn't quite represent reality. The
docs weren't much
help: they talked about the features of the language (including compilation), but
didn't provide detail on what “automatically compiled to JVM bytecode on the
fly” actually meant.
I think that detail is important, especially if you come to Clojure from Java or are
running your Clojure code within a Java-centric framework. And I see enough questions
on the Internet to realize that not a lot of people actually understand how Clojure
works. So this post demonstrates a few key points that are the basis of my new mental
model.
I'm using Clojure 1.8 for examples, but I believe that everything that I say is
correct for versions as early as 1.6 and probably before.
Clojure is a scripting language
I'll start with some definitions:
- Compiled languages translate source code into an artifact that is then loaded,
unchanged into the runtime. Any decisions in the code rely on state
maintained within the executing program.
- Scripting languages load the source code into the runtime, and execute that
code as it is loading. Scripts may make decisions based on state that they
manage, as well as global state that has been set by other scripts.
Clojure is very much a scripting language, even though it compiles its scripts into
JVM bytecode. All Clojure source files are processed one expression at a time: the
reader reads characters from the source until it finds a valid expression (a list
of symbols delimited by balanced parentheses), expands any macros, compiles the
result to bytecode, then executes that bytecode.
It doesn't matter whether the source code is entered by hand into the REPL, or read
from a file as part of a (require ...)
form. It's processed a single
top-level expression at a time.
Note my term: “top-level” expression. You won't find this term in the
Clojure docs; they refer to “expressions” and “forms”
more-or-less interchangeably, but don't differentiate between expressions that
are nested within other expressions. The reason that I do will become apparent
later on.
In my opinion, this form of evaluation is what gives macros their power (the
oft-proclaimed homoiconicity of the language simply means that they're easier
to write). A macro is able to use any information that has already been loaded
into the runtime, including variables that have been created earlier by the same
or a different script.
Top-level expressions turn into classes
Here's an interesting experiment: start up a Clojure REPL with the following command
(using the correct path to the Clojure distribution JAR):
java -XX:+TraceClassLoading -jar clojure-1.8.0.jar
You'll see a long list of classloading messages flash by before you end up at the
REPL. Now enter a simple expression, such as (+ 1 2)
; you'll see more
messages, as the Clojure runtime loads the classes that it needs. Enter that same
expression again, and you'll see something like this:
user=> (+ 1 2)
[Loaded user$eval3 from __JVM_DefineClass__]
3
This message indicates that Clojure compiled that expression to bytecode and then
loaded the newly-created class to execute it. The class definition is still in
memory, and you can inspect it. For example, you can look at its superclass
(I've removed the now-distracting classloader messages):
user=> (.getSuperclass (Class/forName "user$eval3"))
clojure.lang.AFunction
AFunction is a class within the Clojure runtime; it is a subclass of
AFn, which implements the invoke()
method. With this
knowledge, it's apparent that the evaluation of this simple expression has four steps:
- Parse the expression (including macro expansion, which doesn't apply to this case)
and generate the bytes that correspond to a Java
.class
file.
- Pass these bytes to a classloader, getting a Java
Class
back.
- Instantiate this class.
- Call
invoke
on the resulting object.
You can, in fact, do all of this by hand, provided that you know the classname:
user=> (.invoke (.newInstance (Class/forName "user$eval3")))
3
OK, so far so good. Now I want to show why earlier I called out a distinction between
“top-level” expressions and nested expressions:
user=> (* 3 (+ 1 2))
[Loaded user$eval5 from __JVM_DefineClass__]
9
Here we have an expression that contains a nested expression. However, note that
only one class was generated as a result. In theory, every expression
could turn into its own class. Clojure takes a more pragmatic approach, which is
a good thing for our memory footprint.
Variables are wrapped in objects
To outward appearances, a Clojure variable is similar to a final Java variable:
you assign it (once) with def
, and retrieve its value simply by
inserting the variable in an expression:
user=> (def x 10)
#'user/x
user=> (class x)
java.lang.Long
user=> (+ x 2)
12
A hint of the truth can be seen if you attempt to use an unbound var in an
expression:
user=> (def y)
#'user/y
user=> (+ 2 y)
ClassCastException clojure.lang.Var$Unbound cannot be cast to java.lang.Number clojure.lang.Numbers.add (Numbers.java:128)
In fact, variables are instances of
clojure.lang.Var
, which provides functions to get and set
the variable's value. When you reference a variable within an expression, that
reference translates into a method call that retrieves the actual value.
This allows a great deal of flexibility, including the ability to redefine variables.
Application code can do this on a per-thread basis using binding
and
set!
, or within a call tree using with-redefs
. The Clojure
runtime does much more, such as redefining all variables when you reload a namespace.
A namespace is not a class
For someone coming from a Java background, this is perhaps the hardest thing to
grasp. A namespace definition certainly looks like a class definition:
you have a dot-delimited namespace identifier, which corresponds to the path
where you save the source code. And when you invoke a function from a namespace,
you use the same syntax that you would to invoke a static method from a Java
class.
The first hint that the two aren't equivalent is that the ns
macro
doesn't enclose the definitons within the namespace. Another is that you can
switch between namespaces at will and add new definitions to each:
user=> (ns foo)
nil
foo=> (def x 123)
#'foo/x
foo=> (ns bar)
nil
bar=> (def x 456)
#'bar/x
bar=> (ns foo)
nil
foo=> (def y 987)
#'foo/y
You could take the above code snippets, save them in an arbitrary file, and then
use the load-file
function to execute that file as a script. In fact,
you could write your entire application, with namespaces, as a single script.
But most (sane) people don't do that. Instead, they create one source file per
namespace, store that file in a directory derived from the namespace name, and use
the require
function to load it (or more often, a :require
directive in some other ns
declaration).
Loading code from the classpath: require
and load
The :require
directive is another point of confusion for a Java
developer starting Clojure. It certainly looks like the import
statement that we already know, especially when it's used in an ns
invocation:
(ns example.main
:require [example.utils :as utils])
In reality, :require
is almost, but not quite, entirely unlike
import
. The Java compiler uses import
to load definitions
from an already-compiled class so that they can be referenced by the class that's
currently being compiled. On a superficial level, the Clojure runtime does
the same thing when it sees :require
, but it does this by loading
(and compiling) the source code for that namespace.
OK, there are some caveats to that statement. First is that require
only loads a namespace once, unless you specify the :reload
option.
So if the required namespace has already been loaded, it won't be loaded again.
And if the namespace has already been compiled, and the source file is older than
the compiled files, then the runtime loads the already compiled form. But still,
there's a lot of stuff happening as the result of a seemingly simple directive.
So, let's dig into the behavior of require
, along with its step-brother
load
. Earlier I wrote about using load-file
to load an
arbitrary file into the REPL. Here's that file, followed by the command to load and
run it:
(ns foo)
(def x 123)
(ns bar)
(def x 456)
(ns user)
(do (println "myscript!") (+ foo/x bar/x))
user=> (load-file "src/example/myscript.clj")
myscript!
579
When you load the file, it creates definitions within the two namespaces, then invokes
an expression to add them. After loading the file, you can access those variables from
the REPL:
user=> (* foo/x bar/x)
56088
The load
function is similar, but loads files relative to the classpath.
It also assumes a .clj
extension. I'm using Leiningen, so my classpath
is everything under src
; therefore, I can load the same file like so:
user=> (load "example/myscript")
myscript!
nil
Wait a second, what happened to the expression at the end of the script? It was still
evaluated — the println
executed — but the result was discarded
and load
returned nil
.
Now let's try loading this same script with require
:
user=> (require 'example.myscript :reload)
myscript!
nil
Different syntax, same result. The two variables are defined in their respective
namespaces, and the stand-alone expression was evaluated. So what's the difference?
The first difference is that require
gives you a bunch of options.
For example, you can use :as
to create a short alias for the namespace,
so that you don't have to reference its vars with fully-qualified names. The way
that the runtime uses these flags is probably worthy of a post of its own.
Another difference is that require
is a little smarter about loading
scripts: it only loads (and compiles) a script if it hasn't already done so —
unless, of course, you use the :reload
or :reload-all
options,
like I did here. Omitting that option, we see that a second require
doesn't
invoke the println
.
user=> (require 'example.myscript)
nil
Compiling your code (or, :gen-class doesn't do what you might think)
As you've seen above, the Clojure runtime normally compiles your code when it's
loaded, producing the bytes of a .class
file but not writing them to
the filesystem. However, there are times that you want a real, on-disk class.
For example, so that you can invoke that class from Java (note that you'll still
need the Clojure JAR on your classpath). Or so that you can reduce startup time
for a Clojure application, by avoiding load-time compilation (although I think
this is probably premature optimization).
The compile
function turns Clojure scripts into classes:
user=> (compile 'example.foo)
example.foo
That was simple enough. Note, however, that I was running in lein repl
,
which sets the *compile-path*
runtime global to a directory that it
knows exists. If you try to execute this function from the clojure.main
REPL, it will fail unless you create the directory classes
.
Here's the example file that I compiled:
(ns example.foo)
(def x 123)
(defn what [] "I'm compiled!")
(defn add2 [x] (+ 2 x))
And here are the classes that it produced:
-rw-rw-r-- 1 kgregory kgregory 3008 May 7 09:29 target/base+system+user+dev/classes/example/foo__init.class
-rw-rw-r-- 1 kgregory kgregory 683 May 7 09:29 target/base+system+user+dev/classes/example/foo$add2.class
-rw-rw-r-- 1 kgregory kgregory 1320 May 7 09:29 target/base+system+user+dev/classes/example/foo$fn__1194.class
-rw-rw-r-- 1 kgregory kgregory 1503 May 7 09:29 target/base+system+user+dev/classes/example/foo$loading__5569__auto____1192.class
-rw-rw-r-- 1 kgregory kgregory 513 May 7 09:29 target/base+system+user+dev/classes/example/foo$what.class
If you're a bytecode geek like me, you'll of course run javap -c
on those
files to see what they contain (especially fn__1194
, which doesn't appear
anywhere in the source!). Have at it. For everyone else, here are the two things I think
are important:
- Every function turns into its own class. If you've been reading along,
you aready knew that.
- The
foo__init
class is responsible for pulling all of the
other classes into memory, creating instances of those classes, and
assigning them to vars in the namespace.
If you use Leiningen, you've probably noted that it adds a :gen-class
directive to the main class of any “app” project that it creates. If
you skim the docs for gen-class you might think this will produce a
Java class that exposes all of your namespace's functions. Let's see what really
happens, by adding a :gen-class
directive to the example script:
(ns example.foo
(:gen-class))
When you compile, the list of classes now looks like this:
-rw-rw-r-- 1 kgregory kgregory 1823 May 7 09:31 target/base+system+user+dev/classes/example/foo.class
-rw-rw-r-- 1 kgregory kgregory 3009 May 7 09:31 target/base+system+user+dev/classes/example/foo__init.class
-rw-rw-r-- 1 kgregory kgregory 683 May 7 09:31 target/base+system+user+dev/classes/example/foo$add2.class
-rw-rw-r-- 1 kgregory kgregory 1320 May 7 09:31 target/base+system+user+dev/classes/example/foo$fn__1194.class
-rw-rw-r-- 1 kgregory kgregory 1505 May 7 09:31 target/base+system+user+dev/classes/example/foo$loading__5569__auto____1192.class
-rw-rw-r-- 1 kgregory kgregory 513 May 7 09:31 target/base+system+user+dev/classes/example/foo$what.class
Everything's the same, except that we now have foo.class
. Looking
at this class with javap
, we find that it contains overrides of the
basic Object
methods: equals()
, hashCode()
,
toString()
, and clone()
. It also creates a Java-standard
main()
function, which looks for the Clojure-standard -main
(which doesn't exist for our script, so will fail if invoked). But it doesn't expose
any of your functions.
Reading the doc more closely, if you want to use :gen-class
to expose
your functions, you need to specify the exposed functions in the directive itself
— and use a specified naming format that separates the Clojure method
implementations from the names exposed to Java.
Pitfalls of compiling your code
Let's change the namespace declaration on foo
, so that it requires
bar
:
(ns example.foo
(:require [example.bar :as bar]))
This results in the the expected classes for foo
, but also several
for bar
(which doesn't define any functions):
-rw-rw-r-- 1 kgregory kgregory 2219 May 7 09:39 target/base+system+user+dev/classes/example/bar__init.class
-rw-rw-r-- 1 kgregory kgregory 1320 May 7 09:39 target/base+system+user+dev/classes/example/bar$fn__1196.class
-rw-rw-r-- 1 kgregory kgregory 1503 May 7 09:39 target/base+system+user+dev/classes/example/bar$loading__5569__auto____1194.class
-rw-rw-r-- 1 kgregory kgregory 3009 May 7 09:39 target/base+system+user+dev/classes/example/foo__init.class
-rw-rw-r-- 1 kgregory kgregory 683 May 7 09:39 target/base+system+user+dev/classes/example/foo$add2.class
-rw-rw-r-- 1 kgregory kgregory 1320 May 7 09:39 target/base+system+user+dev/classes/example/foo$fn__1198.class
-rw-rw-r-- 1 kgregory kgregory 1891 May 7 09:39 target/base+system+user+dev/classes/example/foo$loading__5569__auto____1192.class
-rw-rw-r-- 1 kgregory kgregory 513 May 7 09:39 target/base+system+user+dev/classes/example/foo$what.class
This makes perfect sense: ff you want to ahead-of-time compile one namespace, you
probably don't want its dependencies to be compiled at runtime. But recognize that
the tree of dependencies can run very deep, and will include any third-party
libraries that you use (poking around Clojars, there aren't a lot of libraries
that come precompiled).
There is one other detail of compilation that may cause concern: require
loads a namespace from the file(s) with the latest modification time. If you have both
source and compiled classes on your classpath, this could mean that you're not loading
what you think you are. Fortunately, in practice this primarily affects work in the
REPL: Leiningen removes the target
directory as part of the jar
and uberjar
tasks, so you won't produce an artifact with a source/class
mismatch.
Wrap-up
This has been a long post, so I'll wrap up with what I consider the main points.
- Startup times for Clojure applications will be longer than for normal Java
applications, because of the additional step of compiling and evaluating each
expression. This isn't going to be an issue if you've written a long-running
server in Clojure, but it does add significant overhead to short-running
programs (so Clojure is even less appropriate for small command-line utilities
than Java).
- Pay attention to the Clojure version used by your dependencies, because they
might rely on functions from a newer version than your application; this
problem manifests itself as an “Unable to resolve symbol” runtime
error. While this is a general issue with transitive dependencies, I've found
that third-party libraries tend to be at the latest version, while corporate
applications tend to use whatever was current when they were begun.
- As far as I can tell, the Clojure runtime doesn't ever unload the classes that
it creates. This means that — on pre-1.8 JVMs — you can fill the
permgen space. Not a big problem in development, but be careful if you use a
REPL when connected to a production instance.
- Every script that you load adds to the global state of the runtime. Be aware
that the behavior of your scripts may be dependent on the order that they're
loaded.