blog.kdgregory.com

Thursday, July 14, 2016

Advice for Young Programmers

I am sometimes asked if I have any words of advice for young people. Well, here are a few simple admonitions.

–William S Burroughs

It's easier to solve problems if you don't panic. But panic may be a mandatory team activity.

There's more than one way to skin a cat. The way you choose determines whether you end up with a clean pelt or a tattered bloody mess.*

The difference between retrospective and second-guessing is that the former tries to reconcile benefits and consequences, while the latter says “this time we'll get it right.”

Experience doesn't make you smarter than anyone else. But once you've seen and made enough mistakes it sometimes seems that way. Unless you didn't learn from them.

Occam's razor is your most important tool. But if you're careless it will cut you.

Unit testing with mock objects lets you verify that your incorrect assumptions are internally consistent.**

Startups are a lot of fun, but eventually they'll burn you out. Join one for the challenge and excitement, not the stock options. You'll be long gone before they become worthless.

By all means spend some time in management. You might like it. You might even be good at it. But always remember: the role of a manager is to satisfy infinite desires with limited resources. And there's only a little stigma to saying “I don't like this” and returning to the front lines.

Everybody has a brand; it's the sum of everything that everybody else perceives about them. The secret to personal and professional success is to understand your brand, actively change what you don't like, and think carefully before changing what you do.

If you interview at a company that refers to people as “resources,” get up and leave. It doesn't matter if the person interviewing you is a nice person, or that your potential manager says the right things. Upper management doesn't value you, and won't think twice about replacing you with someone cheaper.

The world is not divisible into two groups of people, but three: thinkers, non-thinkers, and dogmatic non-thinkers. Non-thinkers will turn into thinkers if they value the outcome. Dogmatic non-thinkers won't, and will hate you for trying to enlighten them.

Whatever you call them, fools are unavoidable. The important thing is to avoid becoming one yourself.


* Stolen from Mike Boldi

** Stolen from Drew Sudell

Saturday, June 4, 2016

Target Fixation

Motorcyclists have a saying: you go where you look. If an animal runs out in front of you, or you there's a patch of sand in the middle of the corner, or a Corvette is coming the other way, your first response has to be to look elsewhere. If not, you'll almost certainly hit whatever it is that you didn't want to hit.

Another name for this phenomena is target fixation, and that name was driven home to me quite literally — and painfully — in a paintball game many years ago. I was slowly and carefully positioning myself to shoot one of the other players, when all of a sudden I felt a paintball hit the middle of my back. I was so fixated on my target that I stopped paying attention to what was around me.

I suspect that target fixation was an enormous help to our hunter-gatherer ancestors stalking their dinner. They would only get one chance to bring down their quarry, and didn't have the benefit of high-powered rifles and telescopic sights. To a modern human, surrounded by opportunities to fixate on the wrong thing, it's not so great.

Physical dangers are one thing, but we're also faced with intellectual dangers. If focus too closely on the scary thing that's right in front of you, you'll ignore all the pitfalls that lie just beyond. This is a particular concern for software developers, who may adopt and implement a particular design without taking the time to think of the ways that it can fail — or of alternative designs that are simpler and more robust.

For example, you might implement a web application that requires shared state, and become so fixated on transactional access to that state that you don't think about contention … until you start running at scale, and discover the delays that synchronization introduces. If you weren't fixated on concurrent access, you might have thought of better ways to share the state without the need for transactions.

So, how to avoid becoming fixated? In the physical world, where fixation has potentially deadly consequences, training programs focus on prevention via ritual. For motorcyclists, the ritual is “SEE”: search, evaluate, execute. For pilots, there are many rituals, but one that was burned into my brain is aviate, navigate, communicate.

For software development, I think that a preemptive “five whys” exercise is a useful way to avoid design fixation. This exercise is usually used after a problem occurs, to identify the root cause of the problem and potential solutions: you keep asking “why did this happen” until there are no more answers. Recast as a pre-emptive exercise, it is meant to challenge — and ultimately validate — the assumptions that underly your design.

Returning to the concurrency example, the first question might be “why do I want to prevent concurrent access?” One possible answer is ”this is inventory data, and we don't want two customers to buy the last item.” That could lead to several other questions, such as “do I need to use a database transaction?” and “do I need to make that guarantee at this point in the process?”

The chief danger in this exercise is “analysis paralysis,” which is itself a form of target fixation. To move forward, you must accept that you are making assumptions, and be comfortable that they're valid assumptions. If you fixate on the possibility that your assumptions are invalid, you'll never move.

You also need to recognize that, while target fixation is often dangerous, it can have a positive side: preventing you from paying attention to irrelevant details.

I had a real-world experience of this sort a few weeks ago, while riding my motorcycle on a twisting country road: I saw a pickup truck coming the other way and not keeping to his lane. With a closing speed in excess of 100 miles per hour there wasn't much time to make a decision, and not many good decisions to make. I could continue as I was going, assume that the driver would see me and be able to keep within his lane; if I was wrong in that assumption, my trip would be over. I could get on the brakes hard, now, but would come to a stop at the exact point where the pickup would leave his lane while exiting the corner.

My best option was to stop just past the apex of the corner, which would be where the pickup was most likely to be within his lane. I fixated on that spot, and let the muscle memory of 100,000+ miles balance the braking and turning forces necessary to get me there. I have no idea how close the truck came to hitting me; my riding partner said that it was an “oh shit” moment. But once I picked my destination, the pickup truck and everything around me simply disappeared.

Which leads me to think that there might be another name for the phenomena: “flow.”

Monday, May 23, 2016

How (and When) Clojure Compiles Your Code

When I started working with Clojure, one of the challenges I faced was understanding exactly what Clojure was doing with my code. I was intimately familiar with how Java and the JVM works, and tried to slot Clojure into that mental model — but found too many cases where my model didn't quite represent reality. The docs weren't much help: they talked about the features of the language (including compilation), but didn't provide detail on what “automatically compiled to JVM bytecode on the fly” actually meant.

I think that detail is important, especially if you come to Clojure from Java or are running your Clojure code within a Java-centric framework. And I see enough questions on the Internet to realize that not a lot of people actually understand how Clojure works. So this post demonstrates a few key points that are the basis of my new mental model.

I'm using Clojure 1.8 for examples, but I believe that everything that I say is correct for versions as early as 1.6 and probably before.

Clojure is a scripting language

I'll start with some definitions:
  • Compiled languages translate source code into an artifact that is then loaded, unchanged into the runtime. Any decisions in the code rely on state maintained within the executing program.
  • Scripting languages load the source code into the runtime, and execute that code as it is loading. Scripts may make decisions based on state that they manage, as well as global state that has been set by other scripts.

Clojure is very much a scripting language, even though it compiles its scripts into JVM bytecode. All Clojure source files are processed one expression at a time: the reader reads characters from the source until it finds a valid expression (a list of symbols delimited by balanced parentheses), expands any macros, compiles the result to bytecode, then executes that bytecode.

It doesn't matter whether the source code is entered by hand into the REPL, or read from a file as part of a (require ...) form. It's processed a single top-level expression at a time.

Note my term: “top-level” expression. You won't find this term in the Clojure docs; they refer to “expressions” and “forms” more-or-less interchangeably, but don't differentiate between expressions that are nested within other expressions. The reason that I do will become apparent later on.

In my opinion, this form of evaluation is what gives macros their power (the oft-proclaimed homoiconicity of the language simply means that they're easier to write). A macro is able to use any information that has already been loaded into the runtime, including variables that have been created earlier by the same or a different script.

Top-level expressions turn into classes

Here's an interesting experiment: start up a Clojure REPL with the following command (using the correct path to the Clojure distribution JAR):

java -XX:+TraceClassLoading -jar clojure-1.8.0.jar

You'll see a long list of classloading messages flash by before you end up at the REPL. Now enter a simple expression, such as (+ 1 2); you'll see more messages, as the Clojure runtime loads the classes that it needs. Enter that same expression again, and you'll see something like this:

user=> (+ 1 2)
[Loaded user$eval3 from __JVM_DefineClass__]
3

This message indicates that Clojure compiled that expression to bytecode and then loaded the newly-created class to execute it. The class definition is still in memory, and you can inspect it. For example, you can look at its superclass (I've removed the now-distracting classloader messages):

user=> (.getSuperclass (Class/forName "user$eval3"))
clojure.lang.AFunction

AFunction is a class within the Clojure runtime; it is a subclass of AFn, which implements the invoke() method. With this knowledge, it's apparent that the evaluation of this simple expression has four steps:

  1. Parse the expression (including macro expansion, which doesn't apply to this case) and generate the bytes that correspond to a Java .class file.
  2. Pass these bytes to a classloader, getting a Java Class back.
  3. Instantiate this class.
  4. Call invoke on the resulting object.

You can, in fact, do all of this by hand, provided that you know the classname:

user=> (.invoke (.newInstance (Class/forName "user$eval3")))
3

OK, so far so good. Now I want to show why earlier I called out a distinction between “top-level” expressions and nested expressions:

user=> (* 3 (+ 1 2))
[Loaded user$eval5 from __JVM_DefineClass__]
9

Here we have an expression that contains a nested expression. However, note that only one class was generated as a result. In theory, every expression could turn into its own class. Clojure takes a more pragmatic approach, which is a good thing for our memory footprint.

Variables are wrapped in objects

To outward appearances, a Clojure variable is similar to a final Java variable: you assign it (once) with def, and retrieve its value simply by inserting the variable in an expression:

user=> (def x 10)
#'user/x

user=> (class x)
java.lang.Long

user=> (+ x 2)
12

A hint of the truth can be seen if you attempt to use an unbound var in an expression:

user=> (def y)
#'user/y

user=> (+ 2 y)

ClassCastException clojure.lang.Var$Unbound cannot be cast to java.lang.Number  clojure.lang.Numbers.add (Numbers.java:128)

In fact, variables are instances of clojure.lang.Var, which provides functions to get and set the variable's value. When you reference a variable within an expression, that reference translates into a method call that retrieves the actual value.

This allows a great deal of flexibility, including the ability to redefine variables. Application code can do this on a per-thread basis using binding and set!, or within a call tree using with-redefs. The Clojure runtime does much more, such as redefining all variables when you reload a namespace.

A namespace is not a class

For someone coming from a Java background, this is perhaps the hardest thing to grasp. A namespace definition certainly looks like a class definition: you have a dot-delimited namespace identifier, which corresponds to the path where you save the source code. And when you invoke a function from a namespace, you use the same syntax that you would to invoke a static method from a Java class.

The first hint that the two aren't equivalent is that the ns macro doesn't enclose the definitons within the namespace. Another is that you can switch between namespaces at will and add new definitions to each:

user=> (ns foo)
nil
foo=> (def x 123)
#'foo/x
foo=> (ns bar)
nil
bar=> (def x 456)
#'bar/x
bar=> (ns foo)
nil
foo=> (def y 987)
#'foo/y

You could take the above code snippets, save them in an arbitrary file, and then use the load-file function to execute that file as a script. In fact, you could write your entire application, with namespaces, as a single script.

But most (sane) people don't do that. Instead, they create one source file per namespace, store that file in a directory derived from the namespace name, and use the require function to load it (or more often, a :require directive in some other ns declaration).

Loading code from the classpath: require and load

The :require directive is another point of confusion for a Java developer starting Clojure. It certainly looks like the import statement that we already know, especially when it's used in an ns invocation:

(ns example.main
    :require [example.utils :as utils])

In reality, :require is almost, but not quite, entirely unlike import. The Java compiler uses import to load definitions from an already-compiled class so that they can be referenced by the class that's currently being compiled. On a superficial level, the Clojure runtime does the same thing when it sees :require, but it does this by loading (and compiling) the source code for that namespace.

OK, there are some caveats to that statement. First is that require only loads a namespace once, unless you specify the :reload option. So if the required namespace has already been loaded, it won't be loaded again. And if the namespace has already been compiled, and the source file is older than the compiled files, then the runtime loads the already compiled form. But still, there's a lot of stuff happening as the result of a seemingly simple directive.

So, let's dig into the behavior of require, along with its step-brother load. Earlier I wrote about using load-file to load an arbitrary file into the REPL. Here's that file, followed by the command to load and run it:

(ns foo)
(def x 123)

(ns bar)
(def x 456)

(ns user)
(do (println "myscript!") (+ foo/x bar/x))
user=> (load-file "src/example/myscript.clj")
myscript!
579

When you load the file, it creates definitions within the two namespaces, then invokes an expression to add them. After loading the file, you can access those variables from the REPL:

user=> (* foo/x bar/x)
56088

The load function is similar, but loads files relative to the classpath. It also assumes a .clj extension. I'm using Leiningen, so my classpath is everything under src; therefore, I can load the same file like so:

user=> (load "example/myscript")
myscript!
nil

Wait a second, what happened to the expression at the end of the script? It was still evaluated — the println executed — but the result was discarded and load returned nil.

Now let's try loading this same script with require:

user=> (require 'example.myscript :reload)
myscript!
nil

Different syntax, same result. The two variables are defined in their respective namespaces, and the stand-alone expression was evaluated. So what's the difference?

The first difference is that require gives you a bunch of options. For example, you can use :as to create a short alias for the namespace, so that you don't have to reference its vars with fully-qualified names. The way that the runtime uses these flags is probably worthy of a post of its own.

Another difference is that require is a little smarter about loading scripts: it only loads (and compiles) a script if it hasn't already done so — unless, of course, you use the :reload or :reload-all options, like I did here. Omitting that option, we see that a second require doesn't invoke the println.

user=> (require 'example.myscript)
nil

Compiling your code (or, :gen-class doesn't do what you might think)

As you've seen above, the Clojure runtime normally compiles your code when it's loaded, producing the bytes of a .class file but not writing them to the filesystem. However, there are times that you want a real, on-disk class. For example, so that you can invoke that class from Java (note that you'll still need the Clojure JAR on your classpath). Or so that you can reduce startup time for a Clojure application, by avoiding load-time compilation (although I think this is probably premature optimization).

The compile function turns Clojure scripts into classes:

user=> (compile 'example.foo)
example.foo

That was simple enough. Note, however, that I was running in lein repl, which sets the *compile-path* runtime global to a directory that it knows exists. If you try to execute this function from the clojure.main REPL, it will fail unless you create the directory classes.

Here's the example file that I compiled:

(ns example.foo)

(def x 123)

(defn what [] "I'm compiled!")

(defn add2 [x] (+ 2 x))

And here are the classes that it produced:

-rw-rw-r--   1 kgregory kgregory     3008 May  7 09:29 target/base+system+user+dev/classes/example/foo__init.class
-rw-rw-r--   1 kgregory kgregory      683 May  7 09:29 target/base+system+user+dev/classes/example/foo$add2.class
-rw-rw-r--   1 kgregory kgregory     1320 May  7 09:29 target/base+system+user+dev/classes/example/foo$fn__1194.class
-rw-rw-r--   1 kgregory kgregory     1503 May  7 09:29 target/base+system+user+dev/classes/example/foo$loading__5569__auto____1192.class
-rw-rw-r--   1 kgregory kgregory      513 May  7 09:29 target/base+system+user+dev/classes/example/foo$what.class

If you're a bytecode geek like me, you'll of course run javap -c on those files to see what they contain (especially fn__1194, which doesn't appear anywhere in the source!). Have at it. For everyone else, here are the two things I think are important:

  • Every function turns into its own class. If you've been reading along, you aready knew that.
  • The foo__init class is responsible for pulling all of the other classes into memory, creating instances of those classes, and assigning them to vars in the namespace.

If you use Leiningen, you've probably noted that it adds a :gen-class directive to the main class of any “app” project that it creates. If you skim the docs for gen-class you might think this will produce a Java class that exposes all of your namespace's functions. Let's see what really happens, by adding a :gen-class directive to the example script:

(ns example.foo
  (:gen-class))

When you compile, the list of classes now looks like this:

-rw-rw-r--   1 kgregory kgregory     1823 May  7 09:31 target/base+system+user+dev/classes/example/foo.class
-rw-rw-r--   1 kgregory kgregory     3009 May  7 09:31 target/base+system+user+dev/classes/example/foo__init.class
-rw-rw-r--   1 kgregory kgregory      683 May  7 09:31 target/base+system+user+dev/classes/example/foo$add2.class
-rw-rw-r--   1 kgregory kgregory     1320 May  7 09:31 target/base+system+user+dev/classes/example/foo$fn__1194.class
-rw-rw-r--   1 kgregory kgregory     1505 May  7 09:31 target/base+system+user+dev/classes/example/foo$loading__5569__auto____1192.class
-rw-rw-r--   1 kgregory kgregory      513 May  7 09:31 target/base+system+user+dev/classes/example/foo$what.class

Everything's the same, except that we now have foo.class. Looking at this class with javap, we find that it contains overrides of the basic Object methods: equals(), hashCode(), toString(), and clone(). It also creates a Java-standard main() function, which looks for the Clojure-standard -main (which doesn't exist for our script, so will fail if invoked). But it doesn't expose any of your functions.

Reading the doc more closely, if you want to use :gen-class to expose your functions, you need to specify the exposed functions in the directive itself — and use a specified naming format that separates the Clojure method implementations from the names exposed to Java.

Pitfalls of compiling your code

Let's change the namespace declaration on foo, so that it requires bar:

(ns example.foo
  (:require [example.bar :as bar]))

This results in the the expected classes for foo, but also several for bar (which doesn't define any functions):

-rw-rw-r--   1 kgregory kgregory     2219 May  7 09:39 target/base+system+user+dev/classes/example/bar__init.class
-rw-rw-r--   1 kgregory kgregory     1320 May  7 09:39 target/base+system+user+dev/classes/example/bar$fn__1196.class
-rw-rw-r--   1 kgregory kgregory     1503 May  7 09:39 target/base+system+user+dev/classes/example/bar$loading__5569__auto____1194.class
-rw-rw-r--   1 kgregory kgregory     3009 May  7 09:39 target/base+system+user+dev/classes/example/foo__init.class
-rw-rw-r--   1 kgregory kgregory      683 May  7 09:39 target/base+system+user+dev/classes/example/foo$add2.class
-rw-rw-r--   1 kgregory kgregory     1320 May  7 09:39 target/base+system+user+dev/classes/example/foo$fn__1198.class
-rw-rw-r--   1 kgregory kgregory     1891 May  7 09:39 target/base+system+user+dev/classes/example/foo$loading__5569__auto____1192.class
-rw-rw-r--   1 kgregory kgregory      513 May  7 09:39 target/base+system+user+dev/classes/example/foo$what.class

This makes perfect sense: ff you want to ahead-of-time compile one namespace, you probably don't want its dependencies to be compiled at runtime. But recognize that the tree of dependencies can run very deep, and will include any third-party libraries that you use (poking around Clojars, there aren't a lot of libraries that come precompiled).

There is one other detail of compilation that may cause concern: require loads a namespace from the file(s) with the latest modification time. If you have both source and compiled classes on your classpath, this could mean that you're not loading what you think you are. Fortunately, in practice this primarily affects work in the REPL: Leiningen removes the target directory as part of the jar and uberjar tasks, so you won't produce an artifact with a source/class mismatch.

Wrap-up

This has been a long post, so I'll wrap up with what I consider the main points.

  • Startup times for Clojure applications will be longer than for normal Java applications, because of the additional step of compiling and evaluating each expression. This isn't going to be an issue if you've written a long-running server in Clojure, but it does add significant overhead to short-running programs (so Clojure is even less appropriate for small command-line utilities than Java).
  • Pay attention to the Clojure version used by your dependencies, because they might rely on functions from a newer version than your application; this problem manifests itself as an “Unable to resolve symbol” runtime error. While this is a general issue with transitive dependencies, I've found that third-party libraries tend to be at the latest version, while corporate applications tend to use whatever was current when they were begun.
  • As far as I can tell, the Clojure runtime doesn't ever unload the classes that it creates. This means that — on pre-1.8 JVMs — you can fill the permgen space. Not a big problem in development, but be careful if you use a REPL when connected to a production instance.
  • Every script that you load adds to the global state of the runtime. Be aware that the behavior of your scripts may be dependent on the order that they're loaded.