Thursday, April 9, 2020

Stupid Git Tricks: Interactive Rebase

I like to provide a nice, clean history in my pull requests. Reviewers should be able to follow each commit, and see how the functionality is built up. No extranneous commits. Nothing out of order. Everything looking like one smooth path from idea to implementation.

Unfortunately, my development process doesn't quite work that way. For one thing, I commit (and push) frequently — as in every 10-15 minutes when I'm making lots of changes. For another, I'll often realize that there's a small change that should have been introduced several commits previously. For these, and other reasons, I find git rebase -i invaluable.

OK, some of you are probably outraged: “you're changing history!” Settle down. This is for development branches, not master. And I'm willing to adapt in a team setting: if my team members want to see messy commit histories in a pull request, I'm OK with giving that to them. But only if they squash merges.

So, here are a few of the ways that I change history. You're free to avoid them.

Combining commits

Here's one morning's commit history:

commit 6aefd6989ba7712cb047d661b68d34c888badea4 (HEAD -> dev-writing_log4j2, origin/dev-writing_log4j2)
Author: Me 
Date:   Sun Apr 5 12:13:19 2020 -0400

    checkpoint: content updates

...

commit e8503f01c72618709ac5231a78cfa8549fcfb7b3
Author: Me 
Date:   Sun Apr 5 09:22:51 2020 -0400

    checkpoint: content updates

commit 8bdb788421c56cb0defe73ce87b9e1ffe4266b0c
Author: Me 
Date:   Sat Apr 4 13:57:27 2020 -0400

    add reference to sample project

Three hours of changes, split up over eight commits, with regular pushes so I wouldn't lose work if my SSD picked today to fail. I really don't want to see all of those in my history.

The solution is to squash those commits down using an interactive rebase:

git rebase -i 8bdb788421c56cb0defe73ce87b9e1ffe4266b0c

When I run this, it starts my editor and shows me the following:

pick e8503f0 checkpoint: content updates
pick f71ddca checkpoint: content updates
pick a8d7a25 checkpoint: content updates
pick 6b87b9b checkpoint: content updates
pick 556a346 checkpoint: content updates
pick 466dd26 checkpoint: content updates
pick 0034657 checkpoint: content updates
pick 6aefd69 checkpoint: content updates

# Rebase 8bdb788..6aefd69 onto 8bdb788 (8 commands)
#
# Commands:
# p, pick = use commit
# r, reword = use commit, but edit the commit message
# e, edit = use commit, but stop for amending
# s, squash = use commit, but meld into previous commit
# f, fixup = like "squash", but discard this commit's log message
# x, exec = run command (the rest of the line) using shell
# d, drop = remove commit
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
#
# Note that empty commits are commented out

A list of commits, instructions on how to work with them, and a few warnings about what happens if I do something dumb. To squash these commits I update all but the first to be a “fixup”:

pick e8503f0 checkpoint: content updates
f f71ddca checkpoint: content updates
f a8d7a25 checkpoint: content updates
f 6b87b9b checkpoint: content updates
f 556a346 checkpoint: content updates
f 466dd26 checkpoint: content updates
f 0034657 checkpoint: content updates
f 6aefd69 checkpoint: content updates

Save this and exit the editor, and Git applies all of those changes:

Successfully rebased and updated refs/heads/dev-writing_log4j2.

And now when I look at my history, this is what I see:

commit 51f5130422b524603d6249ef40e012aeecde5422 (HEAD -> dev-writing_log4j2)
Author: Me 
Date:   Sun Apr 5 09:22:51 2020 -0400

    checkpoint: content updates

commit 8bdb788421c56cb0defe73ce87b9e1ffe4266b0c
Author: Me 
Date:   Sat Apr 4 13:57:27 2020 -0400

    add reference to sample project

Note that the last commit hash has changed, and my working HEAD no longer refers to the origin branch. This means that I'm going to need to force-push these changes. But before that, there's one more thing that I want to do:

git commit --amend -m "content updates" --reset-author

This command does two things. First, it updates my commit message: this is no longer a “checkpoint” commit. The second thing it does is update the basic commit info, in this case just the timestamp. If you looked closely at the history above, you saw that all of the commits had been marked with the timestamp of the first; --reset-author makes the history more closely reflect what actually happened (it can also be used to pretend that other people didn't contribute to the commit, but I'll assume you're more honorable than that).

Now the log looks like this:

commit fdef5d6f0a19218784b87a596322816347db2232 (HEAD -> dev-writing_log4j2)
Author: Me 
Date:   Sun Apr 5 12:22:46 2020 -0400

    content updates

commit 8bdb788421c56cb0defe73ce87b9e1ffe4266b0c
Author: Me 
Date:   Sat Apr 4 13:57:27 2020 -0400

    add reference to sample project

Which is what I want to see, so time to force-push and overwrite the previous chain of commits:

> git push -f
Counting objects: 4, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 4.52 KiB | 2.26 MiB/s, done.
Total 4 (delta 3), reused 0 (delta 0)
To ssh://git-codecommit.us-east-1.amazonaws.com/v1/repos/Website
 + 6aefd69...fdef5d6 dev-writing_log4j2 -> dev-writing_log4j2 (forced update)

I should note here that the previous chain of commits still exists in your repository. If, for some reason, you want to retrieve them, you can explicitly check-out the former head commit:

git checkout -b recovery 6aefd6989ba7712cb047d661b68d34c888badea4

Of course, if you close your terminal window, you might not find that commit hash again, so if you're worried you should write it down it somewhere. When I'm making a large set of changes, I'll often create a temporary branch from the one that's being rebased, just in case (unfortunately, I often forget to switch back to the branch that I want to rebase).

Re-ordering commits

Especially at the start of a new project, I might jump around and edit several different things, resulting in a messy commit history:

pick fc46b5b update docs
pick 0f734fb minor change to feature X
pick 2233c01 update docs
pick fe56f59 another change to feature X
pick d3fb025 related change to feature Y
pick aec87c1 update docs
pick 66ef266 something unrelated to either X or Y
pick 96179b3 changing Y
pick 904a779 update docs

Interactive rebase allows you to move commits around, and then optionally squash those moved commits:

pick fc46b5b update docs
f 2233c01 update docs
f aec87c1 update docs
f 904a779 update docs
pick 0f734fb minor change to feature X
f fe56f59 another change to feature X
pick d3fb025 related change to feature Y
f 96179b3 changing Y
pick 66ef266 something unrelated to either X or Y

There are a couple of gotchas when you do this. First, you need to make sure that you're not changing both X and Y in the same commit. If you do, you can still squash the commits together, but it's pointless to try to track the work in each feature separately.

Second, make sure that you preserve order: in my example, commit 0f734fb happened before fe56f59 the interactive rebase needs to keep them in this order. If you don't, you can end up with merge conflicts that are challenging to resolve.

Lastly, and most important, make sure you have the same number of commits that you started with. If you accidentally delete a commit rather than move it, you will lose that work. For this reason, I tend to use interactive rebase on small pieces of my history, perhaps making several passes over it.

Editing commits

When writing my article about Log4J2 appenders, I saw a comment that I wanted to change in the accompanying example code. Unfortunately, it wasn't the HEAD commit:

commit 8007214ef232abf528baf2968162b51dcd2c09ca
Author: Me 
Date:   Sat Apr 4 09:34:53 2020 -0400

    README

commit 38c610db6a02747d7017dff0a9c2b7ed290e30e1
Author: Me 
Date:   Sat Apr 4 08:34:12 2020 -0400

    stage-10: add tests

commit 5dfd79e3f879038e915fa04c83f8eb9b0f695e35
Author: Me 
Date:   Tue Mar 31 08:38:17 2020 -0400

    stage-9: implement a lookup

There are two ways that I could have approached this. The first would be to create a new commit and then reorder it and turn it into a fixup. The second is to edit the file as part of an interactive rebase, by marking the commit with an "e":

pick 5dfd79e stage-9: implement a lookup
e 38c610d stage-10: add tests
pick 8007214 README

When I do this, git works through the commits, and stops when it reaches the marked one:

Stopped at 38c610d...  stage-10: add tests
You can amend the commit now, with

  git commit --amend 

Once you are satisfied with your changes, run

  git rebase --continue

I can now edit any files in my working tree (they don't have to be part of the original commit). Once I'm done, I do a git add for changed files, followed by both git commit --amend and git rebase --continue. After the rebase completes, I can force-push the changes.

Beware that editing commits can introduce merge conflicts: if a later commit touches the same code, you'll have to stop and resolve the conflict. This is more likely when you edit early commits, or when the edits are wide-ranging. It is far less likely for changes like comments.

Cherry-picking into the middle of a branch

You may be familiar with git cherry-pick, which takes an arbitrary commit and puts it at the HEAD of your current branch. This can be useful when two teams are working on the same general area of the codebase: often one team will incidentally do something that the other team finds valuable.

Interactive rebase is like cherry-picking on steroids: you can insert a commit anywhere in your commit tree. To be honest, I find this more risky than beneficial; instead I would cherry-pick to HEAD and then perhaps use an interactive rebase to move the commit to where it “belongs.” But in the interest of “stupid git tricks,” here we go.

Let's say that you've been working on a branch and have been making changes, starting with changes to the build scripts. Then you talk with a colleague, and learn that she has also made changes to the build. You could cherry-pick her change to the end of your branch and use it moving forward, but you're somewhat OCD and want to keep the build changes together. So you fire up git rebase -i and add your colleague's commit as a new “pick”:

pick ffc954d build scripts
p 1438a13d11d6001de876a034f434a050c09b587d
pick b497403 update 1
pick 18e8415 update 2
pick 33a4e9d update 3

Now when you do a git log, you see something like this:

... skipping two commits

commit 7b62acb8d9100f379a0d43e3227c36ae91c1edd9
Author: Me 
Date:   Fri Mar 27 10:11:01 2020 -0400

    update 1

commit c579ed88403354faed83213da63d4546c5aa13b5
Author: Someone Else 
Date:   Sun Jan 5 09:30:14 2020 -0500

    some build changes

commit ffc954dc41555282ece3e2b7a0197472c0af9f11
Author: Me 
Date:   Mon Jan 6 08:02:30 2020 -0500

    build scripts

Note that the commit hash has changed: from 1438a13d to c579ed88. This is because it's now part of a new branch: a commit hash is based not merely on the content of the commit, but also the commit chain that it's a part of. However, the committer's name and the commit date are unchanged.

Wrapping up: a plea for clean commit histories

A standard Git merge, by preserving the chains of commits that led to the merge point, is both incredibly useful and incredibly annoying. It's useful, in that you can move along the original branch to understand the context of a commit. It's annoying, in that your first view of the log shows the commits intermingled and ordered by date, completely removing context.

I find messy histories to be similar. Software development doesn't happen in a clean, orderly fashion: developers often attach a problem from multiple sides at once. And that can result in commit histories that jump around: instead of “fix foo” you have “add test for bar”, followed by “make test work”, followed by an endless string of “no, really, this time everything works”.

Maybe you find that informative. If not, do your future self (and your coworkers) a favor and clean it up before making it part of the permanent record.