Subtree Merging
Now that you’ve seen the difficulties of the submodule system, let’s look at an alternate way to solve the same problem. When Git merges, it looks at what it has to merge together and then chooses an appropriate merging strategy to use. If you’re merging two branches, Git uses a recursive strategy. If you’re merging more than two branches, Git picks the octopus strategy. These strategies are automatically chosen for you because the recursive strategy can handle complex three-way merge situations — for example, more than one common ancestor — but it can only handle merging two branches. The octopus merge can handle multiple branches but is more cautious to avoid difficult conflicts, so it’s chosen as the default strategy if you’re trying to merge more than two branches.
However, there are other strategies you can choose as well. One of them is the subtree merge, and you can use it to deal with the subproject issue. Here you’ll see how to do the same rack embedding as in the last section, but using subtree merges instead.
The idea of the subtree merge is that you have two projects, and one of the projects maps to a subdirectory of the other one and vice versa. When you specify a subtree merge, Git is smart enough to figure out that one is a subtree of the other and merge appropriately — it’s pretty amazing.
You first add the Rack application to your project. You add the Rack project as a remote reference in your own project and then check it out into its own branch:
$ git remote add rack_remote git@github.com:schacon/rack.git
$ git fetch rack_remote
warning: no common commits
remote: Counting objects: 3184, done.
remote: Compressing objects: 100% (1465/1465), done.
remote: Total 3184 (delta 1952), reused 2770 (delta 1675)
Receiving objects: 100% (3184/3184), 677.42 KiB | 4 KiB/s, done.
Resolving deltas: 100% (1952/1952), done.
From git@github.com:schacon/rack
* [new branch] build -> rack_remote/build
* [new branch] master -> rack_remote/master
* [new branch] rack-0.4 -> rack_remote/rack-0.4
* [new branch] rack-0.9 -> rack_remote/rack-0.9
$ git checkout -b rack_branch rack_remote/master
Branch rack_branch set up to track remote branch refs/remotes/rack_remote/master.
Switched to a new branch "rack_branch"
Now you have the root of the Rack project in your rack_branch
branch and your own project in the master
branch. If you check out one and then the other, you can see that they have different project roots:
$ ls
AUTHORS KNOWN-ISSUES Rakefile contrib lib
COPYING README bin example test
$ git checkout master
Switched to branch "master"
$ ls
README
You want to pull the Rack project into your master
project as a subdirectory. You can do that in Git with git read-tree
. You’ll learn more about read-tree
and its friends in Chapter 9, but for now know that it reads the root tree of one branch into your current staging area and working directory. You just switched back to your master
branch, and you pull the rack_branch
branch into the rack
subdirectory of your master
branch of your main project:
$ git read-tree --prefix=rack/ -u rack_branch
When you commit, it looks like you have all the Rack files under that subdirectory — as though you copied them in from a tarball. What gets interesting is that you can fairly easily merge changes from one of the branches to the other. So, if the Rack project updates, you can pull in upstream changes by switching to that branch and pulling:
$ git checkout rack_branch
$ git pull
Then, you can merge those changes back into your master branch. You can use git merge -s subtree
and it will work fine; but Git will also merge the histories together, which you probably don’t want. To pull in the changes and prepopulate the commit message, use the --squash
and --no-commit
options as well as the -s subtree
strategy option:
$ git checkout master
$ git merge --squash -s subtree --no-commit rack_branch
Squash commit -- not updating HEAD
Automatic merge went well; stopped before committing as requested
All the changes from your Rack project are merged in and ready to be committed locally. You can also do the opposite — make changes in the rack
subdirectory of your master branch and then merge them into your rack_branch
branch later to submit them to the maintainers or push them upstream.
To get a diff between what you have in your rack
subdirectory and the code in your rack_branch
branch — to see if you need to merge them — you can’t use the normal diff
command. Instead, you must run git diff-tree
with the branch you want to compare to:
$ git diff-tree -p rack_branch
Or, to compare what is in your rack
subdirectory with what the master
branch on the server was the last time you fetched, you can run
$ git diff-tree -p rack_remote/master