A missing blog post image

Introduction

Sometimes, you would like to clean your Git history (let’s say, to remove a redacted production secret still present in history, or maybe change an old committer identity).

:warning: As such operations are very dangerous, please read this post fully before running anything, and note that I hereby decline any responsibility (as always) if something bad happens to your project.

The problem

If you reach this page, you already know the problem : rewriting Git history causes all identifiers (SHA) following the first affected commit to change, and you cannot do a thing about it.

If one of the developers used to specify commit references in their own commit messages (like This commit follows 40d5014 [...]), they won’t mean anything once rewriting is done.
Moreover, if some of your commits “revert” others, they are also affected (Git does not update them automatically).

The workaround

So we have somehow to dynamically “update” commit references, while rewriting the history, according to new commit identifiers.

Below is a script implementing this, derived from one of the official GIT-FILTER-BRANCH(1) manual page examples, updating root <root@localhost> identity with John Doe <john@example.net> :

git filter-branch \
	--env-filter '
		if test "$GIT_AUTHOR_NAME" = "root"
		then
			GIT_AUTHOR_NAME="John Doe"
		fi
		if test "$GIT_AUTHOR_EMAIL" = "root@localhost"
		then
			GIT_AUTHOR_EMAIL=john@example.com
		fi
		if test "$GIT_COMMITTER_NAME" = "root"
		then
			GIT_COMMITTER_NAME="John Doe"
		fi
		if test "$GIT_COMMITTER_EMAIL" = "root@localhost"
		then
			GIT_COMMITTER_EMAIL=john@example.com
		fi
	' \
	--commit-filter '
		printf "%s" "${GIT_COMMIT}," >> ../commits_mapping
		git commit-tree "$@" | tee -a ../commits_mapping
	' \
	--tag-name-filter cat \
	--msg-filter '
		message="$(cat)"
		commit_refs="$(echo "$message" | LC_ALL=C grep -oE "\b[0-9a-fA-F]{7,40}\b")"
		for commit_ref in $commit_refs; do
			new_sha="$(grep "^${commit_ref}" ../commits_mapping | cut -d, -f2)"
			if test -z "$new_sha"
			then
				continue;
			fi
			commit_ref_len="$(printf "%s" "$commit_ref" | wc -m)"
			new_commit_ref="$(echo "$new_sha" | cut -c "1-${commit_ref_len}")"
			message="$(echo "$message" | sed "s/${commit_ref}/${new_commit_ref}/g")"
		done

		echo "$message"
	' \
	-- --all

You may have noticed that filtering scripts are fully-POSIX compatible, so they are supposed to work in most environments (maybe even yours :wink:).

You will find other features too :

  • Committer identities are additionally getting updated ;

  • All branches are getting rewritten (this may not be something that you want !) ;

  • Tags are getting updated too (they will point to the same effective version of the code).

A workaround pitfall

TL; DR : beware of word collisions across commit messages.

There is a caveat that we have to share though, because of the use of regular expressions in the msg-filter script :

A missing blog post image

You might encounter collisions between commit references and real-life words, existing in your language.

For a project with commit messages written in English, you can safely run the above Git migration, because there is none :

LC_ALL=C grep -oE "\b[0-9a-fA-F]{7,40}\b" /usr/share/hunspell/en_US.dic

If you happened to use shorter SHA (let’s say, 6-character long references), there are collisions in English :

LC_ALL=C grep -oE "\b[0-9a-fA-F]{6,40}\b" /usr/share/hunspell/en_US.dic
accede
bedded
cabbed
dabbed
decade
efface
facade

For an Italian project, there are collisions, even with 7-character long references (:fearful:) :

LC_ALL=C grep -oE "\b[0-9a-fA-F]{7,40}\b" /usr/share/hunspell/it_IT.dic
accadde
decadde

Last words

Please also note that git filter-branch usage is deprecated since Git v2.24.0, and filter-repo should be preferred.
If you managed to adapt the solution described in this post with this tool, feel free to post a comment below !

It actually appeared that filter-repo supports this feature by default ! :tada:
So it definitely should be preferred over git filter-branch, but sometimes, only legacy tools are available…


Many thanks to the co-author of this script, who will recognize himself :pray: