Introduction
Sometimes, you would like to clean your Git history (let’s say, to remove a redacted production secret still present in history, or maybe change an old committer identity).
As such operations are very dangerous, please read this post fully before running anything, and note that I hereby decline any responsibility (as always) if something bad happens to your project.
The problem
If you reach this page, you already know the problem : rewriting Git history causes all identifiers (SHA) following the first affected commit to change, and you cannot do a thing about it.
If one of the developers used to specify commit references in their own commit messages (like This commit follows 40d5014 [...]
), they won’t mean anything once rewriting is done.
Moreover, if some of your commits “revert” others, they are also affected (Git does not update them automatically).
The workaround
So we have somehow to dynamically “update” commit references, while rewriting the history, according to new commit identifiers.
Below is a script implementing this, derived from one of the official GIT-FILTER-BRANCH(1) manual page examples, updating root <root@localhost>
identity with John Doe <john@example.net>
:
You may have noticed that filtering scripts are fully-POSIX compatible, so they are supposed to work in most environments (maybe even yours ).
You will find other features too :
-
Committer identities are additionally getting updated ;
-
All branches are getting rewritten (this may not be something that you want !) ;
-
Tags are getting updated too (they will point to the same effective version of the code).
A workaround pitfall
TL; DR : beware of word collisions across commit messages.
There is a caveat that we have to share though, because of the use of regular expressions in the msg-filter
script :
You might encounter collisions between commit references and real-life words, existing in your language.
For a project with commit messages written in English, you can safely run the above Git migration, because there is none :
If you happened to use shorter SHA (let’s say, 6-character long references), there are collisions in English :
For an Italian project, there are collisions, even with 7-character long references () :
Last words
Please also note that git filter-branch
usage is deprecated since Git v2.24.0, and filter-repo should be preferred.
If you managed to adapt the solution described in this post with this tool, feel free to post a comment below !
It actually appeared that filter-repo supports this feature by default !
So it definitely should be preferred over git filter-branch
, but sometimes, only legacy tools are available…
Many thanks to the co-author of this script, who will recognize himself