如何在 git 历史记录中查找/识别大型提交？

小编典典

如何在 git 历史记录中查找/识别大型提交？

all

我有一个 300 MB 的 git 存储库。我当前签出的文件的总大小为 2 MB，其余 git repo 的总大小为 298 MB。这基本上是一个不超过几
MB 的纯代码仓库。

我怀疑有人不小心提交了一些大文件（视频、图像等），然后将它们删除......但不是从 git 中删除，所以历史记录仍然包含无用的大文件。如何在 git
历史记录中找到大文件？有 400 多个提交，因此逐个提交是不切实际的。

注意：我的问题不是关于
如何删除文件，而是如何首先找到它。

阅读 105

2022-03-11

共1个答案

小编典典

在过去，我发现此脚本对于在 git 存储库中查找大型（且不明显）对象非常有用：

http://stubbisms.wordpress.com/2009/07/10/git-script-to-show-largest-pack-objects-and-trim-your-waist-line/

#!/bin/bash
#set -x

# Shows you the largest objects in your repo's pack file.
# Written for osx.
#
# @see https://stubbisms.wordpress.com/2009/07/10/git-script-to-show-largest-pack-objects-and-trim-your-waist-line/
# @author Antony Stubbs

# set the internal field separator to line break, so that we can iterate easily over the verify-pack output
IFS=$'\n';

# list all objects including their size, sort by size, take top 10
objects=`git verify-pack -v .git/objects/pack/pack-*.idx | grep -v chain | sort -k3nr | head`

echo "All sizes are in kB's. The pack column is the size of the object, compressed, inside the pack file."

output="size,pack,SHA,location"
allObjects=`git rev-list --all --objects`
for y in $objects
do
    # extract the size in bytes
    size=$((`echo $y | cut -f 5 -d ' '`/1024))
    # extract the compressed size in bytes
    compressedSize=$((`echo $y | cut -f 6 -d ' '`/1024))
    # extract the SHA
    sha=`echo $y | cut -f 1 -d ' '`
    # find the objects location in the repository tree
    other=`echo "${allObjects}" | grep $sha`
    #lineBreak=`echo -e "\n"`
    output="${output}\n${size},${compressedSize},${other}"
done

echo -e $output | column -t -s ', '

这将为您提供 blob 的对象名称 (SHA1sum)，然后您可以使用如下脚本：

哪个提交有这个 blob？

…找到指向每个 blob 的提交。

2022-03-11