The Gitential Guide on How to
Reduce the Size of Your Git Repository
By GITENTIAL TEAM
Unfortunately, there are no easy ways on how to reduce the size of your git repositories. However, there are quite a few ways to keep your repositories down to an efficient and easily manageable size. Many of them are fairly simple and can be set up fairly quickly. But, they probably won’t help you if you are already exceeding your repository cap on Bitbucket, Github, Gitlab, or Azure DevOps. For this scenario, we can help you identify what to target when it becomes necessary to deal with a large respository.
The Impact of Excessively Large Git Repositories
As your git repository expands, the time developers wait on it to complete processes also increases. While there’ll always be some wait time involved, the cost difference between a well-managed and poorly managed git repository can be significant. Left unchecked, the size of your git repository can result in:
- Git activities slowing down and costing developer time.
- Using more storage space possibly increasing service costs.
- Exceeding storage limits with git repository managers can bring development to a halt until the repository’s size is reduced.
Measuring the Cost of Inefficiency
Time is a precious commodity if only because it’s charged by the hour. This provides an excuse to get geeky about the cost of processes.
What’s the cost of 1 minute per developer per day over a year? We’re glad you asked! The formula looks like this: (# of Developers * 261 Work Days * # of Minutes per day)/60) * Average Hourly Wage) = Cost
Applying software developer average wages in the US of $107k or $51.25 per hour, the base cost is about 4.35 hours or $225. Fully loaded, the annual cost of 1 minute per day swells to roughly $300. That’s trivial, yes, but wait!
While applicable to any process, we’re focusing on how to decrease the size of your git repository and the cost-benefit of doing so. If 10 developers are forced to wait 10 additional unnecessary minutes daily on their interactions with git, the cost swells to 435 hours or $30k. What then for 100 developers?
The interest here is not to obsess over how every minute is spent, but to be cognizant of how incremental inefficiency compounds over time. Some may focus on the $-cost, but the real economy here concerns the 2.7 months of possible extra development time for this scenario.
Maximum Repo Sizes
Git has no limit on repo size, but repository managers typically do cap repository sizes unless you work out special arrangements with them:
- Bitbucket – 2 Gb,
- GitHub – 2 Gb with provisions for 5 Gb,
- GitLab and Azure DevOps – 10 Gb.
While providing a warning as you get close to their caps, they allow account managers to set their own repo size limit to avoid the disruptions of reaching their limits. Despite the caps, nearly everyone recommends Git repositories to be limited to 1 Gb and keeping individual files under 100 Mb.
There are projects that radically exceed these parameters, but they are more the exception than the rule. In 2017, Microsoft introduced its Git Virtual File System for working with the world’s largest git repository, exceeding 300 Gb. As noted in the linked article, “A clone from North Carolina with no proxy server took almost 25 minutes. With a proxy configured and up to date, it took 70 seconds… a 95% improvement.”
Git Repository Size and Preventative Maintenance
- Think about how to logically organize your git repositories from the very beginning.
- Set a repo size limit in your repository management account (GitLab, GitHub, etc.) for advance warning that it’s approaching their hard cap. They’ll send you email notifications, too.
- Split up repositories when you can.
- Use Git LFS. This is the best place to keep your large files if they need to be in your repository.
- Consider using artifact repositories like JFrog Artifactory, Sonatype Nexus, MyGet, or other universal package repository managers to store, version, and deploy artifacts.
- Use gitignore to specify files, file types (like .mp4’s, .exe., .jar, etc.), IDE settings, artifacts, and dependencies to exclude from commits.
- Check out this collection of gitignore templates. Git hooks can check for and prevent a commit if it is too large. Use git notes to supplement a commit message without changing the commit.
- Try to use a database instead of giant, frequently modified data files.
- Avoid excessively long file/path names.
- Reinforce good repository management practices as part of your coding standards and during code reviews.
Backup Your Git Repository
How to Reduce the Size of Your Git Repository
- split repository – Changing git history can be risky. Even if the repository becomes big, it is better to start a new and keep the old one for the changelog. Rewriting git history is a possibility but it also has effects on our analysis. That’s why creating new smaller repositories as clones of the original and using filter-branch on the new repositories to achieve the “split” is a better solution.
Some additional ways of how to reduce the size of your git repository include:
- git repack – combines all objects into a pack that aren’t already in one or to combine existing packs into a more efficient pack.
- git prune-packed – this program will search the $GIT_OBJECT_DIRECTORY and remove all duplicates of objects that are already in a pack or independent object directory.
- git reflog – lets you manually remove old references based on a time period that you select.
Delete unneeded branches and tags or move them to a separate fork in your repository that other developers won’t fetch from.
After you’ve deleted your unwanted files, branches, and tags, and rewritten your repo history, don’t forget to implement the preventative maintenance measures discussed previously.
The Developer Coefficient by Stripe indicates software developer inefficiency may be as high as 31.6%. As we all know, developers are lucky if they can spend half of their work hours actually coding. Now, one minute isn’t much. But the size of your git repository can be forcing your developers to wait longer and longer on their git processes. Over the course of a year, this can result in weeks and months of lost development time. The larger your team, the greater the impact.
In tech companies, we like to talk about how our systems have 99.9+% uptime. Using our initial example, the size of your git repository could be costing your team of 10 developers 2.7 months’ worth of development time. If so, that’s 2.25% right off the top. Again, the objective isn’t to be obsessing over how developers spend their every minute. It’s a matter of being mindful of work processes that can introduce inefficiency.
Article updated: August 17, 2022