Setting up and using git large file storage (LFS)

My personal website includes a lot of extra files which I make available for various reasons. Historically I have been committing these files but the repo has grown to about 7GB which is quite large for a site with such a small scope.

I utilize Netlify for building and hosting my main site and the build process has grown to over 10 minutes due to the process needing to do a large download of the repo every time the build runs. I’m also looking to consolidate my other services (discussed here ) into this repo and I’m pretty sure that Github Actions are going to not be happy about the large size of the repo either.

For that matter, I’m looking to introduce LFS as part of the repo so that I can start storing these binary files in an object store fashion and hopefully gain a bunch of extra performance in relation to version control.

Installing LFS

The first step is to install git LFS on my local machine. I don’t use brew (a subject for another post) but installing manually is a simple case of downloading the tar file, uncompressing it and then running the install.sh script.

Once LFS is installed globally on the git side of things, it is a pretty simple case of moving to my site repo directory and running git lfs install. This sets up the git project as an LFS project. Pretty easy so far.

Migrating the repo to LFS

The main part of using LFS is telling git which files you want to track as LFS files. I have two main directories that I want tracked, being that of static/audio and static/games. Telling git to track these directories is as simple as running git lfs track static/audio/**/* and git lfs track static/games/**/*.

At this point, all file changes in these directories will use a file pointer and object storage for versioning, but there is still the issue that the repo size is quite large as previously, these files existed in the git index. We want to remove all commits which reference these files as normal git files and luckily LFS comes with a solid migration command to help with this process.

Prior to moving forward with this, be aware that this command will rewrite all history on all branches that include any of the files that you’re migrating to LFS. i.e. If you merged a binary file into your master branch two years ago that you want to migrate to LFS, your current master and the new master will diverge at that point two years ago due to the migrate task changing that commit and hence, the commit hashes.

For me, being the only user of my personal repo, this isn’t an issue, but is definitely something to keep in mind in other contexts. This is also a good reason to know about LFS even if you currently don’t have a good reason to use it; you don’t want the hassle of retrofitting LFS on a larger scale codebase with many users and an involved CI system.

So at this point, I’ve told git which files I want to track with LFS, but migrating them takes a little more work. The following command gives me output as to what files migrate will actually transform into LFS and the space savings that will occur.

git lfs migrate info --everything --include="static/audio/**/*,static/games/**/*" --top=100

Breaking this down, specifying info tells the migrate command that I just want to see what would happen. The --everything flag stipulates that all branches should be updated, not just a single branch like master. This can be important as you’ll likely want to remove all commits with these LFS files in them. --include is a comma separated list of the directories and files that you want to migrate and --top tells the migrate command to return the 100 file types. 100 is likely more than you’ll need (a lot more) but the default is 5 and this can leave out some important information when you’re making such a large and important change to the repo.

After checking that the info command returns what is expected, moving forward is just a case of replacing the info param with migrate and removing --top.

git lfs migrate import --everything --include="static/audio/**/*,static/games/**/*"

The migrate process now runs through all history rewriting commits with the defined LFS files into actual LFS commits in place of normal binary files being committed. The process can take some time as there can be quite a bit to do such as rewriting commits and copying files into the .git/lfs/objects/ directory.

After this, running git push --force will push the new history to your remote and start the upload process of all LFS files. This takes a fair amount of time for a few GB’s of data as Githubs upload speeds are typically pretty slow and sit around 1 - 2 MB/s.

In my case, the first attempt at pushing didn’t work. The process ended without an error but I didn’t see the updated history on Github. I pushed a second time, and after re-uploading all LFS files, the repo appeared to be updated properly. Interested to hear if this isn’t an isolated event and others have had a similar issue.

Using a different object store (Netlify) for LFS files

When using Netlify, they encourage you to use their own Large Media service for larger files. One of the benefits of this system is that for LFS files, Netlify actually put an on-demand image processor in front of the LFS store allowing you to do transformations on the fly. A pretty neat, free service if this is something that you’re after.

For me though, my site does transformations at compile time so there isn’t any benefit, but I figure that as everything else is hosted on Netlify, it makes sense to move the LFS store there as well.

Initially it’s a simple case of installing netlify-cli via npm install netlify-cli in the project directory. The documentation suggests installing globally but I’m not a fan of installing things in a global fashion. This is only going to be used in my personal blog so setting up the project in a way that means I don’t need to install dependencies when I start using another machine is a good habit to get into.

Once installed node_modules/.bin/netlify login redirects me to netlify to authorize my current machine. Smooth and easy process so far!

I then need to link the repo via node_modules/.bin/netlify link which links the current project with the project on netlify. This is a pretty easy setup as it can use the git remote as the lookup.

The netlify cli tool needs the large media plugin for the next steps of the process and this is installed via node_modules/.bin/netlify plugins:install netlify-lm-plugin.

After this, running node_modules/.bin/netlify lm:setup will set up the Netlify LFS authentication for the LFS server, and add .lfsconfig to the root of the directory so that git knows there is a different LFS store to use other than the default one for the remote.

This file looks like the following:

[lfs]
	url = https://id-xxxxxxxxxxxxxxx.netlify.com/.netlify/large-media

Running git push --force uploads all LFS files again, this time to the Netlify’s LFS storage. This process mirrors the speeds you get with Github.

After this process ends, I visit the Large Media section in Netlify for the project and am greeted with a message that there are ‘No assets found’. That’s not really expected. I do a bit of search around what the issue might be and it turns out that with LFS, Netlify’s servers can take a considerable amount of time to catch up. That’s fair given I just uploaded several GB’s of data.

After a few minutes the files appear but with a state of ‘Ready for Upload’ or similar. I give it even more time and they eventually appear.

Pushing a new commit to the repo fires off a new build. This build completes a lot quicker at about 3 minutes compared to 10+ minutes previously, and checking the site, everything seems to work well.

Wrap up

Overall LFS helps get the size of your repo down but it still isn’t at a level that can really help with a repo that contains large files.

As the idea of monorepo’s gains more and more popularity, the concept of perhaps having many large version controlled files makes more sense. I started looking into this as I want to add a considerable amount of extra data to my repo without the strain of having it be a burden on my local machine. One of the limitations of LFS at the moment for me is the ability to change LFS files back to pointers locally without much hassle, reclaiming the space from files that I may not require on my local machine.

This was something considered for version 2.5 of git-lfs, but still hasn’t been addressed.

Overall though, the performance gains in different contexts is well worth the effort.

Posted in:  

Comments