Monday, December 9, 2013

Splitting and joining compressed folders in Linux

Github has a file limit of 50MB, and I had a repository with an archive of compressed files that exceeded that. So I thought it might be nice if I could split the archive up into segments and then rejoin and decompress them when they were needed. But how do to this efficiently?

There is plenty of advice available on the Web, and initially I was attracted to tar, which has a means of creating archive sections, originally used to create files for separate tapes. But it doesn't compress and split at the same time, and you need to write a clumsy script to rename each of the sections. So I went back to the method everyone said was bad: split. All you have to do is create a tar archive in the usual way:

tar czf archive.tar.gz myfolder

Now split it into numbered sections:

split -a 1 -n 5 -d archive.tar.gz archive.tar.gz.

And what you end up with is a set of 5 more or less equal-sized files numbered archive.tar.gz.0, archive.tar.gz.1, etc. (StackOverflow suggests using a pipe, but then the segments have to be of a fixed size, since the stream length can't be measured.) If you need more than 10 segments (-n option) increase the -a option, which specifies the length of the suffix. To sew them back up again all you need is:

cat archive.tar.gz.* > archive.tar.gz

The reason this works is that the wildcard orders the files via the suffix, so they will be joined up in the correct order. Otherwise you'd have to specify each file. Now to decompress:

tar xzf archive.tar.gz

The only drawback with this method is that you need to know the number of segments in advance. This is not a problem for me, as I can put it into a script and adjust it when needed, which will be rarely, if ever.

No comments:

Post a Comment