Compression

Commands covered in this section: gzip compress tar gunzip uncompress gzcat

Saving Space

However much space you've got, it's still limited, and one day you're going to wish you had more. You should regularly clean up your filespace by removing unwanted files, but there are usually some things you don't use frequently, but still want to keep, such as completed assignments or old e-mail messages. These are perfect candidates for compression, a way of reducing the size of files using complex algorithms that take advantage of patterns in the file. This means that some files compress better than others, for example plain text is excellent, due the limited character range and repeated groups of letters found in natural language. Text files can typically be reduced to a third of their original size. Image files, however, often hardly compress at all, as their format (such as gif or jpeg) already uses compression. There is little point trying to compress an already compressed file, as any uneven distributions of characters and repeated sections have already be detected and replaced by a more compact encoding.

You can either compress files individually or collect groups of files or directories together and compress them into a single file. Each file actually takes up a certain minimum amount of space in the filesystem, regardless of its given size in bytes. This is known is the block size, and could be 1K, 4K, or more. Any file smaller than this will still take up this amount of space. Therefore compressing small files is pointless, but if they are grouped together and compressed into a single file, they will take up much less space, because the large file will use the blocks more effectively, in addition to being compressed.

Compressing files individually

There are many programs for compressing files, and one of the most popular is gzip. To use it simply give the name of the file, or files, to compress, for example gzip largefile. This will create a compressed file called largefile.gz, deleting the original file. Another popular choice is the old compress utility, which creates files with a .Z extension, although it rarely compresses files as well as gzip.

Compressing files collectively

This is a two stage process; the files are grouped into one large file, which is then compressed as discussed above. The grouping can be performed using the tar program (for Tape ARchive, although a tape device doesn't have to be used). You need to specify -cf as a option to tar, to Create a File archive (not a tape one). This is followed by the name of the archive you wish to create, usually with a .tar extension. Finally you list the files and directories to be included in the archive. If a directory is given it will be added to the archive, along with its entire contents, including sub-directories. It is often best to put all the files you wish to group together into a directory and then you need only specify this directory. Also, when you extract the contents of the archive, the directory will be created (if it doesn't exist), so all the extracted files will be contained on their own in a single directory.

For example, you have a directory called project which contains everything you wish to compress. You could then do tar -cf project.tar project. It is a good idea to use the directory name as the archive name, after adding .tar, so you can tell the contents of the archive from its name. This archive will be similar in size to the sum of the individual file sizes, as no compression has been performed yet. Therefore the last step is to do gzip project.tar, which will replace the file with project.tar.gz, which may be considerably smaller than all the individual files. You can perform the whole operation using a pipe, by doing tar -cf - project | gzip > project.tar.gz. Specifying a dash as the archive name causes tar to send the archive to standard output, which is then piped to gzip which reads it as input, redirecting the compressed output to the required filename.

Note that you can also use this technique to create backups of important files and directories. It conveniently groups files together, and compresses them to save space. Also note that sometimes the extension .tgz is used as a short form of .tar.gz.

Uncompressing files

Of course, there's little point compressing files if you don't know to uncompress them, but this is a simple matter. To uncompress largefile.gz simply do gunzip largefile.gz. This will replace the compressed file with the uncompressed file largefile. Using gunzip is equivalent to gzip -d. As with many UNIX commands you can specify a group of files using wildcards, so gunzip *.gz will uncompress all the files in the current directory that have been compressed using gzip. If you used the compress command, the process is reversed with the aptly named uncompress command.

To extract the contents of a tar archive, after uncompressing it if necessary, use tar with options -xf, for eXtract from a File archive. So, for example, tar -xf project.tar would create a directory called project, if it doesn't already exist, and extract all the files and sub-directories into it. Again, a pipe can be used to combine the two operations of uncompressing and extracting by doing gunzip -c project.tar.gz | tar -xf -. The -c option (where available gzcat and be used in place of gunzip -c) writes the uncompressed contents to standard output, leaving the original file unchanged, and the dash on its own is used in this case to specify standard input as the source of the tar achive. Keeping the original file unchanged is often useful as less space is required for the operation and if you want to keep the file you'd have to compress it again, if you had performed the operations separately.


< Previous: Other Network Commands ^ Next: Helping Yourself >
Index

Matt Chapman
Last modified: Sun Aug 10 15:16:22 BST 1997