Archiving and
Compressing Files

Archive, compress, unpack, and uncompress files using tar, gzip, and bzip2

CIS126RH | RHEL System Administration 1
Mesa Community College

Archiving and compression are essential daily skills for Linux administrators. You will use them to back up configuration files before making changes, transfer directory trees between systems, and package software for deployment. Understanding the difference between archiving and compression — and knowing which tools to combine — makes these tasks fast and reliable. These skills are tested on the RHCSA exam.

Learning Objectives

  1. Distinguish archiving from compression — Explain the role of tar, gzip, bzip2, and xz and when to use each
  2. Create and inspect tar archives — Use tar to bundle files, list contents, and extract archives
  3. Compress and decompress files — Use gzip and bzip2 directly on individual files
  4. Create and extract compressed archives — Combine tar with a compression flag to produce .tar.gz and .tar.bz2 files in a single command

Archiving vs Compression

These are two separate operations that are often combined but serve different purposes.

Operation What it does Tools on RHEL
Archiving Combines many files and directories into one file, preserving names, permissions, ownership, and directory structure tar
Compression Reduces the size of a file by encoding repeated patterns — no files are combined, only one file in and one file out gzip, bzip2, xz

Common File Extensions

Extension Meaning
.tartar archive — no compression
.tar.gz or .tgztar archive compressed with gzip
.tar.bz2tar archive compressed with bzip2
.tar.xztar archive compressed with xz
.gzSingle file compressed with gzip — no archive
.bz2Single file compressed with bzip2 — no archive

tar Fundamentals

tartape archive — is the standard tool for bundling files into a single archive. The archive file is called a tarball.

Flag Long form Meaning
-c--createCreate a new archive
-x--extractExtract files from an archive
-t--listList the contents of an archive
-f--fileSpecify the archive filename — always required
-v--verbosePrint each filename as it is processed
-z--gzipCompress or decompress with gzip
-j--bzip2Compress or decompress with bzip2
-J--xzCompress or decompress with xz
-C--directoryChange to directory before operating
-p--preserve-permissionsRestore original permissions on extract

Creating tar Archives

Use the -c flag to create a new archive. Always specify the archive filename with -f.

# Archive a single directory
$ tar -cvf backup.tar /etc/ssh
etc/ssh/
etc/ssh/sshd_config
etc/ssh/ssh_config

# Archive multiple directories
$ tar -cvf configs.tar /etc/ssh /etc/httpd

# Archive files in the current directory
$ tar -cvf project.tar .

# Exclude a subdirectory from the archive
$ tar -cvf home.tar --exclude=/home/student/Downloads /home/student
Absolute vs relative paths

When you archive an absolute path such as /etc/ssh, tar stores the leading slash by default on some systems. On RHEL, tar strips the leading slash and warns you — this is the safe behaviour that prevents accidental overwrites during extraction.

Listing Archive Contents

Use the -t flag to inspect what is inside an archive without extracting it. This is an essential step before extracting an unknown archive.

# List the contents of an uncompressed archive
$ tar -tvf backup.tar
drwxr-xr-x root/root    0 2026-05-25 etc/ssh/
-rw-r--r-- root/root  3905 2026-05-25 etc/ssh/sshd_config
-rw-r--r-- root/root  1770 2026-05-25 etc/ssh/ssh_config

# List the contents of a gzip-compressed archive
$ tar -tzvf backup.tar.gz

# List the contents of a bzip2-compressed archive
$ tar -tjvf backup.tar.bz2

# Let tar detect the compression automatically
$ tar -tvf backup.tar.bz2
Always list before extracting

Listing an archive first shows you where files will land, warns you about archives that contain absolute paths, and confirms the archive is not corrupted before you commit to extracting it.

Extracting tar Archives

Use the -x flag to extract files from an archive.

# Extract into the current directory
$ tar -xvf backup.tar

# Extract to a specific directory with -C
$ tar -xvf backup.tar -C /tmp/restore

# Extract only specific files from an archive
$ tar -xvf backup.tar etc/ssh/sshd_config

# Extract a gzip-compressed archive to a specific directory
$ tar -xzvf backup.tar.gz -C /tmp/restore

# Preserve original permissions on extraction (useful as root)
$ tar -xpvf backup.tar -C /tmp/restore
RHCSA Focus

The exam frequently asks you to extract an archive to a specific directory. The -C flag is essential — the target directory must exist before you run the command.

gzip: Compressing Individual Files

gzip compresses a single file and replaces it with a .gz version. The original file is removed by default.

# Compress a file — replaces messages with messages.gz
$ gzip messages
$ ls
messages.gz

# Decompress — replaces messages.gz with messages
$ gunzip messages.gz

# Keep the original file while compressing
$ gzip -k messages
$ ls
messages  messages.gz

# View a compressed file without decompressing it
$ zcat messages.gz
$ zless messages.gz

# Show compression ratio and file size
$ gzip -l messages.gz
compressed uncompressed  ratio  name
    102400      512000   80.0%  messages

bzip2: Higher Compression

bzip2 produces smaller files than gzip but takes longer to compress and decompress. It operates the same way — one file in, one .bz2 file out.

# Compress a file
$ bzip2 messages
$ ls
messages.bz2

# Decompress — bunzip2 is an alias for bzip2 -d
$ bunzip2 messages.bz2

# Keep the original file while compressing
$ bzip2 -k messages

# View a bzip2-compressed file without decompressing
$ bzcat messages.bz2
$ bzless messages.bz2
Tool Speed Compression ratio Best for
gzipFastGoodLogs, quick backups, streaming
bzip2SlowerBetterDistribution archives, large text files
xzSlowestBestSoftware packages, maximum space savings

xz: Maximum Compression

xz achieves the highest compression ratio of the three tools but is the slowest. It is the format used by RPM packages on RHEL.

# Compress a file
$ xz messages
$ ls
messages.xz

# Decompress
$ unxz messages.xz

# Keep the original file
$ xz -k messages

# View without decompressing
$ xzcat messages.xz
xz and RHEL packages

RPM packages on RHEL 9 use xz compression internally. When you run rpm -qf /path/to/file or extract an RPM with rpm2cpio package.rpm | cpio -idv, xz is working behind the scenes.

RHCSA Note

The RHCSA exam objective names gzip and bzip2 explicitly. Know all three — xz archives appear frequently in the real world and on exam systems.

Creating Compressed Archives

Adding a compression flag to tar -c creates and compresses in one step.

# Create a gzip-compressed archive (.tar.gz)
$ tar -czvf ssh-backup.tar.gz /etc/ssh
etc/ssh/
etc/ssh/sshd_config
etc/ssh/ssh_config

# Create a bzip2-compressed archive (.tar.bz2)
$ tar -cjvf ssh-backup.tar.bz2 /etc/ssh

# Create an xz-compressed archive (.tar.xz)
$ tar -cJvf ssh-backup.tar.xz /etc/ssh

# Archive multiple sources into one compressed file
$ tar -czvf configs.tar.gz /etc/ssh /etc/httpd /etc/firewalld

# Use a datestamp in the filename for rotation
$ tar -czvf backup-$(date +%F).tar.gz /etc
Name your archives clearly

Use the full extension (.tar.gz not just .tgz) and include a date stamp in backup filenames. This makes it immediately obvious what format the file is and when it was created.

Extracting Compressed Archives

Adding a compression flag to tar -x decompresses and extracts in one step.

# Extract a gzip-compressed archive here
$ tar -xzvf ssh-backup.tar.gz

# Extract a bzip2-compressed archive to a specific directory
$ tar -xjvf ssh-backup.tar.bz2 -C /tmp/restore

# Let tar detect the compression automatically (RHEL / GNU tar)
$ tar -xvf ssh-backup.tar.gz -C /tmp/restore

# List contents before extracting — always a good habit
$ tar -tzvf ssh-backup.tar.gz
$ tar -xzvf ssh-backup.tar.gz -C /tmp/restore
The target directory must exist

tar will not create the directory specified with -C. Create it first: mkdir -p /tmp/restore

RHCSA Tip

On the exam, read the task carefully — it will specify where to extract. Use -C /path/to/destination and confirm the directory exists first.

tar Quick Reference

Task Command
Create an uncompressed archivetar -cvf archive.tar /path
Create a gzip-compressed archivetar -czvf archive.tar.gz /path
Create a bzip2-compressed archivetar -cjvf archive.tar.bz2 /path
Create an xz-compressed archivetar -cJvf archive.tar.xz /path
List archive contentstar -tvf archive.tar
List compressed archive contentstar -tzvf archive.tar.gz
Extract to current directorytar -xvf archive.tar
Extract to a specific directorytar -xvf archive.tar -C /dest
Extract compressed to a directorytar -xzvf archive.tar.gz -C /dest
Extract one file from an archivetar -xvf archive.tar etc/ssh/sshd_config
Memory aid

The three operations are create, extract, and list. Always add f with the filename. Add v to see progress. Add z, j, or J for compression.

Working with Compressed Log Files

Linux log rotation compresses old log files automatically. These utilities let you read compressed logs without decompressing them first.

Compressed format View with cat Page through with less Search with grep
.gz zcat zless zgrep
.bz2 bzcat bzless bzgrep
.xz xzcat xzless xzgrep
# Search for errors across all rotated log files
$ zgrep -i error /var/log/messages-*.gz

# Page through a compressed log
$ zless /var/log/messages-20260518.gz

Practical Admin Scenarios

Back Up Config Files Before Editing

$ tar -czvf /root/ssh-before-$(date +%F).tar.gz /etc/ssh

Transfer a Directory Tree to Another Server

$ tar -czvf /tmp/webapp.tar.gz /var/www/html
$ scp /tmp/webapp.tar.gz student@serverb:/tmp/
$ ssh student@serverb 'tar -xzvf /tmp/webapp.tar.gz -C /var/www'

Restore a Specific File from a Backup

# List to find the exact path in the archive
$ tar -tzvf ssh-backup.tar.gz
# Extract just that one file
$ tar -xzvf ssh-backup.tar.gz -C /tmp etc/ssh/sshd_config

Compress a Large Log Before Archiving

$ gzip -k /var/log/messages
$ mv messages.gz /mnt/archive/

Common Mistakes

Mistake What goes wrong Correct approach
Forgetting -f tar treats the next argument as the archive name — often a flag — and fails with a confusing error Always include -f archivename
Extracting without -C Files land in the current directory — possibly overwriting existing files List first, then extract with -C /destination
Target directory does not exist tar exits with an error — no files are extracted Run mkdir -p /destination first
Compressing an already-compressed file The file grows larger — compression cannot compress random data Do not gzip a .tar.gz or .jpg — it will not help
Wrong compression flag for the format tar reports "not in gzip format" or similar Match the flag to the extension: -z for .gz, -j for .bz2
Using gzip on a directory gzip only compresses single files — it cannot bundle a directory Use tar -czvf to archive and compress together

Knowledge Check

Answer these before moving to the next slide.

  1. What is the difference between archiving and compression? Which tool on RHEL handles each?
  2. Write the command to create a gzip-compressed archive of /etc/httpd named httpd-backup.tar.gz.
  3. Write the command to list the contents of httpd-backup.tar.gz without extracting it.
  4. Write the command to extract httpd-backup.tar.gz into /tmp/restore. What must you do before running the command?
  5. What is the difference between gzip and bzip2? When would you prefer one over the other?
  6. You want to search for the word "error" in a rotated, gzip-compressed log file without decompressing it. What command do you use?

Knowledge Check — Answers

  1. Archiving combines multiple files into one, preserving directory structure and metadata — handled by tar. Compression reduces the size of a single file — handled by gzip, bzip2, and xz.
  2. tar -czvf httpd-backup.tar.gz /etc/httpd
  3. tar -tzvf httpd-backup.tar.gz
  4. tar -xzvf httpd-backup.tar.gz -C /tmp/restore
    Before running this command, create the destination directory: mkdir -p /tmp/restore
  5. gzip is faster but compresses less. bzip2 is slower but produces smaller files. Prefer gzip for speed-sensitive tasks like log compression and interactive backups. Prefer bzip2 when file size matters more than time, such as distributing large archives.
  6. zgrep -i error /var/log/messages-20260518.gz
    Or to search all rotated copies: zgrep -i error /var/log/messages*.gz

Key Takeaways

  1. Archiving and compression are separate operations. tar bundles files; gzip, bzip2, and xz reduce size. Combine them with tar's -z, -j, and -J flags to create compressed archives in one command.
  2. The three tar operations are create, extract, and list. Always add -f with the archive filename. Add -v to see progress. Add a compression flag to match the archive format.
  3. List before you extract. Use tar -tvf archive to confirm contents and paths before extracting. Use -C /destination to extract to a specific directory — create that directory first with mkdir -p.
  4. Use the right tool for the format. gzip — fast, good compression, .gz. bzip2 — slower, better compression, .bz2. xz — slowest, best compression, .xz. Use zcat, bzcat, and xzcat to read compressed files without decompressing.

Graded Lab

  • Create a gzip-compressed tar archive of /etc/ssh named ssh-backup.tar.gz in /tmp
  • List the contents of ssh-backup.tar.gz without extracting it and note the exact path stored for sshd_config
  • Create /tmp/restore and extract ssh-backup.tar.gz into it — confirm the files are present
  • Extract only the sshd_config file from the archive into /tmp/single
  • Create a bzip2-compressed archive of the same directory and compare file sizes: which compressor produced the smaller archive?
  • Compress a copy of /var/log/messages with gzip -k and use zgrep to search the compressed file for the word "error"
RHCSA Objective

"Archive, compress, unpack, and uncompress files using tar, gzip, and bzip2." — Create and extract compressed archives to and from specific locations.