Why separate? Unix philosophy: each tool does one thing well. tar archives, gzip compresses. Combined: powerful and flexible.
The tar Command
tar (tape archive) is the standard Unix/Linux tool for creating archives. It preserves file metadata and directory structure.
c
Create archive
x
Extract archive
t
List contents
v
Verbose output
f
Filename
# Basic tar syntax[student@server ~]$ tar [operation][options]-f archive.tar[files...]# The -f option MUST be followed by the filename# Operations: c (create), x (extract), t (list) - pick ONE# Options: v (verbose), z/j/J (compression), p (preserve permissions)
Remember:-f must be immediately followed by the archive filename. tar -cvf archive.tar not tar -cfv archive.tar
Creating Archives
# Create archive of a directory[student@server ~]$ tar -cvf backup.tar /home/student/documents/
documents/
documents/report.txt
documents/data/
documents/data/file1.csv
documents/data/file2.csv# Create archive of multiple items[student@server ~]$ tar -cvf project.tar file1.txt file2.txt mydir/
# Create archive with absolute paths removed (default behavior)[student@server ~]$ tar -cvf backup.tar /etc/hosts /etc/hostname
tar: Removing leading '/' from member names
etc/hosts
etc/hostname# Preserve absolute paths (use with caution!)[student@server ~]$ tar -cvPf backup.tar /etc/hosts
# Verify archive size[student@server ~]$ ls -lh backup.tar
-rw-r--r--. 1 student student 15M Jan 20 14:00 backup.tar
Leading / removed: By default, tar strips the leading / from paths. This is a safety feature - extraction won't accidentally overwrite system files.
Listing and Extracting
# List archive contents without extracting[student@server ~]$ tar -tvf backup.tar
drwxr-xr-x student/student 0 2024-01-20 14:00 documents/
-rw-r--r-- student/student 1024 2024-01-20 13:55 documents/report.txt
drwxr-xr-x student/student 0 2024-01-20 14:00 documents/data/
-rw-r--r-- student/student 2048 2024-01-20 13:50 documents/data/file1.csv# Extract entire archive to current directory[student@server ~]$ tar -xvf backup.tar
# Extract to a specific directory[student@server ~]$ tar -xvf backup.tar -C /tmp/restore/
# Extract specific files only[student@server ~]$ tar -xvf backup.tar documents/report.txt
# Extract files matching a pattern[student@server ~]$ tar -xvf backup.tar --wildcards "*.csv"
Best Practice: Always use tar -tvf to inspect an archive before extracting. Know what you're about to unpack!
Choose wisely: gzip for speed, xz for size, bzip2 for balance. Results vary by data type - text compresses well, already-compressed files (images, videos) don't.
Using gzip
# Compress a file (replaces original with .gz)[student@server ~]$ gzip largefile.txt
[student@server ~]$ ls
largefile.txt.gz# Decompress (replaces .gz with original)[student@server ~]$ gunzip largefile.txt.gz
# Or: gzip -d largefile.txt.gz# Keep original file while compressing[student@server ~]$ gzip -k largefile.txt
largefile.txt largefile.txt.gz# Compress to stdout (useful for pipes)[student@server ~]$ gzip -c largefile.txt > largefile.txt.gz
# View compressed file without decompressing[student@server ~]$ zcat largefile.txt.gz | head
[student@server ~]$ zless largefile.txt.gz
# Set compression level (1=fast, 9=best compression)[student@server ~]$ gzip -9 largefile.txt # Maximum compression[student@server ~]$ gzip -1 largefile.txt # Fastest
Note: gzip replaces the original file by default! Use -k to keep the original, or -c to output to stdout.
Using bzip2 and xz
# bzip2 - better compression, slower[student@server ~]$ bzip2 largefile.txt # Creates .bz2[student@server ~]$ bunzip2 largefile.txt.bz2 # Decompress[student@server ~]$ bzip2 -k largefile.txt # Keep original[student@server ~]$ bzcat largefile.txt.bz2 # View without decompress# xz - best compression, slowest[student@server ~]$ xz largefile.txt # Creates .xz[student@server ~]$ unxz largefile.txt.xz # Decompress[student@server ~]$ xz -k largefile.txt # Keep original[student@server ~]$ xzcat largefile.txt.xz # View without decompress# xz compression levels (0-9, default 6)[student@server ~]$ xz -9 largefile.txt # Maximum (very slow)[student@server ~]$ xz -0 largefile.txt # Fastest# xz with threads for faster compression[student@server ~]$ xz -T 4 largefile.txt # Use 4 CPU threads[student@server ~]$ xz -T 0 largefile.txt # Use all available CPUs
Performance tip: xz supports multi-threading with -T. Use -T 0 to automatically use all CPU cores for faster compression.
Compressed tar Archives
# Create gzip-compressed archive[student@server ~]$ tar -czvf backup.tar.gz /home/student/documents/
# -z tells tar to use gzip compression# Create bzip2-compressed archive[student@server ~]$ tar -cjvf backup.tar.bz2 /home/student/documents/
# -j tells tar to use bzip2 compression# Create xz-compressed archive[student@server ~]$ tar -cJvf backup.tar.xz /home/student/documents/
# -J (capital J) tells tar to use xz compression# Extract compressed archives (tar auto-detects compression)[student@server ~]$ tar -xvf backup.tar.gz # Works![student@server ~]$ tar -xzvf backup.tar.gz # Explicit gzip[student@server ~]$ tar -xjvf backup.tar.bz2 # Explicit bzip2[student@server ~]$ tar -xJvf backup.tar.xz # Explicit xz
Option
Compression
Extension
-z
gzip
.tar.gz or .tgz
-j
bzip2
.tar.bz2 or .tbz2
-J
xz
.tar.xz or .txz
Common tar Operations
# Preserve permissions (important for system backups)[root@server ~]# tar -cvpzf backup.tar.gz /etc/
# -p preserves permissions (default when root, explicit is clearer)# Exclude files/directories[student@server ~]$ tar -czvf backup.tar.gz --exclude='*.log' /home/student/
[student@server ~]$ tar -czvf backup.tar.gz --exclude='cache' --exclude='tmp' /var/www/
# Exclude from file[student@server ~]$ cat exclude.txt
*.tmp
*.log
cache/
.git/[student@server ~]$ tar -czvf backup.tar.gz -X exclude.txt /home/student/
# Update archive with newer files only[student@server ~]$ tar -uvf backup.tar /home/student/documents/
# Append files to existing archive (uncompressed only)[student@server ~]$ tar -rvf backup.tar newfile.txt
Note: Update (-u) and append (-r) only work with uncompressed archives. Compressed archives must be recreated entirely.
File Extensions Reference
Extension
Type
Create
Extract
.tar
Uncompressed archive
tar -cvf
tar -xvf
.tar.gz / .tgz
Gzip compressed
tar -czvf
tar -xzvf
.tar.bz2 / .tbz2
Bzip2 compressed
tar -cjvf
tar -xjvf
.tar.xz / .txz
XZ compressed
tar -cJvf
tar -xJvf
.gz
Gzip single file
gzip file
gunzip file.gz
.bz2
Bzip2 single file
bzip2 file
bunzip2 file.bz2
.xz
XZ single file
xz file
unxz file.xz
.zip
Zip archive
zip -r arch.zip dir/
unzip arch.zip
Convention matters: Extensions tell users what type of file it is and how to handle it. Always use appropriate extensions.
The zip Command
zip creates archives compatible with Windows and other operating systems. It combines archiving and compression in one format.
# Create zip archive of files[student@server ~]$ zip archive.zip file1.txt file2.txt file3.txt
# Create zip archive of directory (recursive)[student@server ~]$ zip -r project.zip project/
adding: project/ (stored 0%)
adding: project/README.md (deflated 45%)
adding: project/src/ (stored 0%)
adding: project/src/main.c (deflated 62%)# Extract zip archive[student@server ~]$ unzip project.zip
# Extract to specific directory[student@server ~]$ unzip project.zip -d /tmp/extract/
# List contents without extracting[student@server ~]$ unzip -l project.zip
# Add password protection[student@server ~]$ zip -e -r secure.zip sensitive/
Enter password:
Verify password:
When to use zip: Sharing with Windows users, email attachments, when recipients expect .zip format. For Linux-to-Linux, tar.gz is more common.
Transferring with scp
scp (secure copy) transfers files between systems over SSH. It encrypts data in transit.
# Copy local file to remote system[student@local ~]$ scp backup.tar.gz student@server:/home/student/
backup.tar.gz 100% 15MB 10.2MB/s 00:01# Copy from remote to local[student@local ~]$ scp student@server:/home/student/data.tar.gz ./
# Copy directory recursively[student@local ~]$ scp -r student@server:/var/www/ ./backup/
# Use specific port[student@local ~]$ scp -P 2222 backup.tar.gz student@server:/home/student/
# Copy between two remote systems[student@local ~]$ scp student@server1:/data/file.tar.gz student@server2:/backup/
# Preserve timestamps and permissions[student@local ~]$ scp -p backup.tar.gz student@server:/home/student/
# Verbose mode for debugging[student@local ~]$ scp -v backup.tar.gz student@server:/home/student/
Transferring with rsync
rsync efficiently synchronizes files, transferring only the differences. Ideal for backups and mirroring.
# Basic rsync (archive mode preserves everything)[student@local ~]$ rsync -av /home/student/documents/ student@server:/backup/docs/
# Sync with compression during transfer[student@local ~]$ rsync -avz /home/student/ student@server:/backup/
# Delete files on destination that don't exist on source (mirror)[student@local ~]$ rsync -av --delete /source/ /destination/
# Dry run - show what would happen without doing it[student@local ~]$ rsync -av --dry-run /source/ /destination/
# Exclude files[student@local ~]$ rsync -av --exclude='*.log' --exclude='cache/' /src/ /dst/
# Show progress for large transfers[student@local ~]$ rsync -av --progress /large/directory/ /backup/
# Resume interrupted transfer[student@local ~]$ rsync -av --partial /source/ /destination/
Key advantage: rsync only transfers changed portions of files. Syncing a 10GB directory where 100MB changed transfers only ~100MB.
Backup Best Practices
# Create timestamped backup[student@server ~]$ tar -czvf backup-$(date +%Y%m%d).tar.gz /home/student/
# Or with full timestamp[student@server ~]$ tar -czvf backup-$(date +%Y%m%d-%H%M%S).tar.gz /home/student/
backup-20240120-143052.tar.gz# Verify archive integrity after creation[student@server ~]$ tar -tzvf backup-20240120.tar.gz > /dev/null && echo "Archive OK"
Archive OK# Create checksum for verification[student@server ~]$ sha256sum backup-20240120.tar.gz > backup-20240120.tar.gz.sha256
[student@server ~]$ sha256sum -c backup-20240120.tar.gz.sha256
backup-20240120.tar.gz: OK# Full backup script example[student@server ~]$ cat backup.sh
#!/bin/bash
DATE=$(date +%Y%m%d)
BACKUP_DIR="/backup"
tar -czvf "$BACKUP_DIR/home-$DATE.tar.gz" /home/
sha256sum "$BACKUP_DIR/home-$DATE.tar.gz" > "$BACKUP_DIR/home-$DATE.tar.gz.sha256"
# Keep only last 7 days
find "$BACKUP_DIR" -name "home-*.tar.gz" -mtime +7 -delete
System Recovery Archives
# Backup critical system directories (as root)[root@server ~]# tar -czvpf system-config.tar.gz \
/etc \
/var/spool/cron \
/root \
--exclude='/etc/mtab'
# Backup user home directories[root@server ~]# tar -czvpf homes.tar.gz /home/
# Full system backup (excluding pseudo-filesystems)[root@server ~]# tar -czvpf full-backup.tar.gz \
--exclude=/proc \
--exclude=/sys \
--exclude=/dev \
--exclude=/run \
--exclude=/tmp \
--exclude=/mnt \
--exclude=/media \
--exclude=/lost+found \
--exclude='full-backup.tar.gz' \
/
# Restore while preserving ownership (as root)[root@server ~]# tar -xzvpf system-config.tar.gz -C /
# -p preserves permissions# -C / extracts to root filesystem
⚠ Caution: Restoring system files can break your system. Test on non-production systems first. Have rescue media ready.
Working with Large Archives
# Split large archive into smaller pieces[student@server ~]$ tar -czvf - /large/directory/ | split -b 1G - backup.tar.gz.part
backup.tar.gz.partaa
backup.tar.gz.partab
backup.tar.gz.partac# Reassemble and extract[student@server ~]$ cat backup.tar.gz.part* | tar -xzvf -
# Create archive directly to remote system (no local storage needed)[student@server ~]$ tar -czvf - /home/student/ | ssh user@backup-server "cat > /backup/home.tar.gz"
# Extract from remote archive without downloading[student@server ~]$ ssh user@backup-server "cat /backup/home.tar.gz" | tar -xzvf -
# Stream archive through compression to remote[student@server ~]$ tar -cvf - /data/ | xz -T 0 | ssh user@server "cat > /backup/data.tar.xz"
# Check archive progress (with pv if available)[student@server ~]$ tar -cvf - /large/ | pv | gzip > large.tar.gz
45.2GiB 0:12:34 [61.5MiB/s] [=========> ] 32% ETA 0:26:12
Streaming: Using -f - makes tar read/write to stdin/stdout, enabling powerful pipeline operations.
Troubleshooting Archives
# Archive is corrupted - try to extract what's possible[student@server ~]$ tar -xvf damaged.tar --ignore-zeros
[student@server ~]$ gzip -d -f damaged.tar.gz # Force despite errors# Test archive integrity[student@server ~]$ gzip -t backup.tar.gz
[student@server ~]$ bzip2 -t backup.tar.bz2
[student@server ~]$ xz -t backup.tar.xz
# Identify compression type of unknown file[student@server ~]$ file mystery.archive
mystery.archive: gzip compressed data, last modified: Sat Jan 20 14:00:00 2024# Check what's using disk space during archiving[student@server ~]$ du -sh /home/student/* | sort -h | tail -10
# Permission denied during extraction - need root?[student@server ~]$ tar -xvf backup.tar 2>&1 | grep -i "permission denied"
# File changed during archivingtar: /var/log/messages: file changed as we read it# This warning is usually OK - file was modified during backup
Common Issues: Permission denied (run as root), disk full (check space), corrupted archives (verify checksums), wrong compression flag.
Best Practices
✓ Do
Use timestamp in backup filenames
Verify archives after creation with -t
Create checksums for verification
Use -p for system backups (preserve permissions)
Test restoration process periodically
Use appropriate compression for the situation
List contents before extracting unknown archives
Use rsync for incremental/regular syncs
✗ Don't
Trust backups without verification
Extract archives as root without checking contents
Forget -r with zip for directories
Use absolute paths without understanding implications