Article 4: Backup Strategy, Rsync Scripts, Failure Recovery
Preparing for failures — power loss, disk death, and human error — so the system survives when things go wrong.
Safety, redundancy, and disaster readiness
This article covers the most important part of a NAS : Backups and recovery.
My goal is:
If something breaks, I should know exactly what to restore and how.
Data Classification (Critical Step)
Before writing scripts, data must be classified : What you want to backup and what not? This is more important and require more thinking than writing the script which can be written using GPTs (no need to learn bash just for it), like what I did.
Primary Data Disk — /mnt/hdd4tb
Contains:
- Media
- Torrents
- Games
- Files
- Photos and Videos
This disk is NOT fully backed up.
Backup Disk — /mnt/hdd1tb
Contains only irreplaceable data:
- Photos and videos
- Important documents
- Immich library (final processed output)
- Docker compose files
This disk exists only for backups.
Backup Model
- Tool:
rsync - Frequency: daily
- Retention: last 180 days
- Direction:
/mnt/hdd4tb → /mnt/hdd1tband/home/<user>/home_nas → /mnt/hdd1tb
Creating the Backup View
The problem with backing up data from true paths:
1
2
/mnt/hdd4tb
/home/<user>/home_nas
- Files may change while rsync is reading them which may lead to errors , corrupted backups and inconsistencies.
By creating a stable backup view that contains read only bind mounts of required data, we are free from such problems. So it creates a safety boundary.
Directory Layout on Backup Disk
1
2
3
4
5
6
7
8
9
10
11
12
/mnt/hdd1tb/backups/
├── snapshots/
│ ├── 2026-01-06/
│ │ ├── hdd4tb/
│ │ └── home_nas/
│ ├── 2026-01-07/
│ ├── ...
│ └── 2026-01-12/
│ │ ├── hdd4tb/
│ │ └── home_nas/
└── daily/
└── latest -> /mnt/hdd1tb/backups/snapshots/2026-01-12
- Each snapshot is like a full backup
- Snapshots share unchanged files using hard links
latestalways points to the most recent backup- Each snapshot take up space for the incremental changes only.
This looks like 180 full backups but does not use 180× space.
The rsync Script
Create a script:
1
sudo vim /usr/local/bin/snapshot_backup.sh
snapshot_backup.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
#!/bin/bash
set -euo pipefail
################ CONFIG ################
BACKUP_ROOT="/mnt/hdd1tb/backups"
SNAPSHOTS="$BACKUP_ROOT/snapshots"
VIEW="/home/<user>/backup_view"
HOME_NAS="/home/<user>/home_nas"
RETENTION_DAYS=180
MIN_FREE_GB=10
LOG="/var/log/snapshot_backup.log"
LOCK_FILE="/var/lock/snapshot_backup.lock"
LOCK_FD=9
TODAY="$(date +%F)"
#######################################
cleanup_mounts() {
for p in \
"$VIEW/hdd4tb/files" \
"$VIEW/hdd4tb/immich" \
"$VIEW/home_nas/immich_app"
do
mountpoint -q "$p" && umount "$p" || true
done
}
finish() {
status=$?
cleanup_mounts
if [ "$status" -eq 0 ]; then
echo "=== Backup finished successfully at $(date) ===" >> "$LOG"
else
echo "=== Backup FAILED at $(date) (exit code $status) ===" >> "$LOG"
fi
}
trap finish EXIT
#######################################
# Safety checks
#######################################
[ "$(id -u)" -eq 0 ] || { echo "Must run as root" >> "$LOG"; exit 1; }
for cmd in rsync mount umount df flock date mountpoint; do
command -v "$cmd" >/dev/null || { echo "$cmd missing" >> "$LOG"; exit 1; }
done
exec {LOCK_FD}>"$LOCK_FILE"
flock -n "$LOCK_FD" || exit 0
#######################################
echo "=== Backup started at $(date) ===" >> "$LOG"
mkdir -p \
"$SNAPSHOTS" \
"$BACKUP_ROOT/daily" \
"$VIEW/hdd4tb/files" \
"$VIEW/hdd4tb/immich" \
"$VIEW/home_nas/immich_app" \
"$VIEW/home_nas/configs"
#######################################
# Bind mounts (read-only)
#######################################
cleanup_mounts
mount --bind /mnt/hdd4tb/files "$VIEW/hdd4tb/files"
mount --bind /mnt/hdd4tb/immich "$VIEW/hdd4tb/immich"
mount --bind "$HOME_NAS/immich_app" "$VIEW/home_nas/immich_app"
mount -o remount,bind,ro "$VIEW/hdd4tb/files"
mount -o remount,bind,ro "$VIEW/hdd4tb/immich"
mount -o remount,bind,ro "$VIEW/home_nas/immich_app"
#######################################
# Disk space check
#######################################
FREE_GB=$(df --output=avail -BG "$BACKUP_ROOT" | tail -1 | tr -dc '0-9')
if [ "$FREE_GB" -lt "$MIN_FREE_GB" ]; then
echo "Low disk space: ${FREE_GB}GB free" >> "$LOG"
exit 1
fi
#######################################
# Collect configs
#######################################
rsync -a --delete --prune-empty-dirs \
--include='*/' \
--include='*.yml' \
--include='*.env' \
--include='snapshot_backup.sh' \
--exclude='*' \
"$HOME_NAS/" \
"$VIEW/home_nas/configs/"
#######################################
# Snapshot creation
#######################################
DEST="$SNAPSHOTS/$TODAY"
PREV="$(readlink -f "$BACKUP_ROOT/daily/latest" 2>/dev/null || true)"
mkdir -p "$DEST"
RSYNC_OPTS=(
-aAXH
--numeric-ids
--delete-after
--partial
--inplace
--ignore-errors
--exclude='*.sock'
--exclude='*.pid'
--exclude='*.lock'
)
if [ -n "$PREV" ] && [ -d "$PREV" ] && [ -f "$PREV/.snapshot_complete" ] && [ "$PREV" != "$SNAPSHOTS/$TODAY" ]; then
echo "Incremental snapshot using link-dest=$PREV" >> "$LOG"
rsync "${RSYNC_OPTS[@]}" --link-dest="$PREV" "$VIEW/" "$DEST/"
else
echo "First snapshot or fallback full sync" >> "$LOG"
rsync "${RSYNC_OPTS[@]}" "$VIEW/" "$DEST/"
fi
touch "$DEST/.snapshot_complete"
ln -sfn "$DEST" "$BACKUP_ROOT/daily/latest"
#######################################
# Pruning (date-based, safe)
#######################################
ls -1d "$SNAPSHOTS/"[0-9]* 2>/dev/null \
| sort \
| head -n "-$RETENTION_DAYS" \
| xargs -r rm -rf
FREE_GB_AFTER=$(df --output=avail -BG "$BACKUP_ROOT" | tail -1 | tr -dc '0-9')
echo "Backup completed. Free space remaining: ${FREE_GB_AFTER}GB" >> "$LOG"
Make executable:
1
chmod +x /usr/local/bin/snapshot_backup.sh
What This Script Does
- Creates a dated snapshot directory
- Uses
--link-destfor hard-linked incremental backups - Copies only changed data
- Deletes files removed from the source
- Updates
daily/latestonly on success - Prunes snapshots by date, not modification time
Scheduling with cron
Edit crontab:
1
sudo crontab -e
Add:
1
0 2 * * * /usr/local/bin/snapshot_backup.sh
Backups run daily at 2 AM and saves the logs to /var/log/snapshot_backup.log
What Is Backed Up (Explicit List)
I back up:
/mnt/hdd4tb/immich/mnt/hdd4tb/files/home/<user>/home_nas:only the yaml files and the Immich PostgreSQL datasnapshot_backup.sh(the script itself 😅)
Testing Backups
Once a month:
1
2
ls /mnt/hdd1tb/backups/snapshots
cd /mnt/hdd1tb/backups/daily/latest
Verify:
- Files exist
- Old versions exist
- Hard links are present (
ls -li)
A backup you’ve never restored from is not a backup.
Failure Scenarios and Recovery
Case 1: File Deleted Accidentally
1
cp /mnt/hdd1tb/backups/snapshots/<date>/path/to/file /restore/location
That’s it 🙌
Case 2: Primary Disk Failure (4 TB)
Steps:
- Power off
- Replace disk
- Recreate mount point
- Mount new disk
- Restore data from backup:
1
rsync -a /mnt/hdd1tb/daily/latest/ /mnt/hdd4tb/
System is back.
Case 3: Backup Disk Failure (1 TB)
- No immediate data loss
- Replace disk
- Reinitialise backup directory
- Next backup recreates everything
This is why backups are not mirrored live.
My Perspective
This is not the best approach, I feel like I have to myself work more upon it but this article may give you a brief idea about how to approach this problem.
Other Technologies
You will often see the following names when reading about other NAS setups. They solve storage and redundancy problems, not backups by default. They are mentioned here only for awareness and brief introduction.
RAID
RAID combines multiple disks into a single logical unit to improve fault tolerance, performance, or both, depending on the RAID level.
RAID only protects against disk failure. It does not protect against accidental deletion, corruption, or ransomware. RAID is not a backup.
MergerFS
MergerFS is a simple filesystem layer that merges multiple disks into a single directory tree.
It provides no redundancy by itself. Each disk remains independent. If one disk fails, only the data on that disk is lost.
Its main advantage is simplicity and flexibility, and it pairs well with file-level backups like rsync.
I plan to use MergerFS when I upgrade my NAS to more disks.
ZFS
ZFS is both a filesystem and a volume manager built around data integrity. It uses checksums, copy-on-write, and snapshots to detect and prevent silent data corruption. It also includes built-in RAID-like redundancy.
ZFS is very reliable but also strict, memory-hungry, and less forgiving of mistakes. It works best when correctness and long-term integrity matter more than simplicity.
SnapRAID
SnapRAID is a parity-based system designed for mostly static data such as media collections.
Unlike RAID, parity is updated manually or on a schedule, not in real time. This makes it safer against accidental deletion but unsuitable for frequently changing data.
SnapRAID is often used alongside MergerFS.
Final Thought
People may obsess over:
- RAID levels
- Filesystems
- Benchmarks
But recovery is what matters.
A simple backup that restores correctly beats a clever system you don’t understand.