Hi
Backups that complete successfully (virtnbdbackup exits 0, all data + checkpoints written) are marked status=failed in vm_backups with error_code=VIRTNBD_FALSE_SUCCESS whenever the virtnbdbackup output contains the substring error anywhere — including in INFO/WARNING-level lines.
The Affected version is current
- vmbackup
0.5.6 (current main at time of writing, installed via make install from git tag v0.5.6)
- Tested against virtnbdbackup
2.46-1.noarch.rhel9 on AlmaLinux 9.6
Root cause
vmbackup.sh line 3594:
if [[ -f "$backup_log" ]] && grep -qi "ERROR" "$backup_log" 2>/dev/null; then
grep -qi "ERROR" is a substring match. It triggers on benign lines that happen to contain "error" or "ERROR":
| Log line (real captured) | Source | Should fail backup? |
| [32m...] INFO root metadata - backupBitlockerRecoveryKey [main]: Unable to pull recovery keys for [C:]: internal error: unable to execute QEMU agent command... (logged at WARNING level, but contains "internal error") |
virtnbdbackup metadata module | No — backup data is intact |
| INFO ... FSTRIM <vm> [/]: ERROR — trim not supported (logged at INFO level by fstrim_optimization_module.sh) |
vmbackup itself | No — trim is unrelated to backup |
| INFO root metadata - backupBitlockerRecoveryKey [main]: System does not appear to have bitlocker tools installed, skipping. | virtnbdbackup | No — author explicitly says "skipping" |
All three of those produce VIRTNBD_FALSE_SUCCESS despite the backup file being on disk, checkpoint XML being written, qemu-img check passing, and vmrestore --list-restore-points returning a valid FULL entry.
We hit this 6 times in one fleet run across 3 hypervisors (~85 VMs total), about ~7%. Specifically:
- 4 Windows VMs (BitLocker key extraction WARNING)
- 1 Linux VM with VirtIO disk without TRIM (FSTRIM error message)
- 1 VM where virtnbdbackup said "bitlocker not installed, skipping"
Each one requires manual SQLite intervention to flip status back to success, or the next monthly cycle will treat the VM as having failed and re-do a full instead of an incremental — wasting hours and TB of NFS write.
The error_code VIRTNBD_FALSE_SUCCESS suggests the author was already aware this is "false" — but the status field is still set to "failed", which is what downstream tools and humans look at.
The Suggested fix
Replace the substring grep with a level-prefix match. virtnbdbackup formats ERROR lines with a distinct prefix (timestamp + ERROR token, plus ANSI [31m).
One option:
# Match only lines where the LEVEL token is "ERROR", not any substring.
# Strip ANSI first so the regex works on both colored and uncolored output.
if [[ -f "$backup_log" ]] && \
sed -E 's/\x1b\[[0-9;]*m//g' "$backup_log" | grep -qE '^\[[0-9-]+ [0-9:]+\] ERROR ' 2>/dev/null; then
Or use virtnbdbackup's exit code as the source of truth (it already exits non-zero on real failures) and stop trying
to re-classify based on log greppping at all — but I assume the current logic exists because some virtnbdbackup
versions did exit 0 on genuine errors.
Happy to send a PR if helpful — just confirm which approach you prefer.
Reproduction
- Run
vmbackup --run --vm <any-windows-vm-without-bitlocker> on a host with virtnbdbackup ≥ 2.40
- Wait for it to complete
- Observe:
vmbackup --status shows the VM as failed with error_code=VIRTNBD_FALSE_SUCCESS
- But
vmrestore --list-restore-points <vm> shows a valid FULL restore point
ls -la $BACKUP_PATH/<vm>/YYYYMM/ shows vda.full.data (no .partial), .chksum,
checkpoints/virtnbdbackup.0.xml — all the markers of a successful run
Workaround (for anyone hitting this before a fix lands)
After each run, patch the SQLite state:
UPDATE vm_backups
SET status='success', event_type='false_failure_corrected'
WHERE status='failed' AND error_code='VIRTNBD_FALSE_SUCCESS';
Hi
Backups that complete successfully (virtnbdbackup exits 0, all data + checkpoints written) are marked
status=failedinvm_backupswitherror_code=VIRTNBD_FALSE_SUCCESSwhenever the virtnbdbackup output contains the substring error anywhere — including in INFO/WARNING-level lines.The Affected version is current
0.5.6(currentmainat time of writing, installed viamake installfrom git tagv0.5.6)2.46-1.noarch.rhel9on AlmaLinux 9.6Root cause
vmbackup.shline 3594:grep -qi "ERROR"is a substring match. It triggers on benign lines that happen to contain "error" or "ERROR":| Log line (real captured) | Source | Should fail backup? |
|
[32m...] INFO root metadata - backupBitlockerRecoveryKey [main]: Unable to pull recovery keys for [C:]: internal error: unable to execute QEMU agent command...(logged at WARNING level, but contains "internal error") |virtnbdbackup metadata module | No — backup data is intact |
|
INFO ... FSTRIM <vm> [/]: ERROR — trim not supported(logged at INFO level byfstrim_optimization_module.sh) |vmbackup itself | No — trim is unrelated to backup |
|
INFO root metadata - backupBitlockerRecoveryKey [main]: System does not appear to have bitlocker tools installed, skipping.| virtnbdbackup | No — author explicitly says "skipping" |All three of those produce
VIRTNBD_FALSE_SUCCESSdespite the backup file being on disk, checkpoint XML being written,qemu-img checkpassing, andvmrestore --list-restore-pointsreturning a valid FULL entry.We hit this 6 times in one fleet run across 3 hypervisors (~85 VMs total), about ~7%. Specifically:
Each one requires manual SQLite intervention to flip status back to success, or the next monthly cycle will treat the VM as having failed and re-do a full instead of an incremental — wasting hours and TB of NFS write.
The error_code
VIRTNBD_FALSE_SUCCESSsuggests the author was already aware this is "false" — but the status field is still set to "failed", which is what downstream tools and humans look at.The Suggested fix
Replace the substring grep with a level-prefix match. virtnbdbackup formats ERROR lines with a distinct prefix (timestamp +
ERRORtoken, plus ANSI[31m).One option:
Or use virtnbdbackup's exit code as the source of truth (it already exits non-zero on real failures) and stop trying
to re-classify based on log greppping at all — but I assume the current logic exists because some virtnbdbackup
versions did exit 0 on genuine errors.
Happy to send a PR if helpful — just confirm which approach you prefer.
Reproduction
vmbackup --run --vm <any-windows-vm-without-bitlocker>on a host with virtnbdbackup ≥ 2.40vmbackup --statusshows the VM asfailedwitherror_code=VIRTNBD_FALSE_SUCCESSvmrestore --list-restore-points <vm>shows a valid FULL restore pointls -la $BACKUP_PATH/<vm>/YYYYMM/showsvda.full.data(no.partial),.chksum,checkpoints/virtnbdbackup.0.xml— all the markers of a successful runWorkaround (for anyone hitting this before a fix lands)
After each run, patch the SQLite state: