Skip to content

vmbackup misclassifies successful backups as VIRTNBD_FALSE_SUCCESS due to too-permissive ERROR regex #1

@hostarts

Description

@hostarts

Hi

Backups that complete successfully (virtnbdbackup exits 0, all data + checkpoints written) are marked status=failed in vm_backups with error_code=VIRTNBD_FALSE_SUCCESS whenever the virtnbdbackup output contains the substring error anywhere — including in INFO/WARNING-level lines.

The Affected version is current

  • vmbackup 0.5.6 (current main at time of writing, installed via make install from git tag v0.5.6)
  • Tested against virtnbdbackup 2.46-1.noarch.rhel9 on AlmaLinux 9.6

Root cause

vmbackup.sh line 3594:

  if [[ -f "$backup_log" ]] && grep -qi "ERROR" "$backup_log" 2>/dev/null; then

grep -qi "ERROR" is a substring match. It triggers on benign lines that happen to contain "error" or "ERROR":

| Log line (real captured) | Source | Should fail backup? |

| [32m...] INFO root metadata - backupBitlockerRecoveryKey [main]: Unable to pull recovery keys for [C:]: internal error: unable to execute QEMU agent command... (logged at WARNING level, but contains "internal error") |
virtnbdbackup metadata module | No — backup data is intact |
| INFO ... FSTRIM <vm> [/]: ERROR — trim not supported (logged at INFO level by fstrim_optimization_module.sh) |
vmbackup itself | No — trim is unrelated to backup |
| INFO root metadata - backupBitlockerRecoveryKey [main]: System does not appear to have bitlocker tools installed, skipping. | virtnbdbackup | No — author explicitly says "skipping" |

All three of those produce VIRTNBD_FALSE_SUCCESS despite the backup file being on disk, checkpoint XML being written, qemu-img check passing, and vmrestore --list-restore-points returning a valid FULL entry.

We hit this 6 times in one fleet run across 3 hypervisors (~85 VMs total), about ~7%. Specifically:

  • 4 Windows VMs (BitLocker key extraction WARNING)
  • 1 Linux VM with VirtIO disk without TRIM (FSTRIM error message)
  • 1 VM where virtnbdbackup said "bitlocker not installed, skipping"

Each one requires manual SQLite intervention to flip status back to success, or the next monthly cycle will treat the VM as having failed and re-do a full instead of an incremental — wasting hours and TB of NFS write.

The error_code VIRTNBD_FALSE_SUCCESS suggests the author was already aware this is "false" — but the status field is still set to "failed", which is what downstream tools and humans look at.

The Suggested fix
Replace the substring grep with a level-prefix match. virtnbdbackup formats ERROR lines with a distinct prefix (timestamp + ERROR token, plus ANSI [31m).

One option:

  # Match only lines where the LEVEL token is "ERROR", not any substring.
  # Strip ANSI first so the regex works on both colored and uncolored output.                                           
  if [[ -f "$backup_log" ]] && \
     sed -E 's/\x1b\[[0-9;]*m//g' "$backup_log" | grep -qE '^\[[0-9-]+ [0-9:]+\] ERROR ' 2>/dev/null; then              

Or use virtnbdbackup's exit code as the source of truth (it already exits non-zero on real failures) and stop trying
to re-classify based on log greppping at all — but I assume the current logic exists because some virtnbdbackup
versions did exit 0 on genuine errors.

Happy to send a PR if helpful — just confirm which approach you prefer.

Reproduction

  1. Run vmbackup --run --vm <any-windows-vm-without-bitlocker> on a host with virtnbdbackup ≥ 2.40
  2. Wait for it to complete
  3. Observe: vmbackup --status shows the VM as failed with error_code=VIRTNBD_FALSE_SUCCESS
  4. But vmrestore --list-restore-points <vm> shows a valid FULL restore point
  5. ls -la $BACKUP_PATH/<vm>/YYYYMM/ shows vda.full.data (no .partial), .chksum,
    checkpoints/virtnbdbackup.0.xml — all the markers of a successful run

Workaround (for anyone hitting this before a fix lands)

After each run, patch the SQLite state:

  UPDATE vm_backups                                                                                                     
  SET status='success', event_type='false_failure_corrected'
  WHERE status='failed' AND error_code='VIRTNBD_FALSE_SUCCESS';                                                         

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions