Skip to content

Data Decimation

VGroshkov edited this page May 28, 2026 · 2 revisions

High-rate sensor data — magnetometer logs at 50–500 Hz, dense GNSS tracks, RSS radar streams — can easily reach hundreds of thousands of rows in a single file. That makes plotting, gridding and any further processing slower than it needs to be, and rarely adds detail that survives the next downsampling step anyway.

The Data Decimation script reduces the row count of the opened CSV by a fixed factor while preserving the shape of every column — numeric, textual and angular alike. It is a generic tool: the script declares the "csv" wildcard in its templates, so it appears in the Scripts drop-down for any opened CSV file, regardless of the sensor it came from.



How it works

For every group of N consecutive rows (where N is the Decimation factor) the script keeps a single representative row. The way that representative is built is controlled by the Algorithm parameter:

Algorithm Numeric columns Text / categorical columns Angular columns (0–360°)
MEDIAN (default) Median of the window Value from the center sample of the window Unwrapped over the window, median, wrapped back to 0–360°
MEAN Mean of the window Value from the center sample of the window Unwrapped over the window, mean, wrapped back to 0–360°
INTERPOLATED Akima 1-D spline resampled on the new, sparser time axis Value from the nearest center sample Unwrapped before interpolation, wrapped back to 0–360°

A few details worth knowing:

  • If the file contains a Data valid column, rows where it equals 0 are dropped before decimation, so invalid samples never contribute to a window's average.
  • Angular columns are detected automatically — any numeric column whose values stay within 0..360 and contains jumps larger than 180° between neighbouring samples is treated as an angle (typical examples: heading, azimuth, yaw). Unwrapping prevents a sample at 359° and a sample at 1° from averaging to 180°.
  • The script always works on a temporary copy of the opened file (see Getting Started with Python Scripts). The decimated result is reloaded into the chart, but is not saved to disk until you press "Save" or "Save to...".


Parameters

Parameter Type Default Range Description
Decimation factor integer 10 2..500 Number of input rows collapsed into a single output row. A factor of 10 reduces a 100 000-row file to 10 000 rows.
Algorithm enum MEDIAN MEDIAN, MEAN, INTERPOLATED Aggregation method used to build the representative row.


Running the script

  1. Open a CSV file in GeoHammer ("Open files" or drag & drop) and select it in the file list so that its chart is active.
Opened CSV file in GeoHammer
  1. In the right-hand processing panel switch to the "Scripts" tab and pick Data Decimation from the drop-down. Since the script declares the wildcard csv template, it is available for any CSV file regardless of the sensor.
Selecting Data Decimation in the Scripts panel
  1. Enter a Decimation factor. A factor of 10 is a safe starting point for most high-rate sensors. More aggressive values (e.g. 100) are convenient for previewing very large files or for producing lightweight overview maps.
Decimation factor 100
  1. Pick an Algorithm:

    • MEDIAN — most robust to spikes and short bursts of noise. Use this by default for raw magnetometer / radiometer data with occasional outliers.
    • MEAN — preserves the overall energy of the signal slightly better than median, but a single spike inside a window will pull the result toward it. Best after the data has already been cleaned.
    • INTERPOLATED — fits an Akima spline through the original samples and resamples it at the new, sparser time axis. Produces the smoothest output and yields evenly spaced samples even when the input has small gaps. Use it on already-filtered data, since the spline does not reject outliers on its own.
    Algorithm drop-down
  2. Click "Apply" to run the script on the current file, or "Apply to All" to decimate every opened CSV with the same parameters.

While the script is running, the bottom status bar reports its progress, for example:

Reading file 2024-04-09-08-08-14-quspin.csv
Detected angular columns: ['Heading']
Decimating with factor 10 using MEDIAN
Rows: 123456 → 12346
Writing result to 2024-04-09-08-08-14-quspin.csv
Done

When it finishes, the chart, map and column list refresh with the decimated dataset. Remember to Save if you want to keep the result on disk.

Decimation result

When to use which algorithm

  • MEDIAN — safe default for raw sensor data.
  • MEAN — slightly smoother decimation after a median / low-pass filter has already removed outliers.
  • INTERPOLATED — best when the decimated data will be plotted or compared with other resampled signals: it preserves the shape of the curve and produces evenly spaced output samples.

For very large files a two-step approach often works well: first decimate with a small factor (10..20) to bring the data to a manageable size, run any further processing, and only then decimate again down to the target resolution.

Clone this wiki locally