Data Decimation

High-rate sensor data — magnetometer logs at 50–500 Hz, dense GNSS tracks, RSS radar streams — can easily reach hundreds of thousands of rows in a single file. That makes plotting, gridding and any further processing slower than it needs to be, and rarely adds detail that survives the next downsampling step anyway.

The Data Decimation script reduces the row count of the opened CSV by a fixed factor while preserving the shape of every column — numeric, textual and angular alike. It is a generic tool: the script declares the "csv" wildcard in its templates, so it appears in the Scripts drop-down for any opened CSV file, regardless of the sensor it came from.

How it works

For every group of N consecutive rows (where N is the Decimation factor) the script keeps a single representative row. The way that representative is built is controlled by the Algorithm parameter:

Algorithm	Numeric columns	Text / categorical columns	Angular columns (0–360°)
`MEDIAN` (default)	Median of the window	Value from the center sample of the window	Unwrapped over the window, median, wrapped back to 0–360°
`MEAN`	Mean of the window	Value from the center sample of the window	Unwrapped over the window, mean, wrapped back to 0–360°
`INTERPOLATED`	Akima 1-D spline resampled on the new, sparser time axis	Value from the nearest center sample	Unwrapped before interpolation, wrapped back to 0–360°

A few details worth knowing:

If the file contains a Data valid column, rows where it equals 0 are dropped before decimation, so invalid samples never contribute to a window's average.
Angular columns are detected automatically — any numeric column whose values stay within 0..360 and contains jumps larger than 180° between neighbouring samples is treated as an angle (typical examples: heading, azimuth, yaw). Unwrapping prevents a sample at 359° and a sample at 1° from averaging to 180°.
The script always works on a temporary copy of the opened file (see Getting Started with Python Scripts). The decimated result is reloaded into the chart, but is not saved to disk until you press "Save" or "Save to...".

Parameters

Parameter	Type	Default	Range	Description
Decimation factor	integer	`10`	`2..500`	Number of input rows collapsed into a single output row. A factor of `10` reduces a 100 000-row file to 10 000 rows.
Algorithm	enum	`MEDIAN`	`MEDIAN`, `MEAN`, `INTERPOLATED`	Aggregation method used to build the representative row.

Running the script

Open a CSV file in GeoHammer ("Open files" or drag & drop) and select it in the file list so that its chart is active.

In the right-hand processing panel switch to the "Scripts" tab and pick Data Decimation from the drop-down. Since the script declares the wildcard csv template, it is available for any CSV file regardless of the sensor.

Selecting Data Decimation in the Scripts panel

Enter a Decimation factor. A factor of 10 is a safe starting point for most high-rate sensors. More aggressive values (e.g. 100) are convenient for previewing very large files or for producing lightweight overview maps.

Pick an Algorithm:
- MEDIAN — most robust to spikes and short bursts of noise. Use this by default for raw magnetometer / radiometer data with occasional outliers.
- MEAN — preserves the overall energy of the signal slightly better than median, but a single spike inside a window will pull the result toward it. Best after the data has already been cleaned.
- INTERPOLATED — fits an Akima spline through the original samples and resamples it at the new, sparser time axis. Produces the smoothest output and yields evenly spaced samples even when the input has small gaps. Use it on already-filtered data, since the spline does not reject outliers on its own.
Click "Apply" to run the script on the current file, or "Apply to All" to decimate every opened CSV with the same parameters.

While the script is running, the bottom status bar reports its progress, for example:

Reading file 2024-04-09-08-08-14-quspin.csv
Detected angular columns: ['Heading']
Decimating with factor 10 using MEDIAN
Rows: 123456 → 12346
Writing result to 2024-04-09-08-08-14-quspin.csv
Done

When it finishes, the chart, map and column list refresh with the decimated dataset. Remember to Save if you want to keep the result on disk.

When to use which algorithm

MEDIAN — safe default for raw sensor data.
MEAN — slightly smoother decimation after a median / low-pass filter has already removed outliers.
INTERPOLATED — best when the decimated data will be plotted or compared with other resampled signals: it preserves the shape of the curve and produces evenly spaced output samples.

For very large files a two-step approach often works well: first decimate with a small factor (10..20) to bring the data to a manageable size, run any further processing, and only then decimate again down to the target resolution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Data Decimation

How it works

Parameters

Running the script

When to use which algorithm

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Overview

Gamma-ray Spectrometer Data Processing

Magnetometer Data Processing

Ground-Penetrating Radar (GPR) Data Processing

Anomaly and Object marking

GeoTagger

Python Scripts in GeoHammer

GeoHammer Data Handling and Templates

Clone this wiki locally