Baseline Strategies

A baseline strategy determines which historical snapshot is used as the reference point when comparing a new run. FlameIQ v1.0 provides three strategies, each suited to different workflows.

Overview

Strategy	How it selects the baseline	Best for
`last_successful`	Most recently stored snapshot	Simple projects, low-noise CI
`rolling_median`	Median of the last N snapshots	Shared CI runners, noisy benchmarks
`tagged`	Snapshot with a specific release tag	Release-to-release comparisons

last_successful

Uses the single most recent snapshot saved via flameiq baseline set.

This is the default and simplest strategy. On every merge to main, the baseline advances to the latest measurements.

baseline:
  strategy: last_successful

Workflow:

main branch:
  commit A → flameiq baseline set  →  baseline = A
  commit B → flameiq baseline set  →  baseline = B
  ...

PR branch:
  current commit → flameiq compare → compared against B (latest)

Characteristics:

Deterministic: same baseline file → same result
Sensitive to one-off performance spikes on main
Requires flameiq baseline set to be run on every main commit

rolling_median

Computes a synthetic baseline from the median values across the last N snapshots. This filters out one-off measurement noise that can cause false regressions or false passes.

baseline:
  strategy: rolling_median
  rolling_window: 5

How the synthetic baseline is computed:

For each metric key (e.g. latency.p95), FlameIQ collects the values from the last N stored snapshots and computes the median:

\[\text{baseline}_{key} = \text{median}(v_1, v_2, \ldots, v_N)\]

The synthetic snapshot uses the most recent snapshot’s metadata (commit, branch, tags) and is marked with tags["flameiq_synthetic"] = "rolling_median".

Choosing a window size:

Window	Guidance
3	Minimal smoothing. Responsive to real changes.
5 (default)	Balanced. Recommended for most projects.
10	Heavy smoothing. Good for very noisy CI.

Characteristics:

Immune to single outlier runs
Requires at least N prior runs before becoming fully effective
The first run after flameiq init uses only 1 snapshot regardless

tagged

Uses a snapshot explicitly tagged with a label such as "v1.0.0". All subsequent comparisons are made against that fixed point, regardless of how many other baselines have been set in between.

Tag a release:

# After your v1.0.0 release:
flameiq baseline set \
  --metrics release_v1.0.0.json \
  --tag v1.0.0

Compare PRs against v1.0.0:

baseline:
  strategy: tagged
  # FlameIQ will search history for any snapshot tagged "v1.0.0"

Characteristics:

Pinned reference — comparisons are always against the same baseline
Ideal for main development comparing against a release tag
Requires --tag to have been used when setting the baseline

Switching strategies

You can switch strategies at any time by editing flameiq.yaml. The strategy only affects how flameiq compare selects the baseline — all stored history is retained.

# Change strategy by editing flameiq.yaml, then re-run comparison:
flameiq compare --metrics current.json --fail-on-regression

The new strategy takes effect immediately.