Configuration Reference

FlameIQ is configured via flameiq.yaml in your project root. Run flameiq init to generate the file with annotated defaults.

Complete example

# flameiq.yaml
# Full reference: https://docs.flameiq.dev/guides/configuration.html

# ── Regression Thresholds ────────────────────────────────────────────────
thresholds:
  latency.mean:  10%    # Allow up to 10% mean latency increase
  latency.p50:   10%
  latency.p95:   10%    # Primary regression signal — recommended
  latency.p99:   15%    # Tail latency — wider tolerance
  throughput:    -5%    # Allow up to 5% throughput decrease
  memory_mb:      8%    # Allow up to 8% memory increase
  cpu_percent:   10%
  custom.score:   5%    # User-defined metric

# ── Baseline Management ──────────────────────────────────────────────────
baseline:
  strategy: rolling_median    # last_successful | rolling_median | tagged
  rolling_window: 5           # Used only with rolling_median

# ── Statistical Significance Testing ────────────────────────────────────
statistics:
  enabled: false              # Enable Mann-Whitney U test
  confidence: 0.95            # 95% confidence level

# ── Noise Handling ───────────────────────────────────────────────────────
noise:
  warmup_runs: 0              # Discard N leading samples before computing

# ── Metric Provider ──────────────────────────────────────────────────────
provider: json                # json | pytest-benchmark

Thresholds

Thresholds define how much change is allowed before a metric is declared a regression. They are percent strings with an optional sign:

thresholds:
  latency.p95: 10%     # ← positive: allow up to 10% increase
  throughput:  -5%     # ← negative: allow up to 5% decrease

Direction semantics

FlameIQ applies direction-aware logic for known metrics:

Metric

Regression when…

Example threshold

latency.*

change_percent > +threshold

10%

memory_mb

change_percent > +threshold

8%

cpu_percent

change_percent > +threshold

10%

throughput

change_percent < -(threshold)

-5% or 5%

custom.*

abs(change_percent) > abs(threshold)

5%

The default threshold is 10% for any metric not explicitly configured.

Per-metric threshold examples

thresholds:
  # Strict: production API latency gate
  latency.p95:  5%
  latency.p99: 10%

  # Relaxed: noisy ML inference benchmark
  latency.mean: 20%

  # Throughput: allow 10% drop
  throughput: 10%

  # Custom metrics from your benchmark
  custom.tokens_per_second: -8%
  custom.db_query_ms:        5%

Baseline strategies

The baseline strategy controls which snapshot is used as the reference point for each comparison.

last_successful (default)

Use the most recently stored snapshot. Simple, predictable, and correct for most use cases.

baseline:
  strategy: last_successful

rolling_median

Compute a synthetic baseline from the median of the last N snapshots. More resistant to noise from a single outlier run. Recommended for benchmarks running on shared CI infrastructure.

baseline:
  strategy: rolling_median
  rolling_window: 5    # Use last 5 runs (default)

The synthetic baseline uses median values for each metric key individually.

tagged

Use a snapshot explicitly tagged with a release label. Useful for comparing all PRs against a known-good release.

baseline:
  strategy: tagged

Tag a snapshot at release time:

flameiq baseline set --metrics release.json --tag v1.0.0

Future PRs will compare against the v1.0.0 baseline.

Statistical mode

When enabled, FlameIQ applies the Mann-Whitney U test alongside threshold comparison. A regression is only declared if both the threshold is exceeded and the difference is statistically significant.

statistics:
  enabled: true
  confidence: 0.95    # 95% confidence (α = 0.05)

Requires at least 3 samples per metric (configurable).

Note

Statistical mode requires your benchmark to produce sample arrays rather than single summary values. See the Statistical Methodology Specification specification for details.

Noise handling

If your benchmark framework produces multiple raw timing samples, FlameIQ can discard leading warmup samples before computing medians:

noise:
  warmup_runs: 2    # Discard the first 2 samples

Config file location

By default FlameIQ looks for flameiq.yaml in the current working directory. Override with the global --config option:

flameiq --config path/to/custom.yaml compare --metrics metrics.json