flameiq.core — Comparison Engine

Comparator

FlameIQ deterministic comparison engine.

This is the most critical module in the codebase. It compares a current PerformanceSnapshot against a baseline and produces a ComparisonResult.

Determinism guarantee

Given identical inputs this module always produces identical outputs.

  • No randomness of any kind.

  • No datetime.now() calls.

  • No network I/O.

  • Floating-point arithmetic is explicit and documented.

  • All rounding uses Python’s built-in round() with fixed precision.

Floating-point policy

change_percent is computed as:

((current - baseline) / baseline) * 100

rounded to 4 decimal places for stable threshold comparisons. Division by zero is guarded — if baseline == 0 the metric is skipped with a warning and a ComparisonError is raised internally (caught and logged).

flameiq.core.comparator.compute_change_percent(baseline, current)[source]

Compute the signed percentage change from baseline to current.

Formula:

((current - baseline) / baseline) * 100

Rounded to _CHANGE_PERCENT_PRECISION decimal places.

Parameters:
  • baseline (float) – Reference value. Must be non-zero.

  • current (float) – Measured value.

Returns:

Signed percentage change, rounded to 4 d.p. Positive means current is larger than baseline.

Raises:

ComparisonError – If baseline is exactly zero.

Return type:

float

Examples:

compute_change_percent(100.0, 110.0)  # →  10.0
compute_change_percent(100.0,  90.0)  # → -10.0
compute_change_percent(100.0, 100.0)  # →   0.0
flameiq.core.comparator.compare_snapshots(baseline, current, threshold_config=None, warning_margin_percent=5.0)[source]

Compare current against baseline and return a full diff.

For every metric present in the baseline, the engine:

  1. Computes change_percent via compute_change_percent().

  2. Looks up the configured threshold (or applies the default).

  3. Calls evaluate_threshold() to determine pass / warning / regression.

Metrics present in current but absent from baseline are ignored — they have no reference value and cannot regress.

Parameters:
  • baseline (PerformanceSnapshot) – The reference snapshot.

  • current (PerformanceSnapshot) – The snapshot under evaluation.

  • threshold_config (dict[str, str | float] | None) – Raw threshold dict from flameiq.yaml, e.g. {"latency.p95": "10%"}. Falls back to defaults if None.

  • warning_margin_percent (float) – Distance from threshold that triggers a WARNING instead of PASS.

Returns:

A ComparisonResult with complete per-metric diffs and an overall RegressionStatus.

Return type:

ComparisonResult

Models

FlameIQ core domain models.

These dataclasses represent the results of FlameIQ operations. They are distinct from schema models (flameiq.schema.v1.models), which represent input data.

No external dependencies.

class flameiq.core.models.RegressionStatus(*values)[source]

Bases: str, Enum

The overall outcome of a baseline-vs-current comparison.

PASS = 'pass'

All metrics are within their configured thresholds.

REGRESSION = 'regression'

One or more metrics exceeded their threshold.

WARNING = 'warning'

No threshold breached, but metrics are approaching limits.

INSUFFICIENT_DATA = 'insufficient_data'

Statistical mode requested but sample count too low.

class flameiq.core.models.MetricDiff(metric_key, baseline_value, current_value, change_percent, threshold_percent, is_regression, is_warning=False, p_value=None, effect_size=None)[source]

Bases: object

The computed difference for a single metric key.

Parameters:
metric_key: str

Dotted key, e.g. "latency.p95".

baseline_value: float

The reference measurement.

current_value: float

The current measurement.

change_percent: float

Signed % change. Positive = current is larger.

threshold_percent: float

Configured allowance for this metric.

is_regression: bool

Whether the threshold was exceeded.

is_warning: bool = False

Within threshold but approaching the limit.

p_value: float | None = None

Statistical p-value (if statistical mode used).

effect_size: float | None = None

Cohen’s d (if statistical mode used).

property direction: str

'increased', 'decreased', or 'unchanged'.

Type:

Human-readable direction

property abs_change_percent: float

Absolute magnitude of the change.

__init__(metric_key, baseline_value, current_value, change_percent, threshold_percent, is_regression, is_warning=False, p_value=None, effect_size=None)
Parameters:
Return type:

None

property status_label: str

Short status string suitable for display.

class flameiq.core.models.ComparisonResult(status, diffs=<factory>, baseline_commit=None, current_commit=None, statistical_mode=False, summary=None)[source]

Bases: object

The complete result of a baseline-vs-current comparison.

Parameters:
__init__(status, diffs=<factory>, baseline_commit=None, current_commit=None, statistical_mode=False, summary=None)
Parameters:
Return type:

None

status: RegressionStatus

Overall pass/regression/warning outcome.

diffs: list[MetricDiff]

Per-metric differences, in metric-key order.

baseline_commit: str | None = None

Git SHA of the baseline snapshot, if available.

current_commit: str | None = None

Git SHA of the current snapshot, if available.

statistical_mode: bool = False

Whether the Mann-Whitney U test was applied.

summary: str | None = None

Optional human-readable summary string.

property regressions: list[MetricDiff]

Metrics that breached their threshold.

property warnings: list[MetricDiff]

Metrics within threshold but approaching the limit.

property passed: list[MetricDiff]

Metrics that passed cleanly, with no warning.

property exit_code: int

Standard CI exit code.

Returns:

0 for PASS / WARNING, 1 for REGRESSION.

to_dict()[source]

Serialise to a plain dict (e.g. for --json CLI output).

Return type:

dict[str, object]

Thresholds

FlameIQ threshold configuration and evaluation.

Thresholds are specified in flameiq.yaml as percent strings:

thresholds:
  latency.p95:  10%     # Allow up to 10% latency increase
  throughput:   -5%     # Allow up to 5% throughput decrease
  memory_mb:     8%     # Allow up to 8% memory increase

Sign convention:

  • Positive threshold (e.g. 10%) → allow up to +10% increase.

  • Negative threshold (e.g. -5%) → allow up to 5% decrease.

  • For known metrics the direction is inferred automatically (see evaluate_threshold()).

Default threshold: 10% in either direction for unknown metrics.

flameiq.core.thresholds.DEFAULT_THRESHOLD_PERCENT: float = 10.0

Default allowance applied when no explicit threshold is configured.

flameiq.core.thresholds.parse_threshold(key, raw)[source]

Parse a raw threshold value into a signed float.

Parameters:
  • key (str) – Metric key (used only for error messages).

  • raw (str | float | int) – A percent string ("10%", "-5%") or a numeric value.

Returns:

Float percentage, e.g. 10.0 or -5.0.

Raises:

ThresholdConfigError – If the string is not a valid percent.

Return type:

float

Examples:

parse_threshold("latency.p95", "10%")  # → 10.0
parse_threshold("throughput",  "-5%")  # → -5.0
parse_threshold("memory_mb",   10.0)   # → 10.0
flameiq.core.thresholds.evaluate_threshold(metric_key, change_percent, threshold_percent)[source]

Determine whether a change_percent breaches its threshold.

Direction semantics:

  • Higher-is-worse metrics (latency.*, memory_mb, cpu_percent): a regression is change_percent > +threshold.

  • Lower-is-worse metrics (throughput): a regression is change_percent < -abs(threshold).

  • Unknown / custom metrics: any absolute deviation beyond the threshold is flagged as a regression.

Parameters:
  • metric_key (str) – Dotted metric name.

  • change_percent (float) – Signed percent change (positive = increased).

  • threshold_percent (float) – The configured threshold float.

Returns:

True if the change is a regression.

Return type:

bool

flameiq.core.thresholds.build_threshold_map(raw_config)[source]

Parse a full threshold config dict into float values.

Parameters:

raw_config (dict[str, str | float]) – Raw mapping from flameiq.yaml, e.g. {"latency.p95": "10%", "throughput": "-5%"}.

Returns:

Parsed mapping of metric key → float threshold.

Raises:

ThresholdConfigError – If any value is invalid.

Return type:

dict[str, float]

Errors

FlameIQ exception hierarchy.

All FlameIQ exceptions derive from FlameIQError. This lets callers catch the entire FlameIQ surface with a single except FlameIQError, or target specific classes for fine-grained handling.

Rule: Never raise a bare Exception anywhere in the FlameIQ codebase. Always raise from this hierarchy.

Hierarchy:

FlameIQError
├── ValidationError
│   └── SchemaVersionError
├── ConfigurationError
│   └── ThresholdConfigError
├── BaselineError
│   ├── BaselineNotFoundError
│   └── BaselineCorruptedError
├── ProviderError
│   ├── ProviderNotFoundError
│   └── MetricsFileNotFoundError
├── ComparisonError
│   └── InsufficientSamplesError
└── StorageError
    └── MigrationError
exception flameiq.core.errors.FlameIQError[source]

Bases: Exception

Base class for all FlameIQ exceptions.

exception flameiq.core.errors.ValidationError[source]

Bases: FlameIQError

Raised when a snapshot fails schema validation.

exception flameiq.core.errors.SchemaVersionError(version)[source]

Bases: ValidationError

Raised when an unsupported schema version is encountered.

Parameters:

version (int) – The unsupported version number that was encountered.

Return type:

None

Initialize the error with the unsupported version number.

__init__(version)[source]

Initialize the error with the unsupported version number.

Parameters:

version (int)

Return type:

None

exception flameiq.core.errors.ConfigurationError[source]

Bases: FlameIQError

Raised when flameiq.yaml is missing, malformed, or invalid.

exception flameiq.core.errors.ThresholdConfigError(key, value, reason)[source]

Bases: ConfigurationError

Raised when a threshold value cannot be parsed or is out of range.

Parameters:
  • key (str) – The metric key the threshold applies to.

  • value (str) – The raw invalid threshold string.

  • reason (str) – Human-readable explanation of the error.

Return type:

None

Initialize the error with the invalid threshold details.

__init__(key, value, reason)[source]

Initialize the error with the invalid threshold details.

Parameters:
Return type:

None

exception flameiq.core.errors.BaselineError[source]

Bases: FlameIQError

Raised for baseline management failures.

exception flameiq.core.errors.BaselineNotFoundError(path)[source]

Bases: BaselineError

Raised when no baseline snapshot exists for the current context.

Parameters:

path (str) – The filesystem path where the baseline was expected.

Return type:

None

Initialize the error with the expected baseline path.

__init__(path)[source]

Initialize the error with the expected baseline path.

Parameters:

path (str)

Return type:

None

exception flameiq.core.errors.BaselineCorruptedError(path, reason)[source]

Bases: BaselineError

Raised when a baseline file exists but cannot be deserialised.

Parameters:
  • path (str) – Path to the corrupted file.

  • reason (str) – Description of the parse failure.

Return type:

None

Initialize the error with the corrupted file details.

__init__(path, reason)[source]

Initialize the error with the corrupted file details.

Parameters:
Return type:

None

exception flameiq.core.errors.ProviderError[source]

Bases: FlameIQError

Raised when a metric provider fails to collect or normalise data.

exception flameiq.core.errors.ProviderNotFoundError(name)[source]

Bases: ProviderError

Raised when a requested provider name is not registered.

Parameters:

name (str) – The requested provider name.

Return type:

None

Initialize the error with the missing provider name.

__init__(name)[source]

Initialize the error with the missing provider name.

Parameters:

name (str)

Return type:

None

exception flameiq.core.errors.MetricsFileNotFoundError(path)[source]

Bases: ProviderError

Raised when the metrics source file does not exist.

Parameters:

path (str) – The path that was not found.

Return type:

None

Initialize the error with the missing file path.

__init__(path)[source]

Initialize the error with the missing file path.

Parameters:

path (str)

Return type:

None

exception flameiq.core.errors.ComparisonError[source]

Bases: FlameIQError

Raised when a comparison cannot be completed.

exception flameiq.core.errors.InsufficientSamplesError(metric, got, required)[source]

Bases: ComparisonError

Raised when statistical mode is enabled but sample count is too low.

Parameters:
  • metric (str) – The metric name with insufficient samples.

  • got (int) – Number of samples available.

  • required (int) – Minimum samples required.

Return type:

None

Initialize the error with the insufficient samples details.

__init__(metric, got, required)[source]

Initialize the error with the insufficient samples details.

Parameters:
Return type:

None

exception flameiq.core.errors.StorageError[source]

Bases: FlameIQError

Raised for storage read/write failures.

exception flameiq.core.errors.MigrationError[source]

Bases: StorageError

Raised when a storage schema migration fails.