flameiq.core — Comparison Engine

Comparator

FlameIQ deterministic comparison engine.

This is the most critical module in the codebase. It compares a current PerformanceSnapshot against a baseline and produces a ComparisonResult.

Determinism guarantee

Given identical inputs this module always produces identical outputs.

No randomness of any kind.
No datetime.now() calls.
No network I/O.
Floating-point arithmetic is explicit and documented.
All rounding uses Python’s built-in round() with fixed precision.

Floating-point policy

change_percent is computed as:

((current - baseline) / baseline) * 100

rounded to 4 decimal places for stable threshold comparisons. Division by zero is guarded — if baseline == 0 the metric is skipped with a warning and a ComparisonError is raised internally (caught and logged).

flameiq.core.comparator.compute_change_percent(baseline, current)[source]

Compute the signed percentage change from baseline to current.

Formula:

((current - baseline) / baseline) * 100

Rounded to _CHANGE_PERCENT_PRECISION decimal places.

Parameters:

baseline (float) – Reference value. Must be non-zero.
current (float) – Measured value.

Returns:

Signed percentage change, rounded to 4 d.p. Positive means current is larger than baseline.

Raises:

ComparisonError – If baseline is exactly zero.

Return type:

float

Examples:

compute_change_percent(100.0, 110.0)  # →  10.0
compute_change_percent(100.0,  90.0)  # → -10.0
compute_change_percent(100.0, 100.0)  # →   0.0

flameiq.core.comparator.compare_snapshots(baseline, current, threshold_config=None, warning_margin_percent=5.0)[source]

Compare current against baseline and return a full diff.

For every metric present in the baseline, the engine:

Computes change_percent via compute_change_percent().
Looks up the configured threshold (or applies the default).
Calls evaluate_threshold() to determine pass / warning / regression.

Metrics present in current but absent from baseline are ignored — they have no reference value and cannot regress.

Parameters:

baseline (PerformanceSnapshot) – The reference snapshot.
current (PerformanceSnapshot) – The snapshot under evaluation.
threshold_config (dict[str, str | float] | None) – Raw threshold dict from flameiq.yaml, e.g. {"latency.p95": "10%"}. Falls back to defaults if None.
warning_margin_percent (float) – Distance from threshold that triggers a WARNING instead of PASS.

Returns:

A ComparisonResult with complete per-metric diffs and an overall RegressionStatus.

Return type:

ComparisonResult

Models

FlameIQ core domain models.

These dataclasses represent the results of FlameIQ operations. They are distinct from schema models (flameiq.schema.v1.models), which represent input data.

No external dependencies.

class flameiq.core.models.RegressionStatus(*values)[source]

Bases: str, Enum

The overall outcome of a baseline-vs-current comparison.

PASS = 'pass': All metrics are within their configured thresholds.

REGRESSION = 'regression': One or more metrics exceeded their threshold.

WARNING = 'warning': No threshold breached, but metrics are approaching limits.

INSUFFICIENT_DATA = 'insufficient_data': Statistical mode requested but sample count too low.

class flameiq.core.models.MetricDiff(metric_key, baseline_value, current_value, change_percent, threshold_percent, is_regression, is_warning=False, p_value=None, effect_size=None)[source]

Bases: object

The computed difference for a single metric key.

Parameters:

metric_key (str)
baseline_value (float)
current_value (float)
change_percent (float)
threshold_percent (float)
is_regression (bool)
is_warning (bool)
p_value (float | None)
effect_size (float | None)

metric_key: str: Dotted key, e.g. "latency.p95".

baseline_value: float: The reference measurement.

current_value: float: The current measurement.

change_percent: float: Signed % change. Positive = current is larger.

threshold_percent: float: Configured allowance for this metric.

is_regression: bool: Whether the threshold was exceeded.

is_warning: bool = False: Within threshold but approaching the limit.

p_value: float | None = None: Statistical p-value (if statistical mode used).

effect_size: float | None = None: Cohen’s d (if statistical mode used).

property direction: str

'increased', 'decreased', or 'unchanged'.

Type:: Human-readable direction

property abs_change_percent: float: Absolute magnitude of the change.

__init__(metric_key, baseline_value, current_value, change_percent, threshold_percent, is_regression, is_warning=False, p_value=None, effect_size=None)

Parameters:

metric_key (str)
baseline_value (float)
current_value (float)
change_percent (float)
threshold_percent (float)
is_regression (bool)
is_warning (bool)
p_value (float | None)
effect_size (float | None)

Return type:

None

property status_label: str: Short status string suitable for display.

class flameiq.core.models.ComparisonResult(status, diffs=<factory>, baseline_commit=None, current_commit=None, statistical_mode=False, summary=None)[source]

Bases: object

The complete result of a baseline-vs-current comparison.

Parameters:

status (RegressionStatus)
diffs (list[MetricDiff])
baseline_commit (str | None)
current_commit (str | None)
statistical_mode (bool)
summary (str | None)

__init__(status, diffs=<factory>, baseline_commit=None, current_commit=None, statistical_mode=False, summary=None)

Parameters:

status (RegressionStatus)
diffs (list[MetricDiff])
baseline_commit (str | None)
current_commit (str | None)
statistical_mode (bool)
summary (str | None)

Return type:

None

status: RegressionStatus: Overall pass/regression/warning outcome.

diffs: list[MetricDiff]: Per-metric differences, in metric-key order.

baseline_commit: str | None = None: Git SHA of the baseline snapshot, if available.

current_commit: str | None = None: Git SHA of the current snapshot, if available.

statistical_mode: bool = False: Whether the Mann-Whitney U test was applied.

summary: str | None = None: Optional human-readable summary string.

property regressions: list[MetricDiff]: Metrics that breached their threshold.

property warnings: list[MetricDiff]: Metrics within threshold but approaching the limit.

property passed: list[MetricDiff]: Metrics that passed cleanly, with no warning.

property exit_code: int

Standard CI exit code.

Returns:: 0 for PASS / WARNING, 1 for REGRESSION.

to_dict()[source]

Serialise to a plain dict (e.g. for --json CLI output).

Return type:: dict[str, object]

Thresholds

FlameIQ threshold configuration and evaluation.

Thresholds are specified in flameiq.yaml as percent strings:

thresholds:
  latency.p95:  10%     # Allow up to 10% latency increase
  throughput:   -5%     # Allow up to 5% throughput decrease
  memory_mb:     8%     # Allow up to 8% memory increase

Sign convention:

Positive threshold (e.g. 10%) → allow up to +10% increase.
Negative threshold (e.g. -5%) → allow up to 5% decrease.
For known metrics the direction is inferred automatically (see evaluate_threshold()).

Default threshold: 10% in either direction for unknown metrics.

flameiq.core.thresholds.DEFAULT_THRESHOLD_PERCENT: float = 10.0: Default allowance applied when no explicit threshold is configured.

flameiq.core.thresholds.parse_threshold(key, raw)[source]

Parse a raw threshold value into a signed float.

Parameters:

key (str) – Metric key (used only for error messages).
raw (str | float | int) – A percent string ("10%", "-5%") or a numeric value.

Returns:

Float percentage, e.g. 10.0 or -5.0.

Raises:

ThresholdConfigError – If the string is not a valid percent.

Return type:

float

Examples:

parse_threshold("latency.p95", "10%")  # → 10.0
parse_threshold("throughput",  "-5%")  # → -5.0
parse_threshold("memory_mb",   10.0)   # → 10.0

flameiq.core.thresholds.evaluate_threshold(metric_key, change_percent, threshold_percent)[source]

Determine whether a change_percent breaches its threshold.

Direction semantics:

Higher-is-worse metrics (latency.*, memory_mb, cpu_percent): a regression is change_percent > +threshold.
Lower-is-worse metrics (throughput): a regression is change_percent < -abs(threshold).
Unknown / custom metrics: any absolute deviation beyond the threshold is flagged as a regression.

Parameters:

metric_key (str) – Dotted metric name.
change_percent (float) – Signed percent change (positive = increased).
threshold_percent (float) – The configured threshold float.

Returns:

True if the change is a regression.

Return type:

bool

flameiq.core.thresholds.build_threshold_map(raw_config)[source]

Parse a full threshold config dict into float values.

Parameters:: raw_config (dict[str, str | float]) – Raw mapping from flameiq.yaml, e.g. {"latency.p95": "10%", "throughput": "-5%"}.
Returns:: Parsed mapping of metric key → float threshold.
Raises:: ThresholdConfigError – If any value is invalid.
Return type:: dict[str, float]

Errors

FlameIQ exception hierarchy.

All FlameIQ exceptions derive from FlameIQError. This lets callers catch the entire FlameIQ surface with a single except FlameIQError, or target specific classes for fine-grained handling.

Rule: Never raise a bare Exception anywhere in the FlameIQ codebase. Always raise from this hierarchy.

Hierarchy:

FlameIQError
├── ValidationError
│   └── SchemaVersionError
├── ConfigurationError
│   └── ThresholdConfigError
├── BaselineError
│   ├── BaselineNotFoundError
│   └── BaselineCorruptedError
├── ProviderError
│   ├── ProviderNotFoundError
│   └── MetricsFileNotFoundError
├── ComparisonError
│   └── InsufficientSamplesError
└── StorageError
    └── MigrationError

exception flameiq.core.errors.FlameIQError[source]

Bases: Exception

Base class for all FlameIQ exceptions.

exception flameiq.core.errors.ValidationError[source]

Bases: FlameIQError

Raised when a snapshot fails schema validation.

exception flameiq.core.errors.SchemaVersionError(version)[source]

Bases: ValidationError

Raised when an unsupported schema version is encountered.

Parameters:: version (int) – The unsupported version number that was encountered.
Return type:: None

Initialize the error with the unsupported version number.

__init__(version)[source]

Initialize the error with the unsupported version number.

Parameters:: version (int)
Return type:: None

exception flameiq.core.errors.ConfigurationError[source]

Bases: FlameIQError

Raised when flameiq.yaml is missing, malformed, or invalid.

exception flameiq.core.errors.ThresholdConfigError(key, value, reason)[source]

Bases: ConfigurationError

Raised when a threshold value cannot be parsed or is out of range.

Parameters:

key (str) – The metric key the threshold applies to.
value (str) – The raw invalid threshold string.
reason (str) – Human-readable explanation of the error.

Return type:

None

Initialize the error with the invalid threshold details.

__init__(key, value, reason)[source]

Initialize the error with the invalid threshold details.

Parameters:

key (str)
value (str)
reason (str)

Return type:

None

exception flameiq.core.errors.BaselineError[source]

Bases: FlameIQError

Raised for baseline management failures.

exception flameiq.core.errors.BaselineNotFoundError(path)[source]

Bases: BaselineError

Raised when no baseline snapshot exists for the current context.

Parameters:: path (str) – The filesystem path where the baseline was expected.
Return type:: None

Initialize the error with the expected baseline path.

__init__(path)[source]

Initialize the error with the expected baseline path.

Parameters:: path (str)
Return type:: None

exception flameiq.core.errors.BaselineCorruptedError(path, reason)[source]

Bases: BaselineError

Raised when a baseline file exists but cannot be deserialised.

Parameters:

path (str) – Path to the corrupted file.
reason (str) – Description of the parse failure.

Return type:

None

Initialize the error with the corrupted file details.

__init__(path, reason)[source]

Initialize the error with the corrupted file details.

Parameters:

path (str)
reason (str)

Return type:

None

exception flameiq.core.errors.ProviderError[source]

Bases: FlameIQError

Raised when a metric provider fails to collect or normalise data.

exception flameiq.core.errors.ProviderNotFoundError(name)[source]

Bases: ProviderError

Raised when a requested provider name is not registered.

Parameters:: name (str) – The requested provider name.
Return type:: None

Initialize the error with the missing provider name.

__init__(name)[source]

Initialize the error with the missing provider name.

Parameters:: name (str)
Return type:: None

exception flameiq.core.errors.MetricsFileNotFoundError(path)[source]

Bases: ProviderError

Raised when the metrics source file does not exist.

Parameters:: path (str) – The path that was not found.
Return type:: None

Initialize the error with the missing file path.

__init__(path)[source]

Initialize the error with the missing file path.

Parameters:: path (str)
Return type:: None

exception flameiq.core.errors.ComparisonError[source]

Bases: FlameIQError

Raised when a comparison cannot be completed.

exception flameiq.core.errors.InsufficientSamplesError(metric, got, required)[source]

Bases: ComparisonError

Raised when statistical mode is enabled but sample count is too low.

Parameters:

metric (str) – The metric name with insufficient samples.
got (int) – Number of samples available.
required (int) – Minimum samples required.

Return type:

None

Initialize the error with the insufficient samples details.

__init__(metric, got, required)[source]

Initialize the error with the insufficient samples details.

Parameters:

metric (str)
got (int)
required (int)

Return type:

None

exception flameiq.core.errors.StorageError[source]

Bases: FlameIQError

Raised for storage read/write failures.

exception flameiq.core.errors.MigrationError[source]

Bases: StorageError

Raised when a storage schema migration fails.