Testing Standards
FlameIQ has high testing standards. The core comparison engine is mathematically critical, and bugs here cause missed regressions in production systems at scale.
Test organisation
tests/
├── unit/
│ ├── test_core/ # comparator, thresholds, models
│ ├── test_engine/ # statistics, baseline strategies
│ ├── test_schema/ # schema models, serialisation
│ ├── test_storage/ # baseline store
│ └── test_providers/ # json, pytest-benchmark providers
├── statistical/ # Algorithm correctness on known distributions
├── integration/ # Full end-to-end pipeline tests
├── e2e/ # CLI invocation tests
└── fixtures/ # Shared test data and snapshot builders
Coverage requirements
Module |
Minimum coverage |
Notes |
|---|---|---|
|
95% |
Critical path. All branches must be tested. |
|
100% |
Immutable contract. Every field validated. |
|
90% |
All statistical paths tested with known inputs. |
|
85% |
Including corruption recovery and empty history. |
|
80% |
Valid, invalid, missing file for each provider. |
|
75% |
Integration tests count toward CLI coverage. |
Determinism tests
Any function in core/ or engine/ that is advertised as
deterministic must have a determinism test:
@pytest.mark.determinism
def test_compare_snapshots_is_deterministic():
results = [
compare_snapshots(BASELINE, CURRENT)
for _ in range(100)
]
first = results[0]
for r in results[1:]:
assert r.status == first.status
assert r.exit_code == first.exit_code
for d1, d2 in zip(r.diffs, first.diffs):
assert d1.change_percent == d2.change_percent
Statistical tests
Statistical tests use analytically known distributions — never random data:
@pytest.mark.statistical
@pytest.mark.parametrize("p95_current,threshold,expected", [
(105.0, "10%", RegressionStatus.PASS), # +5.0% < threshold
(110.0, "10%", RegressionStatus.PASS), # +10.0% = threshold (pass)
(110.1, "10%", RegressionStatus.REGRESSION), # +10.1% > threshold
(115.0, "10%", RegressionStatus.REGRESSION), # +15.0% > threshold
])
def test_latency_threshold(p95_current, threshold, expected):
baseline = make_snapshot(p95=100.0)
current = make_snapshot(p95=p95_current)
result = compare_snapshots(
baseline, current,
threshold_config={"latency.p95": threshold},
)
assert result.status == expected
Fixture conventions
Use
tmp_path(pytest built-in) for all filesystem tests — never write to the actual project directoryUse
make_snapshot()fromtests/fixtures/for all snapshot construction — never construct snapshots inline in testsMark all tests with the appropriate marker:
unit,integration,statistical,determinism