Neural ODE · Phase 7

Learned dynamics research layer

Name: COVID Flow Illinois wastewater activity indices
Creator: COVID Flow
License: https://www.usa.gov/government-works

COVID Flow keeps the ensemble on region dashboards. This page documents the Neural ODE research track: a constrained correction experiment on wastewater activity, not a replacement production forecaster.

Why this is not on production dashboards

v1.7.5-shrinkage-conservative is frozen as the canonical research model. It is calibrated and research-positive, but the promotion gate blocks it for one clear reason.

Four-week MAE is slightly above the allowed ensemble slack on the holdout slice (Illinois 0.773 vs 0.746 allowed; Cook 0.739 vs 0.722). Every other production safety check passes.

That is a useful finding, not a failure: the model learned selective h1/h2 corrections, knows when to abstain on longer horizons, and produces ~79% coverage on nominal 80% bands.

We do not promote this run and do not keep tuning the same gate to chase 4-week MAE. Simple ensemble rules remain stronger at four weeks on noisy weekly surveillance.

On Illinois and Cook dashboards

The default forecast is always the ensemble baseline. Neural ODE overlays appear only after an explicit manual promotion, which we do not plan for the frozen v1.7.5 research reference.

Where modeling goes next

The next product step is honest communication in Model Lab, not more aggressive 4-week Neural ODE tuning.

Short-horizon selective correction (h1/h2) with richer wastewater context, including site quality, coverage, and regime labels.
Better data and evaluation hygiene before new architectures.
Optional one-shot h4-abstention experiment (1.7.6). If it cannot clear the narrow 4w gap without hurting h1/h2, stop forcing h4 and treat week-4 as ensemble territory.

Research conclusion (v1.7.5)

The model can sometimes help, knows when to mostly stay quiet, and produces calibrated uncertainty. At four weeks, simple ensemble baselines are still hard to beat, so COVID Flow keeps the ensemble on production dashboards.

What works

H1 is protected and competitive with persistence.
Correction gates work: h2 and h4 mostly abstain instead of inventing dynamics.
80% intervals land near ~79% empirical coverage after recalibration.
Beats ensemble on many origins, especially at 2 weeks (IL ~59–79% improved by slice; Cook ~71%).

Why not production

4-week MAE is just above the promotion slack vs ensemble: IL 0.773 vs allowed 0.746; Cook 0.739 vs allowed 0.722.
Useful as a selective correction layer, not as a wholesale forecasting model.
Short/medium-horizon signal is scientifically interesting; reliable 4-week dynamics are not.

Frame Neural ODE as constrained dynamics learning under noisy weekly wastewater data: a positive research result, not a failed product bet.

Canonical research candidate (v1.7.5, frozen)

Not broken: bounded h1, shrinkage gates, and ~79% interval coverage with interpretable end-to-end behavior.
Not a production replacement: 4w MAE narrowly misses ensemble×1.05 (IL +3.6%, Cook +2.4% above slack on holdout).
Scientifically interesting at h2: large share of origins beat ensemble when the gate allows a correction.
Frozen reference run; optional 1.7.6 only tests stronger h4 abstention.

Research track: not a case-count predictor

All models here predict a weekly wastewater activity index derived from NWSS surveillance. They do not directly forecast confirmed COVID cases, hospitalizations, or deaths.

The ensemble remains the trusted production forecast. v1.7.5 is the frozen canonical research candidate: safe, interpretable, and near-miss on 4-week lift only.

28 candidate runs in the database; none are promoted to production dashboards yet.

Phase 7 · Learned dynamics

What is a Neural ODE here?

Instead of a fixed rule like “next week equals this week,” we fit a small neural network that describes how fast activity changes at any moment along a trajectory. The model then integrates that law forward in time, similar in spirit to how physics simulates motion, but learned from wastewater history.

How it works in three steps

Step 01

Observe the signal

We use the same weekly weighted activity index as the rest of COVID Flow, a normalized measure of viral RNA in community wastewater.

Step 02

Learn a dynamical law

A Neural ODE fits a smooth rate-of-change function to historical weeks. Think of it as learning “if activity is here today, how fast is it moving?” rather than memorizing one fixed rule.

Step 03

Project forward

The model integrates that law up to four weeks ahead, producing point forecasts, uncertainty bands, and an instantaneous rate-of-change curve you can read on the dashboard.

Promotion gate (held-out weeks)

Holdout metrics from training are compared to production baseline metrics in Model Lab. Rolling-origin scores after inference use the same evaluation pipeline as baselines.

1-week error must beat or match persistence (hard to beat at short horizon).
2-week error must beat or match the ensemble baseline.
4-week error may be up to 5% above ensemble MAE.
Trend direction at 1 week must stay within 5 percentage points of persistence.
RMSE at each horizon must not exceed ensemble by more than 20%.
80% interval coverage must stay within a sane band (not empty, not trivially wide).
No severe degradation vs ensemble in rising, falling, stable, or turn-point regimes.
Training metadata (seed, data hash, artifact) must be recorded for reproducibility.

Promotion tiers

Production safe: all held-out production checks pass (manual promote only).
Near miss: model is safe on h1, intervals, and regimes, but 4w MAE is slightly above ensemble×1.05 (not enough long-horizon lift).
Not production safe: fails persistence, interval, regime, or multi-horizon checks (legacy v1.6-style behavior).
Research value: selective correction beats ensemble on enough origins at h2 with conservative gates (separate from production).
h4 abstention (research): when only h4 improvement fails but gates are conservative, that is expected abstention, not a broken model.

On region dashboards: use the forecast model selector to view ensemble baseline, Neural ODE, or both. The rate-of-change chart appears when Neural ODE is selected. It shows the model’s estimated derivative (dx/dt), not a separate laboratory measurement.

Training runs

Each region has its own model name (neural_ode_IL, neural_ode_17031). v1.7.5-shrinkage-conservative is the frozen canonical research reference (candidate only). Production dashboard forecasts remain ensemble-first.

neural_ode_17031

candidatecanonical conservativeNear miss (safe; 4w lift short)Research (h4 abstention)

Neural ODE · v1.7.5-shrinkage-conservative · updated Jun 27

Holdout 1w MAE

0.3340

80% coverage

0.7913

vs ensemble (origins)

53.5% improved

Gate h2

0.0259

Gate h4

0.0953

Seed

44.0000

Data hash

db5dcda03527…

Safe on short horizons and intervals, but 4-week MAE is slightly above the ensemble slack. This remains a research candidate only.

neural_ode_IL

candidatecanonical conservativeNear miss (safe; 4w lift short)Research (h4 abstention)

Neural ODE · v1.7.5-shrinkage-conservative · updated Jun 27

Holdout 1w MAE

0.2863

80% coverage

0.7913

vs ensemble (origins)

45.6% improved

Gate h2

0.0287

Gate h4

0.1015

Seed

42.0000

Data hash

c769f1d4b210…

Safe on short horizons and intervals, but 4-week MAE is slightly above the ensemble slack. This remains a research candidate only.

Holdout vs baselines

Holdout MAE from Neural ODE training compared to production baseline rolling-origin scores. Lower MAE is better. This table helps you see whether promotion criteria are within reach before running promote_model.py.

Region / model	Status	1w MAE	2w MAE	4w MAE	1w trend
Persistence (baseline)	production	0.309	0.412	0.581	48.5%
Ensemble (baseline)	production	0.348	0.422	0.557	—
Cook Countyneural_ode_17031 v1.7.5-shrinkage-conservative	candidate	0.334	0.434	0.612	49.1%
Illinoisneural_ode_IL v1.7.5-shrinkage-conservative	candidate	0.286	0.399	0.577	45.8%

Calibration & regime breakdown

Interval calibration

Empirical coverage of nominal 80% forecast bands. Values near 80% suggest well-calibrated uncertainty; very low coverage means intervals are too narrow, very high coverage means they are too wide.

Region / model	Status	80% coverage
Cook Countyv1.7.5-shrinkage-conservative	candidate	79.1%
Illinoisv1.7.5-shrinkage-conservative	candidate	79.1%

Regime-specific MAE

Error broken out by origin-week regime: rising, falling, stable, and turn-point weeks near ±0.25 activity-index thresholds.

Region / model	rising	falling	stable	turn point	unknown
Cook Countyneural_ode_17031 · candidate	0.306	0.504	0.476	0.635	—
Illinoisneural_ode_IL · candidate	0.315	0.470	0.451	0.433	—

Data-quality segments

Holdout error on high-quality origin weeks (quality score ≥ 0.7) vs lower-quality weeks. Useful for spotting whether the model fails mainly on sparse or noisy periods.

Region / model	High quality MAE	Low quality MAE
Cook County	0.461	0.566
Illinois	0.300	0.437