Neural ODE · Phase 7
Learned dynamics research layer
COVID Flow keeps the ensemble on region dashboards. This page documents the Neural ODE research track: a constrained correction experiment on wastewater activity, not a replacement production forecaster.
Why this is not on production dashboards
v1.7.5-shrinkage-conservative is frozen as the canonical research model. It is calibrated and research-positive, but the promotion gate blocks it for one clear reason.
Four-week MAE is slightly above the allowed ensemble slack on the holdout slice (Illinois 0.773 vs 0.746 allowed; Cook 0.739 vs 0.722). Every other production safety check passes.
That is a useful finding, not a failure: the model learned selective h1/h2 corrections, knows when to abstain on longer horizons, and produces ~79% coverage on nominal 80% bands.
We do not promote this run and do not keep tuning the same gate to chase 4-week MAE. Simple ensemble rules remain stronger at four weeks on noisy weekly surveillance.
On Illinois and Cook dashboards
The default forecast is always the ensemble baseline. Neural ODE overlays appear only after an explicit manual promotion, which we do not plan for the frozen v1.7.5 research reference.
Where modeling goes next
The next product step is honest communication in Model Lab, not more aggressive 4-week Neural ODE tuning.
- Short-horizon selective correction (h1/h2) with richer wastewater context, including site quality, coverage, and regime labels.
- Better data and evaluation hygiene before new architectures.
- Optional one-shot h4-abstention experiment (1.7.6). If it cannot clear the narrow 4w gap without hurting h1/h2, stop forcing h4 and treat week-4 as ensemble territory.
Research conclusion (v1.7.5)
The model can sometimes help, knows when to mostly stay quiet, and produces calibrated uncertainty. At four weeks, simple ensemble baselines are still hard to beat, so COVID Flow keeps the ensemble on production dashboards.
What works
- H1 is protected and competitive with persistence.
- Correction gates work: h2 and h4 mostly abstain instead of inventing dynamics.
- 80% intervals land near ~79% empirical coverage after recalibration.
- Beats ensemble on many origins, especially at 2 weeks (IL ~59–79% improved by slice; Cook ~71%).
Why not production
- 4-week MAE is just above the promotion slack vs ensemble: IL 0.773 vs allowed 0.746; Cook 0.739 vs allowed 0.722.
- Useful as a selective correction layer, not as a wholesale forecasting model.
- Short/medium-horizon signal is scientifically interesting; reliable 4-week dynamics are not.
Frame Neural ODE as constrained dynamics learning under noisy weekly wastewater data: a positive research result, not a failed product bet.
Canonical research candidate (v1.7.5, frozen)
- Not broken: bounded h1, shrinkage gates, and ~79% interval coverage with interpretable end-to-end behavior.
- Not a production replacement: 4w MAE narrowly misses ensemble×1.05 (IL +3.6%, Cook +2.4% above slack on holdout).
- Scientifically interesting at h2: large share of origins beat ensemble when the gate allows a correction.
- Frozen reference run; optional 1.7.6 only tests stronger h4 abstention.
Research track: not a case-count predictor
All models here predict a weekly wastewater activity index derived from NWSS surveillance. They do not directly forecast confirmed COVID cases, hospitalizations, or deaths.
The ensemble remains the trusted production forecast. v1.7.5 is the frozen canonical research candidate: safe, interpretable, and near-miss on 4-week lift only.
28 candidate runs in the database; none are promoted to production dashboards yet.
Phase 7 · Learned dynamics
What is a Neural ODE here?
Instead of a fixed rule like “next week equals this week,” we fit a small neural network that describes how fast activity changes at any moment along a trajectory. The model then integrates that law forward in time, similar in spirit to how physics simulates motion, but learned from wastewater history.
How it works in three steps
Step 01
Observe the signal
We use the same weekly weighted activity index as the rest of COVID Flow, a normalized measure of viral RNA in community wastewater.
Step 02
Learn a dynamical law
A Neural ODE fits a smooth rate-of-change function to historical weeks. Think of it as learning “if activity is here today, how fast is it moving?” rather than memorizing one fixed rule.
Step 03
Project forward
The model integrates that law up to four weeks ahead, producing point forecasts, uncertainty bands, and an instantaneous rate-of-change curve you can read on the dashboard.
Promotion gate (held-out weeks)
Holdout metrics from training are compared to production baseline metrics in Model Lab. Rolling-origin scores after inference use the same evaluation pipeline as baselines.
- 1-week error must beat or match persistence (hard to beat at short horizon).
- 2-week error must beat or match the ensemble baseline.
- 4-week error may be up to 5% above ensemble MAE.
- Trend direction at 1 week must stay within 5 percentage points of persistence.
- RMSE at each horizon must not exceed ensemble by more than 20%.
- 80% interval coverage must stay within a sane band (not empty, not trivially wide).
- No severe degradation vs ensemble in rising, falling, stable, or turn-point regimes.
- Training metadata (seed, data hash, artifact) must be recorded for reproducibility.
Promotion tiers
- Production safe: all held-out production checks pass (manual promote only).
- Near miss: model is safe on h1, intervals, and regimes, but 4w MAE is slightly above ensemble×1.05 (not enough long-horizon lift).
- Not production safe: fails persistence, interval, regime, or multi-horizon checks (legacy v1.6-style behavior).
- Research value: selective correction beats ensemble on enough origins at h2 with conservative gates (separate from production).
- h4 abstention (research): when only h4 improvement fails but gates are conservative, that is expected abstention, not a broken model.
Training runs
Each region has its own model name (neural_ode_IL, neural_ode_17031). v1.7.5-shrinkage-conservative is the frozen canonical research reference (candidate only). Production dashboard forecasts remain ensemble-first.
Holdout vs baselines
Holdout MAE from Neural ODE training compared to production baseline rolling-origin scores. Lower MAE is better. This table helps you see whether promotion criteria are within reach before running promote_model.py.
| Region / model | Status | 1w MAE | 2w MAE | 4w MAE | 1w trend |
|---|---|---|---|---|---|
| Persistence (baseline) | production | 0.309 | 0.412 | 0.581 | 48.5% |
| Ensemble (baseline) | production | 0.348 | 0.422 | 0.557 | — |
| Cook Countyneural_ode_17031 v1.7.5-shrinkage-conservative | candidate | 0.334 | 0.434 | 0.612 | 49.1% |
| Illinoisneural_ode_IL v1.7.5-shrinkage-conservative | candidate | 0.286 | 0.399 | 0.577 | 45.8% |
Calibration & regime breakdown
Interval calibration
Empirical coverage of nominal 80% forecast bands. Values near 80% suggest well-calibrated uncertainty; very low coverage means intervals are too narrow, very high coverage means they are too wide.
| Region / model | Status | 80% coverage |
|---|---|---|
| Cook Countyv1.7.5-shrinkage-conservative | candidate | 79.1% |
| Illinoisv1.7.5-shrinkage-conservative | candidate | 79.1% |
Regime-specific MAE
Error broken out by origin-week regime: rising, falling, stable, and turn-point weeks near ±0.25 activity-index thresholds.
| Region / model | rising | falling | stable | turn point | unknown |
|---|---|---|---|---|---|
| Cook Countyneural_ode_17031 · candidate | 0.306 | 0.504 | 0.476 | 0.635 | — |
| Illinoisneural_ode_IL · candidate | 0.315 | 0.470 | 0.451 | 0.433 | — |
Data-quality segments
Holdout error on high-quality origin weeks (quality score ≥ 0.7) vs lower-quality weeks. Useful for spotting whether the model fails mainly on sparse or noisy periods.
| Region / model | High quality MAE | Low quality MAE |
|---|---|---|
| Cook County | 0.461 | 0.566 |
| Illinois | 0.300 | 0.437 |