Mitigate cold‑starts with warmed model pools.
First request after idle is noticeably slower.
Reduce response times via model routing, caching, and streaming UI.
Plan long workflows across limits with pacing and batch scheduling.
Recover partial outputs and resume generation seamlessly.