Convex traffic having downtime

Incident Report for Convex

Postmortem

From around 5:18am to 5:54am Pacific (12:18pm to 12:54pm UTC), Convex had a 36 min period of intermittent downtime that affected all Convex services.

The specific issue was a cascading failure in our traffic layer. We had a traffic node (Caddy) run out of memory due to an unforeseen load spike and instead of just being restarted/replaced this node was marked as permanently down by our container management layer (Nomad) which led to the issue propagating to all traffic servers.

Since the incident we've more than doubled the size of our traffic layer, fixed the failover behavior which led to nodes staying failed after OOMing, and will be investigating alternative traffic services.

As always data was safe during this incident but we really apologize for the availability impact during that time period.

Posted Sep 20, 2025 - 20:43 UTC

Resolved

This incident has been resolved.
Posted Sep 20, 2025 - 13:15 UTC

Monitoring

An unexpected traffic pattern overloaded some of our services, causing intermittent unavailability across Convex instances. We've added extra capacity and are monitoring to ensure that the system is stable.
Posted Sep 20, 2025 - 12:57 UTC

Investigating

We are currently investigating this issue.
Posted Sep 20, 2025 - 12:18 UTC