Incident with Databases and Filesystems

Incident Report for Latitude.sh

Postmortem

Impact: All customer databases in the Dallas region experienced unavailability during this window. No data loss occurred.

On July 28th, 2025, the Latitude.sh Databases cluster in our Dallas region experienced a critical failure due to a broader site-level outage. The incident led to the loss of the cluster’s internal state, rendering it unsalvageable. As a result, all customer databases in this region became unavailable.

Our engineering team immediately initiated recovery efforts. We provisioned a new environment, redeployed the database control plane, and restored each customer database from off-site backups.

After 35 hours of continuous work, services were fully restored. All customer data is safe and accessible. However, configuration-level metadata (such as trusted sources) could not be recovered and must be manually recreated.

The failure originated from the control plane node pool of the Dallas cluster. The resulting corruption of the cluster’s internal state made it impossible to safely recover or rejoin the remaining nodes. Compounding the issue was the absence of recent control plane snapshots, which limited restoration options.

We are conducting a full forensic analysis to determine the root cause of the corruption and evaluate the failover mechanisms that were expected to mitigate such an event.

Impact

  • All databases in Dallas were unavailable for the period
  • All customer data was preserved and restored to new clusters
  • Trusted sources (firewall rules) were lost and must be reconfigured by customers

Immediate Actions Taken

  • Isolated the failed cluster to prevent further damage
  • Deployed a new cluster in Dallas
  • Restored customer databases from off-site backups
  • Validated database integrity and access for each tenant

Customer Actions Required

Since all databases were recreated in a new environment, customers must take the following steps:

  1. Update Database Connection URIs and Credentials
    Your database URI and credentials have changed. Please check your dashboard or reach out to support to retrieve your new connection details.
  2. Recreate Trusted Sources
    Any previously configured trusted sources (firewall allowlists) were not recoverable and need to be manually re-added.
  3. Review Application Integrations
    If you have automated services or applications depending on the old URI or IPs, ensure those are updated to avoid connectivity issues.
Posted Jul 31, 2025 - 19:00 UTC

Resolved

This incident has been resolved.
Posted Jul 31, 2025 - 18:52 UTC

Identified

The issue has been identified and a fix is being implemented.
Posted Jul 30, 2025 - 02:05 UTC

Investigating

We have identified an issue with our Databases cluster, which has impacted the application availability. Our team is actively working on restoring these services.
Posted Jul 29, 2025 - 15:07 UTC
This incident affected: Services (Databases, Filesystem).