Last month I spent two hours staring at a perfectly healthy RDS instance while our staging application threw connection timeout errors on every deploy. The database was running, CloudWatch showed zero CPU pressure, and the credentials were correct. It turned out the RDS instance had been recreated in a new subnet that had no route to the application’s VPC CIDR, and the security group was still referencing the old application security group ID. Two tiny misconfigurations, invisible from the RDS console’s green “Available” status. In this post, I’ll walk through every layer that can cause RDS connection timeouts and exactly how to fix each one.

The Problem

Your RDS instance shows “Available” in the console, but application connections fail immediately or hang until timeout. You see errors like these:

Error Type Description
Connection Timed Out FATAL: could not connect to server: Connection timed out or Can't connect to MySQL server on 'hostname' (110)
Connection Refused Connection refused on the correct port (3306/5432)
No Route to Host No route to host when connecting from EC2 or Lambda
Too Many Connections FATAL: too many connections for role or ERROR 1040 (HY000): Too many connections

The application is dead in the water, and the RDS console gives you nothing useful because the database engine itself is fine.

Why Does This Happen?

  • Security group does not allow inbound traffic on the database port — The RDS instance’s security group must explicitly allow TCP traffic on port 3306 (MySQL/Aurora MySQL) or 5432 (PostgreSQL/Aurora PostgreSQL) from the application’s security group or CIDR range. If someone tightened the rules or the application moved to a new security group, connections are silently dropped.

  • RDS instance is in a private subnet with no route to the application — If the RDS instance sits in a subnet whose route table has no path to the application’s network (whether that’s another subnet in the same VPC, a peered VPC, or an on-premises network over VPN), packets never arrive. This is common after VPC peering changes or Transit Gateway route table updates.

  • Network ACL is blocking the port — Unlike security groups, NACLs are stateless. Even if you allow inbound 3306, you need a matching outbound rule for ephemeral ports (1024-65535) back to the client. A restrictive NACL on either the application subnet or the RDS subnet can kill connectivity.

  • RDS parameter group has a low max_connections value — If the parameter group sets max_connections too low (or if it’s derived from a small instance class via the {DBInstanceClassMemory} formula), legitimate connections get rejected once the limit is hit. The database is healthy but actively refusing new clients.

  • DNS resolution is failing for the RDS endpoint — The RDS endpoint is a DNS name, not an IP. If the application is in a VPC with enableDnsSupport or enableDnsHostnames turned off, or if the application is resolving against a custom DNS server that doesn’t forward to AmazonProvidedDNS, the hostname doesn’t resolve.

  • RDS instance is in a different Availability Zone with no cross-AZ routing — After a failover in a Multi-AZ deployment, the standby in a different AZ becomes primary. If the subnet group or routing doesn’t cover that AZ, clients lose connectivity.

The Fix

Work through these checks in order. Start with the cheapest, most common causes first, then move deeper.

Step 1: Verify the RDS instance is actually available and get its endpoint

# Confirm the RDS instance status and retrieve the endpoint
aws rds describe-db-instances \
  --db-instance-identifier my-app-database \
  --region us-east-1 \
  --query 'DBInstances[0].{Status:DBInstanceStatus,Endpoint:Endpoint.Address,Port:Endpoint.Port,VPC:DBSubnetGroup.VpcId,SecurityGroups:VpcSecurityGroups[*].VpcSecurityGroupId}' \
  --output table

If the status is anything other than “available,” wait for the operation to complete. Note the endpoint, port, VPC ID, and security group IDs for the next steps.

Step 2: Test basic connectivity from the application host

# From your EC2 instance or container, test TCP connectivity
# Replace the endpoint and port with your values
nc -zv my-app-database.c9aksdjf.us-east-1.rds.amazonaws.com 5432 -w 5

# If nc is not available, use timeout + bash
timeout 5 bash -c 'echo > /dev/tcp/my-app-database.c9aksdjf.us-east-1.rds.amazonaws.com/5432' && echo "Connected" || echo "Timed out"

If this times out, the problem is network-level (security group, NACL, routing). If it connects but your application still fails, skip to Step 6 for parameter group checks.

Step 3: Check the security group inbound rules

# Get the security group IDs attached to the RDS instance
SG_IDS=$(aws rds describe-db-instances \
  --db-instance-identifier my-app-database \
  --region us-east-1 \
  --query 'DBInstances[0].VpcSecurityGroups[*].VpcSecurityGroupId' \
  --output text)

# Check inbound rules for each security group
for sg in $SG_IDS; do
  echo "=== Security Group: $sg ==="
  aws ec2 describe-security-groups \
    --group-ids "$sg" \
    --region us-east-1 \
    --query 'SecurityGroups[0].IpPermissions' \
    --output table
done

You need an inbound rule allowing TCP on your database port (3306 or 5432) from either the application’s security group ID or its CIDR range. If it’s missing, add it:

# Add an inbound rule allowing PostgreSQL from the application security group
aws ec2 authorize-security-group-ingress \
  --group-id sg-0rds123456 \
  --protocol tcp \
  --port 5432 \
  --source-group sg-0app789012 \
  --region us-east-1

Step 4: Verify subnet routing and VPC peering

# Get the subnets in the RDS subnet group
aws rds describe-db-subnet-groups \
  --db-subnet-group-name my-db-subnet-group \
  --region us-east-1 \
  --query 'DBSubnetGroups[0].Subnets[*].{SubnetId:SubnetIdentifier,AZ:SubnetAvailabilityZone.Name}' \
  --output table

# Check the route table for each RDS subnet
aws ec2 describe-route-tables \
  --filters "Name=association.subnet-id,Values=subnet-0rds111aaa" \
  --region us-east-1 \
  --query 'RouteTables[0].Routes' \
  --output table

If the application is in a different VPC (peered or via Transit Gateway), confirm the route table includes a route to the application’s CIDR pointing to the peering connection or Transit Gateway. Check the application subnet’s route table too — routing must work in both directions.

Step 5: Check Network ACLs on both subnets

# Check the NACL for the RDS subnet
aws ec2 describe-network-acls \
  --filters "Name=association.subnet-id,Values=subnet-0rds111aaa" \
  --region us-east-1 \
  --query 'NetworkAcls[0].{Inbound:Entries[?Egress==`false`],Outbound:Entries[?Egress==`true`]}' \
  --output json

Verify that the NACL allows inbound TCP on port 5432 (or 3306) and outbound TCP on ephemeral ports 1024-65535 back to the application’s CIDR. Do the same check for the application’s subnet NACL in reverse.

Step 6: Check DNS resolution

# From the application host, verify the RDS endpoint resolves
nslookup my-app-database.c9aksdjf.us-east-1.rds.amazonaws.com

# Verify VPC DNS settings
aws ec2 describe-vpc-attribute \
  --vpc-id vpc-0abc123def \
  --attribute enableDnsSupport \
  --region us-east-1

aws ec2 describe-vpc-attribute \
  --vpc-id vpc-0abc123def \
  --attribute enableDnsHostnames \
  --region us-east-1

Both enableDnsSupport and enableDnsHostnames must be true. If either is false, the RDS endpoint won’t resolve within the VPC.

Step 7: Check and adjust max_connections in the parameter group

# Identify the parameter group
aws rds describe-db-instances \
  --db-instance-identifier my-app-database \
  --region us-east-1 \
  --query 'DBInstances[0].DBParameterGroups[*].DBParameterGroupName' \
  --output text

# Check the current max_connections value
aws rds describe-db-parameters \
  --db-parameter-group-name my-custom-params \
  --region us-east-1 \
  --query "Parameters[?ParameterName=='max_connections'].{Name:ParameterName,Value:ParameterValue,ApplyMethod:ApplyMethod}" \
  --output table

If max_connections is too low or set to the default formula on a small instance, increase it:

# Create a custom parameter group if using the default (you cannot modify default groups)
aws rds create-db-parameter-group \
  --db-parameter-group-name my-custom-params \
  --db-parameter-group-family postgres15 \
  --description "Custom params with higher max_connections" \
  --region us-east-1

# Set max_connections to a fixed value
aws rds modify-db-parameter-group \
  --db-parameter-group-name my-custom-params \
  --parameters "ParameterName=max_connections,ParameterValue=200,ApplyMethod=pending-reboot" \
  --region us-east-1

# Associate the parameter group and reboot to apply
aws rds modify-db-instance \
  --db-instance-identifier my-app-database \
  --db-parameter-group-name my-custom-params \
  --apply-immediately \
  --region us-east-1

aws rds reboot-db-instance \
  --db-instance-identifier my-app-database \
  --region us-east-1

Step 8: Check for connection leaks using database-level queries

If connections are exhausting the pool, connect with an admin account and inspect active sessions:

# PostgreSQL — list all active connections by application
psql -h my-app-database.c9aksdjf.us-east-1.rds.amazonaws.com -U admin -d mydb -c \
  "SELECT usename, application_name, client_addr, state, count(*)
   FROM pg_stat_activity
   GROUP BY usename, application_name, client_addr, state
   ORDER BY count DESC;"

# MySQL — list all active connections by user and host
mysql -h my-app-database.c9aksdjf.us-east-1.rds.amazonaws.com -u admin -p -e \
  "SELECT user, host, db, command, count(*) as conn_count
   FROM information_schema.processlist
   GROUP BY user, host, db, command
   ORDER BY conn_count DESC;"

If you see hundreds of connections in “idle” state from a single application, the application has a connection leak. Fix the application’s connection pool settings (set idle_timeout, max_idle, and max_lifetime appropriately) rather than just bumping max_connections higher.

How to Run This

  1. Open the RDS Dashboard and note your DB instance identifier, endpoint, port, and security group IDs.
  2. SSH into an EC2 instance in the same VPC (or the same subnet) as the application and run the nc connectivity test to isolate whether this is a network issue or a database-level issue.
  3. Work through Steps 3-6 to verify security groups, routing, NACLs, and DNS. Fix any gaps you find.
  4. If TCP connectivity succeeds but the application still can’t connect, move to Steps 7-8 to check max_connections and active sessions.
  5. After any parameter group change, reboot the RDS instance during a maintenance window (or immediately if the application is already down).

Is This Safe?

All describe and diagnostic commands above are read-only. Adding a security group inbound rule is non-disruptive and takes effect immediately. Modifying a parameter group and rebooting the RDS instance causes a brief outage (typically 1-3 minutes for Single-AZ, near-zero for Multi-AZ with failover). Always confirm the maintenance window with your team before rebooting a production database.

Key Takeaway

When an RDS instance is “Available” but connections time out, the problem is almost never the database engine. Work the networking stack from the outside in: security groups first, then subnet routing, then NACLs, then DNS. Only after you’ve confirmed TCP connectivity should you look at max_connections or connection pool exhaustion. Keeping a runbook with your RDS security group IDs, subnet group names, and expected max_connections values saves significant time when this hits at 2 AM.


Have questions or ran into a different issue? Connect with me on LinkedIn or X.