Fix DynamoDB ProvisionedThroughputExceededException and Hot Partition Errors

A team I was helping had a DynamoDB table that was throttling requests for exactly four minutes every hour — like clockwork. Total consumed capacity was well under the provisioned limit, so the CloudWatch dashboard looked fine. But a single hot partition was absorbing 80% of the write traffic because their partition key was the current hour. Every new hour, requests stampeded one partition and got rejected until DynamoDB adaptively rebalanced. The fix wasn’t more capacity — it was a better key design.

Here’s how to diagnose throttling and the access-pattern issues behind it.

The Problem

DynamoDB requests fail with one of these errors despite the table looking “healthy”:

Error	What It Means
`ProvisionedThroughputExceededException`	Request exceeded the table’s provisioned read/write capacity
`ThrottlingException`	On-demand table hit its peak traffic ramp limit
`RequestLimitExceeded`	Account-level API rate limit exceeded (usually control plane)
`TransactionConflictException`	Transactional write conflicted with another in-flight transaction
`Internal server error` on BatchGetItem	Individual items within the batch were throttled

The table’s overall consumed capacity may be far below provisioned — a symptom that almost always points to a single hot partition rather than overall capacity starvation.

Why Does This Happen?

Hot partition due to poor key design: DynamoDB splits a table into partitions and each partition gets a fraction of total capacity (up to 3000 RCU / 1000 WCU). If one partition key value receives most of the traffic (user_id of a celebrity, date of today, status of PENDING), that partition exhausts its share while the rest of the table sits idle.
On-demand table scaling limits: On-demand tables auto-scale, but only up to 2x the previous peak within a 30-minute window. A sudden 10x traffic spike will throttle until DynamoDB has stepped up capacity multiple times.
Default retry behavior masks throttling: The AWS SDK retries throttled requests with exponential backoff, which hides throttling from application logs. By the time you see user-facing errors, the actual cause happened seconds earlier.
Global Secondary Index (GSI) separately throttled: GSIs have their own capacity and partitions. A write-heavy table with an undersized GSI will throttle writes even if the base table has plenty of headroom.
Burst capacity depleted: DynamoDB gives you up to 5 minutes of burst capacity, which hides capacity problems during brief spikes. Once the burst is consumed, sustained traffic at the provisioned limit starts throttling immediately.

The Fix

Step 1: Identify Which Operation Is Throttling

Check the table’s throttle metrics to distinguish reads from writes:

aws cloudwatch get-metric-statistics \
  --namespace AWS/DynamoDB \
  --metric-name ReadThrottleEvents \
  --dimensions Name=TableName,Value=my-table \
  --start-time 2026-04-19T00:00:00Z \
  --end-time 2026-04-20T23:59:59Z \
  --period 300 \
  --statistics Sum \
  --output table

Repeat for WriteThrottleEvents. Also check GSI throttling:

aws cloudwatch get-metric-statistics \
  --namespace AWS/DynamoDB \
  --metric-name WriteThrottleEvents \
  --dimensions Name=TableName,Value=my-table Name=GlobalSecondaryIndexName,Value=my-gsi \
  --start-time 2026-04-19T00:00:00Z \
  --end-time 2026-04-20T23:59:59Z \
  --period 300 \
  --statistics Sum \
  --output table

If the GSI is throttling but the base table is not, the GSI capacity is undersized.

Step 2: Find the Hot Partition

Enable Contributor Insights on the table to identify which partition keys are receiving the most traffic:

aws dynamodb update-contributor-insights \
  --table-name my-table \
  --contributor-insights-action ENABLE

Wait a few minutes, then query the most-accessed partitions:

aws cloudwatch list-metrics \
  --namespace AWS/DynamoDB/ContributorInsights \
  --dimensions Name=TableName,Value=my-table \
  --output json

Look at the top keys. If one key appears disproportionately, that’s your hot partition.

Step 3: Fix Hot Partitions Through Key Design

The permanent fix is redistributing traffic across more partition key values. If your key is a date like 2026-04-20, add a sharding suffix:

Before:  pk = "2026-04-20"
After:   pk = "2026-04-20#7"  (where 7 is a random integer 0-9)

When reading, query all shards in parallel. This spreads one hot partition across 10. Another pattern is write-sharding with a computed suffix based on hash(item_id) % N.

If schema changes aren’t feasible short-term, switch to on-demand mode, which has better handling of uneven access patterns:

aws dynamodb update-table \
  --table-name my-table \
  --billing-mode PAY_PER_REQUEST

Note: you can only switch billing modes once every 24 hours per table.

Step 4: Increase Provisioned Capacity (Short-Term Relief)

If throttling is broad-based and not partition-specific, increase capacity:

aws dynamodb update-table \
  --table-name my-table \
  --provisioned-throughput ReadCapacityUnits=500,WriteCapacityUnits=200

For GSIs, the capacity update is nested:

aws dynamodb update-table \
  --table-name my-table \
  --global-secondary-index-updates '[{
    "Update": {
      "IndexName": "my-gsi",
      "ProvisionedThroughput": {
        "ReadCapacityUnits": 300,
        "WriteCapacityUnits": 150
      }
    }
  }]'

Step 5: Enable Auto Scaling

For provisioned tables, auto scaling prevents manual intervention during traffic swings:

aws application-autoscaling register-scalable-target \
  --service-namespace dynamodb \
  --resource-id "table/my-table" \
  --scalable-dimension "dynamodb:table:WriteCapacityUnits" \
  --min-capacity 50 \
  --max-capacity 1000

aws application-autoscaling put-scaling-policy \
  --service-namespace dynamodb \
  --resource-id "table/my-table" \
  --scalable-dimension "dynamodb:table:WriteCapacityUnits" \
  --policy-name my-table-write-scaling \
  --policy-type TargetTrackingScaling \
  --target-tracking-scaling-policy-configuration '{
    "TargetValue": 70.0,
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "DynamoDBWriteCapacityUtilization"
    },
    "ScaleInCooldown": 60,
    "ScaleOutCooldown": 60
  }'

Step 6: Add DAX for Read-Heavy Workloads

If reads are the bottleneck and most are cacheable, add DynamoDB Accelerator (DAX) in front:

aws dax create-cluster \
  --cluster-name my-dax-cluster \
  --node-type dax.r4.large \
  --replication-factor 3 \
  --iam-role-arn arn:aws:iam::123456789012:role/DAXServiceRole \
  --subnet-group-name my-dax-subnet-group

DAX provides microsecond latency and absorbs hot-key reads completely.

Step 7: Verify the Fix

Watch throttling drop after your changes:

aws cloudwatch get-metric-statistics \
  --namespace AWS/DynamoDB \
  --metric-name WriteThrottleEvents \
  --dimensions Name=TableName,Value=my-table \
  --start-time 2026-04-20T00:00:00Z \
  --end-time 2026-04-20T23:59:59Z \
  --period 60 \
  --statistics Sum \
  --output table

Sustained zero values across multiple periods confirm the fix.

Is This Safe?

Mostly yes. Enabling Contributor Insights, auto scaling, and capacity increases are all non-disruptive. Switching from provisioned to on-demand is safe but irreversible for 24 hours. Adding DAX does not modify the underlying table, but your application code needs to point at the DAX endpoint to see the benefit. The one change to be careful with is restructuring partition keys — that requires application-level changes and a migration strategy for existing data.

Key Takeaway

Throughput errors rarely mean you need more capacity. They mean your access pattern is uneven, and one partition is shouldering most of the load. Before raising capacity or switching to on-demand, use Contributor Insights to find out which keys are hot — the answer is almost always a key that encodes time, status, or a small enumerated set. Fix the data model and throttling usually disappears without spending a cent more.

Have questions or ran into a different DynamoDB issue? Connect with me on LinkedIn or X.