Fix AWS Lambda Timeout, Permission Denied, and Cold Start Errors

I was investigating why an event-driven pipeline was silently dropping records. The Lambda function looked fine — the code was correct, the tests passed, and the trigger was configured. But CloudWatch showed a wall of Task timed out after 3.00 seconds errors mixed with occasional AccessDeniedException responses. The function was running but failing in ways that weren’t immediately obvious.

Here’s how to diagnose and fix the most common Lambda invocation failures.

The Problem

Lambda functions fail to execute successfully with one of these errors:

Error	What It Means
`Task timed out after X.XX seconds`	Function exceeded its configured timeout
`AccessDeniedException: User is not authorized to perform lambda:InvokeFunction`	Caller lacks permission to invoke the function
`ResourceNotFoundException: Function not found`	Wrong function name, region, or account
`RequestEntityTooLargeException: Request payload exceeded 6 MB`	Synchronous invocation payload exceeds the limit
`TooManyRequestsException: Rate Exceeded`	Concurrency limit hit — function is being throttled

The function may also succeed intermittently, with some invocations completing in 200ms and others timing out — a classic sign of VPC cold start issues.

Why Does This Happen?

Timeout set too low for the workload: The default Lambda timeout is 3 seconds. If your function makes API calls, database connections, or processes files, 3 seconds is almost never enough. Functions that work in testing fail under real-world latency.
VPC-attached Lambda with cold starts: Lambda functions in a VPC need to attach an ENI (Elastic Network Interface) on cold start. Before AWS introduced Hyperplane, this added 10-30 seconds. Even now, the first invocation after idle can be significantly slower.
Missing resource-based policy on the function: When another AWS service (S3, SNS, EventBridge) invokes your Lambda, the function needs a resource-based policy granting that service permission. Without it, the trigger silently fails.
Execution role missing downstream permissions: The function’s execution role controls what the function can do (write to DynamoDB, read from S3, etc.). A common mistake is configuring the trigger correctly but forgetting to give the execution role access to downstream resources.
Reserved concurrency set to zero: If someone set the function’s reserved concurrency to 0 (possibly to “disable” it temporarily), all invocations get throttled immediately.

The Fix

Step 1: Check the Function Configuration

Start by reviewing the current timeout, memory, and VPC settings:

aws lambda get-function-configuration \
  --function-name my-function \
  --query "{Timeout:Timeout,MemorySize:MemorySize,VpcConfig:VpcConfig,Runtime:Runtime,State:State}" \
  --output json

If the timeout is 3 seconds and the function does any network I/O, that’s likely your problem.

Step 2: Fix Timeout Issues

Update the timeout to something realistic for your workload. For functions calling APIs or databases, 30 seconds is a reasonable starting point:

aws lambda update-function-configuration \
  --function-name my-function \
  --timeout 30

Also check memory. Lambda allocates CPU proportionally to memory — a function with 128 MB gets very little CPU. Bumping to 512 MB or 1024 MB often fixes “slow” functions that are actually CPU-starved:

aws lambda update-function-configuration \
  --function-name my-function \
  --memory-size 512

Step 3: Check and Fix Invoke Permissions

If another service is triggering the function, verify the resource-based policy:

aws lambda get-policy \
  --function-name my-function \
  --query "Policy" \
  --output text | python3 -m json.tool

If this returns ResourceNotFoundException, there’s no policy at all. Add one for the triggering service:

aws lambda add-permission \
  --function-name my-function \
  --statement-id AllowS3Invoke \
  --action lambda:InvokeFunction \
  --principal s3.amazonaws.com \
  --source-arn arn:aws:s3:::my-bucket \
  --source-account 123456789012

Step 4: Verify the Execution Role

Check what the function’s execution role can actually do:

aws lambda get-function-configuration \
  --function-name my-function \
  --query "Role" \
  --output text

aws iam list-attached-role-policies \
  --role-name my-function-role \
  --output table

aws iam list-role-policies \
  --role-name my-function-role \
  --output table

Look for the downstream permissions your function needs. If it writes to DynamoDB, it needs dynamodb:PutItem. If it reads from S3, it needs s3:GetObject. Missing permissions cause AccessDeniedException in the function logs, not at invocation time.

Step 5: Fix Throttling

Check if reserved concurrency is limiting your function:

aws lambda get-function-concurrency \
  --function-name my-function

If this returns ReservedConcurrentExecutions: 0, the function is effectively disabled. Remove the reservation to use the account’s unreserved pool:

aws lambda delete-function-concurrency \
  --function-name my-function

Or set a reasonable reserve:

aws lambda put-function-concurrency \
  --function-name my-function \
  --reserved-concurrent-executions 100

Step 6: Address VPC Cold Starts

If your function is in a VPC and cold starts are the issue, enable provisioned concurrency to keep warm instances ready:

aws lambda put-provisioned-concurrency-config \
  --function-name my-function \
  --qualifier my-alias \
  --provisioned-concurrent-executions 5

Alternatively, if the function doesn’t actually need VPC access (it was added “just in case”), remove the VPC configuration entirely:

aws lambda update-function-configuration \
  --function-name my-function \
  --vpc-config SubnetIds=[],SecurityGroupIds=[]

This eliminates cold start ENI attachment completely.

Step 7: Test the Fix

Invoke the function directly and verify:

aws lambda invoke \
  --function-name my-function \
  --payload '{"test": true}' \
  --cli-binary-format raw-in-base64-out \
  response.json && cat response.json

Check the execution time in the response headers. If it completes well within the timeout and returns the expected output, the fix is working.

Is This Safe?

Yes. Configuration changes like timeout and memory take effect on the next invocation and don’t interrupt running executions. Adding resource-based policies is additive. The only change to be careful with is removing VPC configuration — make sure the function doesn’t actually need access to VPC resources before removing it.

Key Takeaway

Lambda failures are frustrating because the default configuration is almost never right for production workloads. A 3-second timeout with 128 MB of memory in a VPC is a recipe for intermittent failures. The fix is usually straightforward — increase the timeout, bump memory, and verify both the resource-based policy and the execution role. Don’t confuse the two: the resource-based policy controls who can invoke the function, while the execution role controls what the function can do once it’s running.

Have questions or ran into a different Lambda issue? Connect with me on LinkedIn or X.