Troubleshoot Control Tower Landing Zone Repair Failures

I’ve debugged this scenario more times than I can count: Control Tower detects drift in your landing zone, you click “Repair,” and the repair either fails silently or completes but drift is still showing. It’s maddening because the repair should fix it, but it doesn’t. The issue is almost always deeper than just clicking “Repair” — you need to understand what caused the drift in the first place. In this post, I’ll show you how to diagnose and fix repair failures.

The Problem

When Control Tower detects drift and you attempt to repair it, the repair operation deploys CloudFormation StackSets to restore baseline resources. If the repair fails or completes but drift persists, it means underlying issues are blocking the repair:

Error Type	Description
Repair timeout	Stack operation took too long, likely due to manual changes
Permission denied	Execution role lacks necessary permissions
Role missing	IAM roles used by Control Tower were deleted or modified
Stack failed	CloudFormation template failed due to resource state
Security Hub disabled	Security Hub or GuardDuty manually disabled in account

Why Does This Happen?

Manual changes to CloudTrail, Config, or SNS — If you manually deleted or modified these Control Tower-managed resources, the repair attempts to restore them but may encounter conflicts if resources are in an inconsistent state.
Stack instance in OUTDATED or FAILED state — When a StackSet operation fails, stack instances can be left in OUTDATED state. Control Tower cannot repair until these are resolved.
Missing execution role in the account — Control Tower uses the AWSControlTowerExecution role to deploy and update stacks. If this role was deleted, repair fails.
Security Hub or GuardDuty manually disabled — If you disabled these services in a member account, Control Tower’s repair cannot re-enable them if the services were removed from the stack.
IAM role used by Control Tower was modified — The AWSControlTowerAdmin role in the management account or AWSControlTowerExecution in member accounts must have specific permissions. Modifying these roles breaks repair.

The Fix

Diagnose the specific issue causing repair failure and fix it systematically.

Step 1: Identify What’s Drifted

In the Control Tower console:

Control Tower → Landing Zone → Check for drift

Wait for the scan to complete. Then go to:

Control Tower → Organization → [Select drifted account/OU] → View drift details

Note exactly which resources are drifted (e.g., CloudTrail trail deleted, Config recorder modified, etc.).

Step 2: Manually Restore the Drifted Resource

For each drifted resource, manually restore it to the expected state. For example, if CloudTrail was deleted:

aws cloudtrail describe-trails \
  --region us-east-1 \
  --output text

If the expected trail is missing, you may need to contact AWS Support or consult the Control Tower baseline documentation for the exact trail name and configuration.

Step 3: Check StackSet Operations

View StackSet operations for the baseline stacks:

aws cloudformation list-stack-set-operations \
  --stack-set-name AWSControlTowerBP-BASELINE-CLOUDTRAIL \
  --output text

Look for operations in “FAILED” or “STOPPED_ON_FAILURE” state:

aws cloudformation list-stack-set-operation-results \
  --stack-set-name AWSControlTowerBP-BASELINE-CLOUDTRAIL \
  --operation-id [operation-id-from-above] \
  --output table

If an operation failed, note which account and which resource caused the failure.

Step 4: Verify IAM Roles

Check that the required execution roles exist and have proper permissions:

aws iam get-role \
  --role-name AWSControlTowerAdmin \
  --output text

aws iam get-role \
  --role-name AWSControlTowerExecution \
  --output text

If either role is missing, you’ll need to contact AWS Support to restore it.

Step 5: Re-Register the OU

After fixing underlying issues (restored drifted resources, verified roles), re-register the OU to trigger a fresh baseline deployment:

Control Tower → Organization → [Select OU] → Register OU

This is more thorough than “Repair” and will re-deploy all baseline resources.

How to Run This

Open the Control Tower console.
Go to Landing Zone → Check for drift and wait for the scan.
Select a drifted account or OU and view drift details.
Note which resources are drifted (CloudTrail, Config, SNS, etc.).
Run the CloudTrail and StackSet commands above to identify what’s missing.
Manually restore drifted resources (e.g., re-create a deleted CloudTrail).
Run the IAM commands to verify roles exist and are intact.
Go to Control Tower → Organization → [Select OU] → Register OU.
Wait for the re-registration to complete (usually 10–20 minutes).
Check for drift again — it should be clear now.

Is This Safe?

Yes. Re-registering an OU is safe and repeatable. Control Tower will re-deploy baseline resources idempotently. Manually restoring drifted resources is safe as long as you’re restoring them to Control Tower’s expected state (which the drift details show you).

Key Takeaway

Landing Zone repair failures are usually caused by underlying manual changes to Control Tower-managed resources. Identify what drifted, manually restore it to the expected state, and then re-register the OU. The repair will succeed on the next attempt.

Have questions or ran into a different Control Tower issue? Connect with me on LinkedIn or X.