The cloud platform team spent six months deploying Control Tower. Multi-account structure, Service Control Policies, centralized CloudTrail logging to a dedicated log archive account, IAM Identity Center federated to the corporate identity provider. They documented it. They handed it off. They called it done.

Three years later, the CISO asks a question before an SOC 2 audit: “Which of our AWS accounts have unrestricted internet egress?” Nobody can answer it in under a week.

This is the AWS governance problem in a single sentence: a Landing Zone is an infrastructure deployment. Governance is an ongoing operating model. Most enterprises confuse the two, and discover the difference during an audit, a cost crisis, or a breach.

The Illusion of Completion

Control Tower delivers real, valuable things: a repeatable account vending mechanism, a set of preventive guardrails via Service Control Policies, centralized logging, and a starting point for identity federation. None of that is trivial to stand up, and most organizations that have done it are genuinely better off for it.

The trap is the project framing. These capabilities are delivered with a kickoff, a timeline, a go-live, and a handoff. The team that built it moves to the next initiative. The governance posture, however, does not maintain itself.

The business keeps growing. Product teams spin up. New accounts get requested. New AWS services get adopted. New regions get activated. The Landing Zone itself tracks none of this. It is infrastructure, not governance. The gap between what was deployed and what is actually true in the environment widens silently — until something forces it into the open.

How Sprawl Happens — and Why It Is Structural

Account sprawl is not a failure of discipline. It is the predictable outcome of teams optimizing rationally for their own objectives.

A product team needs a new environment. The platform team’s account vending process takes two weeks and requires a ticket with fifteen fields. The engineering lead can have a proof-of-concept account stood up in twenty minutes by working around the process. They choose the path of least resistance. A new account now exists outside the organizational structure. It does not inherit the SCPs. It does not ship logs to the centralized account. It does not appear in Security Hub. It is invisible to everything the governance model depends on.

Multiply that by thirty product teams over three years.

The structural problem is that organizational incentives are misaligned. The team that owns the Landing Zone is measured on platform stability and deployment velocity. The teams consuming it are measured on product delivery. Neither team is measured on org-level governance posture. The outcome is entirely predictable — not because anyone made a bad decision, but because nobody was accountable for the aggregate result.

The Four Failure Modes

Enterprises that treat Landing Zone deployment as a one-time event eventually encounter one or more of the following. The order varies; the arrival is not.

The security audit failure. An external auditor requests evidence that all accounts meet your security baseline. You can produce Config conformance pack results for the accounts enrolled in Control Tower. You cannot enumerate accounts provisioned outside it. You cannot demonstrate that no account bypassed the guardrails. The audit finding is material, the remediation is expensive, and the timeline is compressed because a certification renewal is waiting on it.

The cost attribution breakdown. Finance runs month-end cloud cost allocation. Forty percent of spend maps to accounts with missing or inconsistent tags. Chargeback to business units is an approximation. Engineering disputes the numbers. The FinOps team spends two weeks manually reconciling every month, and the reconciliation is still not reliable enough to make decisions from.

The SCP exception compound. A business unit needed to use a service your SCPs blocked. An exception was granted — a separate OU with modified policies, or a targeted SCP exemption. Then another exception for a different team. Then another. Three years later, the effective policy posture is not the policy you designed. It is the original policy plus thirty-four exceptions that nobody has reviewed holistically. Some apply to accounts where the original use case no longer exists. The exceptions have become the policy.

The ghost account problem. You have accounts in AWS Organizations that nobody can confidently account for. The team that created them may no longer exist. The workloads may have been decommissioned — or may not. Resources may still be running, accumulating cost, holding data subject to retention obligations. You do not know, because account lifecycle management was never built. The platform has a provisioning process. It has no offboarding process.

A Governance Maturity Model

Governance is not binary. In practice, I see organizations progress through four stages — the transition from each to the next requires deliberate investment, not just time.

Stage 1 — Foundation (0–18 months). Account vending is operational. Core SCPs are applied. Centralized CloudTrail logging exists. IAM Identity Center is connected to the identity provider. The organizational structure reflects the intended environment taxonomy: production, non-production, sandbox. This is table stakes. Most teams with Control Tower reach this stage. It is necessary and insufficient.

Stage 2 — Drift (18–36 months). The structure exists, but reality has diverged from it. Accounts have been provisioned outside the vending process. SCP exceptions have accumulated without review. Tag compliance has degraded. Config conformance pack findings have accumulated with no assigned owners. Security Hub surfaces hundreds of findings across accounts; nobody is working a queue. The logs are centralized, but nobody queries them except during incidents. The gap between the documented governance posture and the operational one is widening and largely invisible.

Stage 3 — Reactive governance. Something triggers a reconciliation effort: an external audit, a compliance certification, a material security event, a CFO asking why the cloud bill is unexplainable. The organization mobilizes a dedicated effort to enumerate accounts, assess posture, remediate findings, and document current state. This costs months and significant engineering capacity. The output is accurate on the day it completes. Without structural change, it begins to drift again immediately.

Stage 4 — Continuous governance. Governance is treated as a product with an ongoing product owner, not a project with a completion date. Account lifecycle management exists end-to-end: provisioning, active, dormant, decommissioned. Drift detection runs continuously. Policy changes go through a documented review process. New AWS service adoption is evaluated against security guardrails before org-wide enablement. SCP exceptions carry expiry dates and named owners. Compliance posture is reportable on demand, not reconstructed under pressure. Most enterprises I work with are somewhere between Stage 2 and Stage 3. Reaching Stage 4 requires a deliberate organizational investment, not just more infrastructure.

The Organizational Accountability Model

The technical controls for continuous governance are well understood. The organizational model to sustain them is where most enterprises fail. Three principles separate organizations that maintain Stage 4 from those that regress.

Governance is a product with a product owner. The cloud platform team needs an ongoing mandate — not to deploy a Landing Zone once, but to own the organization’s cloud governance posture continuously. This means a backlog, defined metrics, and accountability to a senior stakeholder (typically the CISO or CTO) for posture outcomes, not just platform uptime. If the team responsible for governance is also the team running shared infrastructure, governance will always lose when capacity is constrained. The mandate has to be explicit.

Every detective control has an owner, and every finding has an SLA. A Security Hub finding that routes to a shared Slack channel and ages for three months is not a control. It is documentation of known risk with no consequence for inaction. Detective controls only produce value when findings are routed to the team accountable for the affected account, with defined remediation SLAs and escalation paths for overdue findings. This requires a maintained account ownership registry — a mapping of accounts to engineering teams and cost centers that is treated as a live system of record, not a spreadsheet updated annually.

Policy changes are treated as production changes. An SCP modification that inadvertently denies a service to two hundred accounts simultaneously is a production incident. SCP changes should go through the same review process as infrastructure changes: documented rationale, peer review, validation in a non-production OU before org-wide application, and a rollback plan. Exception requests should require a named owner, a business justification, and an expiry date — enforced, not advisory. Exceptions that were never reviewed are how the SCP exception compound failure mode develops.

The Control Architecture

AWS Governance Control Plane

A mature governance control plane has four layers that operate continuously, not in sequence.

Foundation — Org Structure and Account Registry. The management account and OU taxonomy define the authorization boundary for every account in the organization. The account registry — whether stored in a CMDB, a DynamoDB table, or a purpose-built platform — is the source of truth for account ownership, lifecycle state, and cost attribution. Without a reliable account registry, everything downstream is unreliable: findings cannot be routed, cost cannot be attributed, and audits cannot be answered.

Preventive Controls. Service Control Policies at the OU level define the maximum permission boundary for all accounts in that OU. No account-level IAM policy can exceed what the SCP permits — this is the enforcement mechanism that makes the organizational structure meaningful. Tag policies enforce tagging standards at resource creation, before drift can occur. New AWS service adoption should be evaluated against security guardrails at this layer before being unblocked org-wide. Critically, the SCP exception process belongs here: exceptions are policy changes, not accommodations.

Detective Controls. AWS Config with aggregated conformance packs provides a continuous compliance score across all accounts. Security Hub, aggregated to the security account, surfaces findings from Config, GuardDuty, Inspector, and integrated third-party tools in a single pane. CloudTrail Lake provides a queryable, long-retention audit log — unlike S3-stored CloudTrail logs, it is designed to be queried directly, which makes it usable for governance reviews and incident investigations rather than just archival. These controls run continuously. They are not event-triggered; they are always on.

Remediation and Accountability. This is the layer most organizations deploy last, fund least, and find most valuable. Findings from detective controls route to the owning team via an integration with whatever ticket management system engineering uses. SLAs are enforced programmatically — a finding that ages past its threshold escalates automatically, not through a manual review cycle. Auto-remediation handles a defined class of findings (public S3 buckets, non-compliant tags, open security group rules) where the remediation is unambiguous and safe to apply without human review. Governance OKRs — posture score trends, mean time to remediation, exception aging — provide the reporting surface for executive accountability.

The feedback loop connecting the fourth layer back to the first is what distinguishes this from a static architecture diagram. Findings in the detective layer should inform SCP and conformance pack updates. Patterns in the remediation queue should drive decisions about preventive controls. Governance posture is a system that learns from its own outputs — or it is a dashboard nobody acts on.

What Good Looks Like

These are the questions a mature governance model can answer immediately, without a multi-day investigation:

  • How many AWS accounts exist in the organization, what is each one for, and which team owns it?
  • Which accounts were not provisioned through the standard vending process?
  • What is the current Security Hub posture score across all accounts, and how has it trended over the past quarter?
  • Which accounts have open Security Hub findings older than thirty days, and who is accountable for remediation?
  • What SCPs are applied to each OU, when was each policy last reviewed, and which exceptions are currently active?
  • What is the monthly cost attributable to each product team, with confidence sufficient to base a chargeback model on?
  • Which accounts have had no activity in the past ninety days and are candidates for decommission?

If any of these questions require more than a few hours to answer, the organization is operating in Stage 2. The answer is not more tooling — it is ownership and process applied to the tooling that already exists.

The Bottom Line

AWS made it easier to establish a multi-account structure than it has ever been. That is not the hard part anymore. The hard part is treating the resulting organizational control plane as a living system — one that requires ownership, process, and ongoing investment to remain accurate and enforceable.

The difference between a Landing Zone and a governance strategy is the same as the difference between installing a security camera and having someone watch the feed.

One is infrastructure. The other is a commitment.


Questions on AWS governance or building this at scale? Connect with me on LinkedIn or X.