The blank canvas problem

Starting a new AWS account is deceptively liberating. You’re the admin. Every IAM permission is available. There are no guardrails, no Service Control Policies, no least privilege constraints. It feels productive.

It’s also the most dangerous moment in your infrastructure’s life.

Every shortcut you take in those first days — the overly broad IAM policy, the hardcoded credential, the “we’ll lock this down later” security group — becomes load-bearing infrastructure within weeks. Just like technical debt, security debt compounds. By the time you notice, you’re not fixing a policy. You’re untangling a web of implicit trust relationships that your entire application depends on.

We made a deliberate choice: build it right from the start. Not because we had unlimited time — we didn’t. But we’d spent years helping organizations retrofit security onto existing infrastructure, and we knew how painful that gets. We decided to treat our own infrastructure the way we’d advise our customers to treat theirs.

We used the AWS Well-Architected Framework’s Serverless Applications Lens as a starting point — it’s a solid resource for structured security, reliability, and cost best practices.

Here’s what we learned.

Start with the structure

Before we wrote a single line of application code for AWS, we set up an organization structure. AWS Organizations lets you group accounts into organizational units (OUs) and apply policies that cascade down to every account in the group. Think of it as governance-as-code for your entire cloud footprint.

We use a management account that does nothing except govern. It doesn’t run workloads. Its only job is to define and enforce the rules that every other account must follow. Separate workload accounts handle dev and production, grouped under a Workloads OU that inherits all governance policies automatically.

This separation matters more than it might seem. If your management account also runs workloads, a vulnerability in your application could compromise your governance layer. Keeping them separate means an incident in one environment can’t escalate to control over the entire organization. Blast radius containment isn’t a buzzword — it’s an architecture decision.

For a small team, multiple AWS accounts feel like overkill. They’re not. The cost is near zero (empty accounts don’t generate charges), and the security benefit is substantial.

Service Control Policies: guardrails that prevent, not detect

Most security tooling is reactive. It alerts you after something has happened — a misconfigured bucket, an unexpected API call, a policy violation. That’s valuable, but it means you’re always playing catch-up. We wanted something different: preventive controls that make dangerous actions impossible in the first place.

Service Control Policies (SCPs) in AWS Organizations let you define hard limits on what any principal in an account can do, regardless of their IAM permissions. Even an account administrator can’t override an SCP. They’re the closest thing AWS has to a physical lock on the door.

We built several layers of preventive controls, each addressing a different category of risk. The first layer is a security guardrails SCP that prevents actions that should never happen in a well-managed account: disabling CloudTrail, turning off GuardDuty or SecurityHub, creating IAM users with long-lived credentials, or removing the account from the organization. These are categorically wrong in our environment, so we made them impossible.

Restricting the blast radius

We restrict operations to specific geographic regions. Our infrastructure runs in a single primary region plus a global edge region for CloudFront distributions. Everything else is off-limits. If an attacker compromises credentials, they can’t spin up crypto miners in a region we’re not monitoring. If an engineer makes a mistake, they can’t accidentally deploy resources to the wrong continent.

One nuance: some AI inference services route requests across regions for load balancing. We had to exempt those from region restrictions while keeping everything else locked down. The lesson: understand how your services actually operate before applying blanket restrictions.

Allowing only what you need

This is the control we’re most proud of, and the one that raised the most eyebrows internally: a service allowlist SCP. Instead of trying to block dangerous services (a game you can never win because new services launch constantly), we flipped the model. We defined the specific AWS services our application actually uses, and blocked everything else.

The key insight is the NotAction pattern — it denies everything except your allowlist:

{
  "Effect": "Deny",
  "NotAction": [
    "lambda:*",
    "dynamodb:*",
    "s3:*",
    "states:*",
    "bedrock:*",
    "apigateway:*"
  ],
  "Resource": "*"
}

We don’t need EC2 instances, RDS databases, EKS clusters, or SageMaker notebooks. So we blocked them all. An attacker who compromises credentials can’t spin up compute, provision databases, or launch containers. The services that enable most cloud-based attacks simply don’t exist in our environment. And as a bonus: zero risk of accidental spend on services we don’t use.

Encrypting everything, by policy

Rather than trusting that every developer remembers to enable encryption, we enforce it at the organization level. Our S3 security SCP requires encryption headers on every object upload, TLS for all operations, blocks changes to public access settings, and blocks unauthenticated Lambda function URLs.

These aren’t guidelines. They’re enforced constraints. Code that tries to upload an unencrypted object gets a hard denial, not a warning. It means we never have to wonder whether data at rest is encrypted. It always is.

Controlling AI model access

We run multiple AI models through Amazon Bedrock, and model costs can vary by orders of magnitude. An SCP restricts which specific models can be invoked in our accounts. This prevents accidental use of expensive models during development and limits the blast radius if credentials are compromised — an attacker can’t rack up charges by invoking the most expensive model available.

Resource Control Policies: the data perimeter

SCPs control what actions can be taken. Resource Control Policies (RCPs) control who can access resources. They’re complementary, and RCPs are one of the most underused tools in the AWS security arsenal.

We use RCPs to create an organization-wide data perimeter. The policy is surprisingly concise:

{
  "Effect": "Deny",
  "Principal": "*",
  "Action": "s3:*",
  "Resource": "*",
  "Condition": {
    "StringNotEqualsIfExists": {
      "aws:PrincipalOrgID": "o-yourorgid"
    }
  }
}

This matters because IAM policies alone don’t prevent exfiltration to external accounts. An overly broad bucket policy could allow access from any AWS account. An RCP prevents that at the organizational level — even if someone misconfigures a resource policy, the data can’t leave the organization.

We applied this to S3, KMS, SQS, Secrets Manager, and STS, with targeted exemptions for CloudFront and our CI/CD OIDC federation. Everything else stays inside the perimeter.

Zero standing credentials: OIDC federation and SSO

This is the hill we chose to stand on: no long-lived credentials anywhere in our infrastructure.

Our SCP doesn’t just discourage IAM users — it makes creating them impossible. No one can create an IAM user, a login profile, or an access key in any workload account. All human access goes through AWS SSO (Identity Center), which provides short-lived session credentials that expire automatically.

For CI/CD, we use GitHub OIDC federation. No stored secrets. GitHub’s identity provider issues a token, AWS verifies it against a trust policy, and the workflow assumes a role with temporary credentials. The critical detail is locking the trust policy to a specific repository and branch:

{
  "Effect": "Allow",
  "Principal": {
    "Federated": "arn:aws:iam::oidc-provider/token.actions.githubusercontent.com"
  },
  "Action": "sts:AssumeRoleWithWebIdentity",
  "Condition": {
    "StringEquals": {
      "token.actions.githubusercontent.com:sub": "repo:your-org/your-repo:ref:refs/heads/main"
    }
  }
}

Without that sub condition, any GitHub repository could assume your deployment role. We’ve seen this misconfiguration in the wild more than once.

We maintain separate CI and CD roles in each environment. The CI role is read-only. The CD role can deploy, but operates under a permissions boundary — a hard ceiling that blocks it from creating IAM roles or modifying policies, regardless of what the role’s own policy allows. Even if the deployment pipeline is compromised, it can’t grant itself admin access.

The result: zero stored AWS credentials in our GitHub repository, zero long-lived access keys in any account, zero permanent admin sessions. Every credential is temporary, scoped, and auditable.

Least privilege, function by function

Broad IAM roles are comfortable. A single role with wide permissions means you rarely hit access denied errors during development. It also means a compromised function can access everything.

Every Lambda function in our pipeline has its own IAM role, scoped to specific resources — not “allow S3 access” but “allow read access to this one bucket.” The difference between a skeleton key and a room key.

This granularity takes more effort upfront. Every new feature that touches a new resource requires a policy update. But it means we can reason about the blast radius of any individual function: if it’s compromised, what can it access? The answer is always a small, well-defined set of resources.

For deployment roles, we add a permissions boundary — a ceiling on what the role can ever do, regardless of what policies are attached. This prevents privilege escalation even if the role’s policy is overly broad:

PermissionsBoundary:
  PolicyName: DeploymentBoundary
  PolicyDocument:
    Statement:
      - Effect: Deny
        Action:
          - iam:CreateUser
          - iam:CreateRole
          - iam:PutRolePolicy
          - iam:AttachRolePolicy
        Resource: "*"

Even if the deployment pipeline is compromised, it can’t create new roles or escalate its own permissions. The boundary makes that impossible at the IAM level, not just by convention.

We also deployed IAM Access Analyzer across the organization. It continuously monitors IAM policies and flags permissions that are granted but never used — a strong signal that a role is overprivileged. Early on it flagged DynamoDB DeleteTable permissions on a Lambda function that only needed PutItem and GetItem. The kind of permission creep that’s invisible without automated analysis. It’s the feedback loop that keeps policies tight over time.

Tag everything, automatically

We enforce a tag policy across the organization requiring four tags on all resources: Service, Environment, Component, and ManagedBy. The Environment tag is enforced on nine different resource types to prevent drift between what a resource is labeled and where it actually runs.

Tags might seem like an operational convenience rather than a security control. They’re both. Consistent tagging enables:

Cost allocation — spot anomalous spend by service or environment
Compliance reporting — prove which resources belong to which environment
Operational clarity — help incident responders understand what a resource does

At 2 AM during an incident, seeing Environment: production versus Environment: development changes how you respond.

Eating our own cooking

Here’s where this story comes full circle.

After building all of this infrastructure, we did something obvious and terrifying: we ran ThreatKrew against our own architecture.

The assessment found things we’d missed. Implicit trust assumptions between components. Gaps in encryption for internal data flows. Least-privilege roles with more access than strictly necessary — not dangerously so, but more than the principle demands.

Some findings confirmed decisions we’d already made. Others sent us back to the drawing board. Humbling because we’d built it ourselves and still had blind spots. Validating because the tool we’d built actually found them.

This is the core of why we built ThreatKrew: security analysis shouldn’t require hiring an external firm for a six-week engagement. It should be something every team can do, regularly, as part of how they build. If we — a team that thinks about security every day — had blind spots in our own infrastructure, every team does.

Lessons for your team

You don’t need to implement everything we’ve described here on day one. But here’s what we’d suggest to any team building on AWS.

Start with organization structure. Even if you only have one workload account today, set up an organization and a management account. The overhead is negligible, and it gives you the ability to enforce policies as you grow. Retrofitting an organization onto existing accounts is painful.

Eliminate long-lived credentials immediately. This is the highest-impact change with the lowest effort. Set up SSO for human access and OIDC federation for CI/CD. Delete your IAM users and access keys. Every long-lived credential is a liability that doesn’t need to exist.

Block what you don’t use. A service allowlist SCP is more powerful than trying to secure services you shouldn’t be running in the first place. If you’re serverless, block EC2. If you don’t use containers, block ECS and EKS. Reduce the surface area.

Enforce encryption by policy, not convention. Conventions get forgotten. Policies don’t. If every S3 object must be encrypted, make it a hard requirement at the organization level.

Scope IAM roles to resources, not services. “Allow S3 access” and “Allow access to this specific S3 bucket” are different statements with very different blast radii. The extra specificity is worth the effort.

Threat model your own infrastructure. Whatever you build, examine it with fresh eyes — whether that’s an internal review, a structured threat model, or a conversation where someone plays devil’s advocate. The blind spots are always there. The question is whether you find them before someone else does.

The compound return

Every security decision we’ve described here was made before we had our first customer. Some of them felt premature at the time. An SCP that blocks EC2 when you’re not using EC2 anyway? A tag policy when there are only two people on the team?

But these decisions compound. Each one makes the next decision easier. Each constraint eliminates an entire category of potential problems. Each policy that’s enforced automatically is one fewer thing to remember, review, or worry about.

Six months in, our infrastructure has never had a misconfigured public bucket, an overprivileged role that was exploited, a long-lived credential that was leaked, or a resource deployed to the wrong region. Not because we’re exceptionally careful — but because the architecture makes those mistakes impossible.

That’s the real lesson. Security isn’t about vigilance. It’s about architecture. Build the guardrails, and the guardrails do the work.

If this resonated, you might also enjoy why it’s never too early for security, what we learned building reliable AI, and our overview of the NIST CSF maturity framework. See how ThreatKrew works, or for a deeper look at how we protect our platform and your data, visit our security page.

Want to find the blind spots in your own infrastructure? Try our threat modeling tool or join the Founders Program and run a threat model on your architecture in minutes.

AWS Least Privilege in Practice: SCPs, RCPs, and Zero Standing Credentials