The confidence paradox

Here’s something counterintuitive. We recently ran a threat model against a multi-tenant SaaS platform — a serverless architecture on AWS with Lambda functions, DynamoDB, S3, Cognito authentication, and an LLM-powered analysis pipeline. Standard modern stack.

The initial report returned a 94% confidence score and a CRITICAL risk rating. Nine threats identified, two critical, seven high-severity. Sounds thorough.

Then we ran our Clarify process — targeted questions about the architecture that the team answered over the course of an afternoon. We re-ran the report with their answers incorporated.

The new confidence score: 86%. The risk rating: HIGH, down from CRITICAL. Same architecture, same tool, dramatically different results.

That sounds like regression. It’s actually the opposite.

The baseline was 94% confident in assumptions it couldn’t verify. The clarified report was 86% confident in facts it knew to be true. Lower confidence, higher trust. That’s the difference between guessing and knowing.

What the baseline knows

When you run a threat model against documentation alone, you get the best analysis the documentation can support. The problem is that architecture documents are written for humans who already understand the system. They contain implicit assumptions that are obvious to the team but invisible to anyone — or anything — reading from the outside.

The tool reads the architecture, builds a model, and generates threats. But it has to make assumptions about things the documentation doesn’t say. The baseline flagged this as critical:

Lambda Execution Role Over-Privilege Leading to Cross-Tenant Data Access — Severity: CRITICAL

Pipeline Lambda functions likely share overly permissive IAM execution roles. An attacker compromising any single function can assume the shared role and access DynamoDB tables across all tenants if encryption context validation is not enforced.

Notice the word “likely.” The tool couldn’t determine from the documentation whether each function had its own least-privilege role. So it assumed the worst — which is the right default — and flagged a critical finding.

The assumptions behind this threat were marked NEEDS_CLARIFICATION:

Lambda functions execute with least-privilege IAM roles restricted to only required services and resources — NEEDS CLARIFICATION
KMS encryption at rest is enabled for all DynamoDB tables with tenant-specific encryption context — INVALID

That second assumption matters. The tool inferred that tenant-specific encryption context should be present, checked the documentation, and couldn’t find evidence for it. So it marked it invalid — the threat model was built on the worst-case interpretation.

The baseline was doing its job. But it was working with incomplete information.

The questions that matter

This is where Clarify changes everything. Instead of accepting incomplete information, it generates targeted questions — the kind a security consultant would ask in a real engagement. The report generated 24 questions. Here are a few that mattered most:

Can you provide the IAM role policies for each Lambda function? Are they scoped to specific DynamoDB tables, S3 buckets, and API endpoints?

Are all DynamoDB tables encrypted at rest using KMS with customer-managed keys, and is encryption context enforced per tenant in all KMS operations?

Is MFA mandatory for all user accounts, or is it optional? What enforcement mechanism is in place?

Are Lambda functions deployed in VPCs with private subnets? Are VPC endpoints configured for DynamoDB, S3, and Secrets Manager access?

These aren’t generic security questionnaire items. Each one targets a specific ambiguous assumption from the baseline. The encryption context question exists because the tool found a gap between what the architecture implies and what the documentation confirms.

The team answered each question in a few sentences. Total time: about an hour. The answers were specific and sometimes surprising.

Lambda IAM roles? Scoped per function with specific resource ARNs — not wildcards. The handler had DynamoDB and S3 access, the API proxy had Secrets Manager access, the analysis functions had only the artifacts bucket and one table. The admin function had no access to report data at all.

VPC deployment? No — the functions weren’t in a VPC. They accessed AWS services via public endpoints over TLS, with data exfiltration mitigated by IAM role scoping and Service Control Policies restricting which services were available.

Encryption context? Only enforced for enterprise-tier tenants using customer-managed KMS keys. Standard-tier tenants used AWS-managed keys without tenant-specific encryption context.

MFA? Optional for regular users, mandatory for admin access, enforced via identity provider verification.

Every answer either confirmed a worst-case assumption (and kept the threat), or corrected it with specific evidence (and refined the threat).

What changes after you answer

Same architecture, same tool, but now grounded in verified facts instead of inferred assumptions.

That critical Lambda over-privilege threat? It changed shape entirely:

Lambda Execution Role Privilege Escalation via Overly Permissive IAM Policies — Severity: HIGH

While least-privilege scoping is confirmed with resource-level ARNs, a compromised Lambda via code injection or dependency vulnerability could still access secrets, decrypt data, or modify records across tenants. Residual risk exists if resource policies on downstream services do not explicitly restrict access by role ARN.

The finding went from “likely shared overly permissive roles” (a guess) to “confirmed least-privilege with specific residual risk in downstream resource policies” (a finding you can act on). The severity dropped from CRITICAL to HIGH — not because the risk disappeared, but because the team had already implemented the primary mitigation. What remained was a specific, testable gap.

The encryption threat transformed even more dramatically. The baseline flagged a generic critical finding: “Encryption at rest not explicitly enforced on DynamoDB tables.” After clarification:

Unauthorized Tenant Data Disclosure via Weak Encryption Context Validation — Severity: HIGH

Standard-tier tenants use AWS-managed KMS keys without tenant-specific encryption context validation. A compromised Lambda role could decrypt reports from other tenants by calling KMS Decrypt without proper encryption context enforcement. Enterprise BYOK tenants have this control, but standard tiers — the majority — lack it.

A precise architectural gap: encryption exists, but tenant isolation within that encryption has a specific weakness for a specific tier of customers. That’s something a team can fix in a sprint, with a clear scope and a clear test for success.

Across the full report, the risk level dropped from CRITICAL to HIGH. The number of threats stayed at nine — but the nature of those threats changed from broad assumptions to specific, grounded findings. The executive summary went from “2 critical and 7 high-severity security gaps requiring immediate remediation” to “5 significant security gaps requiring immediate attention, alongside adequate controls for authentication and encryption at rest.”

That “alongside adequate controls” is new. The baseline couldn’t give credit for controls it couldn’t verify. The clarified report could.

The documentation gap nobody talks about

There’s a side effect of Clarify we didn’t anticipate: it systematically reveals what’s missing from your architecture documentation.

Every clarification question represents something the documentation should say but doesn’t. The process generates documentation improvement suggestions — in this case, 24 of them — each pointing to a specific gap with the team’s own words filling it in.

Document in your architecture: Do the authentication trigger functions validate tenant membership and enforce tenant isolation in JWT claims?

Your team confirmed: Yes. The pre-signup function validates email domain against configured SSO records and blocks public email providers. The post-authentication function enriches JWT with tenant ID claims. Tenant isolation is enforced at the API layer via org and user identity from JWT claims.

Document in your architecture: Are Lambda functions deployed in VPCs with private subnets?

Your team confirmed: Lambda functions are NOT in a VPC. They access AWS services via public endpoints over TLS. Data exfiltration is mitigated by IAM role scoping and SCPs restricting service access.

Document in your architecture: What TLS versions are enabled on CloudFront? Is HTTP redirected to HTTPS?

Your team confirmed: CloudFront enforces TLS 1.2 minimum. HTTP redirected to HTTPS. HSTS configured with max-age and includeSubDomains. Default AWS cipher suite excludes weak ciphers.

Each suggestion is ready to paste into the architecture document. The next time anyone reviews the architecture — a new team member, an auditor, or a threat modeling tool — that information will be there.

Architecture documentation that improves every time you run a threat model. That’s a feedback loop worth building.

Why conversation beats computation

The traditional approach to automated security analysis is to throw more compute at the problem. Better models, larger context windows, more sophisticated reasoning. That helps — but it has a ceiling. No amount of inference can compensate for information that isn’t in the document.

The best security consultants know this instinctively. They don’t read a document and produce a report. They read a document, form hypotheses, ask questions, update their understanding, and then produce a report. The conversation is the analysis. The report is just the artifact.

That’s what iterative threat modeling does. The baseline is the hypothesis. The clarification questions are the consultant’s interview. The clarified report is the grounded analysis. You can’t skip the middle step and expect the same quality.

A 94% confidence score on unverified assumptions is a number that looks good in a dashboard but doesn’t help you prioritize. An 86% confidence score on verified facts tells you exactly where your real risks are. One number sells. The other one protects.

What this means for your architecture

If you’re running threat models — automated or manual — ask yourself: is the tool asking questions, or just guessing?

A one-shot analysis will always default to worst-case assumptions for anything it can’t verify. That’s appropriate for a first pass. But if you stop there, you’re making prioritization decisions based on guesses. Teams waste effort hardening components that are already secure while real gaps hide behind generic findings.

Iteration isn’t overhead. It’s the difference between a threat model that tells you “something might be wrong” and one that tells you “here’s exactly what’s wrong, here’s why, and here’s how to fix it.”

Threat modeling matters. But a threat model that doesn’t ask questions is just an educated guess.

If this resonated, you might also enjoy our deep dive into AWS least privilege, how STRIDE and MITRE ATT&CK compare, and why it’s never too early to think about security. See how ThreatKrew works or learn more about the automated threat modeling tool.

Want to see interactive threat modeling in action? Try the platform or join the Founders Program and experience the conversation for yourself.

Threat Modelling is a Conversation, Not a Checklist