Every SSM session in your org lands as ssm-user. You know this is wrong. You’ve read the runAsEnabled docs. You know about SSMSessionRunAs session tags. You know ABAC can carry attributes through AssumeRoleWithSAML.
The problem: every blog post that covers this uses Okta. You run Entra ID.
This post covers the full Entra-specific implementation — the attribute chain, the SCIM gotchas, the two deployment models, and the agentic methodology that got me from stuck to shipped in a day.
The Identity Chain
extensionAttribute1 (Entra)
→ costCenter (SCIM enterprise extension)
→ ${path:enterprise.costCenter} (IdC ABAC)
→ SSMSessionRunAs (STS session tag)
→ runAsEnabled (SSM-SessionManagerRunShell)
→ Linux user on EC2
Four layers. Each one independently testable. Each one a potential silent failure point.
IdC’s SCIM endpoint only accepts standard SCIM 2.0 core + enterprise extension attributes. No custom schemas. So you repurpose enterprise.costCenter as the carrier. Pick whichever enterprise field isn’t already used in your HR system — employeeNumber, division, department all work equally.
Two Deployment Models
Per-Role — extensionAttribute1=developer → local Linux user developer. Simple. No AD. Works everywhere. CloudTrail shows real identity; OS shows shared role.
Per-User — extensionAttribute1=alice → SSSD resolves alice@gitrdun.net against AD → personal identity, AD group-driven entitlements, full OS accountability. Requires AD reachable from VPC. The repo includes a self-contained Windows Server 2022 DC in CFN (5-tier stack: VPC → endpoints → IAM → SG → instance) if you need to stand one up.
Domain alignment matters for per-user: your AD domain must match your Entra UPN suffix. use_fully_qualified_names = True in SSSD so the resolved username matches the SSMSessionRunAs tag exactly.
The Entra-Specific Gotchas (The Part That Cost Weeks)
These are the ones that aren’t in any doc:
1. Metadata XML has 12 certs. IdC silently fails with “Retry Failed Steps”. Strip to 1 active cert + 2 SSO endpoints. Nothing else.
2. SCIM template ID is aWSSingleSignon, not aws. The MS Learn Graph API provisioning walkthrough shows templateId: "aws" — but that example targets the legacy AWS Single-Account Access gallery app (no user provisioning, separate applicationTemplateId). The IAM Identity Center app uses a different template entirely. Using aws against an IAM Identity Center service principal returns BadRequest. Run GET /servicePrincipals/{id}/synchronization/templates on your own SP to confirm.
3. ACS URL domain mismatch. Gallery wildcard covers *.signin.aws.amazon.com. Actual ACS URL uses signin.aws. Different domain. Add the specific URL explicitly to replyUrls via Graph API — in a separate PATCH body, not combined with other fields.
4. IdC identity source change deletes the IAM SAML provider. Role trust policies still reference the old ARN. 403 on AssumeRoleWithSAML. Fix: delete-account-assignment + create-account-assignment forces IdC to regenerate the SAML provider.
5. Fn::Sub eats PowerShell ${var} tokens. Capture all CFN parameters into plain variables at the top of your UserData. Use unbraced $var syntax throughout.
6. Windows EC2 UserData is two boots. Install-ADDSForest triggers an automatic reboot. Dead code follows. Use <persist>true</persist> + a registry phase flag. Signal CloudFormation from the second boot.
7. SCIM DuplicateDocumentContent. ssm:UpdateDocument raises this when content is identical on a re-run. Catch it explicitly and return success — it’s idempotent, not an error.
8. VPC Interface Endpoints required for private instances. SSM, ec2messages, ssmmessages, secretsmanager, cloudformation. DependsOn all five from the EC2 instance so it doesn’t start before private DNS resolves.
Nine more in the white paper.
The Infrastructure
Shared across both approaches:
shared/cfn/idc-abac.yaml ← AWS::SSO::InstanceAccessControlAttributeConfiguration
shared/cfn/ssm-preferences-member.yaml ← Lambda custom resource (UpdateDocument, not CreateDocument)
shared/cfn/ssm-preferences-stackset.yaml ← SERVICE_MANAGED, autoDeployment=true
shared/cfn/scp-guardrail.yaml ← denies ssm:UpdateDocument on SSM-SessionManagerRunShell
The Lambda custom resource is necessary because SSM-SessionManagerRunShell is a shared AWS-managed document — it already exists in every account. AWS::SSM::Document only does CreateDocument. You need UpdateDocument. So you write a Lambda. Watch for DuplicateDocumentContent on idempotent re-runs and ServiceTimeout: 130 to fail fast on import errors instead of hanging for an hour.
The Agentic Methodology
Two hours of research before planning. Research subagents scoured the internet — not just docs, everything — and pulled live API references via Context7 (MCP server for current library docs). Then formal property specs (SAFETY / LIVENESS / INVARIANT), each with a runnable observable. The plan was peer-reviewed before implementation started. TDD: test cases written from properties, implementation written to pass them. Parallel implementation subagents. Adversarial review across five dimensions. Engineer sign-off before anything deployed.
The output wasn’t “AI-written code that might work.” It was code that passed a formal spec, a test plan, shellcheck, CFN validation, a hardcoded-value scan, and a security review — then got reviewed by a human. More rigour than most sprint tickets get.
The methodology removed the bandwidth constraint. I was thinking at the architecture level the entire time — not simultaneously debugging PowerShell syntax and reasoning about SCP scope.
Repo
CloudFormation, scripts, PowerShell, verification tooling. Both approaches. Fully parameterized, no hardcoded values, shellcheck-clean.
Full white paper has every gotcha, the complete architecture, operational lifecycle, break-glass procedure, and cert rotation steps.
Written in the Conductor and Orchestra model. I directed. The agents executed. You’re reading the output.