Tag: defaultazurecredential

  • Authentication patterns for Microsoft Foundry — beyond DefaultAzureCredential

    Authentication patterns for Microsoft Foundry — beyond DefaultAzureCredential

    DefaultAzureCredential is the right default, and I said as much in the getting-started guide that this post follows. It walks an ordered chain — environment variables, managed identity, Azure CLI, VS Code, interactive browser — and the same line of code works on a laptop, in CI, and on production compute. That is exactly why it earns its place on day one.

    The trouble starts by the time you hit production, when the questions get more specific. Your production workload needs to authenticate as something stronger than “whichever managed identity the host happens to provide.” Your CI/CD pipeline has to deploy agents, model deployments, and role assignments without a client secret sitting on the build agent. Your app calls Foundry on behalf of a signed-in user, and the user’s own identity has to reach Foundry — both for RBAC and for audit. And a security review asks for a complete inventory of who can call what, and “DefaultAzureCredential” is not an answer to that question.

    What follows is the auth pattern catalogue I wish I had when I went from prototype to production on Foundry. Five patterns, a per-environment role assignment model, the multi-environment story, and the four things that will bite you.

    The big picture — one diagram

    Before the catalogue, the one diagram that summarises the relationships. Every identity — a developer’s laptop, a signed-in end user, a workload on Azure compute, a CI/CD pipeline — reaches Foundry by way of an Entra-issued access token. The pattern you pick determines how that token is minted, not whether Entra is in the loop.

    Authentication architecture for Microsoft Foundry — calling identities flow through Entra ID via one of five auth patterns to reach the Foundry project and its endpoints.
    Authentication architecture for Microsoft Foundry. Every calling identity reaches Foundry via an Entra-issued access token.

    1. The auth pattern catalogue

    1.1 System-assigned managed identity for single-resource workloads

    When to use it. A single App Service, Function, or Container App that calls one Foundry resource, has no shared identity needs with anything else, and never has to outlive its host.

    When not. Anything where two compute resources need the same identity, or where the identity must persist across redeploys.

    Trade-off. System-assigned managed identities are created and deleted with their host. Zero lifecycle work, zero secrets, and zero portability. If you delete the App Service, the identity is gone — along with every role assignment that ever referenced it.

    resource app 'Microsoft.Web/sites@2023-12-01' = {
    name: 'app-foundry-prod'
    location: location
    identity: { type: 'SystemAssigned' }
    properties: { serverFarmId: plan.id }
    }
    // Assign Foundry User on the project (not the resource)
    resource roleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
    name: guid(project.id, app.id, foundryUserRoleId)
    scope: project
    properties: {
    principalId: app.identity.principalId
    principalType: 'ServicePrincipal'
    // Foundry User role ID — stable across the rename
    roleDefinitionId: subscriptionResourceId(
    'Microsoft.Authorization/roleDefinitions',
    '53ca6127-db72-4b80-b1b0-d745d6d5456d'
    )
    }
    }
    System-assigned managed identity lifecycle. The identity is created with the host, lives only as long as the App Service, and dies with it — taking every role assignment with it.
    System-assigned managed identity lifecycle. The identity is created with the host and deleted with it — taking every role assignment with it.

    1.2 User-assigned managed identity for shared and durable workloads

    When to use it. Multiple compute resources sharing one identity (App Service plus a Function, two AKS workloads, a Container App plus a Logic App). Or anywhere the identity must survive a redeploy of the compute.

    When not. A single transient workload — system-assigned is simpler, and you do not have an identity hanging around with no host.

    Trade-off. Durable and shareable, but you own the lifecycle. Think of it as identity-as-a-resource: it gets its own Bicep module, its own naming convention, and its own teardown plan.

    resource uami 'Microsoft.ManagedIdentity/userAssignedIdentities@2023-01-31' = {
    name: 'id-foundry-app-prod'
    location: location
    }
    resource app 'Microsoft.Web/sites@2023-12-01' = {
    name: 'app-foundry-prod'
    location: location
    identity: {
    type: 'UserAssigned'
    userAssignedIdentities: { '${uami.id}': {} }
    }
    properties: { serverFarmId: plan.id }
    }
    resource projectRole 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
    name: guid(project.id, uami.id, foundryUserRoleId)
    scope: project
    properties: {
    principalId: uami.properties.principalId
    principalType: 'ServicePrincipal'
    roleDefinitionId: subscriptionResourceId(
    'Microsoft.Authorization/roleDefinitions',
    '53ca6127-db72-4b80-b1b0-d745d6d5456d'
    )
    }
    }
    User-assigned managed identity shared across App Service, Function, AKS, and Container Apps — one identity, one Foundry role assignment, multiple workloads.
    User-assigned managed identity shared across App Service, Function, AKS, and Container Apps — one identity, one role assignment, multiple workloads.

    For anything in production, my default is user-assigned. The first time you redeploy a Container App and discover every role assignment has gone with it, you will thank yourself.

    1.3 Workload identity federation for GitHub Actions and other federated CI/CD

    When to use it. Any pipeline that deploys Foundry agents, model deployments, role assignments, or any other RBAC-protected operation. GitHub Actions, Azure DevOps with OIDC, Terraform Cloud, AKS workload identity — all federated subjects.

    When not. There is not a good “when not.” If your GitHub Actions workflow still has AZURE_CLIENT_SECRET in its repository secrets, you should be migrating off it.

    Trade-off. A bit of configuration up front — a federated credential on the app registration with the right subject claim and audience. Zero credential rotation forever after. The external identity provider (GitHub, Kubernetes, etc.) is trusted to assert the workload’s identity, and Entra exchanges that assertion for a token. No client secret ever crosses the wire.

    # Create the federated credential on an app registration
    az ad app federated-credential create \
    --id $APP_ID \
    --parameters '{
    "name": "github-main-prod",
    "issuer": "https://token.actions.githubusercontent.com",
    "subject": "repo:my-org/my-repo:ref:refs/heads/main",
    "audiences": ["api://AzureADTokenExchange"]
    }'
    # .github/workflows/deploy.yml
    permissions:
    id-token: write # required to mint the OIDC token
    contents: read
    jobs:
    deploy:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4
    - uses: azure/login@v2
    with:
    client-id: ${{ secrets.AZURE_CLIENT_ID }}
    tenant-id: ${{ secrets.AZURE_TENANT_ID }}
    subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
    enable-AzPSSession: false
    - run: az deployment group create ...
    Workload identity federation trust between GitHub Actions and Microsoft Entra ID. The runner sends an OIDC token, Entra validates it against the federated credential on the app registration, and returns a Foundry-scoped access token.
    Workload identity federation trust between GitHub Actions and Microsoft Entra ID. The runner sends an OIDC token, Entra validates it against the federated credential, and returns a Foundry-scoped access token.

    The pattern generalises. AKS workload identity uses the same federation primitive with the cluster’s OIDC issuer as the subject. Terraform Cloud has its own. The configuration changes; the model does not.

    1.4 On-Behalf-Of flow for apps that call Foundry as the signed-in user

    When to use it. A web app or API where the end user’s identity must reach Foundry — because the user’s own RBAC determines what they can see, because audit logs need the user not the app, or because a compliance regime requires per-user attribution all the way to the model call.

    When not. Pure machine-to-machine workloads. If there is no signed-in human in the loop, you want a managed identity, not OBO.

    Trade-off. More moving parts. The user signs into the front end, the front end calls your API with their access token, the API exchanges that token for a downstream token scoped to Foundry, and only then does the call go through. It is the only correct answer for user-scoped operations.

    # Middle-tier API: exchange the incoming user token for a Foundry-scoped token
    import msal
    app = msal.ConfidentialClientApplication(
    client_id=API_CLIENT_ID,
    client_credential=API_CLIENT_SECRET, # or a certificate / federated credential
    authority=f"https://login.microsoftonline.com/{TENANT_ID}",
    )
    # incoming_user_token comes from the Authorization header on the request
    result = app.acquire_token_on_behalf_of(
    user_assertion=incoming_user_token,
    scopes=["https://ai.azure.com/.default"],
    )
    foundry_access_token = result["access_token"]
    On-Behalf-Of flow sequence — end user signs into the front end, the middle-tier API exchanges the user token for a Foundry-scoped token, and the call to Foundry runs under the user identity with their RBAC and Conditional Access applied.
    On-Behalf-Of flow. The middle-tier API exchanges the user token for a Foundry-scoped token, and the call runs under the user identity with their RBAC and Conditional Access applied.

    One implication worth calling out: any Conditional Access policy on the user’s original sign-in propagates through the OBO exchange. If your CA policy says “no Foundry access from non-compliant devices,” the downstream Foundry call inherits that. That is almost always what you want.

    1.5 Application registrations with client secrets — when (rarely) still appropriate

    When to use it. Local developer machines that are not on a corporate-managed laptop with Entra-joined credentials. Genuinely headless scripts that cannot use a managed identity or federated workload identity. Third-party integrations that do not yet support OIDC federation. That is it.

    When not. Anything in production on Azure compute — use a managed identity. Anything in CI/CD on a platform that supports federation — use workload identity federation. Anything an auditor will ever look at.

    Trade-off. Simplest to set up, hardest to govern. Secrets rotate, they leak, they accumulate. If you have more than a handful, you have a secret-sprawl problem and you do not yet know it.

    If you must use one: short expiry (90 days), stored in Key Vault, never in a .env checked into a repo, and the role assigned to the app’s service principal is the minimum it needs — Foundry User scoped to the project, never Contributor scoped to the subscription.

    The hard line: if you are putting a client secret on a production workload, you have taken a wrong turn. Go back and use one of the four patterns above.

    Client secrets as an anti-pattern in production — secrets leak via .env files, copied CI variables, and expire without an owner. Replace with managed identity, workload identity federation, or On-Behalf-Of.
    Client secrets in production are an anti-pattern. Replace with managed identity for Azure compute, workload identity federation for CI/CD, or On-Behalf-Of for signed-in user apps.

    2. The role assignment model — least privilege without the spreadsheet

    Two principles. Roles are assigned to principals — managed identities, user accounts, Entra groups — at a scope. The scope can be project, Foundry resource, resource group, or subscription. Get the scope right and least privilege follows naturally. Get it wrong and you will be re-assigning Contributor every six months because somebody got blocked at a demo.

    In prose, here is the model I deploy:

    Application principals — the managed identity that the production app authenticates as, the federated workload identity the AKS pod assumes — get the Foundry User role, scoped to the project, not the resource. Project-scoped assignments mean a misconfigured app cannot accidentally see another project’s agents, threads, or connections.

    Build and deploy principals — the federated CI/CD identity that runs your GitHub Actions workflow — get Foundry Project Manager scoped to the project. If the same pipeline also creates projects, then it needs a resource-level role for that one operation; keep it as narrow as you can get away with.

    Human developers get Foundry Project Manager on the dev project, Foundry User on staging, and read-only on prod. Production changes go through the pipeline; they do not go through individual developer accounts.

    Resource-level roles — Foundry Account Owner and Foundry Owner — are platform-team territory, and even there they should be PIM-eligible rather than standing assignments. These are the roles that can create new projects, configure guardrails, and conditionally hand out other roles. Treat them accordingly.

    A few practical notes the docs are explicit about. Do not assign built-in roles that start with Cognitive Services for Foundry work — Microsoft’s RBAC documentation calls this out directly. Those roles are for accessing AI Services resources directly and do not apply to Foundry scenarios, even though Foundry sits on the Microsoft.CognitiveServices resource provider. Also avoid the Azure AI Developer role for Foundry — despite the name, it is scoped to Azure Machine Learning workspaces and Foundry hubs, not to Foundry projects or resources.

    One more practical note: reference role definition GUIDs in Bicep and Azure CLI, not display names. The Foundry roles were recently renamed from their Azure AI predecessors (Azure AI User → Foundry User, Azure AI Project Manager → Foundry Project Manager, Azure AI Account Owner → Foundry Account Owner). The GUIDs are stable; the display names are still mid-rollout across the portal and tooling.

    Role assignment model. Application principals get Foundry User at project scope, CI/CD and developer principals get Foundry Project Manager at project scope, and resource-level Foundry Account Owner and Foundry Owner roles stay with the platform team. Avoid Cognitive Services * roles and Azure AI Developer for Foundry work.
    Role assignment model. Application principals get Foundry User on the project, CI/CD and developer principals get Foundry Project Manager, and resource-level Foundry Account Owner / Foundry Owner stay with the platform team. Avoid Cognitive Services * roles and Azure AI Developer for Foundry work.

    3. The multi-environment story

    Dev, staging, and prod each get their own Foundry resource — not just their own project. Quotas are resource-scoped. Network configuration is resource-scoped. The blast radius of a misconfigured role assignment is resource-scoped. All of those argue for full resource separation between non-prod and prod, even if it means three sets of Bicep modules and three Application Insights workspaces. The cost of running an under-utilised dev resource is far less than the cost of an intern accidentally pointing a load test at a prod deployment.

    Each environment gets its own user-assigned managed identity for the application principal, its own federated credential on the CI/CD app registration (one per environment, with a distinct subject claim — environment:dev, environment:prod — so prod deploys only run from protected branches and reviewed environments), and its own Entra group for human access. Group membership rather than direct user assignment, always — that is how you get clean joiner/mover/leaver flows without a quarterly spreadsheet review.

    Secrets that genuinely have to exist — third-party API keys, database connection strings — live in a per-environment Key Vault, accessed by the per-environment managed identity. Foundry credentials themselves are never in Key Vault. They are token exchanges via the patterns in Section 1.

    Elevated roles on the prod resource go through Privileged Identity Management. The platform team holds Foundry Owner on prod as PIM-eligible, not as a standing assignment. Activation requires justification, a time window, and an audit trail. If your auditor asks “who could have changed the prod guardrails on this date,” you want PIM logs to answer that, not Azure Activity Log archaeology.

    Multi-environment isolation. Dev, staging, and production each get their own Foundry resource, user-assigned managed identity, federated credential, and Key Vault. Elevated roles on prod are PIM-eligible only.
    Per-environment isolation. Dev, staging, and production each get their own Foundry resource, user-assigned managed identity, federated credential, and Key Vault. Elevated roles on prod are PIM-eligible only.

    4. The four things that will bite you

    Token caching. The Azure SDK clients cache tokens for the lifetime of the credential object. Long-lived processes — anything stateful, anything that processes a queue, anything with a connection pool — need to handle credential refresh correctly. The right pattern is usually to reuse a single credential instance across all clients in the process, not to recreate DefaultAzureCredential() (or its successor) per call. Recreating it per call defeats the cache and, on a busy worker, will get you rate-limited at the IMDS endpoint before you have shipped a single completion.

    Cross-tenant scenarios. Foundry resources live in a single tenant. If you have a partner tenant whose users need to call your Foundry workload, you are in B2B territory and the patterns above need adapting. Managed identities do not cross tenants without explicit federation, and OBO has its own constraints when the user is a guest. Do not discover this two weeks before a launch — design for the tenant model on day one.

    Private endpoints and DNS. Authentication works, the call still fails. If you have put Foundry behind a private endpoint, the DNS for the resource FQDN must resolve to the private IP from the calling network. Public DNS will look correct, your nslookup from a different network will look correct, and the call from inside the VNet will time out with no useful error. Always check resolution from the calling subnet, not from your laptop.

    Role propagation latency. New role assignments take up to ten minutes to propagate. Pipelines that create a user-assigned managed identity and immediately use it against Foundry will hit 403s on the first run. Options: insert a wait step after role assignment, retry with exponential backoff in the calling code, or assign roles ahead of provisioning the compute they are attached to. I prefer the third — the assignment is declarative and the compute picks it up when it comes online.

    Four gotchas to watch for: stale tokens in long-lived processes, cross-tenant scenarios needing multi-tenant app registrations, private-endpoint DNS resolution failures, and the up-to-ten-minute delay before new role assignments take effect.
    Four things that will bite you in production: stale tokens in long-lived processes, cross-tenant scenarios needing multi-tenant app registrations, private-endpoint DNS failures, and the up-to-ten-minute delay before new role assignments take effect.

    5. When NOT to add another auth pattern

    Counterweight, briefly. If your workload is one App Service calling one Foundry resource for one tenant’s users, deployed by one GitHub Actions workflow, you do not need four patterns. You need a user-assigned managed identity on the App Service and a federated workload identity for the pipeline. Stop there. Adding OBO, custom token exchange, or a second managed identity because “we might need it later” is the kind of architecture work that looks responsible in a design doc and creates three years of operational debt.

    And if you find yourself building a custom token-exchange layer — your own service that sits in front of Foundry and stamps tokens on requests — you are almost certainly reinventing something Entra already does. Read the workload identity federation and OBO docs again before you write more code. The thing you are about to build is probably a federated credential with the wrong subject claim.

    6. Closing

    DefaultAzureCredential is how you start. The patterns in this post are how you scale. Pick the right managed identity flavour for the workload’s lifecycle. Federate your CI/CD so no client secret ever lives on a build agent. Use OBO where the user’s identity has to reach Foundry, and do not use it where it does not. Get the role scope right at the project level. Separate environments by resource, not just by project.

    References

  • Starting an Azure Foundry project — the getting-started guide nobody wrote

    Microsoft Foundry banner
    Banner image: Microsoft Foundry. Source: Microsoft Tech Community — Introducing Microsoft Foundry.

    Most “getting started with Foundry” content is a screenshot tour of the portal. You watch someone click “Create resource,” pick a region from a dropdown, and end the post with a chat playground saying “Hello, world.” None of that helps you on Monday morning when you have to commit to a region, an auth pattern, and a project topology that you’ll be living with for the next year.

    This is the post I wish I’d had open in another tab when I started TrafficIQ, our multi-agent supply-chain transport intelligence build on Foundry Agent Service. Five decisions you make before you click Create, the auth pattern you should adopt from day one, a first-sprint checklist, and the three things that will bite you.

    1. The naming maze — what Foundry actually is in 2026

    Eighteen months ago you had four products: Azure OpenAI, Azure AI Studio, Azure AI Services, and a sprawling Cognitive Services back catalogue. Today you have one Azure resource type — kind: AIServices with allowProjectManagement: true — and Microsoft calls it Microsoft Foundry (formerly Azure AI Foundry). Single resource, single ARM object, and three FQDNs hanging off it: the Azure OpenAI-compatible inference endpoint, the cognitive-services endpoint, and the Foundry project endpoint your agents and Responses API code talks to.

    There are also two portals. Foundry (classic) is the hub-based experience that grew out of Azure AI Studio. Foundry (new) is the project-first experience built around the consolidated resource. Both still work. Classic is in maintenance mode. If you are starting a new project in 2026, start in the new portal and create a Foundry project — not a hub project. Hub projects still exist for backwards compatibility, but everything Microsoft is investing in — agent service, evaluations, the new model catalogue, observability — is wired up around Foundry projects first.

    One more piece of context before you create anything: the Assistants API retirement deadline of 26 August 2026 is real. If you are building anything new today, do not start on Assistants — go directly to Foundry Agent Service and the Responses API. I’ll cover the migration path in a dedicated post; for now, treat Assistants as legacy.

    Microsoft Foundry resource and project architecture: a top-level Foundry resource governance boundary containing model deployments, security settings, connections, and two projects.
    Microsoft Foundry resource and project architecture. Source: Microsoft Learn — Microsoft Foundry architecture.

    2. The five decisions you make before you click Create

    2.1. Foundry resource vs upgrading an existing Azure OpenAI resource

    Decision: create a brand-new Foundry resource, or upgrade an existing Azure OpenAI resource in place. Trade-off: the in-place upgrade keeps your existing endpoint, deployments, network config, and RBAC bindings — but it requires a system-assigned managed identity on the source resource and is one-way once you commit (rollback exists but is a support operation, not a button).

    For TrafficIQ: new resource. The repo was greenfield, I wanted a clean project boundary, and I didn’t want to inherit eighteen months of ad-hoc role assignments from the old Azure OpenAI resource.

    2.2. Region

    Decision: which Azure region hosts the resource. Trade-off: model availability is not uniform. Sweden Central, East US 2, and France Central each have meaningfully different model catalogues, and frontier models often land in one region weeks before the others. Pick the wrong region and you’ll either rewrite code against a different deployment or pay cross-region latency. For TrafficIQ: Sweden Central. TrafficIQ shipped on gpt-4.1 and gpt-4.1-mini, and Sweden Central was the region that aligned with both the model availability I needed and my EU data-residency obligations. Starting fresh today, I’d still default to Sweden Central but I’d evaluate gpt-5-mini for the router/orchestrator.

    2.3. New portal vs classic portal

    Decision: which portal you do your work in. Trade-off: classic gives you hub projects (good if you have an existing hub and shared compute), new gives you Foundry projects (better isolation, simpler RBAC, where all the new features land first).

    For TrafficIQ: new portal, Foundry project. No hub.

    2.4. Single project vs multiple projects per resource

    Decision: how many projects to carve out of one Foundry resource. Trade-off: projects are the isolation and RBAC boundary in Foundry — a project owns its agents, threads, evaluations, connections, and the people who can see them. One project is simpler; multiple projects are how you separate prod from dev, or two workloads that should never see each other’s data.

    For TrafficIQ: I started with a single project and split as soon as evaluations grew enough to need their own connections and quotas. The pattern I’d recommend day one: two projects per environment — one for the agent runtime, one for evaluations and offline experiments — and prod in a separate Foundry resource entirely from non-prod, so a misconfigured RBAC binding can never reach production data.

    2.5. Direct Foundry-billed models vs Azure Marketplace third-party models

    Decision: how you procure non-OpenAI models — Anthropic, Cohere, Mistral, Meta, and the rest. Trade-off: direct (first-party in the Foundry catalogue, billed on your Azure invoice, full enterprise SLA, no separate contract) versus Azure Marketplace (third-party publisher, often the only way to get the very latest version of a partner model, but it’s a separate offer you have to accept and the billing line lands differently).

    For TrafficIQ: direct for everything I could, marketplace only where a specific model version wasn’t available first-party. One Azure invoice is worth real money in procurement time.

    3. Authentication and authorisation — the day-one setup

    If you take one thing from this post, take this: don’t use API keys. Foundry resources support Entra ID (Azure AD) authentication everywhere, and DefaultAzureCredential from azure-identity is the right pattern from day one. Keys feel quick on day one and become a rotation, secrets-sprawl, and audit nightmare by month three.

    The pattern I use in TrafficIQ, lifted down to its essentials:

    from azure.identity import DefaultAzureCredential
    from azure.ai.projects import AIProjectClient
    # DefaultAzureCredential walks an ordered chain:
    # env vars -> managed identity -> Azure CLI -> VS Code -> interactive
    # Same line of code works locally, in CI, and in production.
    credential = DefaultAzureCredential()
    project = AIProjectClient(
    endpoint="https://<your-foundry-resource>.services.ai.azure.com/api/projects/<project-name>",
    credential=credential,
    )
    # Now you can use Agents, Responses, evaluations, connections —
    # all authenticated as the principal the host environment provides.
    agents = project.agents

    There are three roles you’ll actually find yourself assigning in the first week. Microsoft renamed these in the last release wave; both old and new names still appear across the portal and docs during the rollout, but the new names are what you should write into runbooks.

    • Foundry User (formerly Azure AI User) — read/use existing agents, run inference, call the Responses API. This is the role for your application’s managed identity in production, and for engineers who consume but don’t author. Role ID: 53ca6127-db72-4b80-b1b0-d745d6d5456d.
    • Foundry Project Manager (formerly Azure AI Project Manager) — create and modify agents, manage connections, deploy models into the project. The role for developers actually building. Role ID: eadc314b-1a2d-4efa-be10-5d325db5065e.
    • Foundry Account Owner (formerly Azure AI Account Owner) — resource-level operations like creating new Foundry resources and configuring guardrails. The elevated tier. Don’t grant casually.

    Two practical notes. In Azure CLI and Bicep, use the role definition GUIDs, not the names — names are still mid-rename and the GUIDs are stable. And don’t grant any role that starts with “Cognitive Services” for Foundry work. The Microsoft Learn RBAC doc explicitly calls these out as not applicable to Foundry, even though Foundry sits on the Microsoft.CognitiveServices provider under the hood.

    Diagram showing access for the Foundry User role, scoped at the Foundry resource.
    Foundry User role (formerly Azure AI User), scoped at the Foundry resource. Source: Microsoft Learn — RBAC for Microsoft Foundry.

    In production, the application principal is a managed identity — a user-assigned managed identity attached to your App Service, Container App, AKS workload identity, or Function. App registrations with client secrets are for local development and headless CI/CD only. If you find yourself putting an app registration secret on a production workload, you’ve taken a wrong turn — go back and attach a managed identity instead.

    Secrets that genuinely have to exist — third-party API keys, database connection strings, anything that isn’t a Foundry credential — live in Azure Key Vault and are injected at build time, not runtime where possible. TrafficIQ uses a Vite Key Vault plugin pattern for the frontend so that the bundle never contains a literal secret and the build agent’s managed identity is the only thing that ever touches the vault.

    One last thing the docs bury and I wish someone had said louder: private endpoints are the most-forgotten production step, and you have to recreate them after an in-place upgrade from Azure OpenAI to Foundry. The upgrade preserves most of your network configuration, but private endpoints targeting the new Foundry sub-resources need to be re-provisioned, and DNS will be wrong until you do. Put it on the upgrade runbook.

    Network isolation plan for Microsoft Foundry: inbound via Private Access, outbound to Storage/Key Vault/Cosmos via private endpoints, outbound from compute.
    Network isolation plan for Microsoft Foundry. Source: Microsoft Learn — Configure network isolation for Microsoft Foundry.

    4. The first sprint — a working checklist

    In order. One line on what to do, one line on the trap.

    1. Create the Foundry resource. Use kind: AIServices, allowProjectManagement: true, system-assigned managed identity on. Trap: if you let someone create it as a vanilla Azure OpenAI resource “for now,” you’ll be doing an upgrade migration in week three.
    2. Create the first Foundry project. Give it a name that survives renames — <workload-<env works. Trap: project name is in the endpoint URL, so renaming later means client config changes everywhere.
    3. Assign roles, not keys. Azure AI Project Manager for builders, Azure AI User for the app’s managed identity. Trap: don’t grant subscription-level Contributor “just to unblock the demo” — it never gets revoked.
    4. Set up Key Vault and managed identity. One vault per environment, user-assigned managed identity attached to your compute. Trap: system-assigned MIs disappear when you delete the compute resource; use user-assigned for anything you care about.
    5. Deploy a model. A reasonable default in 2026: gpt-5-mini for router/orchestrator agents and gpt-4.1 for specialists with heavier tool-calling. Trap: model availability is regional — check the catalogue in your target region before you write code against a specific deployment name.
    6. Wire a connection for any external data source. Foundry “connections” are the project-scoped credential store for storage accounts, search indexes, and tools. Trap: connections live inside the project — copy them when you split prod from dev, don’t share.
    7. Call the Responses API from a smoke-test script. AIProjectClient → get inference client → responses.create. Trap: if you copy a sample using the legacy chat-completions endpoint, you’ll miss the new tool-calling and reasoning surface entirely.
    8. Stand up your first agent in Foundry Agent Service. Tools, instructions, model — keep it boring. Trap: don’t start with a mega-agent; start with one narrow agent and add a second before you make the first one cleverer.
    9. Turn on Guardrails and review the defaults. They are on by default at “medium” across categories. Trap: defaults block legitimate enterprise content — see Section 5.
    10. Wire up observability before you ship. Application Insights connection on the project, distributed tracing through opentelemetry, Foundry’s built-in run/thread tracing on. Trap: adding observability after the fact is two orders of magnitude harder than turning it on now.

    5. The three things that will bite you in the first sprint

    Quota. Tokens-per-minute (TPM) and requests-per-minute (RPM) limits are per-deployment and per-region, and the default quota you get on a fresh subscription is sized for demos, not production. The day you flip a real workload on, you will hit 429s. Mitigations: request quota increases early (the form is slow), spread deployments across multiple regions if your latency budget allows, and put Provisioned Throughput Units (PTU) under anything customer-facing where you cannot tolerate rate-limit jitter.

    Guardrails (formerly content filters). Foundry’s Guardrails system is on by default with sensible consumer settings — and it will block legitimate enterprise content. Customer-complaint emails trip the harm filter. Security logs trip the violence filter. Code review of an exploit-handling library trips multiple. You can tune controls per-model and per-agent under Guardrails in the portal, define custom guardrails with their own controls, and apply them at four intervention points: user input, tool call, tool response, and output (the final completion returned to the user). Audit the defaults the day you deploy your first model, not the day a business user shows you a screenshot of a blocked legitimate prompt.

    Observability. Foundry exposes distributed traces, per-run token accounting, evaluation hooks, and a thread/run viewer in the portal — but only if you wire it up. Wire it up on day one. The cost of adding tracing to a quiet new system is an afternoon. The cost of adding tracing to a live multi-agent system with real users is a sprint and a half, plus the customer trust you spend debugging the bug you can’t see.

    6. When NOT to use Foundry

    I’m bullish on Foundry, but it isn’t the answer to every question.

    If you have exactly one OpenAI model in production and a stable PTU reservation on it, defer the upgrade. The in-place upgrade is non-trivial, and you get nothing from it if you aren’t using agents, evaluations, or the broader catalogue. Revisit when one of those becomes a “yes.”

    If you need offline or on-device inference — air-gapped environments, edge devices, sub-10ms latency budgets — you want Foundry Local, not cloud Foundry. Same model story, very different deployment shape, and trying to make cloud Foundry pretend to be local will end badly.

    If you have a price-sensitive, non-enterprise workload with no Entra or Azure compliance requirement — a side project, a hobby tool, a community OSS app — going direct to OpenAI’s or Anthropic’s API is still cheaper and operationally simpler. Foundry’s value is enterprise: SSO, RBAC, private networking, compliance attestations, one invoice. If you don’t need those, you’re paying for them anyway.

    7. Closing — and what’s next

    Foundry rewards a small amount of up-front thinking. Pick the region for the models you actually need. Use Entra and managed identities from line one of code. Multi-project from the start if you’re going to run more than one environment. Turn on observability before the first user hits the first endpoint. Re-do your private endpoints after any upgrade. Most of the pain I see on Foundry projects is pain that comes from skipping one of those.

    Two follow-ups coming next on this blog: Foundry Agent Service migration from the Assistants API (with code from TrafficIQ) and an authentication-patterns deep-dive that goes well past DefaultAzureCredential into workload identity federation, on-behalf-of flows, and the per-environment role assignments I actually deploy. Subscribe if that’s useful — I’ll link them here as they go live.

    Image credits

    Diagrams in this post are reused from Microsoft Learn with attribution to Microsoft:

    All other commentary, code, and opinions in this post are my own and reflect lessons from building TrafficIQ.