Category: AI Agents & RAG

Copilot Cowork: the agent that does the work — and the extensibility model architects should actually study

Most Copilot feature announcements I read the way I read a release note: skim the capability list, note what changed, move on. Microsoft 365 Copilot Cowork was the first one in a while that made me stop and read the developer documentation twice. Not because of what it does — the "it sends your emails and builds your decks" story is everywhere — but because of what sits underneath it.

Chat assistants describe work. Cowork does it. That shift is the headline, and it is real. But the part worth an architect’s attention is the extensibility model: Cowork adopts an open skills standard, runs a multi-model architecture, and packages capability as standard Microsoft 365 app packages. Read that way, Cowork is less a product and more a distribution channel for agent capabilities you may already have built.

This post is a companion to my Microsoft IQ pillar post — Cowork is the most visible consumer of Work IQ to date. Here I want to cover the architecture, the extensibility model in detail, and what is worth doing (and not doing) with it while it is still preview.

What Cowork is — and the preview status everyone keeps getting wrong

Copilot Cowork carries out tasks across your Microsoft 365 environment rather than just answering questions about them. It drafts and sends email through Outlook, schedules meetings and manages your calendar, creates Word, Excel, PowerPoint, and PDF files, posts to Teams channels and chats, searches across your organisation, runs deep research, and can run prompts on a schedule for recurring work. Every step is visible in the conversation as it happens.

The control model is the important design choice. Before any sensitive action — sending, posting, creating — Cowork pauses and asks. Medium- and high-risk actions carry a risk-level indicator. The approval button is labelled for the specific action (Send, Post, Create), and you can pause, resume, or cancel at any point. Microsoft announced Cowork on 9 March 2026 and made it available through the Frontier preview programme; it runs in the browser at m365.cloud.microsoft, in the desktop app for Windows and Mac, and — since the May 2026 update — on iOS and Android through the Microsoft 365 Copilot app.

Here is the correction worth making early, because some third-party coverage has it wrong: Cowork is not generally available. Several write-ups have claimed a GA milestone. The Microsoft Learn documentation says the opposite, on every page, in a banner: this is prerelease documentation, the feature is in Frontier preview, and capabilities may change. Your admin account also has to be Frontier-enrolled (Copilot → Settings → Frontier) or Cowork will not even appear in Admin Center agent management. Treat anything you read about Cowork being "shipped" with that banner in mind.

The architecture underneath

Copilot Cowork architecture: connectors and integrations (Fabric IQ/Power BI, Dynamics 365, third-party MCP) feed the skills system, which feeds capability into Cowork; Work IQ grounds the plan; desktop and iOS/Android client surfaces feed in; every action passes a per-action approval gate before reaching Microsoft 365 surfaces. — Cowork architecture — Work IQ grounds the agent, the skills system feeds capability, and every action passes a per-action approval gate before it reaches a Microsoft 365 surface.

Cowork is built on Work IQ. The documentation is explicit that it "browses your entire Work IQ" to pull in the content it needs — emails, meetings, messages, files, and data across Outlook, Teams, Excel, SharePoint, and the rest of Microsoft 365. That grounding layer is what lets the agent act with context rather than starting cold from a prompt. If you want the deeper treatment of what that intelligence layer is and what it changes, that is the subject of the IQ pillar post.

The second architectural fact is that Cowork is multi-model. It uses Microsoft’s own models alongside Anthropic’s Claude — the model selector currently exposes Claude Opus 4.7 as an option, and Microsoft documents that it uses Anthropic models as a subprocessor. One consequence that belongs in your governance notes: access to the Anthropic models is limited to Anthropic-supported regions, and Cowork is not exempt from that restriction. If your tenant spans regions, verify coverage before you assume availability.

The third fact — the one that carries the rest of this post — is that the unit of capability is the skill. Cowork ships with built-in skills (Word, Excel, PowerPoint, PDF, Email, Scheduling, Calendar Management, Meetings, Daily Briefing, Enterprise Search, Communications, Deep Research, and Adaptive Cards), and it loads them dynamically during a conversation, showing you which are active in a side panel. Everything you can add to Cowork is expressed as a skill or a connector. So the extensibility model is the architecture that matters.

The extensibility model

There are two tiers, and they map cleanly to two audiences.

Tier one: OneDrive custom skills (no-code)

Any user can create up to 50 custom skills by dropping a SKILL.md file into a subfolder of their OneDrive at /Documents/Cowork/skills/<skill-name>/SKILL.md — for example /Documents/Cowork/skills/weekly-report/SKILL.md. Cowork discovers them automatically at the start of each conversation. A skill is a YAML frontmatter block (a name and a description) followed by a Markdown body of instructions: structure, tone, the steps you want followed, the output format you expect. No deployment, no packaging, no admin involvement. This is the path to test the model on a real recurring task this week.

Tier two: plugin packages (skills + MCP connectors)

The developer path uses the same distribution mechanism as Teams apps, Copilot agents, and Office add-ins: the Microsoft 365 app package. A Cowork plugin is a .zip containing a manifest.json (unified manifest v1.28), the two app icons, and a skills/ folder. It can carry two extension types. Skills are the prompt-based workflows already described. Connectors are remote MCP servers that give Cowork access to external data and APIs — Streamable HTTP over HTTPS (TLS 1.2+), JSON-RPC 2.0 message format, and support for tools/list and tools/call. The package limits are firm: a maximum of 20 skills and 10 connectors per package.

Packages are distributed through the Microsoft 365 App Store (submitted via Partner Center) or deployed by an admin. Connector credentials never live in the manifest or the skill files — they reference the Microsoft Enterprise Token Store, using OAuthPluginVault for OAuth 2.0 APIs or ApiKeyPluginVault for API-key services, with the secret held in the vault and only a reference ID in the package.

The Agent Skills open standard — and the Claude conversion path

This is the detail almost no one is covering properly. Cowork’s skills are not a proprietary Microsoft format. They use the Agent Skills open standard — the same SKILL.md format supported by Claude Code, Claude.ai projects, Visual Studio Code and GitHub Copilot, Gemini CLI, Cursor, JetBrains Junie, OpenAI Codex, and, per Microsoft’s own count, 30+ other AI tools. The skill text you write for one is the skill text you can use across all of them.

Microsoft leans into this with a PowerShell conversion script, Convert-ClaudePluginToMOS3.ps1, that turns an existing Claude Code plugin into a valid M365 package — Microsoft quotes roughly five minutes. It reads the plugin’s .claude-plugin/plugin.json, its .mcp.json, and its skills/ directory and emits a .zip with a generated manifest. The mapping is clean, but it is not complete. What converts and what does not is worth keeping in front of you:

Conversion path: a Claude Code plugin runs through the conversion script into an M365 app package, then App Store or admin deployment into a Cowork session. Slash commands, sub-agents, and hooks are not converted. — The Claude plugin to M365 package conversion path. Skills copy verbatim and MCP servers map to connectors; slash commands, sub-agents, and hooks do not convert.

Claude plugin artifact	M365 equivalent	Status
`plugin.json`	`manifest.json`	Name, description, author mapped; GUID auto-generated (deterministic UUID v5)
`skills/*/SKILL.md`	`agentSkills[]` + `skills/` folder	Copied verbatim — identical format
`.mcp.json` servers	`agentConnectors[]`	URL and auth type autodetected
`commands/` (slash commands)	—	Not yet supported
`agents/` (sub-agents)	—	Not yet supported
`hooks/` (event handlers)	—	Not yet supported

Skills copy across verbatim and MCP servers map to agentConnectors. Slash commands, sub-agents, and hooks do not convert. Hold that thought — it matters for the "portability" claim later.

How a skill is loaded — the three-layer context model

The skill format is designed around a context budget, and understanding it is the difference between a skill that triggers reliably and one that quietly never fires. The system loads a skill in three layers:

Layer	When loaded	Target size
Frontmatter (name + description)	Always — at startup	~100 tokens
`SKILL.md` body	When the skill triggers	Under 5,000 tokens (1,500–2,000 words)
`references/`	On demand, by the agent	Unlimited
`scripts/`	Executed, not loaded into context	N/A

The frontmatter is always resident, so the description is doing real work on every conversation. The body loads only when the skill triggers. References are pulled in when the agent decides it needs them, and scripts are run rather than read into the window at all. Each skill can carry up to 20 companion files, 5 MB each, 10 MB total. Keep the body lean and push depth into references/.

One gotcha causes more failures than any other, and Microsoft says so directly: the folder name must match the name field in the frontmatter, exactly. skills/contract-analysis/SKILL.md with name: contract-analysis works. The same folder with name: ContractAnalysis does not. Kebab-case only — lowercase alphanumerics and single hyphens, no underscores, no leading or trailing or consecutive hyphens. If a skill never activates and you have ruled out the description, check this first.

The connector surface is widening fast

The May 2026 update makes the distribution-channel framing more concrete. Microsoft is shipping native integrations into Cowork — Fabric IQ with Power BI for data, and Dynamics 365 across sales, customer service, and ERP for scenarios like pipeline reviews, case resolution, and order approvals — alongside connectors to third-party systems including LSEG, Miro, monday.com, and S&P Global Energy. Cowork has also moved onto iOS and Android, so a delegated task can run in the cloud while you are away from your desk.

Two things follow for architects. The reach is broadening quickly, which strengthens the case for treating Cowork as a delivery surface rather than a destination. But the pace also reinforces the preview caveat: Microsoft describes itself as “still early and moving fast,” with capabilities rolling out continuously. A connector that exists this month is not a contract for next month — verify availability against the live docs before you design around any specific integration.

Governance and control

The governance story is reassuringly standard, because Cowork reuses the Microsoft 365 controls you already operate. Cowork inherits the signed-in user’s Entra identity and permissions — it can only reach files and mail the user can already reach. Admins get tenant-level allow and block lists, can deploy plugins on behalf of users, and can apply compliance policies. When a plugin is revoked, its skills and connectors are removed from the user’s session on the next sync; active conversations are not interrupted, but new ones lose the capability.

Two things to note on the current preview surface. Purview sensitivity labels are surfaced in responses and citations, showing the highest-priority label across the data used — useful, but a display behaviour rather than a full enforcement story yet. And the May 2026 update brought Agent 365 integration, which is how Microsoft intends Cowork to come under enterprise observability, security, and governance through a single control plane. That integration is the direction of travel; it is not a substitute for verifying what is actually auditable in your tenant today.

Where this falls short today

It is preview, and the documentation is explicit that capabilities may change. That is not a disclaimer to skim past. Do not build production process dependencies on a Frontier preview feature whose behaviour Microsoft reserves the right to alter. The correction on the false GA claims matters precisely because someone, somewhere, is about to wire a business process to this on the assumption that it has shipped.

Per-action approval is the right default and a real friction at the same time. Human-in-the-loop on every send and post is exactly what enterprise governance requires. It also caps the autonomy story. A "delegate and walk away" workflow that pings you eight times for approval is supervised execution, not delegation — and that is fine, as long as you are honest about which one you are buying. There is a "don’t ask again" option, scoped to the current conversation, and an "Approve All" for batching pending approvals; both move risk from the system to the user, and that trade should be a conscious choice, not a reflex click.

Skill triggering is probabilistic, not configured. The description field is how the agent decides whether to activate a skill, which makes activation reliability a prompt-engineering problem rather than a guarantee. The docs push explicit trigger phrases ("use when the user asks to…") for exactly this reason. Anyone who has run function-calling agents in production will recognise the failure mode immediately; in my own work, intermittent tool selection was consistently the hardest class of bug to reproduce, precisely because it was non-deterministic. Plan for it: write specific descriptions, name your connector tools explicitly in the skill body, and measure activation rather than assuming it.

The open standard cuts both ways. Adopting Agent Skills is genuinely good — portable skill text, no proprietary lock-in on the part that holds your domain logic. But the conversion is lossy today: slash commands, sub-agents, and hooks do not come across. And the manifest, the store review process, and the Enterprise Token Store are all Microsoft-specific. Portability of the skill text is not portability of the whole solution; the wrapper stays platform-bound even when the contents travel.

Finally, cost clarity is thin. Cowork requires a Microsoft 365 Copilot licence and Frontier enrolment, and the consumption implications of agentic execution at scale — an agent that plans, calls tools, and acts across many steps — are not yet documented. This is the question to take to your account team before any pilot grows into something people depend on.

What I’d do this month

Concrete and bounded. Enrol a sandbox tenant in the Frontier programme so you are evaluating on real infrastructure rather than reading about it. Write one OneDrive custom skill for a genuine recurring task — a weekly status roll-up, a standard document format — and then measure how reliably it triggers across a dozen real prompts, not how well it works once when you phrase the prompt perfectly. That single measurement tells you more about production-readiness than any feature list.

If you already maintain Claude Code skills, run the conversion script against one and inventory exactly what is lost — the slash commands and hooks you relied on will not survive, and it is better to know that now. And hold production dependencies until GA. Build familiarity, build a couple of skills, build an opinion. Do not build a process that breaks the next time Microsoft changes a preview behaviour.

Where Cowork sits in the bigger picture

Cowork is the most visible thing Microsoft has built on top of Work IQ, and that is the right way to read it. The execution capabilities will get the attention, but the durable architectural story is the combination underneath: an intelligence layer that grounds the agent, a multi-model engine, and an extensibility model that adopts an open standard and rides the existing Microsoft 365 distribution rails. For an architect, the question is not "what can Cowork do" — that list will keep changing through preview. It is "what is the unit of capability, and how does it travel." The answer, for once, is a portable, open-standard skill. That is worth studying now, even while the product around it is still moving.

References

June 12, 2026

Microsoft IQ: the intelligence layer your agents inherit — and what it actually changes for enterprise AI builders
For a few years now, every enterprise agent I built started the same way: from scratch. A new connector here, a handcrafted retrieval pipeline there, a fresh attempt to teach the agent what the business already knew. I built agent grounding the hard way more than once — custom RAG, bespoke chunking, my own identity plumbing — and watched two agents in the same tenant give two different answers to the same question because they had been grounded against two different copies of reality.

That is the production problem nobody puts on a slide. Context gets rebuilt per agent. Connectors sprawl. Answers drift. And the cost of all that plumbing lands on the same small group of engineers every time a new use case appears.

Microsoft IQ is Microsoft’s answer to that problem, and it reached general availability at Build 2026. This is the pillar post for the wider Microsoft IQ cluster on this blog: what each layer does, how they compose, and — just as important — where the GA reality differs from the keynote framing.

What Microsoft IQ actually is

Microsoft IQ is a shared, permission-aware intelligence layer that agents inherit, rather than rebuild. The pitch is simple: stop stitching connectors and pipelines into every agent, and ground them all against one governed view of how people work, how the business operates, and how to reuse knowledge.

It is worth being precise about what it is not. Microsoft IQ is not a model, and it is not a chatbot feature. It sits underneath the agents you build in Copilot Studio, Microsoft Foundry, or GitHub Copilot, and feeds them context. It is composed of four layers — Work IQ, Fabric IQ, Foundry IQ, and Web IQ — and any agent across those three build surfaces can consume them.

The “GA” label, though, is an umbrella, not a uniform guarantee. Each of the four layers shipped at a different stage of maturity, and anyone scheduling a deployment this quarter — picking what to commit to in the next planning cycle — needs that breakdown up front, not buried in the small print. So I will give each layer its own section, with the state it actually shipped in.

Work IQ — the workplace context layer

Work IQ is the contextual intelligence layer for Microsoft 365. It captures the signals that describe how people actually work — emails, meetings, documents, Teams messages, people relationships, and collaboration patterns — and exposes them so an agent can reason over them in natural language.

The capability ships as a CLI and a Model Context Protocol server today, which is how AI assistants such as GitHub Copilot reach into a user’s M365 context. The broader public Work IQ APIs — REST, A2A, and MCP — are slated to reach GA on 16 June 2026.

The one thing an architect should know: as of writing, Microsoft Learn still labels Work IQ “public preview”, and accessing organisation data requires admin-consented permissions and tenant billing activation. Treat the GA date as imminent rather than banked, and plan the admin-consent step into your rollout — it is not a developer-self-serve switch.

Fabric IQ — the business-data semantic layer

Fabric IQ is the semantic layer over your business data. It elevates raw analytical, real-time, and operational data in OneLake into the language of the business — entities, relationships, rules, and actions — so that agents reason in terms of Customer, Shipment, or Breach rather than table columns.

It delivers this through two core items: semantic models and an ontology. And here is the GA nuance Microsoft’s umbrella headline glosses over — the Fabric IQ ontology is in preview. Learn marks it “ontology (preview)” consistently. The semantic-model side is more mature, and ontologies can be generated directly from Power BI semantic models already running in production, which is the realistic on-ramp for most estates that have years of Power BI behind them.

The one thing an architect should know: you can bootstrap a Fabric IQ ontology from an existing Power BI semantic model, keeping business terminology consistent across reports, agents, and apps — but treat the ontology itself as preview-grade until Microsoft says otherwise.

Foundry IQ — the managed knowledge layer

Foundry IQ is the layer that replaces the most plumbing, so it earns the most depth here. It turns fragmented enterprise content into governed, reusable knowledge that multiple agents can share.

The model has three concepts worth learning precisely. A knowledge base is the top-level resource that orchestrates retrieval and carries a retrieval reasoning-effort setting of minimal, low, or medium. It is composed of knowledge sources — connections to indexed or remote content such as Azure Blob Storage, SharePoint, OneLake, the web, MCP, Azure SQL, and File Search. And it is queried through agentic retrieval: an LLM plans the query, decomposes a complex question into parallel subqueries across sources, semantically reranks the results, and returns extractive answers with citations the agent can trace.

Azure AI Search provides the underlying infrastructure. Crucially, a knowledge base is shareable: one knowledge base can ground many agents, and those agents can run in Foundry Agent Service, the Microsoft Agent Framework, or any custom app via the Azure AI Search knowledge base APIs. At Build 2026, Microsoft positioned Foundry IQ knowledge bases as the unifying point — bringing Work IQ, Fabric IQ, File Search, Azure SQL, and MCP behind a single, SLA-backed retrieval endpoint.

Now the candour. Foundry IQ’s GA is uneven, and Microsoft says so on the concept page itself: some features are generally available while others remain in preview, and which is which depends on the Search Service REST API version you call. The same page notes that the Foundry portal and Azure portal still expose all agentic retrieval features as preview-only. So “Foundry IQ is GA” is true and incomplete at the same time — the answer depends on how you call it.

The one thing an architect should know: before you plan a production rollout, pin down which Search Service REST API version your code targets, because that single choice determines whether you are on a GA or a preview surface.

Web IQ — the live web-grounding layer

Web IQ is the newest layer, announced at Build 2026. It is web grounding rebuilt for LLMs and multi-step agents: a suite of AI-native APIs returning ranked, citation-ready context across web pages, news, images, and video, built on two decades of Bing infrastructure rather than SERP scraping.

The engineering numbers are the headline. Microsoft claims roughly 164ms P95 latency — close to 2.5x faster than the best alternative — with fewer tokens per query. It is model-agnostic and MCP-native over JSON-RPC 2.0, so there is no inference lock-in, and it is benchmarked against suites including DeepSearchQA.

The one thing an architect should know: Web IQ is limited access and waitlist-only today, prioritised for enterprise customers working with Microsoft account teams. If live web grounding matters to your roadmap, the action this month is to join the waitlist, not to design around guaranteed availability.

How the layers compose

The architecture is cleaner than the four-product naming suggests. The four IQ layers feed a shared intelligence layer that any agent — GitHub Copilot, a Foundry agent, or a Copilot Studio agent — consumes. Foundry IQ knowledge bases act as the retrieval hub that the others can flow through. Cutting across all of it is governance: queries run under the caller’s Microsoft Entra identity, ACLs synchronise for supported sources, and Microsoft Purview sensitivity labels are enforced end to end.

Microsoft IQ architecture. Work IQ, Fabric IQ, Foundry IQ, and Web IQ feed a shared intelligence layer that agents inherit, with Entra identity and Purview labels as a cross-cutting governance band. (Diagram placeholder — Mermaid source in the image plan.)

The retrieval path inside Foundry IQ is the part most worth internalising, because it is where the “no complex RAG” claim meets reality: a query is planned by an LLM, decomposed into parallel subqueries across sources, semantically reranked, and returned as a cited answer.

Foundry IQ agentic retrieval. Query, LLM query planning, parallel subqueries across knowledge sources, semantic rerank, cited answer. (Diagram placeholder — Mermaid source in the image plan.)

Where this falls short today

“No complex RAG” is a marketing claim, not an engineering fact. Foundry IQ genuinely removes connector and pipeline plumbing, and that is real value. It does not remove the need to understand chunking behaviour, to evaluate retrieval quality, or to model cost. You are not maintaining the pipeline; you are still responsible for whether it returns the right thing at a price you can defend.

The GA label is uneven. Microsoft IQ is GA as an umbrella, but Fabric IQ’s ontology is preview, Web IQ is waitlist-only, and Foundry IQ’s GA depends on which Search REST API version you call. If you are signing off a production plan, build that table honestly: per layer, what is GA, what is preview, what is gated.

Microsoft has promised programmable enterprise context before. Microsoft Graph was supposed to be this a decade ago. Semantic Kernel memory abstractions and earlier Copilot extensibility models each hit walls — latency, schema drift, identity resolution. What is different this time is a retrieval-planning layer doing the grounding and GA APIs rather than perpetual preview. That difference is meaningful. It is also unproven at production scale, and I would hold both thoughts at once.

Lock-in is the trade. An intelligence layer this deep couples your agent estate to the Microsoft stack. For organisations already on M365, Fabric, and Dynamics, that is leverage of investment you have already made. For hybrid estates, it is a strategic decision to make deliberately, not a default to drift into.

Pricing and billing clarity is thin at GA. Work IQ requires tenant billing activation, and the consumption models across the layers are not yet fully documented. Before you commit a budget, get your Microsoft account team to put the per-layer billing model in writing — that is the gap most likely to surprise you later.

What I would do this month

Concrete, role-aware next steps if you own enterprise agent architecture:
- Pin the Search Service REST API version your Foundry IQ code targets, and confirm whether that puts you on a GA or a preview surface before anything reaches production.
- Plan for the Work IQ API GA on 16 June 2026 — including the admin-consent and tenant-billing steps — rather than assuming developer self-serve.
- Bootstrap a Fabric IQ ontology from an existing Power BI semantic model in a sandbox, so you learn the preview behaviour without betting a production workload on it.
- Join the Web IQ waitlist now if live web grounding is on your roadmap, because availability is gated and lead time is unknown.
- Ask your account team for the per-layer consumption and billing model in writing before you size a budget.
Where this cluster goes next

This pillar is deliberately broad. Each layer deserves its own engineering deep-dive, and those are coming — starting with Foundry IQ versus a do-it-yourself RAG pipeline, the comparison I get asked about most. Fabric IQ ontologies, the Work IQ API surface, and Web IQ’s grounding economics will each get their own treatment. When those land, this post will link out to them from the sections above.

The short version: Microsoft IQ is the first time the “shared enterprise context” promise has arrived with a retrieval-planning layer and real GA APIs behind it. That is worth taking seriously. It is also worth reading the GA label one layer at a time.

References
Image credits

The layer diagrams in this post are reused from the Microsoft IQ product page with attribution to Microsoft:
- Banner and the four layer illustrations (Work IQ, Fabric IQ, Foundry IQ, Web IQ): Microsoft IQ
The two architecture diagrams are my own. All other commentary, code, and opinions in this post are my own and reflect lessons from building enterprise agent grounding the hard way.
June 10, 2026
Starting an Azure Foundry project — the getting-started guide nobody wrote
Banner image: Microsoft Foundry. Source: Microsoft Tech Community — Introducing Microsoft Foundry.

Most “getting started with Foundry” content is a screenshot tour of the portal. You watch someone click “Create resource,” pick a region from a dropdown, and end the post with a chat playground saying “Hello, world.” None of that helps you on Monday morning when you have to commit to a region, an auth pattern, and a project topology that you’ll be living with for the next year.

This is the post I wish I’d had open in another tab when I started TrafficIQ, our multi-agent supply-chain transport intelligence build on Foundry Agent Service. Five decisions you make before you click Create, the auth pattern you should adopt from day one, a first-sprint checklist, and the three things that will bite you.

1. The naming maze — what Foundry actually is in 2026

Eighteen months ago you had four products: Azure OpenAI, Azure AI Studio, Azure AI Services, and a sprawling Cognitive Services back catalogue. Today you have one Azure resource type — kind: AIServices with allowProjectManagement: true — and Microsoft calls it Microsoft Foundry (formerly Azure AI Foundry). Single resource, single ARM object, and three FQDNs hanging off it: the Azure OpenAI-compatible inference endpoint, the cognitive-services endpoint, and the Foundry project endpoint your agents and Responses API code talks to.

There are also two portals. Foundry (classic) is the hub-based experience that grew out of Azure AI Studio. Foundry (new) is the project-first experience built around the consolidated resource. Both still work. Classic is in maintenance mode. If you are starting a new project in 2026, start in the new portal and create a Foundry project — not a hub project. Hub projects still exist for backwards compatibility, but everything Microsoft is investing in — agent service, evaluations, the new model catalogue, observability — is wired up around Foundry projects first.

One more piece of context before you create anything: the Assistants API retirement deadline of 26 August 2026 is real. If you are building anything new today, do not start on Assistants — go directly to Foundry Agent Service and the Responses API. I’ll cover the migration path in a dedicated post; for now, treat Assistants as legacy.

Microsoft Foundry resource and project architecture. Source: Microsoft Learn — Microsoft Foundry architecture.

2. The five decisions you make before you click Create

2.1. Foundry resource vs upgrading an existing Azure OpenAI resource

Decision: create a brand-new Foundry resource, or upgrade an existing Azure OpenAI resource in place. Trade-off: the in-place upgrade keeps your existing endpoint, deployments, network config, and RBAC bindings — but it requires a system-assigned managed identity on the source resource and is one-way once you commit (rollback exists but is a support operation, not a button).

For TrafficIQ: new resource. The repo was greenfield, I wanted a clean project boundary, and I didn’t want to inherit eighteen months of ad-hoc role assignments from the old Azure OpenAI resource.

2.2. Region

Decision: which Azure region hosts the resource. Trade-off: model availability is not uniform. Sweden Central, East US 2, and France Central each have meaningfully different model catalogues, and frontier models often land in one region weeks before the others. Pick the wrong region and you’ll either rewrite code against a different deployment or pay cross-region latency. For TrafficIQ: Sweden Central. TrafficIQ shipped on gpt-4.1 and gpt-4.1-mini, and Sweden Central was the region that aligned with both the model availability I needed and my EU data-residency obligations. Starting fresh today, I’d still default to Sweden Central but I’d evaluate gpt-5-mini for the router/orchestrator.

2.3. New portal vs classic portal

Decision: which portal you do your work in. Trade-off: classic gives you hub projects (good if you have an existing hub and shared compute), new gives you Foundry projects (better isolation, simpler RBAC, where all the new features land first).

For TrafficIQ: new portal, Foundry project. No hub.

2.4. Single project vs multiple projects per resource

Decision: how many projects to carve out of one Foundry resource. Trade-off: projects are the isolation and RBAC boundary in Foundry — a project owns its agents, threads, evaluations, connections, and the people who can see them. One project is simpler; multiple projects are how you separate prod from dev, or two workloads that should never see each other’s data.

For TrafficIQ: I started with a single project and split as soon as evaluations grew enough to need their own connections and quotas. The pattern I’d recommend day one: two projects per environment — one for the agent runtime, one for evaluations and offline experiments — and prod in a separate Foundry resource entirely from non-prod, so a misconfigured RBAC binding can never reach production data.

2.5. Direct Foundry-billed models vs Azure Marketplace third-party models

Decision: how you procure non-OpenAI models — Anthropic, Cohere, Mistral, Meta, and the rest. Trade-off: direct (first-party in the Foundry catalogue, billed on your Azure invoice, full enterprise SLA, no separate contract) versus Azure Marketplace (third-party publisher, often the only way to get the very latest version of a partner model, but it’s a separate offer you have to accept and the billing line lands differently).

For TrafficIQ: direct for everything I could, marketplace only where a specific model version wasn’t available first-party. One Azure invoice is worth real money in procurement time.

3. Authentication and authorisation — the day-one setup

If you take one thing from this post, take this: don’t use API keys. Foundry resources support Entra ID (Azure AD) authentication everywhere, and DefaultAzureCredential from azure-identity is the right pattern from day one. Keys feel quick on day one and become a rotation, secrets-sprawl, and audit nightmare by month three.

The pattern I use in TrafficIQ, lifted down to its essentials:
from azure.identity import DefaultAzureCredential from azure.ai.projects import AIProjectClient # DefaultAzureCredential walks an ordered chain: # env vars -> managed identity -> Azure CLI -> VS Code -> interactive # Same line of code works locally, in CI, and in production. credential = DefaultAzureCredential() project = AIProjectClient( endpoint="https://<your-foundry-resource>.services.ai.azure.com/api/projects/<project-name>", credential=credential, ) # Now you can use Agents, Responses, evaluations, connections — # all authenticated as the principal the host environment provides. agents = project.agents
There are three roles you’ll actually find yourself assigning in the first week. Microsoft renamed these in the last release wave; both old and new names still appear across the portal and docs during the rollout, but the new names are what you should write into runbooks.
- Foundry User (formerly Azure AI User) — read/use existing agents, run inference, call the Responses API. This is the role for your application’s managed identity in production, and for engineers who consume but don’t author. Role ID: 53ca6127-db72-4b80-b1b0-d745d6d5456d.
- Foundry Project Manager (formerly Azure AI Project Manager) — create and modify agents, manage connections, deploy models into the project. The role for developers actually building. Role ID: eadc314b-1a2d-4efa-be10-5d325db5065e.
- Foundry Account Owner (formerly Azure AI Account Owner) — resource-level operations like creating new Foundry resources and configuring guardrails. The elevated tier. Don’t grant casually.
Two practical notes. In Azure CLI and Bicep, use the role definition GUIDs, not the names — names are still mid-rename and the GUIDs are stable. And don’t grant any role that starts with “Cognitive Services” for Foundry work. The Microsoft Learn RBAC doc explicitly calls these out as not applicable to Foundry, even though Foundry sits on the Microsoft.CognitiveServices provider under the hood.

Foundry User role (formerly Azure AI User), scoped at the Foundry resource. Source: Microsoft Learn — RBAC for Microsoft Foundry.

In production, the application principal is a managed identity — a user-assigned managed identity attached to your App Service, Container App, AKS workload identity, or Function. App registrations with client secrets are for local development and headless CI/CD only. If you find yourself putting an app registration secret on a production workload, you’ve taken a wrong turn — go back and attach a managed identity instead.

Secrets that genuinely have to exist — third-party API keys, database connection strings, anything that isn’t a Foundry credential — live in Azure Key Vault and are injected at build time, not runtime where possible. TrafficIQ uses a Vite Key Vault plugin pattern for the frontend so that the bundle never contains a literal secret and the build agent’s managed identity is the only thing that ever touches the vault.

One last thing the docs bury and I wish someone had said louder: private endpoints are the most-forgotten production step, and you have to recreate them after an in-place upgrade from Azure OpenAI to Foundry. The upgrade preserves most of your network configuration, but private endpoints targeting the new Foundry sub-resources need to be re-provisioned, and DNS will be wrong until you do. Put it on the upgrade runbook.

Network isolation plan for Microsoft Foundry. Source: Microsoft Learn — Configure network isolation for Microsoft Foundry.

4. The first sprint — a working checklist

In order. One line on what to do, one line on the trap.
1. Create the Foundry resource. Use kind: AIServices, allowProjectManagement: true, system-assigned managed identity on. Trap: if you let someone create it as a vanilla Azure OpenAI resource “for now,” you’ll be doing an upgrade migration in week three.
2. Create the first Foundry project. Give it a name that survives renames — <workload-<env works. Trap: project name is in the endpoint URL, so renaming later means client config changes everywhere.
3. Assign roles, not keys. Azure AI Project Manager for builders, Azure AI User for the app’s managed identity. Trap: don’t grant subscription-level Contributor “just to unblock the demo” — it never gets revoked.
4. Set up Key Vault and managed identity. One vault per environment, user-assigned managed identity attached to your compute. Trap: system-assigned MIs disappear when you delete the compute resource; use user-assigned for anything you care about.
5. Deploy a model. A reasonable default in 2026: gpt-5-mini for router/orchestrator agents and gpt-4.1 for specialists with heavier tool-calling. Trap: model availability is regional — check the catalogue in your target region before you write code against a specific deployment name.
6. Wire a connection for any external data source. Foundry “connections” are the project-scoped credential store for storage accounts, search indexes, and tools. Trap: connections live inside the project — copy them when you split prod from dev, don’t share.
7. Call the Responses API from a smoke-test script. AIProjectClient → get inference client → responses.create. Trap: if you copy a sample using the legacy chat-completions endpoint, you’ll miss the new tool-calling and reasoning surface entirely.
8. Stand up your first agent in Foundry Agent Service. Tools, instructions, model — keep it boring. Trap: don’t start with a mega-agent; start with one narrow agent and add a second before you make the first one cleverer.
9. Turn on Guardrails and review the defaults. They are on by default at “medium” across categories. Trap: defaults block legitimate enterprise content — see Section 5.
10. Wire up observability before you ship. Application Insights connection on the project, distributed tracing through opentelemetry, Foundry’s built-in run/thread tracing on. Trap: adding observability after the fact is two orders of magnitude harder than turning it on now.
5. The three things that will bite you in the first sprint

Quota. Tokens-per-minute (TPM) and requests-per-minute (RPM) limits are per-deployment and per-region, and the default quota you get on a fresh subscription is sized for demos, not production. The day you flip a real workload on, you will hit 429s. Mitigations: request quota increases early (the form is slow), spread deployments across multiple regions if your latency budget allows, and put Provisioned Throughput Units (PTU) under anything customer-facing where you cannot tolerate rate-limit jitter.

Guardrails (formerly content filters). Foundry’s Guardrails system is on by default with sensible consumer settings — and it will block legitimate enterprise content. Customer-complaint emails trip the harm filter. Security logs trip the violence filter. Code review of an exploit-handling library trips multiple. You can tune controls per-model and per-agent under Guardrails in the portal, define custom guardrails with their own controls, and apply them at four intervention points: user input, tool call, tool response, and output (the final completion returned to the user). Audit the defaults the day you deploy your first model, not the day a business user shows you a screenshot of a blocked legitimate prompt.

Observability. Foundry exposes distributed traces, per-run token accounting, evaluation hooks, and a thread/run viewer in the portal — but only if you wire it up. Wire it up on day one. The cost of adding tracing to a quiet new system is an afternoon. The cost of adding tracing to a live multi-agent system with real users is a sprint and a half, plus the customer trust you spend debugging the bug you can’t see.

6. When NOT to use Foundry

I’m bullish on Foundry, but it isn’t the answer to every question.

If you have exactly one OpenAI model in production and a stable PTU reservation on it, defer the upgrade. The in-place upgrade is non-trivial, and you get nothing from it if you aren’t using agents, evaluations, or the broader catalogue. Revisit when one of those becomes a “yes.”

If you need offline or on-device inference — air-gapped environments, edge devices, sub-10ms latency budgets — you want Foundry Local, not cloud Foundry. Same model story, very different deployment shape, and trying to make cloud Foundry pretend to be local will end badly.

If you have a price-sensitive, non-enterprise workload with no Entra or Azure compliance requirement — a side project, a hobby tool, a community OSS app — going direct to OpenAI’s or Anthropic’s API is still cheaper and operationally simpler. Foundry’s value is enterprise: SSO, RBAC, private networking, compliance attestations, one invoice. If you don’t need those, you’re paying for them anyway.

7. Closing — and what’s next

Foundry rewards a small amount of up-front thinking. Pick the region for the models you actually need. Use Entra and managed identities from line one of code. Multi-project from the start if you’re going to run more than one environment. Turn on observability before the first user hits the first endpoint. Re-do your private endpoints after any upgrade. Most of the pain I see on Foundry projects is pain that comes from skipping one of those.

Two follow-ups coming next on this blog: Foundry Agent Service migration from the Assistants API (with code from TrafficIQ) and an authentication-patterns deep-dive that goes well past DefaultAzureCredential into workload identity federation, on-behalf-of flows, and the per-environment role assignments I actually deploy. Subscribe if that’s useful — I’ll link them here as they go live.

Image credits

Diagrams in this post are reused from Microsoft Learn with attribution to Microsoft:
- Section 1 — Foundry resource architecture: Microsoft Foundry architecture
- Section 3 — Azure AI User role scope: RBAC for Microsoft Foundry
- Section 3 — Network isolation plan: Configure private link for Foundry
- Section 4 — Agent components: What is Microsoft Foundry Agent Service?
All other commentary, code, and opinions in this post are my own and reflect lessons from building TrafficIQ.
May 20, 2026
Why I built 6 agents instead of 1 mega-agent — lessons from TrafficIQ
I had two design choices for TrafficIQ: one super-agent holding 56 tools, or six specialist agents sharing them. I picked six. Here is what the one-agent path gets right, where it breaks, and the six lessons I took into production.

TrafficIQ went on to win Best Use of Microsoft Foundry at the AI Dev Days Hackathon — chosen from 401 projects and 2,041 registrants. The architecture choices below are what made that possible, and what I would actually defend in front of an enterprise architecture review board.

Why one-agent is genuinely tempting

The one-agent design is the simpler mental model. One assistant. One system prompt. One thread. One place to debug.

When you are sketching the first prototype, this is almost always the right move. Orchestration is not free — you have to write a router, define handoff contracts, manage cross-agent state. Skipping all of that gets you to a working demo in an afternoon. Most enterprise teams default here, and for a 10-tool assistant, they are right to.

The trouble starts later. It starts when the surface area grows past what a single model can hold in its head.

Where one-agent breaks

In my experience tool-selection accuracy degrades non-linearly past around 15 to 20 tools. The model does not fail loudly. It fails subtly. It picks get_shipment_status when the user clearly needed check_shipment_status, because the names overlap and the descriptions rhyme. It calls track_shipment when the right answer was get_proof_of_delivery.

The system prompt becomes the second symptom. To compensate for the confusion, you add disambiguation rules. “Use tool X only when the user mentions Y.” The prompt grows. By the time you have 40 tools, you are nursing a 4,000-token monolith that nobody on the team wants to touch.

And then there is context-window pressure. Every tool’s JSON schema, every parameter description, every example — it all lives in the agent’s context on every turn. With 56 tools, that alone is enough to crowd out the actual conversation.

A super-agent does not just get slower. It gets less correct. The failure mode is “looks plausible, called the wrong tool.”

The architecture I chose

Six specialist agents, each with a tight tool set scoped to its domain. One orchestrator on top. One router inside the orchestrator. GPT-4.1 under each agent. The whole orchestration layer is built on the Microsoft Foundry SDK — the MultiAgentOrchestrator, the specialists, and the RouterAgent are all SDK-native, using the Foundry Assistants pattern (agent, thread, message, run) end to end.

TrafficIQ multi-agent architecture — 6 specialist agents and the orchestrator.

The split is the part most people skip past, so it is worth being concrete:
- Traffic Agent — 17 tools. Routing, journeys, incidents, reroutes, weather, POI, isochrone, snap-to-road.
- Supply Chain Agent — 11 tools. Shipments, deliveries, inventory, ETAs, KPIs, proof of delivery. Backed by D365 F&O via the MCP Server.
- Fleet Agent — 7 tools. Vehicle positions, driver performance, health, maintenance.
- Operations Agent — 7 tools. Work orders, technician availability, schedule optimisation, returns.
- Field Service Agent — 7 tools. Service requests, customer assets, SLAs, dispatch, parts.
- IoT & Logistics Agent — 7 tools. Device health, geofences, driving behaviour, connectivity, batch route alternatives.
Plus 2 shared tools (navigate_to_page, show_input_form) that every agent can call. That is 56 tools total, none of which any single agent actually has to reason over.

Coordination sits in a MultiAgentOrchestrator. It runs a three-tier router: sticky → keyword → LLM classifier (the RouterAgent). Each specialist holds its own Foundry thread so its context stays clean. The orchestrator handles handoff when the user pivots from one domain to another.

Broader TrafficIQ architecture — agents, MCP, Azure services, Dataverse.

The rest of this post is the six lessons that fell out of building it.

Lesson 1 — route in tiers, not in one LLM call

The naive multi-agent router is “ask GPT which agent should handle this.” It works. It is also slow and expensive on every single turn, including the easy ones.

I run three tiers in order. First, sticky: if the user is mid-thread with the Supply Chain Agent and the next message is “and the one after that?”, stay put. Conversations are usually continuous. The default should be continuity, not re-evaluation.

Second, keyword. Each agent registers a small set of high-signal terms — “shipment”, “warehouse”, “geofence”, “technician”. A keyword match is effectively free. For roughly the queries you would expect — the obvious ones — this resolves the routing decision in microseconds with no token spend.

Only when both tiers miss do I fall back to the LLM classifier. That is the RouterAgent, and it is the only model call dedicated to routing. The result is a router that is fast on the common path, accurate on the ambiguous one, and cheap in aggregate. Putting the cheap checks first is the entire trick.

Lesson 2 — each agent owns its own thread

This one took me a while to land on, and I think it is the most underrated decision in the whole architecture.

The obvious approach is to share a single conversation thread across all agents, and have the orchestrator switch which agent reads from it. Do not do this. It is the worst of both worlds. Each agent now sees every tool’s history, including tools it does not own. The tool-set bleed contaminates selection. You also get token bloat: every agent re-reads the entire shared history on every run.

In TrafficIQ each specialist owns its own thread via the Microsoft Foundry SDK. The Supply Chain Agent’s thread only ever contains Supply Chain turns. Its tool schemas, its system prompt, its prior tool calls — none of it touches the Fleet Agent’s context. Each agent is, effectively, a tightly scoped assistant that does not know the others exist. The SDK’s thread primitive is what makes that isolation cheap to enforce.

The orchestrator is the only component that knows there are multiple agents. The agents themselves are blissfully ignorant. That isolation is what makes them stay accurate as the system grows.

Lesson 3 — context handoff is the hard problem, not routing

Once you have isolated threads, the next question is the obvious one: what happens when the user pivots? “What’s the ETA on that shipment?” — Supply Chain handles it. Then: “And dispatch a tech to the warehouse.” — that is Field Service, and Field Service has no idea what “that shipment” refers to.

You cannot dump the entire Supply Chain thread on Field Service. That would re-introduce every problem isolated threads were meant to solve. You also cannot hand over nothing — the user is mid-thought and expects continuity.

What I settled on is a small, deliberate handoff payload: a summary of the last N messages from the source agent, written into the destination agent’s thread as a context message before the user’s new turn lands. Enough grounding to resolve “that shipment”. Not enough to confuse tool selection. The summary is generated by the same Azure OpenAI deployment the agents use, with a tight system prompt — give me entities, IDs, and the last user intent. No prose.

Routing gets the headlines. Handoff is what actually breaks in production if you get it wrong.

Lesson 4 — tools must be MECE within an agent, not across all agents

MECE — mutually exclusive, collectively exhaustive. It is the rule I borrowed from consulting, and it is the cleanest way to think about tool design in a multi-agent system.

Across the whole platform, similar-sounding tools exist. Traffic’s plan_journey and Supply Chain’s optimize_delivery_route both compute routes. That is fine. They live in different agents and serve different intents — a personal commute is not a multi-stop delivery plan. The router decides which world the user is in. The agent never has to choose between them.

The rule that actually matters: within one agent, no two tools should be confusable. The Traffic Agent has 17 tools, and I spent more time on their names and descriptions than on any other part of the system. get_traffic_incidents queries an area. monitor_saved_journey watches a specific route. suggest_reroute triggers a recompute. Different verbs, different objects, no overlap.

If you cannot explain to a junior engineer in one sentence what makes two tools different, the model will not get it right either.

Lesson 5 — make agents observable from day one

You cannot debug a multi-agent system from the response text alone. You need to see which agent answered and which tool fired. So the chat panel in TrafficIQ shows both.

TRAFI chat panel with agent badges and tool-call indicators.

Every message carries an agent badge — colour-coded per domain. Every tool call streams in real time as a small inline indicator: tool name, parameters, status. When something looks off, I can see immediately whether the routing was wrong, the tool selection was wrong, or the tool itself returned bad data. Three different failure modes, three different fixes, and you cannot tell them apart without the visibility.

This is not UI polish. I would argue it is the single most important user-trust feature in the product. Users are sceptical of agents — rightly. When they can see “Supply Chain Agent → check_shipment_status → D365 F&O”, the agent stops being a black box. It becomes a transparent process they can audit.

Build the observability before you build the second agent. You will need it the moment routing decisions start mattering.

Lesson 6 — ground on enterprise data, not the LLM’s memory

Every tool in TrafficIQ resolves against a real system of record. D365 F&O via the MCP Server for shipments, inventory, work orders. Azure Maps for routing, traffic, weather, POI. Azure IoT Hub for device health and telemetry. Dataverse for application state.

The agents never “remember” entities. They look them up. If the user asks about shipment SH-10042, the agent does not summarise what it thinks it knows — it calls check_shipment_status and reads the live record. If GPT-4.1 hallucinates an ETA, the tool result overwrites it.

That single discipline is what separates a hackathon demo from something an enterprise IT team can own. The model is the reasoning surface. The tools are the truth surface. Keep them strictly separated and the agent’s answers become defensible, auditable, and — most importantly — refreshable when the underlying data changes.

What I would do differently next time

Two honest ones.

First, I would build the router evaluation harness before writing the router. I built it last. I now have a CSV of representative queries with the expected target agent, and it runs as a test suite — but I had to retrofit it after the architecture was already set. If I had started with the eval, I would have caught two keyword collisions weeks earlier.

Second, I would put a hard token budget on per-agent system prompts from day one. The Traffic Agent’s prompt drifted from 600 tokens to nearly 1,400 over the course of the build, because every new tool came with “and remember to use this when…” instructions. A budget forces the discipline of writing better tool descriptions instead of patching the prompt. Treat the system prompt like a constitution, not a notepad.

Closing

The headline is small but the implication is large: when a single agent’s tool surface grows past where its selection accuracy holds, the answer is not a smarter prompt. It is a smaller agent.

Six specialists with clear scopes, isolated threads, tiered routing, MECE tools, visible execution, and grounded data — that is the recipe that survived production hardening in TrafficIQ. None of it is exotic. All of it is boring engineering applied carefully.

If you want to see the code, the TrafficIQ repo is on GitHub. The Microsoft winner announcement is here. And the full demo video walks the router, the handoffs, and the tool execution in real time.

TrafficIQ operational dashboard.
May 18, 2026
Building LocalRAG — a fully local AI document search
LocalRAG is a fully local Retrieval-Augmented Generation application I built to answer one question: how much of a useful enterprise RAG can you run without sending a single byte to a cloud LLM?

The problem

Most “build a chatbot over your documents” tutorials assume an OpenAI key, a managed vector database and a cloud orchestrator. That’s fine for prototypes — and a dead end the moment you talk to a customer in regulated banking, healthcare or government. They want answers on their data, on their hardware, with no egress.

The shape of the solution

LocalRAG uses local Ollama models for both embeddings and generation, FAISS for the vector index, and a content-type-aware ingestion pipeline that handles PDF, DOCX, CSV, Excel, XML and images. Everything runs on a laptop. The full demo is on YouTube.
- Ingestion: multi-format extractors that preserve enough structure to chunk intelligently — tables stay together, lists stay together, headings become metadata.
- Indexing: FAISS index with content-type tags so retrieval can prefer the right shape of content for the question.
- Retrieval: semantic top-k with rate-limited retries and a simple fallback when a model is overloaded.
- Generation: a local Ollama model with grounded prompts and source citations.
What I’d do differently next time

Two things. First, evaluation should be a first-class subsystem from day one, not bolted on later — even a small golden-question set saves you from regression panic during refactors. Second, content-type awareness is more important than fancy reranking; a boring extractor that respects document structure beats a clever reranker that received bad chunks.

Repo: github.com/PowerAI-Labs/LocalRAG. Feedback and PRs welcome.
May 16, 2026