The Agent-to-Agent (A2A) protocol has opened up incredible possibilities for AI agent communication and collaboration across different platforms. However, there’s been a significant limitation that has puzzled many developers: why can’t we run multiple A2A agents on the same server?
Until now, deploying multiple A2A agents meant spinning up separate servers or using different ports for each agent, creating unnecessary complexity and resource overhead. But what if I told you there’s an elegant solution that changes everything?
The Problem: One Server, One Agent Limitation
Traditionally, A2A implementations have been constrained by a “one agent per server” model. This limitation stems from how agents are typically served and discovered within the A2A ecosystem. Developers who wanted to deploy multiple specialized agents had to:
- Set up multiple servers – increasing infrastructure costs
- Use different ports – complicating network configuration
- Manage separate deployments – multiplying maintenance overhead
The Solution: Path-Based Agent Routing
The breakthrough comes from a surprisingly simple yet powerful concept: serving multiple agents through unique URL paths on a single server. Instead of dedicating entire servers to individual agents, we can host them all under different paths.
Here’s how it works in practice:
Multiple Agents, One Host
Imagine running three different agents on the same server:
| Agent Type | Purpose | Agent Card URL |
|---|---|---|
| Conversational Agent | General conversation and Q&A | http://localhost:8000/a2a/conversation/agent-card.json |
| Trending Topics Agent | Real-time trend analysis | http://localhost:8000/a2a/trending/agent-card.json |
| Analyzer Agent | Data analysis and insights | http://localhost:8000/a2a/analyzer/agent-card.json |
Each agent maintains its own unique identity and capabilities while sharing the same underlying infrastructure.
Technical Innovation: The Architecture Behind It
The solution leverages FastAPI’s powerful routing capabilities combined with the A2A SDK to create a multi-tenant agent hosting platform. Here’s what makes it work:
Smart URL Routing
Each agent is mapped to a specific URL path pattern (/a2a/{agent_name}/), allowing the server to route requests to the appropriate agent handler while maintaining A2A protocol compliance.
Context Isolation
Despite sharing the same server, each agent maintains its own context and conversation state, ensuring no cross-contamination between different agent interactions.
Resource Optimization
By sharing the same Python process and server resources, this approach significantly reduces memory footprint and startup time compared to running separate server instances.
Real-World Benefits
1. Cost Efficiency
- Reduced Infrastructure: One server handles multiple agents
- Lower Resource Usage: Shared memory and CPU resources
- Simplified Deployment: Single deployment pipeline
2. Operational Simplicity
- Unified Monitoring: All agents under one roof
- Centralized Logging: Easier debugging and maintenance
- Single Configuration: One environment setup for all agents
3. Scalability
- Horizontal Scaling: Add more agents without new servers
- Load Balancing: Distribute traffic across agent types
- Flexible Resource Allocation: Adjust resources based on actual usage
Getting Started: Your First Multi-Agent Server
Setting up your own multi-agent A2A server is surprisingly straightforward:
Creating Three Agents with the [google-adk](https://google.github.io/adk-docs/) Library
Conversation Agent
from google.adk.agents import LlmAgent
from google.adk.tools import google_search
from a2a.types import AgentCapabilities, AgentCard, AgentSkill, TransportProtocol
CONVERSATION_AGENT_INSTRUCTIONS = """
You are a Conversation Agent Enhanced with Web Search Capabilities.
## Core Behavior:
- Be conversational, friendly, and helpful
- Provide accurate, relevant, and well-structured responses
- Maintain context throughout the conversation
- Ask clarifying questions when user intent is unclear
- Admit when you don't know something and offer to search
## When to Use Web Search:
1. Current events or time-sensitive info
2. Precise, up-to-date facts
3. Latest technical details
4. Local information
5. Verification of uncertain info
6. Specialized topics needing expert sources
## Search Strategy:
- Use specific queries and authoritative sources
- Cross-reference results
- Distinguish between your knowledge and searched info
- Attribute sources when relevant
## Response Guidelines:
1. Direct answers first
2. Break down complex topics
3. Provide examples
4. Offer multiple perspectives
5. Suggest follow-ups
## Information Quality:
- Prioritize accuracy
- State confidence levels
- Warn about outdated info
- Suggest multiple sources for key decisions
- Fact-check critical points
## Conversation Management:
- Retain and build upon previous context
- Transition topics smoothly
- Match tone to user style
- Respect preferences
## Limitations and Transparency:
- Be honest about capabilities
- Explain when search might help
- Acknowledge incomplete info
- Suggest alternative resources
- Respect privacy
## Best Practices:
- Stay respectful and professional
- Avoid bias
- Use proactive search
- Structure answers clearly
- End with an offer to assist further
"""
def get_conversational_agent(model: str) -> LlmAgent:
return LlmAgent(
model=model,
name="conversational_agent",
description="An AI assistant that enhances conversations with live web search when needed.",
instruction=CONVERSATION_AGENT_INSTRUCTIONS,
tools=[google_search],
)
def get_conversational_agent_card(agent_url: str) -> AgentCard:
return AgentCard(
name="Conversational Agent",
description="Smart Conversational Agent Enhanced with Web Search Capabilities",
url=agent_url,
version="1.0",
capabilities=AgentCapabilities(streaming=True),
default_input_modes=["text/plain"],
default_output_modes=["text/plain"],
preferred_transport=TransportProtocol.jsonrpc,
skills=[
AgentSkill(
id="conversational_agent",
name="Conversational Agent",
description="A Smart Conversational Agent Enhanced with Web Search Capabilities",
tags=["SmartAssistant", "LiveSearch", "AIPowered", "Conversation"],
examples=[
"Find the latest market share statistics for electric vehicles.",
"Why is Trump's tariff a problem for India?",
"What are people talking about on social media?",
],
)
],
)
Trending Topics Agent
from google.adk.agents import LlmAgent
from google.adk.tools import google_search
from a2a.types import AgentCapabilities, AgentCard, AgentSkill, TransportProtocol
def get_trending_topics_agent(model: str) -> LlmAgent:
return LlmAgent(
model=model,
name="trending_topics_agent",
instruction="""
You are a social media trends analyst. Your job is to search the web for current trending topics,
particularly from social platforms.
When asked about trends:
1. Search for "trending topics today" or similar queries
2. Extract the top 3 trending topics
3. Return them in a JSON format
Focus on current, real-time trends from the last 24 hours.
You MUST return your response in the following JSON format:
{
"trends": [
{
"topic": "Topic name",
"description": "Brief description (1-2 sentences)",
"reason": "Why it's trending"
},
{
"topic": "Topic name",
"description": "Brief description (1-2 sentences)",
"reason": "Why it's trending"
},
{
"topic": "Topic name",
"description": "Brief description (1-2 sentences)",
"reason": "Why it's trending"
}
]
}
Only return the JSON object, no additional text.
""",
tools=[google_search],
)
def get_trending_topics_agent_card(agent_url: str) -> AgentCard:
return AgentCard(
name="Trending Topics Agent",
url=agent_url,
description="Searches the web for current trending topics from social media",
version="1.0",
capabilities=AgentCapabilities(streaming=True),
default_input_modes=["text/plain"],
default_output_modes=["text/plain"],
preferred_transport=TransportProtocol.jsonrpc,
skills=[
AgentSkill(
id="find_trends",
name="Find Trending Topics",
description="Searches for current trending topics on social media",
tags=["trends", "social media", "twitter", "current events"],
examples=[
"What's trending today?",
"Show me current Twitter trends",
"What are people talking about on social media?",
],
)
],
)
Trend Analyzer agent
from google.adk.agents import LlmAgent
from google.adk.tools import google_search
from a2a.types import AgentCapabilities, AgentCard, AgentSkill, TransportProtocol
def get_analyzer_agent(model: str) -> LlmAgent:
return LlmAgent(
model=model,
name="trend_analyzer_agent",
instruction="""
You are a data analyst specializing in trend analysis. When given a trending topic,
perform deep research to find quantitative data and insights.
For each trend you analyze:
1. Search for statistics, numbers, and metrics related to the trend
2. Look for:
- Engagement metrics (views, shares, mentions)
- Growth rates and timeline
- Geographic distribution
- Related hashtags or keywords
3. Provide concrete numbers and data points
Keep it somehow concise
Always prioritize quantitative information over qualitative descriptions.
""",
tools=[google_search],
)
def get_analyzer_agent_card(agent_url: str) -> AgentCard:
return AgentCard(
name="Trend Analyzer Agent",
url=agent_url,
description="Performs deep analysis of trends with quantitative data",
version="1.0",
capabilities=AgentCapabilities(streaming=True),
default_input_modes=["text/plain"],
default_output_modes=["text/plain"],
preferred_transport=TransportProtocol.jsonrpc,
skills=[
AgentSkill(
id="analyze_trend",
name="Analyze Trend",
description="Provides quantitative analysis of a specific trend",
tags=["analysis", "data", "metrics", "statistics"],
examples=[
"Analyze the #ClimateChange trend",
"Get metrics for the Taylor Swift trend",
"Provide data analysis for AI adoption trend",
],
)
],
)
Smart Router Integration
By implementing the abstract class JSONRPCApplication with sophisticated FastAPI routing, this solution doesn’t just solve the “one agent per server” limitation—it creates a robust, scalable architecture for enterprise A2A deployments.
The solution uses FastAPI’s APIRouter to create isolated routing contexts for each agent:
from collections.abc import Callable
from typing import Any
from a2a.server.apps.jsonrpc.jsonrpc_app import CallContextBuilder, JSONRPCApplication
from a2a.server.context import ServerCallContext
from a2a.server.request_handlers.request_handler import RequestHandler
from a2a.types import AgentCard
from a2a.utils.constants import AGENT_CARD_WELL_KNOWN_PATH, DEFAULT_RPC_URL, EXTENDED_AGENT_CARD_PATH
from fastapi import APIRouter, FastAPI
from starlette.applications import Starlette
class A2AFastApiApp(JSONRPCApplication):
def __init__(
self,
fastapi_app: FastAPI,
agent_card: AgentCard,
http_handler: RequestHandler,
extended_agent_card: AgentCard | None = None,
context_builder: CallContextBuilder | None = None,
card_modifier: Callable[[AgentCard], AgentCard] | None = None,
extended_card_modifier: Callable[[AgentCard, ServerCallContext], AgentCard] | None = None,
):
super().__init__(
agent_card=agent_card,
http_handler=http_handler,
extended_agent_card=extended_agent_card,
context_builder=context_builder,
card_modifier=card_modifier,
extended_card_modifier=extended_card_modifier,
)
self.fastapi_app = fastapi_app
def build( self,
agent_card_url: str = AGENT_CARD_WELL_KNOWN_PATH,
rpc_url: str = DEFAULT_RPC_URL,
extended_agent_card_url: str = EXTENDED_AGENT_CARD_PATH,
**kwargs: Any,
) -> Starlette:
name_prefix = rpc_url.replace("/", "")
router = APIRouter()
# Add RPC endpoint
router.add_api_route(
rpc_url,
endpoint=self._handle_requests,
name=f"{name_prefix}_a2a_handler",
methods=["POST"],
)
# Add agent card endpoint
router.add_api_route(
agent_card_url,
endpoint=self._handle_get_agent_card,
methods=["GET"],
name=f"{name_prefix}_agent_card",
)
self.fastapi_app.include_router(router)
return self.fastapi_app
Request Handler Architecture
At the core of this solution is a sophisticated request handler factory that creates isolated execution environments for each agent:
from a2a.server.request_handlers import DefaultRequestHandler
from a2a.server.tasks import InMemoryTaskStore
from google.adk import Runner
from google.adk.a2a.executor.a2a_agent_executor import A2aAgentExecutor, A2aAgentExecutorConfig
from google.adk.agents import LlmAgent
from google.adk.artifacts import InMemoryArtifactService
from google.adk.memory import InMemoryMemoryService
from google.adk.sessions import InMemorySessionService
class A2ARequestHandler:
@staticmethod
def get_request_handler(agent: LlmAgent):
runner = Runner(
app_name=agent.name,
agent=agent,
artifact_service=InMemoryArtifactService(),
session_service=InMemorySessionService(),
memory_service=InMemoryMemoryService(),
)
config = A2aAgentExecutorConfig()
executor = A2aAgentExecutor(runner=runner, config=config)
return DefaultRequestHandler(agent_executor=executor, task_store=InMemoryTaskStore())
Integrating the Agent with A2A Server: A Utility Class
from typing import Callable
from a2a.types import AgentCard
from fastapi import FastAPI
from google.adk.agents import LlmAgent
from src.a2a.a2a_fastapi_app import A2AFastApiApp
from src.a2a.a2a_request_handler import A2ARequestHandler
class A2AUtils:
"""Utility class for A2A (Agent-to-Agent) communication."""
@staticmethod
def build(
name: str,
get_agent: Callable[[str], LlmAgent],
get_agent_card: Callable[[str], AgentCard],
model_name: str,
agent_base_url: str,
app: FastAPI,
) -> None:
agent = get_agent(model_name)
agent_request_handler = A2ARequestHandler.get_request_handler(agent)
agent_card = get_agent_card(f"{agent_base_url}/{name}/")
agent_server = A2AFastApiApp(fastapi_app=app, agent_card=agent_card, http_handler=agent_request_handler)
agent_server.build(rpc_url=f"/{name}/", agent_card_url=f"/{name}/{{path:path}}")
Integrating the Agent with the A2A Server and Initiating the Server:
import logging
import os
import uvicorn
from dotenv import load_dotenv
from fastapi import FastAPI
from src.a2a.a2a_utils import A2AUtils
from src.agent.analyzer_agent import get_analyzer_agent, get_analyzer_agent_card
from src.agent.conversation_agent import get_conversational_agent, get_conversational_agent_card
from src.agent.trending_topics_agent import get_trending_topics_agent, get_trending_topics_agent_card
load_dotenv()
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger()
AGENT_BASE_URL = os.getenv("AGENT_BASE_URL")
if not AGENT_BASE_URL:
raise ValueError("AGENT_BASE_URL environment variable must be set")
MODEL_NAME = os.getenv("MODEL_NAME")
if not MODEL_NAME:
raise ValueError("MODEL_NAME environment variable must be set")
logger.info(f"AGENT BASE URL {AGENT_BASE_URL}")
app: FastAPI = FastAPI(
title="Run multiple agents on single host using A2A protocol.",
description="Run multiple agents on single host using A2A protocol.",
version="1.0.0",
root_path="/a2a",
)
@app.get("/health")
async def health_check() -> dict[str, str]:
return {"status": "ok"}
# conversation agent integration with A2A server
A2AUtils.build(
name="conversation",
get_agent=get_conversational_agent,
get_agent_card=get_conversational_agent_card,
model_name=MODEL_NAME,
agent_base_url=AGENT_BASE_URL,
app=app,
)
# trending_topics agent integration with A2A server
A2AUtils.build(
name="trending_topics",
get_agent=get_trending_topics_agent,
get_agent_card=get_trending_topics_agent_card,
model_name=MODEL_NAME,
agent_base_url=AGENT_BASE_URL,
app=app,
)
# analyzer agent integration with A2A server
A2AUtils.build(
name="analyzer",
get_agent=get_analyzer_agent,
get_agent_card=get_analyzer_agent_card,
model_name=MODEL_NAME,
agent_base_url=AGENT_BASE_URL,
app=app,
)
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
A2A Client for Testing A2A Agent
from typing import Any
from uuid import uuid4
import httpx
from a2a.client import ClientConfig, ClientFactory
from a2a.types import AgentCard, Message, Part, Role, TextPart, TransportProtocol
AGENT_CARD_PATH = "/agent-card.json"
class A2AClient:
def __init__(self, default_timeout: float = 240.0):
# Cache for agent metadata
self._agent_info_cache: dict[str, dict[str, Any] | None] = {}
self.default_timeout = default_timeout
async def create_task(self, agent_url: str, message: str, context_id: str) -> str:
"""Send a message following the official A2A SDK pattern."""
# Configure httpx client with timeout
timeout_config = httpx.Timeout(
timeout=self.default_timeout,
connect=10.0,
read=self.default_timeout,
write=10.0,
pool=5.0,
)
async with httpx.AsyncClient(timeout=timeout_config) as httpx_client:
# Check if we have cached agent card data
if agent_url in self._agent_info_cache and self._agent_info_cache[agent_url] is not None:
agent_card_data = self._agent_info_cache[agent_url]
else:
# Fetch the agent card
agent_card_response = await httpx_client.get(f"{agent_url}{AGENT_CARD_PATH}")
agent_card_data = self._agent_info_cache[agent_url] = agent_card_response.json()
# Create AgentCard from data
agent_card = AgentCard(**agent_card_data)
# Create A2A client with the agent card
config = ClientConfig(
httpx_client=httpx_client,
supported_transports=[
TransportProtocol.jsonrpc,
TransportProtocol.http_json,
],
use_client_preference=True,
)
factory = ClientFactory(config)
client = factory.create(agent_card)
message_obj = Message(
role=Role.user,
parts=[Part(TextPart(text=message))],
message_id=str(uuid4()),
context_id=context_id,
)
responses = []
async for response in client.send_message(message_obj):
responses.append(response)
# The response is a tuple - get the first element (Task object)
if responses and isinstance(responses[0], tuple) and len(responses[0]) > 0:
task = responses[0][0] # First element of the tuple
# Extract text: task.artifacts[0].parts[0].root.text
try:
return task.artifacts[0].parts[0].root.text
except (AttributeError, IndexError):
return str(task)
return "No response received"
Let’s begin the discussion with the agent:
import asyncio
import uuid
from src.a2a.a2a_client import A2AClient
async def main():
a2a_client: A2AClient = A2AClient()
agent_host_url = "http://localhost:8000/a2a"
context_id = str(uuid.uuid4())
print(f"Starting conversation with context_id: {context_id}")
# Turn 1 — Start conversation
conversation_task = await a2a_client.create_task(
agent_url=f"{agent_host_url}/conversation",
message="Who is the Prime Minister of India?",
context_id=context_id,
)
print(f"Turn 1 → {conversation_task} \n\n")
# Turn 2 — Follow-up using pronoun (tests context memory)
conversation_task = await a2a_client.create_task(
agent_url=f"{agent_host_url}/conversation",
message="What is his wife's name?",
context_id=context_id,
)
print(f"Turn 2 → {conversation_task} \n\n")
# Turn 3 — A context shift
conversation_task = await a2a_client.create_task(
agent_url=f"{agent_host_url}/conversation",
message="List three major policies he introduced.",
context_id=context_id,
)
print(f"Turn 4 → {conversation_task}")
if __name__ == "__main__":
asyncio.run(main())
Conclusion: Scalable AI Agent Infrastructure
What started as a simple question—”Why can’t we run multiple A2A agents on one server?”—has led to a fundamental reimagining of how we architect AI agent systems. This solution represents more than just a technical workaround; it’s a paradigm shift that democratizes access to sophisticated multi-agent AI deployments.
The transformation is profound:
- From resource-heavy to resource-efficient: One server now does the work of many
- From complex to simple: Unified deployment replaces fragmented infrastructure
- From experimental to enterprise-ready: Production-grade architecture enables real-world adoption
- From isolated to collaborative: Multiple agents can work together seamlessly
The impact extends beyond cost savings. Organizations can now experiment with AI agent architectures that were previously prohibitively expensive. Startups can prototype sophisticated multi-agent systems without massive infrastructure investments. Enterprise teams can deploy specialized agents for different departments while maintaining centralized governance and monitoring.
This is just the beginning. As the A2A protocol matures and adoption grows, we’ll see the emergence of agent marketplaces, sophisticated orchestration platforms, and AI ecosystems that rival today’s microservices architectures in complexity and capability.
The barrier has been broken. The future of AI isn’t just about better models—it’s about better architecture that makes those models accessible, scalable, and collaborative. Welcome to the era of democratized multi-agent AI.
References
For deeper understanding and continued learning, explore these essential resources:
A2A Protocol Documentation
- A2A Protocol Specification – Complete A2A protocol documentation and specifications
- A2A Project Discussions – Community discussions on multi-agent architectures and deployment patterns.
Google ADK Resources
- Google ADK Documentation – Comprehensive guide to the Agent Development Kit used in this implementation.
A2A Samples code
- A2A Samples Repository – The official repository for a2a sample code. Some of the code samples are sourced from this collection.
These resources provide the theoretical foundation and practical examples that make this multi-agent architecture possible. Whether you’re looking to understand the A2A protocol deeper, explore Google ADK capabilities, or see additional implementation patterns, these references will accelerate your journey into the world of sophisticated AI agent systems.
Check out the GitHub repository for the code: Multiple Agents on a Single A2A Server.