Agentic-AI – Tech Blogs

The Agent-to-Agent (A2A) protocol has opened up incredible possibilities for AI agent communication and collaboration across different platforms. However, there’s been a significant limitation that has puzzled many developers: why can’t we run multiple A2A agents on the same server?

Until now, deploying multiple A2A agents meant spinning up separate servers or using different ports for each agent, creating unnecessary complexity and resource overhead. But what if I told you there’s an elegant solution that changes everything?

The Problem: One Server, One Agent Limitation

Traditionally, A2A implementations have been constrained by a “one agent per server” model. This limitation stems from how agents are typically served and discovered within the A2A ecosystem. Developers who wanted to deploy multiple specialized agents had to:

Set up multiple servers – increasing infrastructure costs
Use different ports – complicating network configuration
Manage separate deployments – multiplying maintenance overhead

The Solution: Path-Based Agent Routing

The breakthrough comes from a surprisingly simple yet powerful concept: serving multiple agents through unique URL paths on a single server. Instead of dedicating entire servers to individual agents, we can host them all under different paths.

Here’s how it works in practice:

Multiple Agents, One Host

Imagine running three different agents on the same server:

Agent Type	Purpose	Agent Card URL
Conversational Agent	General conversation and Q&A	`http://localhost:8000/a2a/conversation/agent-card.json`
Trending Topics Agent	Real-time trend analysis	`http://localhost:8000/a2a/trending/agent-card.json`
Analyzer Agent	Data analysis and insights	`http://localhost:8000/a2a/analyzer/agent-card.json`

Each agent maintains its own unique identity and capabilities while sharing the same underlying infrastructure.

Technical Innovation: The Architecture Behind It

The solution leverages FastAPI’s powerful routing capabilities combined with the A2A SDK to create a multi-tenant agent hosting platform. Here’s what makes it work:

Smart URL Routing

Each agent is mapped to a specific URL path pattern (/a2a/{agent_name}/), allowing the server to route requests to the appropriate agent handler while maintaining A2A protocol compliance.

Context Isolation

Despite sharing the same server, each agent maintains its own context and conversation state, ensuring no cross-contamination between different agent interactions.

Resource Optimization

By sharing the same Python process and server resources, this approach significantly reduces memory footprint and startup time compared to running separate server instances.

Real-World Benefits

1. Cost Efficiency

Reduced Infrastructure: One server handles multiple agents
Lower Resource Usage: Shared memory and CPU resources
Simplified Deployment: Single deployment pipeline

2. Operational Simplicity

Unified Monitoring: All agents under one roof
Centralized Logging: Easier debugging and maintenance
Single Configuration: One environment setup for all agents

3. Scalability

Horizontal Scaling: Add more agents without new servers
Load Balancing: Distribute traffic across agent types
Flexible Resource Allocation: Adjust resources based on actual usage

Getting Started: Your First Multi-Agent Server

Setting up your own multi-agent A2A server is surprisingly straightforward:

Creating Three Agents with the `[google-adk](https://google.github.io/adk-docs/)` Library

Conversation Agent

from google.adk.agents import LlmAgent
from google.adk.tools import google_search

from a2a.types import AgentCapabilities, AgentCard, AgentSkill, TransportProtocol

CONVERSATION_AGENT_INSTRUCTIONS = """
You are a Conversation Agent Enhanced with Web Search Capabilities.

## Core Behavior:
- Be conversational, friendly, and helpful
- Provide accurate, relevant, and well-structured responses
- Maintain context throughout the conversation
- Ask clarifying questions when user intent is unclear
- Admit when you don't know something and offer to search

## When to Use Web Search:
1. Current events or time-sensitive info
2. Precise, up-to-date facts
3. Latest technical details
4. Local information
5. Verification of uncertain info
6. Specialized topics needing expert sources

## Search Strategy:
- Use specific queries and authoritative sources
- Cross-reference results
- Distinguish between your knowledge and searched info
- Attribute sources when relevant

## Response Guidelines:
1. Direct answers first
2. Break down complex topics
3. Provide examples
4. Offer multiple perspectives
5. Suggest follow-ups

## Information Quality:
- Prioritize accuracy
- State confidence levels
- Warn about outdated info
- Suggest multiple sources for key decisions
- Fact-check critical points

## Conversation Management:
- Retain and build upon previous context
- Transition topics smoothly
- Match tone to user style
- Respect preferences

## Limitations and Transparency:
- Be honest about capabilities
- Explain when search might help
- Acknowledge incomplete info
- Suggest alternative resources
- Respect privacy

## Best Practices:
- Stay respectful and professional
- Avoid bias
- Use proactive search
- Structure answers clearly
- End with an offer to assist further
"""


def get_conversational_agent(model: str) -> LlmAgent:
    return LlmAgent(
        model=model,
        name="conversational_agent",
        description="An AI assistant that enhances conversations with live web search when needed.",
        instruction=CONVERSATION_AGENT_INSTRUCTIONS,
        tools=[google_search],
    )


def get_conversational_agent_card(agent_url: str) -> AgentCard:
    return AgentCard(
        name="Conversational Agent",
        description="Smart Conversational Agent Enhanced with Web Search Capabilities",
        url=agent_url,
        version="1.0",
        capabilities=AgentCapabilities(streaming=True),
        default_input_modes=["text/plain"],
        default_output_modes=["text/plain"],
        preferred_transport=TransportProtocol.jsonrpc,
        skills=[
            AgentSkill(
                id="conversational_agent",
                name="Conversational Agent",
                description="A Smart Conversational Agent Enhanced with Web Search Capabilities",
                tags=["SmartAssistant", "LiveSearch", "AIPowered", "Conversation"],
                examples=[
                    "Find the latest market share statistics for electric vehicles.",
                    "Why is Trump's tariff a problem for India?",
                    "What are people talking about on social media?",
                ],
            )
        ],
    )

Trend Analyzer agent

from google.adk.agents import LlmAgent
from google.adk.tools import google_search

from a2a.types import AgentCapabilities, AgentCard, AgentSkill, TransportProtocol


def get_analyzer_agent(model: str) -> LlmAgent:
    return LlmAgent(
        model=model,
        name="trend_analyzer_agent",
        instruction="""
        You are a data analyst specializing in trend analysis. When given a trending topic,
        perform deep research to find quantitative data and insights.
    
        For each trend you analyze:
        1. Search for statistics, numbers, and metrics related to the trend
        2. Look for:
           - Engagement metrics (views, shares, mentions)
           - Growth rates and timeline
           - Geographic distribution
           - Related hashtags or keywords
        3. Provide concrete numbers and data points
    
        Keep it somehow concise
    
        Always prioritize quantitative information over qualitative descriptions.
        """,
        tools=[google_search],
    )


def get_analyzer_agent_card(agent_url: str) -> AgentCard:
    return AgentCard(
        name="Trend Analyzer Agent",
        url=agent_url,
        description="Performs deep analysis of trends with quantitative data",
        version="1.0",
        capabilities=AgentCapabilities(streaming=True),
        default_input_modes=["text/plain"],
        default_output_modes=["text/plain"],
        preferred_transport=TransportProtocol.jsonrpc,
        skills=[
            AgentSkill(
                id="analyze_trend",
                name="Analyze Trend",
                description="Provides quantitative analysis of a specific trend",
                tags=["analysis", "data", "metrics", "statistics"],
                examples=[
                    "Analyze the #ClimateChange trend",
                    "Get metrics for the Taylor Swift trend",
                    "Provide data analysis for AI adoption trend",
                ],
            )
        ],
    )

Smart Router Integration

By implementing the abstract class JSONRPCApplication with sophisticated FastAPI routing, this solution doesn’t just solve the “one agent per server” limitation—it creates a robust, scalable architecture for enterprise A2A deployments.

The solution uses FastAPI’s APIRouter to create isolated routing contexts for each agent:

from collections.abc import Callable
from typing import Any

from a2a.server.apps.jsonrpc.jsonrpc_app import CallContextBuilder, JSONRPCApplication
from a2a.server.context import ServerCallContext
from a2a.server.request_handlers.request_handler import RequestHandler
from a2a.types import AgentCard
from a2a.utils.constants import AGENT_CARD_WELL_KNOWN_PATH, DEFAULT_RPC_URL, EXTENDED_AGENT_CARD_PATH
from fastapi import APIRouter, FastAPI
from starlette.applications import Starlette


class A2AFastApiApp(JSONRPCApplication):
    def __init__(
            self,
            fastapi_app: FastAPI,
            agent_card: AgentCard,
            http_handler: RequestHandler,
            extended_agent_card: AgentCard | None = None,
            context_builder: CallContextBuilder | None = None,
            card_modifier: Callable[[AgentCard], AgentCard] | None = None,
            extended_card_modifier: Callable[[AgentCard, ServerCallContext], AgentCard] | None = None,
    ):
        super().__init__(
            agent_card=agent_card,
            http_handler=http_handler,
            extended_agent_card=extended_agent_card,
            context_builder=context_builder,
            card_modifier=card_modifier,
            extended_card_modifier=extended_card_modifier,
        )
        self.fastapi_app = fastapi_app

    def build( self,
    agent_card_url: str = AGENT_CARD_WELL_KNOWN_PATH,
    rpc_url: str = DEFAULT_RPC_URL,
    extended_agent_card_url: str = EXTENDED_AGENT_CARD_PATH,
    **kwargs: Any,
) -> Starlette:
    name_prefix = rpc_url.replace("/", "")
    router = APIRouter()
    
    # Add RPC endpoint
    router.add_api_route(
        rpc_url,
        endpoint=self._handle_requests,
        name=f"{name_prefix}_a2a_handler",
        methods=["POST"],
    )
    
    # Add agent card endpoint
    router.add_api_route(
        agent_card_url,
        endpoint=self._handle_get_agent_card,
        methods=["GET"],
        name=f"{name_prefix}_agent_card",
    )
    
    self.fastapi_app.include_router(router)
    return self.fastapi_app

Request Handler Architecture

At the core of this solution is a sophisticated request handler factory that creates isolated execution environments for each agent:

from a2a.server.request_handlers import DefaultRequestHandler
from a2a.server.tasks import InMemoryTaskStore
from google.adk import Runner
from google.adk.a2a.executor.a2a_agent_executor import A2aAgentExecutor, A2aAgentExecutorConfig
from google.adk.agents import LlmAgent
from google.adk.artifacts import InMemoryArtifactService
from google.adk.memory import InMemoryMemoryService
from google.adk.sessions import InMemorySessionService


class A2ARequestHandler:
    @staticmethod
    def get_request_handler(agent: LlmAgent):
        runner = Runner(
            app_name=agent.name,
            agent=agent,
            artifact_service=InMemoryArtifactService(),
            session_service=InMemorySessionService(),
            memory_service=InMemoryMemoryService(),
        )
        config = A2aAgentExecutorConfig()
        executor = A2aAgentExecutor(runner=runner, config=config)
        return DefaultRequestHandler(agent_executor=executor, task_store=InMemoryTaskStore())

Integrating the Agent with A2A Server: A Utility Class

from typing import Callable

from a2a.types import AgentCard
from fastapi import FastAPI
from google.adk.agents import LlmAgent

from src.a2a.a2a_fastapi_app import A2AFastApiApp
from src.a2a.a2a_request_handler import A2ARequestHandler

class A2AUtils:
    """Utility class for A2A (Agent-to-Agent) communication."""
    @staticmethod
    def build(
            name: str,
            get_agent: Callable[[str], LlmAgent],
            get_agent_card: Callable[[str], AgentCard],
            model_name: str,
            agent_base_url: str,
            app: FastAPI,
    ) -> None:
        agent = get_agent(model_name)
        agent_request_handler = A2ARequestHandler.get_request_handler(agent)
        agent_card = get_agent_card(f"{agent_base_url}/{name}/")
        agent_server = A2AFastApiApp(fastapi_app=app, agent_card=agent_card, http_handler=agent_request_handler)
        agent_server.build(rpc_url=f"/{name}/", agent_card_url=f"/{name}/{{path:path}}")

Integrating the Agent with the A2A Server and Initiating the Server:

import logging
import os

import uvicorn
from dotenv import load_dotenv
from fastapi import FastAPI

from src.a2a.a2a_utils import A2AUtils
from src.agent.analyzer_agent import get_analyzer_agent, get_analyzer_agent_card
from src.agent.conversation_agent import get_conversational_agent, get_conversational_agent_card
from src.agent.trending_topics_agent import get_trending_topics_agent, get_trending_topics_agent_card

load_dotenv()

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger()

AGENT_BASE_URL = os.getenv("AGENT_BASE_URL")

if not AGENT_BASE_URL:
    raise ValueError("AGENT_BASE_URL environment variable must be set")

MODEL_NAME = os.getenv("MODEL_NAME")

if not MODEL_NAME:
    raise ValueError("MODEL_NAME environment variable must be set")
logger.info(f"AGENT BASE URL {AGENT_BASE_URL}")

app: FastAPI = FastAPI(
    title="Run multiple agents on single host using A2A protocol.",
    description="Run multiple agents on single host using A2A protocol.",
    version="1.0.0",
    root_path="/a2a",
)


@app.get("/health")
async def health_check() -> dict[str, str]:
    return {"status": "ok"}


# conversation agent integration with A2A server
A2AUtils.build(
    name="conversation",
    get_agent=get_conversational_agent,
    get_agent_card=get_conversational_agent_card,
    model_name=MODEL_NAME,
    agent_base_url=AGENT_BASE_URL,
    app=app,
)

# trending_topics agent integration with A2A server
A2AUtils.build(
    name="trending_topics",
    get_agent=get_trending_topics_agent,
    get_agent_card=get_trending_topics_agent_card,
    model_name=MODEL_NAME,
    agent_base_url=AGENT_BASE_URL,
    app=app,
)

# analyzer agent integration with A2A server
A2AUtils.build(
    name="analyzer",
    get_agent=get_analyzer_agent,
    get_agent_card=get_analyzer_agent_card,
    model_name=MODEL_NAME,
    agent_base_url=AGENT_BASE_URL,
    app=app,
)

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000)

A2A Client for Testing A2A Agent

from typing import Any
from uuid import uuid4

import httpx

from a2a.client import ClientConfig, ClientFactory
from a2a.types import AgentCard, Message, Part, Role, TextPart, TransportProtocol

AGENT_CARD_PATH = "/agent-card.json"


class A2AClient:
    def __init__(self, default_timeout: float = 240.0):
        # Cache for agent metadata
        self._agent_info_cache: dict[str, dict[str, Any] | None] = {}
        self.default_timeout = default_timeout

    async def create_task(self, agent_url: str, message: str, context_id: str) -> str:
        """Send a message following the official A2A SDK pattern."""
        # Configure httpx client with timeout
        timeout_config = httpx.Timeout(
            timeout=self.default_timeout,
            connect=10.0,
            read=self.default_timeout,
            write=10.0,
            pool=5.0,
        )

        async with httpx.AsyncClient(timeout=timeout_config) as httpx_client:
            # Check if we have cached agent card data
            if agent_url in self._agent_info_cache and self._agent_info_cache[agent_url] is not None:
                agent_card_data = self._agent_info_cache[agent_url]
            else:
                # Fetch the agent card
                agent_card_response = await httpx_client.get(f"{agent_url}{AGENT_CARD_PATH}")
                agent_card_data = self._agent_info_cache[agent_url] = agent_card_response.json()

            # Create AgentCard from data
            agent_card = AgentCard(**agent_card_data)

            # Create A2A client with the agent card
            config = ClientConfig(
                httpx_client=httpx_client,
                supported_transports=[
                    TransportProtocol.jsonrpc,
                    TransportProtocol.http_json,
                ],
                use_client_preference=True,
            )

            factory = ClientFactory(config)
            client = factory.create(agent_card)
            message_obj = Message(
                role=Role.user,
                parts=[Part(TextPart(text=message))],
                message_id=str(uuid4()),
                context_id=context_id,
            )
            responses = []
            async for response in client.send_message(message_obj):
                responses.append(response)
            # The response is a tuple - get the first element (Task object)
            if responses and isinstance(responses[0], tuple) and len(responses[0]) > 0:
                task = responses[0][0]  # First element of the tuple

                # Extract text: task.artifacts[0].parts[0].root.text
                try:
                    return task.artifacts[0].parts[0].root.text
                except (AttributeError, IndexError):
                    return str(task)

            return "No response received"

Let’s begin the discussion with the agent:

import asyncio
import uuid

from src.a2a.a2a_client import A2AClient


async def main():
    a2a_client: A2AClient = A2AClient()
    agent_host_url = "http://localhost:8000/a2a"

    context_id = str(uuid.uuid4())
    print(f"Starting conversation with context_id: {context_id}")

    # Turn 1 — Start conversation
    conversation_task = await a2a_client.create_task(
        agent_url=f"{agent_host_url}/conversation",
        message="Who is the Prime Minister of India?",
        context_id=context_id,
    )
    print(f"Turn 1 → {conversation_task} \n\n")

    # Turn 2 — Follow-up using pronoun (tests context memory)
    conversation_task = await a2a_client.create_task(
        agent_url=f"{agent_host_url}/conversation",
        message="What is his wife's name?",
        context_id=context_id,
    )
    print(f"Turn 2 → {conversation_task} \n\n")

    # Turn 3 — A context shift
    conversation_task = await a2a_client.create_task(
        agent_url=f"{agent_host_url}/conversation",
        message="List three major policies he introduced.",
        context_id=context_id,
    )
    print(f"Turn 4 → {conversation_task}")


if __name__ == "__main__":
    asyncio.run(main())

Conclusion: Scalable AI Agent Infrastructure

What started as a simple question—”Why can’t we run multiple A2A agents on one server?”—has led to a fundamental reimagining of how we architect AI agent systems. This solution represents more than just a technical workaround; it’s a paradigm shift that democratizes access to sophisticated multi-agent AI deployments.

The transformation is profound:

From resource-heavy to resource-efficient: One server now does the work of many
From complex to simple: Unified deployment replaces fragmented infrastructure
From experimental to enterprise-ready: Production-grade architecture enables real-world adoption
From isolated to collaborative: Multiple agents can work together seamlessly

The impact extends beyond cost savings. Organizations can now experiment with AI agent architectures that were previously prohibitively expensive. Startups can prototype sophisticated multi-agent systems without massive infrastructure investments. Enterprise teams can deploy specialized agents for different departments while maintaining centralized governance and monitoring.

This is just the beginning. As the A2A protocol matures and adoption grows, we’ll see the emergence of agent marketplaces, sophisticated orchestration platforms, and AI ecosystems that rival today’s microservices architectures in complexity and capability.

The barrier has been broken. The future of AI isn’t just about better models—it’s about better architecture that makes those models accessible, scalable, and collaborative. Welcome to the era of democratized multi-agent AI.

References

For deeper understanding and continued learning, explore these essential resources:

A2A Protocol Documentation

A2A Protocol Specification – Complete A2A protocol documentation and specifications
A2A Project Discussions – Community discussions on multi-agent architectures and deployment patterns.

Google ADK Resources

Google ADK Documentation – Comprehensive guide to the Agent Development Kit used in this implementation.

A2A Samples code

A2A Samples Repository – The official repository for a2a sample code. Some of the code samples are sourced from this collection.

These resources provide the theoretical foundation and practical examples that make this multi-agent architecture possible. Whether you’re looking to understand the A2A protocol deeper, explore Google ADK capabilities, or see additional implementation patterns, these references will accelerate your journey into the world of sophisticated AI agent systems.

Check out the GitHub repository for the code: Multiple Agents on a Single A2A Server.

Tag: Agentic-AI

Running Multiple A2A Agents on a Single Server

The Problem: One Server, One Agent Limitation

The Solution: Path-Based Agent Routing

Multiple Agents, One Host

Technical Innovation: The Architecture Behind It

Smart URL Routing

Context Isolation

Resource Optimization

Real-World Benefits

1. Cost Efficiency

2. Operational Simplicity

3. Scalability

Getting Started: Your First Multi-Agent Server

Creating Three Agents with the `[google-adk](https://google.github.io/adk-docs/)` Library

Conversation Agent

Trending Topics Agent

Trend Analyzer agent

Smart Router Integration

Request Handler Architecture

Integrating the Agent with A2A Server: A Utility Class

Integrating the Agent with the A2A Server and Initiating the Server:

A2A Client for Testing A2A Agent

Conclusion: Scalable AI Agent Infrastructure

References

A2A Protocol Documentation

Google ADK Resources

A2A Samples code

The Problem: One Server, One Agent Limitation

The Solution: Path-Based Agent Routing

Multiple Agents, One Host

Technical Innovation: The Architecture Behind It

Smart URL Routing

Context Isolation

Resource Optimization

Real-World Benefits

1. Cost Efficiency

2. Operational Simplicity

3. Scalability

Getting Started: Your First Multi-Agent Server

Creating Three Agents with the [google-adk](https://google.github.io/adk-docs/) Library

Conversation Agent

Trending Topics Agent

Trend Analyzer agent

Smart Router Integration

Request Handler Architecture

Integrating the Agent with A2A Server: A Utility Class

Integrating the Agent with the A2A Server and Initiating the Server:

A2A Client for Testing A2A Agent

Conclusion: Scalable AI Agent Infrastructure

References

A2A Protocol Documentation

Google ADK Resources

A2A Samples code

Share this:

Creating Three Agents with the `[google-adk](https://google.github.io/adk-docs/)` Library