Introducing the Agent Communication & Discovery Protocol (ACDP): A proposal for AI agents to discover and collaborate with each other

Introduction

There has been an absolutely incredible amount of activity around the evolution of LLM-based agents. Most recently Anthropic's Model Context Protocol (MCP) has seen adoption as the standard for how application context is provided to agents. My guess is we'll soon see improvements in transport layer options and authentication and authorization providing a more granular permissions model. There are already a good handful of MCP marketplaces/registries centralizing and easing access to the thousands of servers that have already been developed. MCP gateways and agent discovery mechanisms will be next up along with 'supply chain' attacks against that infrastructure. How do you verify and trust an agent or MCP server that wasn't developed in house?

All this got me thinking about how agents could discover, share and communicate information, publish capabilities and autonomously collaborate on tasks, even when they're from different agent providers. All in a standardized, easily implementable manner.

As LLM-based agents become more specialized and numerous, we need a common approach for these digital workers to find one another and collaborate.

Introducing the Agent Communication and Discovery Protocol (ACDP)

ACDP is a practical approach that leverages familiar technologies to create a secure, robust network of discoverable, interoperable AI agents.

Coincidentally, Google just released A2A as an approach to providing a solution to the problem of a common language for agents to use for communication. They lightly touch on topics of discovery and registration of capabilities too:

“...critical to support multi-agent communication by giving your agents a common language – irrespective of the framework or vendor they are built on...”

Regardless of industry or domain, this communication requirement will become commonplace. I think we'll see a fast evolution and merging of the capabilities provided by the services and companies solving this problem today

Our initial focus with ACDP was on agent discovery and establishing peer relationships for the sharing of capabilities, all through the use of existing technologies.

The full write up (it comes with more diagrams!) of the protocol can be found here. This was all a bit of a thought exercise more than anything else but thinking through the mechanics of something like this helps with reasoning about solutions in other areas of our platform. There is a simple PoC implementation of the concepts, it's somewhat contrived as it ACDP forces discovery and collaboration with other agents if a question is asked of an agent, but the idea was to show that the approach was viable.

Why We Need a Discovery Protocol for AI Agents

Let's consider a simple scenario: Imagine you have a specialized accounting agent that needs help with language translation. How does it find a translation agent? How do they exchange information securely? (I have no idea if accountants need a translation agent but looking at my tax returns, I might.) Hey, the example could be worse, it could be yet another weather agent/tool.

What will likely happen in the agentic world today (even assuming MCP is universally adopted) is that the accounting agent will do what it’s capable of and deliver output that needs to be translated separately. Or even worse, the agent will need to work on the output of translated content and restart the process again, requiring human input and guidance along the process.

Without a standard protocol, every agent integration becomes a custom project. This specification defines an approach that allows LLM-based agents to advertise themselves via DNS and discover peers in a hybrid decentralized manner. It leverages standard DNS records (TXT, SRV) for discovery and metadata, augmented by a central registry for detailed capability listing, auth requirements and dynamic updates and search. All agent-to-agent and agent-to-registry communication uses HTTPS for security and interoperability. The protocol defines how agents register their endpoints, discover each other (both through DNS and peer-to-peer awareness), describe their capabilities in a structured way, and establish secure communications. ACDP enables them to:

Advertise their capabilities (like translation, summarization, coding, incident response, chess)

Discover other agents with complementary skills

Communicate securely

Collaborate on complex tasks

The best part? There is no need to reinvent the wheel – the protocol builds on technologies that have served the internet reliably (with a few bumps along the way) for decades.

DNS Service Discovery: DNS-based discovery mechanism leveraging DNS SRV and TXT records to publish agent endpoints and capabilities.

Central Registry + Peer Awareness: Like how we combine Google searches with recommendations from friends, ACDP uses both a central registry (for searchability) and peer-to-peer gossip (for resilience). LLM-based agents register themselves, discover peers, and collaborate on tasks.

HTTPS for Everything: All communication happens over standard HTTPS, ensuring compatibility with existing infrastructure and security tools.

Let's take a look at how these pieces could fit together in practice:

Agent Registration Process

When a new agent comes online, it follows a simple registration process, and it performs two key registration steps:

DNS Registration: The agent registers SRV and TXT records in DNS, which advertise:

The agent's hostname and port (SRV record)

The agent's capabilities and description (TXT record)

Registry Registration: The agent sends its full metadata to the central registry including:

Basic information (ID, name, description)

Detailed capabilities list

Interface details (REST endpoints)

Model information

Endpoints for specific features

After registration, the agent maintains a heartbeat connection with the registry, sending periodic updates to confirm it's still active. If the registry loses track of the agent, the agent will automatically re-register during its next heartbeat attempt.

Agent Discovery Process

The Discovery Service component provides a unified way to discover other agents through multiple methods:

Registry-based Discovery: Agents can query the central registry to:

Find a specific agent by ID

Discover agents with specific capabilities

Search for agents matching multiple criteria

DNS-based Discovery: Agents can resolve other agents using DNS queries:

SRV lookup to find host and port

TXT lookup to get capabilities and description

Cache-based Optimization: The Discovery Service maintains a local cache of discovered agents to reduce network traffic and improve performance.

The implementation follows a fallback pattern - first check the cache, then try the registry, and finally fall back to DNS if needed.

def find_agent():     

# Try registry first     
agents = registry_client.search(capability="translation")          

# If none found, try DNS lookup     
if not agents:         
	srv_records = dns_resolver.query("_llm-agent._tcp.translator.agents.example.com", "SRV")         			
   	txt_records = dns_resolver.query("_llm-agent._tcp.translator.agents.example.com", "TXT")         
    # Parse records...          
    
# If still nothing, ask peers     
if not agents:         
	agents = ask_peers_for_agents()              
    
return agents

When our accountant's AI agent needs to find an agent with the appropriate capabilities, it can:

Query the registry: "Show me agents with translation capability"

Directly look up DNS:

_llm-agent._tcp.translator.agents.example.com

Ask a peer agent: "Do you know any good translation agents?"

This hybrid approach means if one discovery method fails, others can pick up the slack. These discovery, communication and collaboration capabilities enable agents’ abilities to be shared, improving all agents. So, the accounting agent will find the translation support it needs and possibly collaborate with other agents to help them finish their work efficiently.

A discovery request would look like this:

Peer-to-Peer Awareness

The Peer Manager component implements a decentralized peer discovery mechanism through "gossiping":

Gossip Protocol: At regular intervals, each agent:

Selects a random subset of known peers

Exchanges peer lists with them

Discovers new peers through these exchanges

Health Checking: Agents maintain information about peer health:

They periodically check if peers are responsive

They update peer status (healthy, unhealthy, unknown)

They remove stale peers that haven't been seen for too long

Peer Exchange: When two agents gossip, they:

Send a list of their known peers to each other

Process any new peers received

Resolve details of new peers via the Discovery Service

This peer-to-peer approach ensures agents can discover new peers even if the central registry is unavailable, creating a resilient network.

Agent Collaboration

The implementation includes a collaboration mechanism where agents can work together to answer questions:

Assistance Requests: When an agent receives a question, it can:

Select relevant peer agents based on capabilities

Send assistance requests to those peers

Gather responses from peers in parallel

Collaborative Responses: The primary agent:

Incorporates peer responses into its own context

Generates a final response that synthesizes the collective knowledge

Credits peer agents in the response

Agent Selection: Agents prioritize peers for collaboration based on:

Relevance of capabilities to the query

Health status (preferring responsive peers)

Recent activity

This collaborative approach allows the system to leverage specialized knowledge across different agents.

Since DNS is somewhat key to the discovery mechanism, we should look at some actual DNS records for an agent:

; SRV record defines where to find the agent 
_llm-agent._tcp.translator.agents.example.com. IN SRV 10 10 8000 translator-1.example.com.  

; TXT record describes the agent's capabilities 
_llm-agent._tcp.translator.agents.example.com. IN TXT "ver=1.0" 
"caps=translation,summarization" "desc=Polyglot translator specialized in technical content for accountants"   

; A record resolves the hostname to an IP 
translator-1.example.com. IN A 192.168.1.100

These records tell us:

The agent runs on host translator-1.example.com at port 8000

It has translation and summarization capabilities

It specializes in technical content (for accountants)

This structure can be queried with standard DNS tools. For example, to find the translator's address:

$ dig _llm-agent._tcp.translator.agents.example.com SRV

Security Considerations

1. DNS Security: DNSSEC provides cryptographic verification of DNS records, preventing spoofing.

2. Authentication Layers:

Registry updates require authentication (API keys or digital signatures)

Agent-to-agent communication can use mutual TLS for bidirectional verification

Access controls determine which agents can see or interact with others

Oauth2 should be easily implementable

3. Tiered Trust Model:

DNS provides baseline trust (domain ownership verification)

TLS certificates verify identity

Peer reputation tracks trustworthiness over time

4. Private Deployments:

Private Registries: Organizations can run their own registry servers internally.

Split DNS: Different DNS responses can be provided for internal vs. external queries.

Controlled Discovery: Access to registry and DNS can be restricted to authorized clients.

Security Measures: Authentication, authorization, and encryption can be applied to all communications.

These features make the system suitable for enterprise, government, healthcare, and financial deployments where privacy and security are paramount.

For example, if Agent A queries Agent B for help, it first verifies Agent B's identity through TLS, checks if Agent B has the necessary capabilities, and may require additional authentication before sharing sensitive data.

Private Deployments: Keep Your Agents to Yourself

For enterprises, healthcare providers, and government institutions, ACDP offers additional privacy controls:

Private Registry: Run your own internal registry that only indexes authorized agents:

AgentRegistry.internal.company.com
├── hr-assistant (capability: hr-policies)
├── code-reviewer (capability: code-review)
└── security-scanner (capability: vulnerability-detection)

Split Horizon DNS: Different DNS responses for internal vs. external queries:

; Internal DNS sees this 
_llm-agent._tcp.hr-bot.company.internal. IN SRV 0 0 8000 hr-bot.company.internal.  

; External DNS query returns nothing or an error

‍Network Isolation: Keep agents on a private network, accessible only via VPN or internal access points.

Enterprise AI Ecosystem: A large corporation deploys several specialized agents on their internal network:

An HR Virtual Assistant resides on the corporate intranet with the DNS entry _llm-agent._tcp.hr-bot.company.internal. Employees can discover it through the internal agent directory, but external parties have no visibility into this agent. It handles onboarding questions, benefits inquiries, and policy clarifications, while keeping sensitive HR data within company boundaries.

An IT Support Agent registered as it-assist.corp.internal helps staff troubleshoot technical issues and automate support tickets. The split DNS configuration ensures only devices on the corporate network (or VPN) can find the agent's address, speeding up internal support while preventing outsiders from accessing or even knowing about the support bot.
Intelligence agencies could implement ACDP within classified networks for secure agent collaboration:

; Example private DNS zone in a classified network 
_llm-agent._tcp.intel-analyst.siprnet.gov. IN SRV 0 0 443 analyst-agent.classified.gov. 
_llm-agent._tcp.intel-analyst.siprnet.gov. IN TXT "caps=intel-analysis,pattern-recognition" "desc=Classified intelligence analysis assistant"

Intelligence Processing Agents that analyze satellite imagery, intelligence reports, and signal intercepts likely operate exclusively on networks like SIPRNet. For example, a Defense Intelligence Agency analytical AI might be registered in a private directory so that analysts' workstations on the classified network can discover the "Intel Analyst AI" service, while the agent remains invisible outside that secure network.

Public Agent Ecosystems: Agents for Everyone

For public agent ecosystems, ACDP enables open discovery while maintaining security:

Public Registry: A searchable directory of available agents and their capabilities, similar to an app store Verified Domains: DNS-based verification ensures agents come from legitimate sources. Capability Matching: Find the right agent for any task based on its advertised capabilities:

$ curl https://registry.example.com/agents?capability=translation 
{   
	"agents": [     
    {
    	"id": "translator.agents.example.com",       
        "name": "Polyglot",       
        "capabilities": ["translation", 
        "summarization"],       
        "description": "Specialized in technical content for..."     
     }   
   ] 
}

Imagine a researcher working on climate modeling who needs specialized analysis:

Their primary research assistant agent discovers three specialized agents through the registry:

climate-data.science-agents.org (data analysis)

viz-specialist.science-agents.org (visualization)

paper-formatter.science-agents.org (academic formatting)

The agents collaborate through a chain of specialized tasks

Each agent registers its expertise publicly while maintaining security through authentication:

_llm-agent._tcp.climate-data.science-agents.org. IN TXT "caps=data-analysis,time-series,climate" "auth=required"

Extending Capabilities with Tool Discovery

ACDP can do more than connect agents – it also helps them discover tools and data sources through integration with the Model Context Protocol (MCP).

The agent can then connect to this server, list available tools, and access them through a standardized interface

This integration allows agents to:

Discover external tools

Access structured data sources

Find reusable prompt templates

For example, a hospital implements ACDP with MCP to safely integrate AI agents with electronic health record systems:

The hospital sets up an MCP server that provides access to patient data as read-only resources:

_mcp._tcp.health-tools.hospital.local. IN SRV 0 0 443 ehr-gateway.hospital.local. 
_mcp._tcp.health-tools.hospital.local. IN TXT "resources=labs,imaging,notes" "auth=oauth2"

A doctor's AI assistant discovers this MCP server through DNS and the hospital's private registry.

When the doctor asks: "Summarize this patient's recent lab results," the assistant:

Authenticates with the MCP server (enforcing HIPAA compliance)

Uses resources/read to retrieve authorized lab data

Generates the summary using only data the doctor is authorized to access

The entire interaction happens within the hospital's secure environment, with the MCP server enforcing role-based access controls

Sample Security Operations Use Case: Integrated Threat Detection and Response

Consider a financial institution's Security Operations Center (SOC):

Private Security Infrastructure: The bank maintains several specialized security agents behind their firewall:

; Internal DNS records for security agents 
_llm-agent._tcp.responder.bank-sec.internal. IN SRV 0 0 443 ir.bank-sec.internal. 
_llm-agent._tcp.reponder.bank-sec.internal. IN TXT "caps=log-analysis,threat-hunting,incident-response" "desc=Advanced threat detection and investigation agent"  

_llm-agent._tcp.forensics.bank-sec.internal. IN SRV 0 0 443 forensics.bank-sec.internal. 
_llm-agent._tcp.forensics.bank-sec.internal. IN TXT "caps=memory-forensics,disk-forensics,network-forensics" "desc=Digital forensics assistant"

Bridging to Public Intelligence: When investigating a suspicious alert, these private agents leverage public threat intelligence through authenticated ACDP connections:

; Public DNS records for threat intelligence agents 
_llm-agent._tcp.threat-intel.security-alliance.org. IN SRV 0 0 443 osint.security-alliance.org. 
_llm-agent._tcp.threat-intel.security-alliance.org. IN TXT "caps=ioc-lookup,threat-actor-profiles,vulnerability-data" "auth=required"

Here's how this hybrid approach might handle a potential security incident:

An notification is generated by their Identity Management platform and alerts the Incident Response Agent

The agent:

Analyzes internal logs

Discovers/Extracts indicators (IP addresses, UA Strings, UPNs, Emails, Domains, etc...)

Needs additional context on these indicators

Using ACDP, it discovers and authenticates to public threat intelligence agents

Based on the combined internal and external intelligence, it may:

Collaborate with the Forensics Agent for deeper investigation

Generate automated response recommendations

Create a detailed report for the security team

The system also leverages MCP for discovering and using enterprise security tools, including integration with common security platforms:

; MCP server for security operations tools 
_mcp._tcp.security-tools.bank-sec.internal. IN SRV 0 0 443 security-ops.bank-sec.internal.
_mcp._tcp.security-tools.bank-sec.internal. IN TXT "tools=firewall-control,endpoint-isolation,malware-scan" "auth=mTLS"  

; MCP server for Microsoft security tools integration 
_mcp._tcp.ms-security.bank-sec.internal. IN SRV 0 0 443 security-tools.bank-sec.internal. 
_mcp._tcp.ms-security.bank-sec.internal. IN TXT "tools=ms-graph,defender-endpoint,sentinel" "auth=mTLS"

The agents can:

Discover security tools via the MCP server:

// Request to list available tools 
{   
	"method": "tools/list",   
	"params": {"category": "response"},   
	"id": 1 
}

Execute authorized actions through these tools:

// Request to isolate compromised endpoint 
{   
	"method": "tools/call",   
    "params": {     
    "tool": "endpoint-isolation",     
    "arguments": {       
    	"hostname": "compromised-workstation",       
        "reason": "Suspected malware infection",       
        "duration": "until_manual_review"     
        		  }   
        		},   
    "id": 2 
}

The ability to maintain security boundaries while enabling authenticated cross-boundary collaboration makes ACDP particularly valuable in cybersecurity contexts, where both protecting internal systems and leveraging external intelligence are critical.

For example, the Forensics Agent might discover that it can query Windows Event Logs but not perform active responses, while the Incident Response Agent has authorization to isolate endpoints when suspicious activity passes a certain threshold.

The beauty of this approach is that new security tools can be added to the MCP server without modifying the agents themselves. If the bank adds a new security tool, it simply registers the tool, and all authorized agents can immediately discover and leverage its capabilities.

Conclusion

By leveraging familiar technologies while adding agent-specific discovery mechanisms, we're creating a foundation that's both robust and accessible. If this protocol were expanded and improved upon, I’d envision;

Standard capability taxonomies for consistent agent classification

Trust networks that help identify reliable collaborators

Additional security layers for the various domains and use cases

Specialized discovery mechanisms for different domains (cybersecurity, healthcare, finance, etc.)

Enhanced collaboration/communication patterns beyond simple request-response. See Google's A2E

Shared History/Memory management capabilities

Rate limiting and usage policies especially for open-access agents. Each agent’s metadata can include rate limits (e.g. “max 100 requests per hour from a single IP”) or licensing terms

I don't know that this approach makes sense in closed ecosystems or for single platforms running agents but as inter communication and requirements for agent discovery become commonplace, we'll benefit from having an easily implementable and scalable approach whether it's this or something entirely different.

Regardless of whether this approach makes sense to you or not, if you're building an ecosystem of enterprise agents or contributing to an open network of public agents, it's worth thinking about this up front.

Take a look at the full write up of the protocol and sample code on GitHub. I’d be curious to hear your thoughts on it.