Agent-D API Integration Guide

Agent-D provides a FastAPI wrapper, allowing you to send commands via HTTP and receive streaming results. This guide covers the available API endpoints, configuration options, and best practices for using the Agent-D API.

API Endpoints

Execute Task

Send POST requests to execute tasks. For example, to execute a task using cURL:

curl --location 'http://127.0.0.1:8000/execute_task' \
--header 'Content-Type: application/json' \
--data '{
    "command": "go to espn, look for soccer news, report the names of the most recent soccer champs",
    "llm_config": {"planner_agent": {...}, "browser_nav_agent": {...}},
    "planner_max_chat_round": 50,
    "browser_nav_max_chat_round": 10,
    "clientid": "optional-client-id",
    "request_originator": "optional-request-originator"
}'

command: The command related to web navigation to execute.
llm_config: Optional. Configuration for planner and browser navigation agents.
planner_max_chat_round: Optional. Maximum chat rounds for the planner agent (default: 50).
browser_nav_max_chat_round: Optional. Maximum chat rounds for the browser navigation agent (default: 10).
clientid: Optional. Client identifier.
request_originator: Optional. ID of the request originator.

Set Credentials

Before executing tasks that require authentication, you must set credentials using the secure API endpoint:

POST /api/set_credentials
Content-Type: application/json

Request body:

{
    "username": "your_username",
    "password": "your_password",
    "client_secret": "your-api-client-secret"
}

username: The username credential.
password: The password credential.
client_secret: Required. Client secret for authentication.

LLM Configuration

Agent-D supports multiple LLM providers and models through a flexible configuration system. The API accepts custom LLM configurations for both the planner and browser navigation agents.

Supported LLM Providers

OpenAI
- Suggested Models: GPT-4o, GPT-4-Turbo
- Requires OpenAI API key
Anthropic
- Suggested Models: Claude-3-Opus, Claude-3-Sonnet, Claude-3-Haiku
- Requires Anthropic API key
Mistral
- Suggested Models: Mistral-Large, Mistral-Medium, Mistral-Small
- Requires Mistral API key and base URL
Llama (via Groq)
- Suggested Models: Llama-3.1-70b-versatile
- Requires Groq API key

Configuration Structure

The LLM configuration can be provided in the API request body using the following structure:

{
    "command": "your automation command",
    "llm_config": {
        "planner_agent": {
            "model_name": "string",
            "model_api_key": "string",
            "model_base_url": "string",
            "model_api_type": "string",
            "system_prompt": "string",
            "llm_config_params": {
                "temperature": number,
                "top_p": number,
                "cache_seed": null,
                "seed": number
            }
        },
        "browser_nav_agent": {
            // Same structure as planner_agent
        }
    }
}

Example Configurations

OpenAI GPT-4

{
    "command": "1. Navigate to the URL https://comfortdentalsid.curvehero.com/#/. 2. Check if redirected to a login page. 3. If on a login page, use the enter secret credentials tool to input the username and password. 4. Verify successful login or capture any error message if login fails.",
    "llm_config": {
        "planner_agent": {
            "model_name": "gpt-4",
            "model_api_key": "sk-...",
            "model_base_url": "https://...",
            "system_prompt": "You are a web automation task planner....",
            "llm_config_params": {
                "temperature": 0.0,
                "top_p": 0.001,
                "seed": 12345
            }
        },
        "browser_nav_agent": {
            "model_name": "gpt-4",
            "model_api_key": "sk-...",
            "model_base_url": "https://...",
            "system_prompt": "You will perform web navigation tasks with the functions that you have...\nOnce a task is completed, confirm completion with ##TERMINATE TASK##.",
            "llm_config_params": {
                "temperature": 0.0,
                "top_p": 0.001,
                "seed": 12345
            }
        }
    }
}

Secure Credential Management

Agent-D provides a secure mechanism for handling automated logins through its credential management system. This is particularly useful for automating tasks that require authentication.

How Credential Management Works

Storage:
- Credentials are stored securely in environment variables within the server's process memory.
- Not written to disk or persistent storage.
- Isolated to the specific server instance.
- Cleared when the server restarts.
Usage Flow:
- When a task requires authentication, the browser navigation agent automatically:
  1. Detects login forms.
  2. Retrieves credentials from environment variables.
  3. Fills in appropriate fields.
  4. Handles form submission.
  - All without exposing credentials in logs or responses.
Security Features:
- Client authentication using constant-time comparison.
- Credentials never included in error messages or logs.
- Environment variable storage prevents credential exposure in crash dumps.
- Automatic credential clearing on server restart.

Limitations

Session Management:
- Credentials are valid only for the current server instance.
- Must be reset after server restarts.
- No persistent storage of credentials.
Concurrent Usage:
- Single set of credentials per server instance.
- Credentials may need to be reset before giving a command that requires navigating to different sites.
Authentication Methods:
- Currently supports basic username/password authentication.
- No support for:
  - Multi-factor authentication
  - OAuth flows
  - SSO systems
  - Biometric authentication

Best Practices

Model Selection
- Use GPT-4 or Claude-3-Opus for complex navigation tasks.
- Consider Mistral or Llama for simpler tasks to optimize costs.
- Test different models to find the best balance for your use case.
Parameter Tuning
- Lower temperature (0.0-0.1) for consistent results.
- Higher temperature (0.2-0.7) for more creative problem-solving.
- Use seed values for reproducible results.
System Prompts
- Keep default system prompts unless you have specific requirements.
- Custom system prompts should maintain task completion signals (e.g., ##TERMINATE TASK##).
- Avoid overly complex prompts that might confuse the model.
API Keys
- Store API keys securely.
- Use environment variables when possible.
- Rotate keys regularly following security best practices.
Environment Configuration
- Set environment variables like API_CLIENT_SECRET, HOST, PORT, and CONTAINER_ID for proper application configuration.
- Ensure these variables are securely managed and not exposed in logs or error messages.

This guide should provide a comprehensive overview of the Agent-D API, its features, and best practices for integration and usage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent-D API Integration Guide

API Endpoints

Execute Task

Set Credentials

LLM Configuration

Supported LLM Providers

Configuration Structure

Example Configurations

Secure Credential Management

How Credential Management Works

Limitations

Best Practices

FilesExpand file tree

API_GUIDE.md

Latest commit

History

API_GUIDE.md

File metadata and controls

Agent-D API Integration Guide

API Endpoints

Execute Task

Set Credentials

LLM Configuration

Supported LLM Providers

Configuration Structure

Example Configurations

Secure Credential Management

How Credential Management Works

Limitations

Best Practices