AFMBridge API Documentation

This document provides detailed API documentation for AFMBridge endpoints.

Base URL

http://localhost:8080

Configure the base URL using environment variables HOST and PORT.

Endpoints

Health Check

GET /health

Returns the health status of the server.

Response

Status: 200 OK
Body: OK (plain text)

Example

curl http://localhost:8080/health

Chat Completions

POST /v1/chat/completions

OpenAI-compatible chat completions endpoint. Generates a model response for the given conversation.

Request

Headers

Content-Type: application/json
Authorization: Bearer <token> (if authentication is enabled)

Body

Field	Type	Required	Description
`model`	string	Yes	Model identifier (e.g., "gpt-4o")
`messages`	array	Yes	Array of message objects
`stream`	boolean	No	Enable streaming (SSE with text/event-stream)
`max_tokens`	integer	No	Maximum tokens to generate (default: 1024)
`temperature`	number	No	Sampling temperature 0.0-2.0 (default: 1.0)
`tools`	array	No	Array of tool definitions (Phase 3)
`tool_choice`	string/object	No	Control tool selection (Phase 3)

Message Object

Field	Type	Required	Description
`role`	string	Yes	Message role: "system", "user", "assistant", "tool"
`content`	string	Conditional	Message content (required except for assistant with tool_calls)
`tool_calls`	array	No	Tool calls made by assistant (Phase 3)
`tool_call_id`	string	Conditional	ID of tool call this message responds to (required for role="tool")
`name`	string	Conditional	Name of tool that produced this content (required for role="tool")

Response

Success - 200 OK

Field	Type	Description
`id`	string	Unique completion ID
`object`	string	Object type: "chat.completion"
`created`	integer	Unix timestamp
`model`	string	Model used
`choices`	array	Array of completion choices

Choice Object

Field	Type	Description
`index`	integer	Choice index (0-based)
`message`	object	Generated message
`finish_reason`	string	Reason for stopping: "stop", "tool_calls", etc.

Error Responses

Status	Reason
400	Bad Request (invalid JSON, invalid tool definition, no user message)
401	Unauthorized (invalid or missing API key)
503	Service Unavailable (model not available)

Tool Calling (Phase 3)

AFMBridge supports OpenAI-compatible tool calling, enabling the model to request execution of functions with structured arguments. Tool execution happens client-side following the OpenAI pattern.

Tool Definition Schema

Tools are defined using JSON Schema to specify function signatures:

{
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Get current weather for a location",
    "parameters": {
      "type": "object",
      "properties": {
        "location": {
          "type": "string",
          "description": "City name"
        },
        "unit": {
          "type": "string",
          "enum": ["celsius", "fahrenheit"]
        }
      },
      "required": ["location"]
    }
  }
}

Tool Calling Flow

Client sends request with tools - Include tool definitions in tools array
Model decides to use tools - Returns finish_reason: "tool_calls" with tool call details
Client executes tools - Run the requested functions locally
Client submits results - Send new request with tool messages containing results
Model generates final response - Returns finish_reason: "stop" with answer

Note: When stream: true is set with tools, the server automatically falls back to non-streaming responses, as Apple FoundationModels does not yet support streaming tool calls.

Examples

Basic Request

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

Basic Response

{
  "id": "chatcmpl-a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d",
  "object": "chat.completion",
  "created": 1734678901,
  "model": "apple-afm-on-device",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ]
}

Request with System Message

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful geography tutor."},
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

Request with Parameters

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Write a haiku about programming"}
    ],
    "max_tokens": 50,
    "temperature": 0.7
  }'

Tool Calling Request

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "What is the weather in Boston?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get current weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {"type": "string", "description": "City name"}
            },
            "required": ["location"]
          }
        }
      }
    ]
  }'

Tool Calling Response

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1734678901,
  "model": "apple-afm-on-device",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I'll check the weather for you.",
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\":\"Boston\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}

Multi-Turn Tool Conversation

After receiving tool calls, execute them locally and submit results:

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "What is the weather in Boston?"},
      {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\":\"Boston\"}"
            }
          }
        ]
      },
      {
        "role": "tool",
        "tool_call_id": "call_abc123",
        "name": "get_weather",
        "content": "Temperature: 72°F, Conditions: Sunny"
      }
    ]
  }'

Final Response with Tool Results

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1734678902,
  "model": "apple-afm-on-device",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The weather in Boston is currently sunny with a temperature of 72°F."
      },
      "finish_reason": "stop"
    }
  ]
}

Messages

POST /v1/messages

Anthropic-compatible messages endpoint. Generates a model response for the given conversation.

Request

Headers

Content-Type: application/json
Authorization: Bearer <token> (if authentication is enabled)

Body

Field	Type	Required	Description
`model`	string	Yes	Model identifier (e.g., "claude-opus-4-5-20251101")
`max_tokens`	integer	Yes	Maximum tokens to generate
`messages`	array	Yes	Array of message objects
`system`	string	No	System prompt to set assistant behavior
`stream`	boolean	No	Enable streaming (SSE with text/event-stream)
`temperature`	number	No	Sampling temperature 0.0-1.0 (default: 1.0)

Message Object

Field	Type	Required	Description
`role`	string	Yes	Message role: "user" or "assistant"
`content`	string or array	Yes	Message content (string or content blocks)

Response

Success - 200 OK

Field	Type	Description
`id`	string	Unique message ID
`type`	string	Object type: "message"
`role`	string	Always "assistant"
`model`	string	Model used
`content`	array	Array of content blocks
`stop_reason`	string	Reason for stopping: "end_turn", "max_tokens"
`usage`	object	Token usage statistics

Content Block Object

Field	Type	Description
`type`	string	Block type: "text"
`text`	string	Generated text

Usage Object

Field	Type	Description
`input_tokens`	integer	Number of input tokens
`output_tokens`	integer	Number of output tokens

Error Responses

Status	Reason
400	Bad Request (invalid JSON, no user message)
401	Unauthorized (invalid or missing API key)
503	Service Unavailable (model not available)

Examples

Basic Message Request

curl -X POST http://localhost:8080/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-opus-4-5-20251101",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Basic Message Response

{
  "id": "msg-a1b2c3d4e5f6",
  "type": "message",
  "role": "assistant",
  "model": "claude-opus-4-5-20251101",
  "content": [
    {
      "type": "text",
      "text": "Hello! How can I assist you today?"
    }
  ],
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 10,
    "output_tokens": 12
  }
}

Message with System Parameter

curl -X POST http://localhost:8080/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-opus-4-5-20251101",
    "max_tokens": 1024,
    "system": "You are a helpful assistant who speaks like a pirate.",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Streaming Message Request

curl -X POST http://localhost:8080/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-opus-4-5-20251101",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "Write a haiku"}
    ],
    "stream": true
  }'

Streaming Events

When stream: true, the server returns Server-Sent Events with the following event sequence:

message_start - Message metadata with input token count
content_block_start - Start of text content block
content_block_delta - Streaming text deltas (multiple events)
content_block_stop - End of content block
message_delta - Final message metadata with stop reason
message_stop - Stream completion

Example streaming response:

event: message_start
data: {"type":"message_start","message":{"id":"msg-...","type":"message","role":"assistant","model":"claude-opus-4-5-20251101","content":[],"stop_reason":null,"usage":{"input_tokens":29,"output_tokens":0}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Code"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" flows"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":10}}

event: message_stop
data: {"type":"message_stop"}

Authentication

API key authentication is optional and disabled by default. When enabled, all API requests must include a Bearer token in the Authorization header.

Enable Authentication

Set the API_KEY environment variable to enable Bearer token authentication:

API_KEY=your-secret-key just run

Making Authenticated Requests

Include the Bearer token in the Authorization header:

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer your-secret-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Authentication Errors

When authentication is enabled and fails, the server returns a 401 Unauthorized error with the appropriate error format (OpenAI or Anthropic depending on the endpoint).

Configuration

Server configuration is managed through environment variables:

Variable	Default	Description
`HOST`	127.0.0.1	Server bind address
`PORT`	8080	Server port
`MAX_TOKENS`	1024	Default maximum tokens per request
`LOG_LEVEL`	info	Log level (trace, debug, info, warning, error)
`API_KEY`	(none)	Optional Bearer token for authentication

Integration Examples

Python with OpenAI SDK

from openai import OpenAI

# Without authentication (default)
client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="not-needed"  # API key not required if authentication disabled
)

# With authentication (if API_KEY is set)
client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="your-secret-key"  # Must match API_KEY environment variable
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

Python with Anthropic SDK

from anthropic import Anthropic

# With authentication (if API_KEY is set)
client = Anthropic(
    base_url="http://localhost:8080",
    api_key="your-secret-key"  # Must match API_KEY environment variable
)

message = client.messages.create(
    model="claude-opus-4-5-20251101",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

print(message.content[0].text)

JavaScript with OpenAI SDK

import OpenAI from 'openai';

// Without authentication (default)
const client = new OpenAI({
  baseURL: 'http://localhost:8080/v1',
  apiKey: 'not-needed' // API key not required if authentication disabled
});

// With authentication (if API_KEY is set)
const client = new OpenAI({
  baseURL: 'http://localhost:8080/v1',
  apiKey: 'your-secret-key' // Must match API_KEY environment variable
});

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Hello!' }
  ]
});

console.log(response.choices[0].message.content);

cURL

#!/bin/bash
API_URL="http://localhost:8080/v1/chat/completions"

curl -X POST "$API_URL" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Hello, world!"}
    ]
  }' | jq

Phase Roadmap

Phase 1 (Complete)

✅ POST /v1/chat/completions (non-streaming)
✅ GET /health
✅ System message support
✅ Error handling and validation

Phase 2 (Complete)

✅ Server-Sent Events (SSE) streaming
✅ Streaming response chunks
✅ True token-by-token streaming via FoundationModels AsyncSequence

Phase 3 (Complete)

✅ OpenAI-compatible tool calling
✅ Tool definition schema with JSON Schema
✅ Multi-turn conversation with tool results
✅ Streaming DTOs for tool calls (falls back to non-streaming)
✅ Client-side tool execution pattern

Phase 4 (Complete)

✅ POST /v1/messages (Anthropic compatibility)
✅ Anthropic Messages API with streaming support
✅ API key authentication
✅ Error middleware with formatted responses
✅ Request logging and metrics
✅ Anthropic-compatible tool calling

Phase 5 (In Progress)

✅ API key authentication
✅ Request logging and metrics
✅ 80% code coverage (239 tests, 81.61% coverage)
🚧 Production documentation and deployment guide
🚧 Production documentation

Error Handling

AFMBridge provides detailed, API-specific error messages to help debug issues.

Error Response Formats

Errors are returned in the format matching the endpoint being called:

OpenAI Format (`/v1/chat/completions`)

{
  "error": {
    "message": "No user message found in conversation",
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }
}

Anthropic Format (`/v1/messages`)

{
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "message": "Invalid message format: No user message found in conversation"
  }
}

Common Error Types

invalid_request_error (400 Bad Request)

Invalid JSON in request body
Missing required fields (model, messages, max_tokens)
No user message in conversation
Invalid message format

authentication_error (401 Unauthorized)

Missing Authorization header (when authentication is enabled)
Invalid Authorization header format
Invalid API key

api_error (503 Service Unavailable)

FoundationModels framework not available (requires macOS 26.0+)
Model not available
LLM generation failure

Best Practices

Include System Messages: Use system messages to set the behavior and context for the assistant
Handle Errors Gracefully: Check response status codes and handle errors appropriately
Set Appropriate Limits: Use max_tokens to control response length
Monitor Logs: Check server logs for detailed error information (configured via LOG_LEVEL)
Use Health Checks: Monitor the /health endpoint for server availability

Support

For issues, questions, or contributions:

GitHub Issues: https://github.com/kolohelios/afmbridge/issues
Documentation: See README.md and PLAN.md
Contributing: See CONTRIBUTING.md

FilesExpand file tree

API.md

Latest commit

History

API.md

File metadata and controls

AFMBridge API Documentation

Base URL

Endpoints

Health Check

GET /health

Response

Example

Chat Completions

POST /v1/chat/completions

Request

Headers

Body

Message Object

Response

Success - 200 OK

Choice Object

Error Responses

Tool Calling (Phase 3)

Tool Definition Schema

Tool Calling Flow

Examples

Basic Request

Basic Response

Request with System Message

Request with Parameters

Tool Calling Request

Tool Calling Response

Multi-Turn Tool Conversation

Final Response with Tool Results

Messages

POST /v1/messages

Request

Headers

Body

Message Object

Response

Success - 200 OK

Content Block Object

Usage Object

Error Responses

Examples

Basic Message Request

Basic Message Response

Message with System Parameter

Streaming Message Request

Streaming Events

Authentication

Enable Authentication

Making Authenticated Requests

Authentication Errors

Configuration

Integration Examples

Python with OpenAI SDK

Python with Anthropic SDK

JavaScript with OpenAI SDK

cURL

Phase Roadmap

Phase 1 (Complete)

Phase 2 (Complete)

Phase 3 (Complete)

Phase 4 (Complete)

Phase 5 (In Progress)

Error Handling

Error Response Formats

OpenAI Format (/v1/chat/completions)

Anthropic Format (/v1/messages)

Common Error Types

invalid_request_error (400 Bad Request)

authentication_error (401 Unauthorized)

api_error (503 Service Unavailable)

Best Practices

Support

OpenAI Format (`/v1/chat/completions`)

Anthropic Format (`/v1/messages`)