This document provides detailed API documentation for AFMBridge endpoints.
http://localhost:8080
Configure the base URL using environment variables HOST and PORT.
Returns the health status of the server.
- Status: 200 OK
- Body:
OK(plain text)
curl http://localhost:8080/healthOpenAI-compatible chat completions endpoint. Generates a model response for the given conversation.
Content-Type: application/jsonAuthorization: Bearer <token>(if authentication is enabled)
| Field | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | Model identifier (e.g., "gpt-4o") |
messages |
array | Yes | Array of message objects |
stream |
boolean | No | Enable streaming (SSE with text/event-stream) |
max_tokens |
integer | No | Maximum tokens to generate (default: 1024) |
temperature |
number | No | Sampling temperature 0.0-2.0 (default: 1.0) |
tools |
array | No | Array of tool definitions (Phase 3) |
tool_choice |
string/object | No | Control tool selection (Phase 3) |
| Field | Type | Required | Description |
|---|---|---|---|
role |
string | Yes | Message role: "system", "user", "assistant", "tool" |
content |
string | Conditional | Message content (required except for assistant with tool_calls) |
tool_calls |
array | No | Tool calls made by assistant (Phase 3) |
tool_call_id |
string | Conditional | ID of tool call this message responds to (required for role="tool") |
name |
string | Conditional | Name of tool that produced this content (required for role="tool") |
| Field | Type | Description |
|---|---|---|
id |
string | Unique completion ID |
object |
string | Object type: "chat.completion" |
created |
integer | Unix timestamp |
model |
string | Model used |
choices |
array | Array of completion choices |
| Field | Type | Description |
|---|---|---|
index |
integer | Choice index (0-based) |
message |
object | Generated message |
finish_reason |
string | Reason for stopping: "stop", "tool_calls", etc. |
| Status | Reason |
|---|---|
| 400 | Bad Request (invalid JSON, invalid tool definition, no user message) |
| 401 | Unauthorized (invalid or missing API key) |
| 503 | Service Unavailable (model not available) |
AFMBridge supports OpenAI-compatible tool calling, enabling the model to request execution of functions with structured arguments. Tool execution happens client-side following the OpenAI pattern.
Tools are defined using JSON Schema to specify function signatures:
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"]
}
},
"required": ["location"]
}
}
}- Client sends request with tools - Include tool definitions in
toolsarray - Model decides to use tools - Returns
finish_reason: "tool_calls"with tool call details - Client executes tools - Run the requested functions locally
- Client submits results - Send new request with tool messages containing results
- Model generates final response - Returns
finish_reason: "stop"with answer
Note: When stream: true is set with tools, the server automatically falls back to
non-streaming responses, as Apple FoundationModels does not yet support streaming tool calls.
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
]
}'{
"id": "chatcmpl-a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d",
"object": "chat.completion",
"created": 1734678901,
"model": "apple-afm-on-device",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
]
}curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "You are a helpful geography tutor."},
{"role": "user", "content": "What is the capital of France?"}
]
}'curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "Write a haiku about programming"}
],
"max_tokens": 50,
"temperature": 0.7
}'curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "What is the weather in Boston?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"]
}
}
}
]
}'{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1734678901,
"model": "apple-afm-on-device",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "I'll check the weather for you.",
"tool_calls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\":\"Boston\"}"
}
}
]
},
"finish_reason": "tool_calls"
}
]
}After receiving tool calls, execute them locally and submit results:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "What is the weather in Boston?"},
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\":\"Boston\"}"
}
}
]
},
{
"role": "tool",
"tool_call_id": "call_abc123",
"name": "get_weather",
"content": "Temperature: 72°F, Conditions: Sunny"
}
]
}'{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1734678902,
"model": "apple-afm-on-device",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The weather in Boston is currently sunny with a temperature of 72°F."
},
"finish_reason": "stop"
}
]
}Anthropic-compatible messages endpoint. Generates a model response for the given conversation.
Content-Type: application/jsonAuthorization: Bearer <token>(if authentication is enabled)
| Field | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | Model identifier (e.g., "claude-opus-4-5-20251101") |
max_tokens |
integer | Yes | Maximum tokens to generate |
messages |
array | Yes | Array of message objects |
system |
string | No | System prompt to set assistant behavior |
stream |
boolean | No | Enable streaming (SSE with text/event-stream) |
temperature |
number | No | Sampling temperature 0.0-1.0 (default: 1.0) |
| Field | Type | Required | Description |
|---|---|---|---|
role |
string | Yes | Message role: "user" or "assistant" |
content |
string or array | Yes | Message content (string or content blocks) |
| Field | Type | Description |
|---|---|---|
id |
string | Unique message ID |
type |
string | Object type: "message" |
role |
string | Always "assistant" |
model |
string | Model used |
content |
array | Array of content blocks |
stop_reason |
string | Reason for stopping: "end_turn", "max_tokens" |
usage |
object | Token usage statistics |
| Field | Type | Description |
|---|---|---|
type |
string | Block type: "text" |
text |
string | Generated text |
| Field | Type | Description |
|---|---|---|
input_tokens |
integer | Number of input tokens |
output_tokens |
integer | Number of output tokens |
| Status | Reason |
|---|---|
| 400 | Bad Request (invalid JSON, no user message) |
| 401 | Unauthorized (invalid or missing API key) |
| 503 | Service Unavailable (model not available) |
curl -X POST http://localhost:8080/v1/messages \
-H "Content-Type: application/json" \
-d '{
"model": "claude-opus-4-5-20251101",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Hello!"}
]
}'{
"id": "msg-a1b2c3d4e5f6",
"type": "message",
"role": "assistant",
"model": "claude-opus-4-5-20251101",
"content": [
{
"type": "text",
"text": "Hello! How can I assist you today?"
}
],
"stop_reason": "end_turn",
"usage": {
"input_tokens": 10,
"output_tokens": 12
}
}curl -X POST http://localhost:8080/v1/messages \
-H "Content-Type: application/json" \
-d '{
"model": "claude-opus-4-5-20251101",
"max_tokens": 1024,
"system": "You are a helpful assistant who speaks like a pirate.",
"messages": [
{"role": "user", "content": "Hello!"}
]
}'curl -X POST http://localhost:8080/v1/messages \
-H "Content-Type: application/json" \
-d '{
"model": "claude-opus-4-5-20251101",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Write a haiku"}
],
"stream": true
}'When stream: true, the server returns Server-Sent Events with the following event sequence:
message_start- Message metadata with input token countcontent_block_start- Start of text content blockcontent_block_delta- Streaming text deltas (multiple events)content_block_stop- End of content blockmessage_delta- Final message metadata with stop reasonmessage_stop- Stream completion
Example streaming response:
event: message_start
data: {"type":"message_start","message":{"id":"msg-...","type":"message","role":"assistant","model":"claude-opus-4-5-20251101","content":[],"stop_reason":null,"usage":{"input_tokens":29,"output_tokens":0}}}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Code"}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" flows"}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":10}}
event: message_stop
data: {"type":"message_stop"}
API key authentication is optional and disabled by default. When enabled, all API requests must include a Bearer token in the Authorization header.
Set the API_KEY environment variable to enable Bearer token authentication:
API_KEY=your-secret-key just runInclude the Bearer token in the Authorization header:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer your-secret-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}]
}'When authentication is enabled and fails, the server returns a 401 Unauthorized error with the appropriate error format (OpenAI or Anthropic depending on the endpoint).
Server configuration is managed through environment variables:
| Variable | Default | Description |
|---|---|---|
HOST |
127.0.0.1 | Server bind address |
PORT |
8080 | Server port |
MAX_TOKENS |
1024 | Default maximum tokens per request |
LOG_LEVEL |
info | Log level (trace, debug, info, warning, error) |
API_KEY |
(none) | Optional Bearer token for authentication |
from openai import OpenAI
# Without authentication (default)
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="not-needed" # API key not required if authentication disabled
)
# With authentication (if API_KEY is set)
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="your-secret-key" # Must match API_KEY environment variable
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
)
print(response.choices[0].message.content)from anthropic import Anthropic
# With authentication (if API_KEY is set)
client = Anthropic(
base_url="http://localhost:8080",
api_key="your-secret-key" # Must match API_KEY environment variable
)
message = client.messages.create(
model="claude-opus-4-5-20251101",
max_tokens=1024,
messages=[
{"role": "user", "content": "Hello!"}
]
)
print(message.content[0].text)import OpenAI from 'openai';
// Without authentication (default)
const client = new OpenAI({
baseURL: 'http://localhost:8080/v1',
apiKey: 'not-needed' // API key not required if authentication disabled
});
// With authentication (if API_KEY is set)
const client = new OpenAI({
baseURL: 'http://localhost:8080/v1',
apiKey: 'your-secret-key' // Must match API_KEY environment variable
});
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Hello!' }
]
});
console.log(response.choices[0].message.content);#!/bin/bash
API_URL="http://localhost:8080/v1/chat/completions"
curl -X POST "$API_URL" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "Hello, world!"}
]
}' | jq- ✅ POST /v1/chat/completions (non-streaming)
- ✅ GET /health
- ✅ System message support
- ✅ Error handling and validation
- ✅ Server-Sent Events (SSE) streaming
- ✅ Streaming response chunks
- ✅ True token-by-token streaming via FoundationModels AsyncSequence
- ✅ OpenAI-compatible tool calling
- ✅ Tool definition schema with JSON Schema
- ✅ Multi-turn conversation with tool results
- ✅ Streaming DTOs for tool calls (falls back to non-streaming)
- ✅ Client-side tool execution pattern
- ✅ POST /v1/messages (Anthropic compatibility)
- ✅ Anthropic Messages API with streaming support
- ✅ API key authentication
- ✅ Error middleware with formatted responses
- ✅ Request logging and metrics
- ✅ Anthropic-compatible tool calling
- ✅ API key authentication
- ✅ Request logging and metrics
- ✅ 80% code coverage (239 tests, 81.61% coverage)
- 🚧 Production documentation and deployment guide
- 🚧 Production documentation
AFMBridge provides detailed, API-specific error messages to help debug issues.
Errors are returned in the format matching the endpoint being called:
{
"error": {
"message": "No user message found in conversation",
"type": "invalid_request_error",
"param": null,
"code": null
}
}{
"type": "error",
"error": {
"type": "invalid_request_error",
"message": "Invalid message format: No user message found in conversation"
}
}- Invalid JSON in request body
- Missing required fields (
model,messages,max_tokens) - No user message in conversation
- Invalid message format
- Missing Authorization header (when authentication is enabled)
- Invalid Authorization header format
- Invalid API key
- FoundationModels framework not available (requires macOS 26.0+)
- Model not available
- LLM generation failure
- Include System Messages: Use system messages to set the behavior and context for the assistant
- Handle Errors Gracefully: Check response status codes and handle errors appropriately
- Set Appropriate Limits: Use
max_tokensto control response length - Monitor Logs: Check server logs for detailed error information (configured via
LOG_LEVEL) - Use Health Checks: Monitor the
/healthendpoint for server availability
For issues, questions, or contributions:
- GitHub Issues: https://github.com/kolohelios/afmbridge/issues
- Documentation: See README.md and PLAN.md
- Contributing: See CONTRIBUTING.md