Skip to content

Basic Identity Service #13

@tfius

Description

@tfius

Parent: #10 (EPIC: Transform DataCortex into Context Engine)
Priority: HIGH | Phase: 2 - Connectivity | Complexity: Medium

What to Implement

Implement an IdentityService that maps disparate identifiers (email addresses, usernames, handles) to canonical Person entities, enabling agents to understand that "Sarah" in Slack is @sarah on GitHub.

Features

  1. Person entity type based on Schema.org
  2. Alias mapping system (email → person, @handle → person)
  3. Manual alias configuration in YAML
  4. Identity resolution API for agents
  5. Future hook for LLM-based resolution

How to Implement

Step 1: Define Person Entity Model

# src/datacortex/identity/models.py
class PersonAlias(BaseModel):
    value: str           # The alias value
    type: str           # email, github, slack, twitter, name
    confidence: float   # 0.0-1.0
    source: str         # manual, heuristic, llm
    verified: bool = False

class Person(BaseModel):
    id: str                          # person:sarah-chen
    name: str
    given_name: Optional[str] = None
    family_name: Optional[str] = None
    email: Optional[str] = None
    aliases: list[PersonAlias] = []
    metadata: dict = {}
    created_at: datetime
    updated_at: datetime

    def matches(self, identifier: str) -> tuple[bool, float]:
        """Check if identifier matches this person."""

Step 2: Create Database Schema

CREATE TABLE persons (
    id TEXT PRIMARY KEY,
    name TEXT NOT NULL,
    given_name TEXT,
    family_name TEXT,
    email TEXT,
    metadata JSON,
    created_at TEXT,
    updated_at TEXT
);

CREATE TABLE person_aliases (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    person_id TEXT NOT NULL REFERENCES persons(id),
    alias_value TEXT NOT NULL,
    alias_type TEXT NOT NULL,
    confidence REAL DEFAULT 1.0,
    source TEXT DEFAULT 'manual',
    verified INTEGER DEFAULT 0,
    UNIQUE(alias_value, alias_type)
);

CREATE INDEX idx_aliases_value ON person_aliases(alias_value);

Step 3: Create Identity Service

# src/datacortex/identity/service.py
class IdentityService:
    def __init__(self, db_path: str):
        self.db_path = db_path
        self._cache: dict[str, Person] = {}

    def resolve(self, identifier: str) -> Optional[Person]:
        """Resolve an identifier to a canonical Person."""
        if identifier in self._cache:
            return self._cache[identifier]
        person = self._query_by_alias(identifier)
        if person:
            self._cache[identifier] = person
        return person

    def add_person(self, person: Person) -> None:
        """Add a new person to the identity store."""

    def add_alias(self, person_id: str, alias: PersonAlias) -> None:
        """Add an alias to an existing person."""

    def merge_persons(self, primary_id: str, secondary_id: str) -> Person:
        """Merge two persons, keeping primary as canonical."""

Step 4: YAML Configuration

# config/identities.yaml
persons:
  - id: person:sarah-chen
    name: Sarah Chen
    email: sarah.chen@company.com
    aliases:
      - value: sarah
        type: name
      - value: "@schen"
        type: slack
      - value: sarahchen
        type: github

  - id: person:john-doe
    name: John Doe
    email: john@example.com
    aliases:
      - value: johnd
        type: github

Step 5: Add API Endpoints

# src/datacortex/api/routes/identity.py
@router.get("/resolve/{identifier}")
async def resolve_identity(identifier: str):
    """Resolve an identifier to a canonical person."""

@router.get("/persons")
async def list_persons():
    """List all known persons."""

@router.post("/persons")
async def add_person(person: Person):
    """Add a new person."""

Step 6: Add CLI Commands

@cli.group()
def identity():
    """Manage identity resolution."""
    pass

@identity.command()
@click.argument('identifier')
def resolve(identifier: str):
    """Resolve an identifier to a canonical person."""

@identity.command()
def sync():
    """Sync identities from config/identities.yaml."""

@identity.command()
def list():
    """List all known persons."""

Acceptance Criteria

  • Person entities can be defined in YAML config
  • datacortex identity resolve @sarah returns canonical person
  • API endpoint for identity resolution works
  • Multiple aliases map to same person
  • Identity service integrates with graph (Person nodes)
  • Documentation for identity configuration

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions