Skip to content

Epic: AI-Driven Semantic Search for envited-x Data Assets #485

@jdsika

Description

@jdsika

Epic: AI-Driven Semantic Search for envited-x Data Assets

Description

Goal: Implement an intelligent, scalable search solution for envited-x data assets to enable semantic discovery, unlock Work Package 1 (Indexing), and prepare the platform for handling extremely large datasets.

Problem: Current LLMs have context window limitations, making it unfeasible to search massive datasets (e.g., terabytes of video footage, gigabytes of point clouds) directly. We need a way to structure and index the associated metadata to make these assets searchable and relatable.

Proposed Solution: The Agent-Based Search Bar

This Epic proposes creating a search interface where an AI agent acts as an intelligent intermediary between the user and the data catalog.

  1. Metadata Indexing:

    • Index JSON-LD files (referenced by tokens/IPFS) into a structured database.
    • Primary Focus: Implement the indexing into a Graph Database to leverage existing ontologies and create connections (triples) between entities.
    • Investigation: Research and compare the benefits of a Vector Database versus a Graph Database, or explore a combined approach.
  2. Agent-Led Query Translation:

    • The AI Agent will receive natural language queries from the user.
    • The Agent will use its knowledge of the ontologies to understand the query's intent (e.g., "map of Munich" -> "GEO reference").
    • The Agent will generate and execute precise, structured queries (e.g., SPARQL) against the indexed database.
  3. Interactive Refinement:

    • The Agent will engage in a conversational flow to resolve ambiguities, asking the user for clarification (e.g., "You mentioned Munich. Are you searching for a Geo reference, or something else?").
    • This interactive process will also provide valuable feedback to refine ontologies and understand actual user search intent.
  4. Flexible Search Tiers:

    • Implement an optional "Extended Search" or "Deep Search" feature that allows users to pay for a more comprehensive search.
    • In a Deep Search, the agent would load a large portion of the pre-sorted data into its context (RAM/Level 1 Cache equivalent) to perform a full Large Language Model search.

Key Objectives & Deliverables

  • Database schema and indexing logic for Graph DB.
  • Basic indexer functionality for JSON-LD metadata.
  • Functional AI agent capable of translating user text into database queries.
  • Working prototype of the search bar integrated with the agent and database.
  • Unblock Work Package 1 by demonstrating confident understanding and implementation of the data structure and indexing solution.

Potential Sub-Tasks (Initial Sprint Focus)

  • Define and set up the Graph Database infrastructure.
  • Implement a tool to index token-referenced JSON-LD files.
  • Develop a minimal agent skill to map a natural language input to a basic SPARQL query.
  • Create an initial set of test data and ontologies to demonstrate core search capabilities.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions