-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Epic: AI-Driven Semantic Search for envited-x Data Assets
Description
Goal: Implement an intelligent, scalable search solution for envited-x data assets to enable semantic discovery, unlock Work Package 1 (Indexing), and prepare the platform for handling extremely large datasets.
Problem: Current LLMs have context window limitations, making it unfeasible to search massive datasets (e.g., terabytes of video footage, gigabytes of point clouds) directly. We need a way to structure and index the associated metadata to make these assets searchable and relatable.
Proposed Solution: The Agent-Based Search Bar
This Epic proposes creating a search interface where an AI agent acts as an intelligent intermediary between the user and the data catalog.
-
Metadata Indexing:
- Index JSON-LD files (referenced by tokens/IPFS) into a structured database.
- Primary Focus: Implement the indexing into a Graph Database to leverage existing ontologies and create connections (triples) between entities.
- Investigation: Research and compare the benefits of a Vector Database versus a Graph Database, or explore a combined approach.
-
Agent-Led Query Translation:
- The AI Agent will receive natural language queries from the user.
- The Agent will use its knowledge of the ontologies to understand the query's intent (e.g., "map of Munich" -> "GEO reference").
- The Agent will generate and execute precise, structured queries (e.g., SPARQL) against the indexed database.
-
Interactive Refinement:
- The Agent will engage in a conversational flow to resolve ambiguities, asking the user for clarification (e.g., "You mentioned Munich. Are you searching for a Geo reference, or something else?").
- This interactive process will also provide valuable feedback to refine ontologies and understand actual user search intent.
-
Flexible Search Tiers:
- Implement an optional "Extended Search" or "Deep Search" feature that allows users to pay for a more comprehensive search.
- In a Deep Search, the agent would load a large portion of the pre-sorted data into its context (RAM/Level 1 Cache equivalent) to perform a full Large Language Model search.
Key Objectives & Deliverables
- Database schema and indexing logic for Graph DB.
- Basic indexer functionality for JSON-LD metadata.
- Functional AI agent capable of translating user text into database queries.
- Working prototype of the search bar integrated with the agent and database.
- Unblock Work Package 1 by demonstrating confident understanding and implementation of the data structure and indexing solution.
Potential Sub-Tasks (Initial Sprint Focus)
- Define and set up the Graph Database infrastructure.
- Implement a tool to index token-referenced JSON-LD files.
- Develop a minimal agent skill to map a natural language input to a basic SPARQL query.
- Create an initial set of test data and ontologies to demonstrate core search capabilities.