Skip to content

Vchen7629/Mosaic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

668 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project Banner

CI codecov License: MIT Go Python

Mosaic is a desktop application designed to assist people with memory loss. It identifies visitors via webcam and displays an AI-generated briefing about who they are and what you last talked about — so users always have context before a conversation begins. Originally built for HackMerced XI (Devpost).

Installation

  • To install, check the releases section and install the version for your os from the latest release

How It Works

When a known face appears on webcam, Mosaic shows a personalized card with the visitor's name and a summary of past conversations. Audio is transcribed in real time during the session. When recording stops, the transcript is summarized and stored, updating the briefing for the next visit.

Unknown visitors can be enrolled by name during their first appearance. Over time, the system builds a longitudinal memory of each relationship.

Architecture

flowchart LR
    classDef client fill:none,stroke:#4A90D9,stroke-width:2px,color:#4A90D9
    classDef service fill:none,stroke:#5BA85A,stroke-width:2px,color:#5BA85A
    classDef datastore fill:none,stroke:#8B6BB1,stroke-width:2px,color:#8B6BB1
    classDef legend fill:none,stroke:none,color:#888,font-size:12px

    Frontend["Frontend<br/>React + Tauri"]:::client
    Client["Client<br/>Go"]:::client

    FD["Face Detection Service<br/>Go — dlib, face_recognition"]:::service
    AT["Audio Transcription Service<br/>Python — Whisper"]:::service
    CB["Conversation Briefing Service<br/>Go — Ollama + RAG"]:::service

    DB[("PostgreSQL")]:::datastore
    Valkey[("Valkey")]:::datastore

    Frontend --> Client
    Client --> FD
    Client --> AT
    Client --> CB
    FD <--> Valkey
    FD <--> DB
    AT <--> DB
    CB <--> DB

    linkStyle 0 stroke:#E8A838,stroke-width:2px
    linkStyle 1,2,3 stroke:#00897B,stroke-width:2px
Loading

Green arrows — gRPC  |  Orange arrows — WebSocket

Services

Service Language Description
frontend React / Tauri Captures webcam frames and audio, displays face overlays and briefing cards, streams to client via WebSocket
client Go Receives webcam and audio streams from the frontend, forwards to backend services via gRPC
face_detection Go Identifies faces using dlib embeddings, registers new visitors
audio_transcription Python Transcribes audio in real time using Whisper, persists transcripts
conversation_briefing Go Generates visitor briefings using a self-hosted Qwen model and RAG over conversation history

Tech Stack

  • Transport: gRPC with Protocol Buffers, WebSockets
  • Backend services: Go 1.24
  • ML services: Python 3.12 (Whisper), Go (dlib / face_recognition)
  • LLM: Qwen via Ollama (self-hosted)
  • Vector search: pgvector with similarity search
  • Databases: PostgreSQL, Valkey
  • Desktop shell: Tauri

Local Development

Each service has its own build tooling. From the repo root:

Audio Transcription (Python)

cd backend/audio_transcription
uv sync --locked --all-extras --dev
make test_all

Client / Conversation Briefing / Face Detection (Go)

cd backend/<service>
go mod download
make test_all

Frontend

cd frontend
npm install
npm run dev

Proto definitions live in backend/proto/. Re-generate bindings with:

cd backend/proto
make generate

About

Desktop application that records conversations and recognizes faces in real-time, automatically generating personalized briefings that summarize past interactions with each person.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors