Cosmind

A local-first, privacy AI second brain. Multi-agent RAG turns chaotic Markdown vaults into a queryable archive — and a navigable 3D galaxy of ideas.

Timeline

2026 - Present

Status

Open-source (MIT)

Overview

Cosmind is a local-first knowledge system that turns a disconnected folder of Markdown notes into an intelligent, queryable archive — and renders the whole thing as a navigable 3D galaxy of ideas.

Instead of keyword search, a team of autonomous AI agents ingests raw notes, splits them into atomic concepts, extracts semantic links, and indexes everything in a local vector database. You no longer search text: you query an intelligent archivist, or you fly through the spatial map of how your concepts cluster together. Crucially, it runs entirely on your own hardware — no prompt or note ever has to leave your machine.

The problem

The friction of scaling personal knowledge is fragmentation: as notes pile up, cross-domain connections become impossible to see, and the archive turns into a write-only graveyard. Meanwhile, most "chat with your notes" tools solve this by shipping your private knowledge to a commercial cloud API — trading privacy for convenience.

The core engineering constraints I set:

  • Data sovereignty: run the full Retrieval-Augmented Generation loop on local, open-weight models, so sensitive prompts and notes never touch a remote server. Cloud models are an option, never a dependency.
  • Unstructured in, structure out: convert a disorganized mass of files into a deterministic RAG system — without forcing the user into endless manual tagging.

Architecture

Cosmind is an end-to-end, containerized system (Docker Compose) built around three pillars: a Python orchestrator on the Agno agent framework, ChromaDB for persistent vector storage, and open-weight LLMs served locally via Ollama (Qwen2.5, Llama 3, Llama 3.2-Vision). A FastAPI backend exposes the engine; a React/TypeScript frontend wraps it in a usable interface.

The pipeline is built as a team of specialized agents, each with a narrow job:

  • Splitter — breaks raw text into atomic Zettelkasten notes (one concept per note) and emits structured output.
  • Researcher — searches the web (DuckDuckGo) and reads real pages to enrich a note, always citing sources.
  • Vision — reads images and screenshots embedded in notes, turning them into searchable text.
  • Lecturer — writes a literature-note summary that ties the extracted concepts together.
  • Chat — answers questions grounded strictly in retrieved context.
  • Editor copilot — expands, fact-checks, or tutors a selection on demand.

Raw notes flow in, the agents transform them into linked atomic notes plus embeddings, and everything lands in a persistent vector store that both the chat and the visualizer read from.

Cloudless RAG

On a query, the system embeds the question locally, retrieves the closest chunks from ChromaDB by cosine similarity, and feeds only that retrieved context to a local LLM for synthesis. The embeddings are computed on-device by ChromaDB's built-in model — so retrieval, generation, and storage all happen without a network call.

Two design choices keep it trustworthy:

  • Grounding over guessing: the chat agent is instructed to answer only from retrieved notes. If the answer isn't in your vault, it doesn't hallucinate — it explicitly offers a web-search fallback instead.
  • Your text is sacred: the ingest pipeline never rewrites or "improves" your original words; it splits and links them, preserving your voice and meaning.

The 3D knowledge galaxy

Retrieval is only half the value — seeing your knowledge is the other half. Cosmind takes the high-dimensional embeddings and reduces them two ways: PCA for a fast, navigable 3D map, and t-SNE for 2D concept "islands". A cosine-similarity pass then links notes above a threshold into a knowledge graph.

The result is a spatial map that shows not just where ideas sit, but how they relate — clusters of related concepts emerge visually, and you can spot cross-domain bridges you'd never find by scrolling a file list.

Engineering decisions

A few constraints shaped the build:

  • Local-first, not local-only: the architecture treats cloud models as a hot-swappable option (set an API key and the same agents route through OpenAI), so privacy is the default without sacrificing flexibility.
  • Deterministic sync: a sync step reconciles the vector store with the files on disk — added, modified, and deleted notes stay consistent, so the index never silently drifts from reality.
  • Anti-hallucination guardrails: structured agent outputs and strict grounding instructions keep the system honest about what it does and doesn't know.

What's next

Cosmind is open-source under the MIT license. The current focus: dropping the last optional cloud dependency entirely (fully local embedding models), incremental re-indexing of only-changed notes, chunking long notes before embedding, and exporting the knowledge graph to external tools like Obsidian Canvas. It's the working proof of a thesis I keep returning to: capable AI does not require surrendering your data.

Tech stack

Python · Agno (multi-agent orchestration) · Ollama (Qwen2.5, Llama 3, Llama 3.2-Vision) · ChromaDB (vector database) · scikit-learn (PCA / t-SNE) · FastAPI · TypeScript · React + Vite · Docker.

Do you have a complex architecture to build?

Tell me about your idea. Let's design a solid, automated, and scalable solution together.