Cosmind
A local-first, privacy AI second brain. Multi-agent RAG turns chaotic Markdown vaults into a queryable archive — and a navigable 3D galaxy of ideas.
Overview
Cosmind is a local-first knowledge system that turns a disconnected folder of Markdown notes into an intelligent, queryable archive — and renders the whole thing as a navigable 3D galaxy of ideas.
Instead of keyword search, a team of autonomous AI agents ingests raw notes, splits them into atomic concepts, extracts semantic links, and indexes everything in a local vector database. You no longer search text: you query an intelligent archivist, or you fly through the spatial map of how your concepts cluster together. Crucially, it runs entirely on your own hardware — no prompt or note ever has to leave your machine.
The problem
The friction of scaling personal knowledge is fragmentation: as notes pile up, cross-domain connections become impossible to see, and the archive turns into a write-only graveyard. Meanwhile, most "chat with your notes" tools solve this by shipping your private knowledge to a commercial cloud API — trading privacy for convenience.
The core engineering constraints I set:
- Data sovereignty: run the full Retrieval-Augmented Generation loop on local, open-weight models, so sensitive prompts and notes never touch a remote server. Cloud models are an option, never a dependency.
- Unstructured in, structure out: convert a disorganized mass of files into a deterministic RAG system — without forcing the user into endless manual tagging.
Architecture
Cosmind is an end-to-end, containerized system (Docker Compose) built around three pillars: a Python orchestrator on the Agno agent framework, ChromaDB for persistent vector storage, and open-weight LLMs served locally via Ollama (Qwen2.5, Llama 3, Llama 3.2-Vision). A FastAPI backend exposes the engine; a React/TypeScript frontend wraps it in a usable interface.
The pipeline is built as a team of specialized agents, each with a narrow job:
- Splitter — breaks raw text into atomic Zettelkasten notes (one concept per note) and emits structured output.
- Researcher — searches the web (DuckDuckGo) and reads real pages to enrich a note, always citing sources.
- Vision — reads images and screenshots embedded in notes, turning them into searchable text.
- Lecturer — writes a literature-note summary that ties the extracted concepts together.
- Chat — answers questions grounded strictly in retrieved context.
- Editor copilot — expands, fact-checks, or tutors a selection on demand.
Raw notes flow in, the agents transform them into linked atomic notes plus embeddings, and everything lands in a persistent vector store that both the chat and the visualizer read from.
Cloudless RAG
On a query, the system embeds the question locally, retrieves the closest chunks from ChromaDB by cosine similarity, and feeds only that retrieved context to a local LLM for synthesis. The embeddings are computed on-device by ChromaDB's built-in model — so retrieval, generation, and storage all happen without a network call.
Two design choices keep it trustworthy:
- Grounding over guessing: the chat agent is instructed to answer only from retrieved notes. If the answer isn't in your vault, it doesn't hallucinate — it explicitly offers a web-search fallback instead.
- Your text is sacred: the ingest pipeline never rewrites or "improves" your original words; it splits and links them, preserving your voice and meaning.
The 3D knowledge galaxy
Retrieval is only half the value — seeing your knowledge is the other half. Cosmind takes the high-dimensional embeddings and reduces them two ways: PCA for a fast, navigable 3D map, and t-SNE for 2D concept "islands". A cosine-similarity pass then links notes above a threshold into a knowledge graph.
The result is a spatial map that shows not just where ideas sit, but how they relate — clusters of related concepts emerge visually, and you can spot cross-domain bridges you'd never find by scrolling a file list.
Engineering decisions
A few constraints shaped the build:
- Local-first, not local-only: the architecture treats cloud models as a hot-swappable option (set an API key and the same agents route through OpenAI), so privacy is the default without sacrificing flexibility.
- Deterministic sync: a sync step reconciles the vector store with the files on disk — added, modified, and deleted notes stay consistent, so the index never silently drifts from reality.
- Anti-hallucination guardrails: structured agent outputs and strict grounding instructions keep the system honest about what it does and doesn't know.
What's next
Cosmind is open-source under the MIT license. The current focus: dropping the last optional cloud dependency entirely (fully local embedding models), incremental re-indexing of only-changed notes, chunking long notes before embedding, and exporting the knowledge graph to external tools like Obsidian Canvas. It's the working proof of a thesis I keep returning to: capable AI does not require surrendering your data.
Tech stack
Python · Agno (multi-agent orchestration) · Ollama (Qwen2.5, Llama 3, Llama 3.2-Vision) · ChromaDB (vector database) · scikit-learn (PCA / t-SNE) · FastAPI · TypeScript · React + Vite · Docker.
Related
Tackling real-world problems with modern technologies in a university-level competition.
Artificial Intelligence didn’t start with ChatGPT. From its origins in the 1950s to today’s deep learning models, let’s explore what AI really is, how it learns from data, and why it’s not just computer “magic.”
Do you have a complex architecture to build?
Tell me about your idea. Let's design a solid, automated, and scalable solution together.