Graphify — codebases & docs to knowledge graphs
A Claude Code skill that turns any folder of code, docs, PDFs, and images into a queryable knowledge graph.
graphify is a Claude Code skill. Type /graphify in any folder and it reads your files — code, PDFs, markdown, screenshots, diagrams, even images in other languages — and builds a queryable knowledge graph out of them. It uses Claude vision to extract concepts and relationships, connects everything into one graph, and is honest about what it found versus what it guessed. Queries run at roughly 71× fewer tokens than re-reading the raw files, and the graph persists across sessions.
Install
Requires Claude Code and Python 3.10+.
- 1
Install the package and the skill
One command installs the CLI and registers the/graphifyskill.bashpip install graphifyy && graphify install - 2
Run it in any directory
Open Claude Code in a project and build the graph.bash/graphify .
Naming & PATH
The PyPI package is temporarilygraphifyy (double-y) while the graphify name is reclaimed — the CLI and skill are still graphify. On macOS “externally-managed” errors or Windows PATH issues, use pipx install graphifyy instead.What a run produces
Every run writes a graphify-out/ folder you can browse, query, or commit so your team starts from the cached graph:
graph.html interactive graph — click nodes, search, filter by community
obsidian/ open as an Obsidian vault
wiki/ Wikipedia-style articles for agent navigation (--wiki)
GRAPH_REPORT.md god nodes, surprising connections, suggested questions
graph.json persistent graph — query weeks later without re-reading
cache/ SHA256 cache — re-runs only process changed filesThe full command surface
graphify is a single skill with one command and a set of flags. Everything it does:
Build & grow the graph
| Command | What it does |
|---|---|
| /graphify [path] | Build a knowledge graph from a folder (defaults to the current directory). |
| --mode deep | More aggressive extraction with richer INFERRED edges. |
| --update | Incremental: re-extract only changed files and merge into the existing graph. |
| add <url> | Fetch a paper, tweet, or web page, save it, and fold it into the graph. |
Query it
| Command | What it does |
|---|---|
| query "…" | Ask the graph a question in natural language. |
| path "A" "B" | Find how two nodes connect to each other. |
| explain "X" | Explain a node and the relationships around it. |
Keep it fresh
| Command | What it does |
|---|---|
| --watch | Auto-sync as files change — instant for code, notifies you for docs. |
| graphify hook install | Post-commit git hook that rebuilds the graph after every commit. |
Export & integrate
| Command | What it does |
|---|---|
| --wiki | Build an agent-crawlable wiki — an index.md plus one article per community. |
| --svg | Export graph.svg. |
| --graphml | Export graph.graphml for Gephi or yEd. |
| --neo4j | Generate cypher.txt to load the graph into Neo4j. |
| --mcp | Start an MCP stdio server so agents can query the graph as a tool. |
Quick examples
/graphify ./raw --mode deep # thorough build with richer inferred edges
/graphify add https://arxiv.org/abs/1706.03762 # pull in a paper
/graphify query "what connects attention to the optimizer?"
/graphify path "DigestAuth" "Response"
/graphify explain "SwinTransformer"What it reads
Fully multimodal — drop in any mix of file types and it extracts from all of them:
| Type | Extensions | Extraction |
|---|---|---|
| Code | .py .ts .js .go .rs .java .c .cpp .rb .cs .kt .scala .php | AST via tree-sitter + a call-graph pass |
| Docs | .md .txt .rst | Concepts + relationships via Claude |
| Papers | Citation mining + concept extraction | |
| Images | .png .jpg .webp .gif | Claude vision — screenshots, diagrams, any language |
What you get back
God nodes
The highest-degree concepts — what everything else connects through.
Surprising connections
Ranked by a composite score (code↔paper edges rank above code↔code), each with a plain-English why.
Suggested questions
Four or five questions the graph is uniquely positioned to answer.
Token benchmark
Printed after every run — e.g. ~71.5× fewer tokens per query vs reading the raw files.
Communities & confidence
Leiden clustering groups related concepts; every edge is tagged EXTRACTED, INFERRED, or AMBIGUOUS.
Stays current
--watch rebuilds as code changes; the git hook rebuilds on every commit.
Honest by design
Because every edge carries a confidence tag, you always know what graphify found in the source versus what it inferred — no silent guessing.Under the hood
Built on NetworkX + Leiden (graspologic) + tree-sitter + Claude + vis.js. It runs entirely locally — no Neo4j, no server. Beyond Claude Code, the same skill installs into Cursor, Gemini CLI, Codex, and Copilot CLI via graphify <platform> install, and the --mcp mode exposes the graph to any MCP client.
Learn more
- github.com/safishamsi/graphify — the skill, CLI, and worked examples
- worked/ — real corpora with their actual graph output and token numbers
- ARCHITECTURE.md — module responsibilities and how to add a language