GitHub Repo
MIT
April 16, 2026 at 12:49 AM0 views
Vision Claw
@safishamsiProject Author
- Overview
- Graphify is an AI-powered coding assistant skill that reads your files, builds a structured knowledge graph, and reveals architectural insights you might not notice from raw code alone. It works across folders, notes, papers, PDFs, images, videos, and more, turning disparate sources into a single, navigable graph.
- The system is fully multimodal: you drop in code, PDFs, Markdown, diagrams, whiteboard photos, screenshots, or even audio and video, and graphify extracts concepts and relationships from all of it, connecting them into one coherent graph.
- It supports 25 languages via tree-sitter AST parsing for code and structured text, enabling broad language coverage (Python, JavaScript, TypeScript, Go, Rust, Java, C, C++, Ruby, C#, Kotlin, Scala, PHP, Swift, and more).
- The output artifacts include an interactive graph (graph.html), a persistent JSON graph (graph.json), a human-readable audit (GRAPH_REPORT.md), and a cache that reuses work on subsequent runs. This architecture allows you to re-query weeks later without re-reading the entire corpus.
- The project emphasizes transparency: every relationship in the graph is tagged as EXTRACTED (found directly in source), INFERRED (reasonable inference with a confidence score), or AMBIGUOUS (flagged for review). You always know what was found versus what was guessed.
- A noteworthy efficiency claim: in mixed corpora, graphify achieves around 71.5x fewer tokens per query versus reading the raw files, thanks to a persistent, compact graph representation and a domain-aware extraction workflow. This performance benefit compounds over time as the graph is reused.
- Included at the outset are badges and indicators of project status and community engagement (CI, PyPI, downloads, sponsorship, and LinkedIn presence) to signal current maintenance, distribution, and support.
[Image: CI badge]
[Image: PyPI badge]
[Image: Downloads badge]
[Image: Sponsor badge]
[Image: LinkedIn badge]
- Core capabilities and intent
- Purpose: graphify turns a folder of files into a structured knowledge graph that reveals “god nodes,” surprising connections, and design rationales that are often buried in comments, docs, or related papers.
- Multimodal ingestion: supports code, docs, papers, images, video and audio, and web-derived content through URLs. Transcripts from audio/video are created locally with Whisper via a domain-aware prompt, and transcripts are cached for fast re-runs.
- Language breadth: 25 languages supported for analysis via tree-sitter AST, enabling robust extraction across ecosystems and communities.
- Graph-centric design: the output is a NetworkX graph, enriched with communities discovered by Leiden clustering, and augmented by explicit semantic similarity edges (INFERRED) that reflect cross-document concept ties.
- Transparency in reasoning: each edge is labeled as EXTRACTED, INFERRED (with a confidence score), or AMBIGUOUS, so users can audit the basis of connections and decisions.
- How it works: the three-pass pipeline
- Pass 1 — Deterministic AST extraction (no LLM needed): code files are analyzed to extract structure such as classes, functions, imports, call graphs, docstrings, and rationale comments. This deterministic step grounds the graph in code topology without relying on large language models.
- Pass 2 — Local transcription for media: video and audio files are transcribed locally using faster-whisper. Transcripts are cached, enabling instant re-runs. Transcriptions are domain-aware and integrated as first-class inputs to the knowledge graph (god nodes, edges, and rationale lines).
- Pass 3 — Subagent-driven semantic extraction: Claude subagents (or equivalents on each platform) run in parallel over docs, papers, images, and transcripts to extract concepts, relationships, and design rationales. The results are merged, clustered, and exported into the graph, with a topological clustering approach (Leiden) rather than embedding-based clustering.
- Output produced: an interactive HTML graph (graph.html), a persistent graph in JSON (graph.json), and a narrative audit report (GRAPH_REPORT.md). A SHA256-based cache ensures only changed files are reprocessed.
- Output artifacts and how to use them
- graph.html: an interactive graph where you can click nodes, search, and filter by community.
- GRAPH_REPORT.md: a narrative report highlighting god nodes, surprising connections, and notable design rationales.
- graph.json: the persistent graph that supports queries weeks later without re-reading the corpus.
- cache/: a snapshot cache of processed files and transcriptions to accelerate re-runs.
- A .graphifyignore file can be added at the repo root to exclude folders from the graph (patterns follow .gitignore syntax, e.g., vendor/, node_modules/, dist/). The ignore rules apply when graphify runs on subfolders as well.
- Graph structure and relationship semantics
- Edge types:
- EXTRACTED: direct, source-found relationships with a weight of 1.0.
- INFERRED: reasonable inferences with a confidence_score (0.0–1.0) indicating how strongly the system believes the link is valid.
- AMBIGUOUS: flagged for human review when a connection is unclear.
- Clustering is topology-based: Leiden detection computes communities by edge density, not via embedding similarity alone.
- Semantic similarity edges exist as cross-document conceptual links even when there is no direct structural connection—these edges are INFERRED and influence community structure.
- The graph includes hyperedges to articulate multi-node, multi-step relationships (e.g., a shared protocol across classes, or a combined sequence in an authentication flow).
- All information remains local to the graph, with a clear separation between factual extractions and inferred reasoning.
- Installation, platforms, and first setup
- Requires: Python 3.10+ and one of a long list of AI agents or toolchains (Claude Code, Codex, OpenCode, Cursor, Gemini CLI, Copilot CLI, Aider, OpenClaw, Factory Droid, Trae, Kiro, Hermes, Google Antigravity).
- Official package: PyPI package is graphifyy (install with pip install graphifyy). Note that other packages named graphify* on PyPI are not affiliated with this project; the official repository is safishamsi/graphify.
- Core install command (one-liner):
- bash
- pip install graphifyy && graphify install
- Platform-specific install commands (high level):
- Claude Code (Linux/Mac): graphify install
- Claude Code (Windows): graphify install (auto-detected) or graphify install --platform windows
- Codex: graphify install --platform codex
- OpenCode: graphify install --platform opencode
- GitHub Copilot CLI: graphify install --platform copilot
- Aider: graphify install --platform aider
- OpenClaw: graphify install --platform claw
- Factory Droid: graphify install --platform droid
- Trae: graphify install --platform trae
- Trae CN: graphify install --platform trae-cn
- Gemini CLI: graphify install --platform gemini
- Hermes: graphify install --platform hermes
- Kiro IDE/CLI: graphify kiro install
- Cursor: graphify cursor install
- Google Antigravity: graphify antigravity install
- Post-install housekeeping (platform-specific):
- Claude Code: installs CLAUDE.md and a PreToolUse hook to read GRAPH_REPORT.md before answering architecture questions.
- Codex/OpenCode/GeminiHermes: install corresponding AGENTS.md and tool hooks that inject graph reminders or always-on rules.
- Cursor, Kiro, Antigravity: hook or rules files that make the graph info visible by default.
- After installation, the recommended start is:
- /graphify . (run on current directory)
- The slash format is platform-specific on some assistants; for Codex, use $graphify . instead of /graphify .
- Making the assistant always use the graph (recommended):
- Each platform has a dedicated install command to enable always-on behavior (CLAUDE, CODex, OPENCODE, COPILOT CLI, AIDER, OPENCLAW, FACTORY DROID, TRAE, TRAE-CN, CURSOR, GEMINI CLI, HERMES, KIRO, GOOGLE ANTIGRAVITY, etc.).
- Claude Code adds a CLAUDE.md and a PreToolUse hook; Codex adds AGENTS.md and a PreToolUse hook; OpenCode adds a tool.execute.before plugin; Cursor adds a rules file; Gemini copies a SKILL.md and installs BeforeTool hooks; Aider/OpenClaw/Factory Droid/Trae/Hermes copy rules to a platform-wide skill area; Kiro injects rules via a steering file; Antigravity uses always-on rules. The end result is the assistant navigates by the graph rather than grepping raw files.
- General workflow concept:
- After a graph exists, run the platform-specific install to ensure the AI assistant consults the graph whenever answering questions about the project.
- Working with graph.json in an LLM-enabled workflow
- The graph.json can be exposed as an MCP server for structured querying:
- python -m graphify.serve graphify-out/graph.json
- Typical usage pattern:
- Start by inspecting the high-level view with graphify-out/GRAPH_REPORT.md
- Then run focused graph queries to pull subgraphs for precise questions
- Feed a focused graph snippet to the LLM as context rather than pasting the entire raw corpus
- Example workflow:
- graphify query "show the auth flow" --graph graphify-out/graph.json
- graphify query "what connects DigestAuth to Response?" --graph graphify-out/graph.json
- The outputs include node labels, edge types, confidence tags, source files, and exact locations in the corpus.
- If your assistant supports tool calling or MCP, you can export graph.json as an MCP server and query it programmatically:
- graphify-out/graph.json can be served and queried against via an API for repeatable, structured access.
- Supported file types and extraction details
- Code (.py, .ts, .js, .jsx, .tsx, .go, .rs, .java, .c, .cpp, .rb, .cs, .kt, .scala, .php, .swift, .lua, .zig, .ps1, .ex, .exs, .m, .mm, .jl, .vue, .svelte):
- Extraction: AST via tree-sitter plus cross-file call graphs and docstring/rationale comments.
- Docs (.md, .txt, .rst):
- Extraction: Concepts, relationships, and design rationale via Claude-like extraction.
- Office (.docx, .xlsx):
- Extraction: Converted to Markdown and processed by Claude (requires the graphifyy office extension).
- Papers (.pdf):
- Extraction: Citation mining plus concept extraction to scaffold scholarly connections.
- Images (.png, .jpg, .webp, .gif):
- Extraction: Claude vision processing to interpret screenshots, diagrams, and visuals, including cross-language references.
- Video / Audio (.mp4, .mov, .mkv, .webm, .avi, .m4v, .mp3, .wav, .m4a, .ogg):
- Extraction: Transcribed locally with faster-whisper; transcript fed into Claude extraction; transcripts cached.
- YouTube / URLs:
- Extraction: yt-dlp downloads audio, transcribes locally, and ingests into the same extraction pipeline.
- Video and audio corpus workflow
- To enable automatic transcription:
- pip install 'graphifyy[video]'
- /graphify ./my-corpus
- You can add a public video directly:
- /graphify add
(example: a public arxiv abstract or a tweet) - You can specify whisper model for better accuracy:
- /graphify ./my-corpus --whisper-model medium
- Privacy note: audio and video transcription runs locally; no audio or raw media leaves your machine.
- What you get: a rich set of insights
- God nodes: highest-degree concepts that anchor the graph.
- Surprising connections: ranked by a composite score; code-paper edges often rank higher than code-code edges.
- Suggested questions: a curated list of questions the graph is uniquely positioned to answer.
- The “why”: rationale nodes are derived from docstrings, inline comments (# NOTE:, # IMPORTANT:, # HACK:, # WHY:), and design rationale from docs.
- Confidence scores: INFERRED edges carry a numerical confidence; EXTRACTED edges are definitive (1.0).
- Semantic similarity edges: cross-document links that reflect conceptual overlap across files and domains.
- Hyperedges: multi-node relationships that cannot be expressed by pairwise edges alone, such as all classes implementing a shared protocol or a sequence of related concepts from a paper section.
- Token efficiency: a token benchmark is printed after each run; the system demonstrates substantial token savings on larger corpora.
- Auto-sync (–watch): background updates when code changes; quick rebuilds for code (AST only) and notifications for doc/image changes prompting a re-run of the LLM pass.
- Worked examples and privacy assurances
- Worked examples encourage practical validation. You can run /graphify on a real corpus, save the outputs to worked/{slug}/, write an honest review (review.md), and contribute back to the project.
- Privacy philosophy:
- Graphify processes code locally via tree-sitter; no source code leaves your machine.
- Video and audio are transcribed locally; transcripts are stored in graphify-out/transcripts/ for reuse.
- The only network interactions are with your platform’s model API during extraction, using your own API key.
- No telemetry or analytics are emitted by graphify itself.
- Tech stack and building blocks
- Core components:
- NetworkX for graph representation and manipulation.
- Leiden community detection for graph topology clustering.
- tree-sitter for fast, accurate language-aware parsing of code.
- vis.js for interactive graph visualization in the browser.
- Semantic extraction:
- Claude (Claude Code) or GPT-family models on your platform to extract concepts, relationships, and design rationales.
- Media handling:
- faster-whisper for local transcription of audio/video.
- yt-dlp for video URL ingestion when needed.
- Local operation: no Neo4j server required; graphify runs entirely on-device, preserving privacy and avoiding external data sinks.
- Built on Graphify — Penpax integration
- Penpax is described as the enterprise layer atop graphify, designed to extend a project-level graph into a broader life-wide graph.
- Penpax concept:
- Input: from a single project (graphify) to a broader set of data sources including browser history, meetings, emails, files, and code—creating a continuous, on-device digital twin.
- Runs: graphify operates on demand, while Penpax runs continuously in the background.
- Scope: graphify targets a project; Penpax targets your entire working life.
- Querying: graphify uses CLI/MCP/AI skill interfaces; Penpax leverages natural language, always-on access.
- Privacy: graphify is local by default; Penpax emphasizes fully on-device operation without cloud data exposure.
- The combined vision is to provide a layered approach: a graph layer (graphify) paired with an always-on layer (Penpax) that binds meetings, browser activity, and documents into a living knowledge graph.
- Star history and community momentum
- The project tracks its popularity with a star history visualization: a dynamic indicator of community engagement over time.
- [Star History Chart] image: https://api.star-history.com/svg?repos=safishamsi/graphify&type=Date
- Ongoing contributions are encouraged, with worked examples and real-world testing as strong catalysts for trust and validation.
- How to contribute and what’s next
- Worked examples are the most trustworthy contributions. Run /graphify on a real corpus, save the output under worked/{slug}/, and publish a review outlining what the graph captured correctly and what it missed.
- If you encounter extraction bugs, open an issue with:
- The input files
- The cache entry at graphify-out/cache/
- What was missed or invented
- Architecture and extensibility:
- See ARCHITECTURE.md for module responsibilities and language extension guidelines.
- You can add a new language by implementing extraction hooks and integrating with the AST/corpus pipeline.
- Future directions:
- Deeper integration with enterprise data: more robust policy controls, finer-grained permissions, and more granular privacy modes.
- Expanded multi-agent orchestration to handle larger, more diverse corpora without sacrificing performance.
- Improved UI/UX for graph exploration, with guided storytelling around god nodes and design rationales.
- Starred status and ongoing availability
- Star history indicates sustained interest and ongoing activity.
- The project continues to welcome maintainers, contributors, and users who want to validate and extend graphify in real-world scenarios.
- A note on distribution: graphifyy (double-y) is the official PyPI package; ensure you install the correct package to align with the project’s ecosystem.
- Quick reference: essential commands and concepts
- Install and initialize:
- pip install graphifyy && graphify install
- Run on a folder:
- /graphify .
- Review outputs:
- graphify-out/GRAPH_REPORT.md
- graphify-out/graph.json
- graph.html for interactive exploration
- Make graphify always-on (platform-specific install):
- Claud Code: graphify claude install
- Codex: graphify codex install
- OpenCode: graphify opencode install
- Cursor: graphify cursor install
- Gemini CLI: graphify gemini install
- Aider/OpenClaw/Factory Droid/Trae/Hermes: respective install commands
- Kiro: graphify kiro install
- Google Antigravity: graphify antigravity install
- Querying the graph:
- graphify query "what connects attention to the optimizer?" --graph graphify-out/graph.json
- graphify path "DigestAuth" "Response"
- graphify explain "SwinTransformer"
- You can expose graph.json as an MCP server for repeated, structured queries:
- python -m graphify.serve graphify-out/graph.json
- Closing note
- Graphify provides a disciplined, transparent, and scalable way to transform a dispersed corpus into a navigable knowledge graph, with robust handling of code, docs, media, and scholarly content. It emphasizes privacy, local processing, and a clear separation between what is observed and what is inferred. Its ecosystem of platform-specific hooks and always-on configurations ensures that your AI assistant can leverage the graph effectively, reducing cognitive load and revealing architectural insights that would be difficult to surface through traditional file-by-file reading. The combination of interactive visualization, persistent graphs, and domain-aware extraction makes graphify a powerful tool for developers, researchers, and teams who want to understand not just what a codebase does, but why it is designed that way, and how its components relate to broader ideas across papers and media.
Enjoying this project?
Discover more amazing open-source projects on TechLogHub. We curate the best developer tools and projects.
Repository:https://github.com/safishamsi/graphify
GitHub - safishamsi/graphify: Vision Claw
VisionClaw is an open-source AI assistant...
github - safishamsi/graphify
Project
vision-claw
Created
April 16
Last Updated
April 16, 2026 at 12:49 AM