Skip to content

feat: add Solidity language support (supersedes #299)#667

Open
naiba wants to merge 1 commit into
colbymchenry:mainfrom
naiba-forks:feat/solidity-support
Open

feat: add Solidity language support (supersedes #299)#667
naiba wants to merge 1 commit into
colbymchenry:mainfrom
naiba-forks:feat/solidity-support

Conversation

@naiba
Copy link
Copy Markdown

@naiba naiba commented Jun 3, 2026

Summary

Adds Solidity (.sol) as a first-class indexed language. Off-the-shelf
tree-sitter-wasms grammar (ABI 14, healthy under the multi-grammar
runtime — verified via scripts/add-lang/check-grammar.mjs), no vendored
*.wasm, no new runtime dependency.

Supersedes #299 — that PR is currently DIRTY/conflicting and missed
several edges that codegraph relies on for cross-contract audit flows:
event/error/modifier invocation calls, is X, Y inheritance,
constant_variable_declaration kind, and parameter-aware signatures.

This PR closes those gaps end-to-end. If maintainers prefer this
implementation, #299 can be closed in favor of this PR. Happy to defer to
#299 if the original author rebases and absorbs the deltas — flagging here
for maintainer choice.

What gets indexed

Solidity construct Graph kind Notes
contract_declaration, library_declaration class is X, Y clause emits extends refs
interface_declaration interface same inheritance handling
struct_declaration / struct_member struct / field grammar has no body: field on struct, so visitNode walks direct children
enum_declaration / enum_value enum / enum_member enum_value has no name: field — node text is the name
function_definition (file-level) function free functions (Solidity 0.7.4+)
function_definition (in contract/library) method dispatched by isInsideClassLikeNode
modifier_definition method onlyRole/whenNotPaused etc. become real targets of modifier_invocation
constructor_definition method (name = "constructor") nameless in AST — synthesized via resolveName
fallback_receive_definition method (name = "fallback" / "receive") same — keyword is an unnamed child, picked up by walking all children
state_variable_declaration (in contract) field
constant_variable_declaration (file-level) constant distinct AST node
event_definition / error_declaration field lets emit X(...) and revert X() resolve to real targets
import_directive import source path is the module name; existing import resolver matches it on disk
call_expression / emit_statement / revert_statement / modifier_invocation calls refs all four are call-shaped but use distinct AST node types
using X for Y library-method calls calls refs resolve via the existing name matcher, e.g. total.add(amount)SafeMath.add

Signature shape

tree-sitter-solidity does not wrap parameters in a parameters:
field — each parameter is a direct named child of function_definition.
The generic getChildByField(node, 'parameters') returns null, which would
lose the param list and produce signatures like external returns (bool)
with no params.

getSignature walks namedChildren and reconstructs:

(t1 a, t2 b) visibility mutability returns (...)

Real index output on the included sample:

method  transferFrom   (address from, address to, uint256 amount) external returns (bool)
method  add            (uint256 a, uint256 b) internal pure returns (uint256)
method  constructor    (uint256 supply_)
function freeAdd       (uint256 a, uint256 b) pure returns (uint256)

Files changed

.claude/skills/agent-eval/corpus.json
CHANGELOG.md
README.md
__tests__/extraction.test.ts
src/extraction/grammars.ts
src/extraction/languages/index.ts
src/extraction/languages/solidity.ts
src/types.ts

No new dependencies, no vendored .wasm, no package.json changes, and no
copy-assets impact because the grammar is loaded from tree-sitter-wasms.

Tests

12 Solidity extraction cases were added under describe('Solidity Extraction'):

  • language detection through detectLanguage, isLanguageSupported, and getSupportedLanguages
  • contract, library, and interface extraction
  • is X, Y inheritance refs
  • functions, methods, modifiers, constructor, fallback, and receive extraction
  • parameter-aware signature assertions
  • structs, enums, fields, and file-level constants
  • imports and call refs for ordinary calls, emits, reverts, modifier invocations, and using X for Y library calls
npx vitest run __tests__/extraction.test.ts
# Test Files  1 passed (1)
# Tests       289 passed (289)

Real-repo benchmark

Followed the /add-lang workflow on three popular Solidity codebases.
Extraction passed on all three; with-codegraph A/B answered each
cross-contract audit flow in one codegraph_explore call.

Repo Tier Files Indexed nodes / edges with-cg tools without-cg tools
transmissions11/solmate Small 60 1,277 / 3,125 1 codegraph_explore 2 tools
Vectorized/solady Medium 305 10,896 / 34,664 1 codegraph_explore 5 tools
OpenZeppelin/openzeppelin-contracts Large 681 7,554 / 16,695 1 codegraph_explore 2 tools

Sample cross-contract flow verified in the index:

contract MyToken is Token { ... }   →  extends MyToken → Token
function transferFrom { _spendAllowance(...) }
                                    →  calls transferFrom → _spendAllowance
                                       (resolves across the inheritance edge)

Why not vendor the wasm

tree-sitter-wasms already ships tree-sitter-solidity.wasm at ABI 14.
scripts/add-lang/check-grammar.mjs solidity sample.sol parses cleanly and
reuses safely under the multi-grammar runtime. Vendoring would add repo size
without changing behavior.

Closes

Closes #299, which this PR supersedes with broader Solidity extraction
coverage, full test coverage, no vendored wasm, and real-repo validation.

Indexes .sol files end-to-end via the off-the-shelf tree-sitter-wasms
grammar (no vendored wasm needed). Maps every callable form, every
member declaration, and every cross-symbol edge that an audit/trace
flow depends on:

- contract / library  → class
- interface           → interface
- struct / enum       → struct / enum (with enum_value as enum_member;
  the grammar has no body: field on struct/enum, so we walk direct
  children in visitNode instead of bailing)
- function / modifier / constructor / fallback / receive → method
  (constructor/fallback/receive have no name: field — synthesized via
  resolveName)
- event / error       → field-shaped node carrying the name so emit X
  / revert X resolve to the declaration
- state_variable_declaration / struct_member → field
- constant_variable_declaration              → constant (isConst hook)
- import_directive    → import (source path is the moduleName)
- emit_statement / revert_statement / modifier_invocation are all
  registered as call types so onlyRole / emit Withdrawn / revert
  NotOwner / library `using X for Y` calls become real calls edges
- contract `is X, Y` → extends references on the contract node, with
  the resolver's interface-impl synthesizer reclassifying class→
  interface as implements at resolve time (Java/C# pattern)

tree-sitter-solidity puts each `parameter` as a direct named child of
function_definition rather than under a `parameters:` field, so the
generic getChildByField walk would lose the param list. getSignature
walks namedChildren and reconstructs (t1 a, t2 b) visibility mutability
returns (...) by hand — without this every Solidity signature would
show only `external returns (bool)` with no params, which is exactly
the gap that drives the agent back to Read.

Validated end-to-end on three real repos via scripts/add-lang/bench.sh:
solmate (60 files), solady (305 files), openzeppelin-contracts (681
files) — all PASS verify-extraction; with-codegraph A/B answers each
audit-flow question in a single codegraph_explore call vs 2-5 Glob/
Read/Grep calls without.

12 vitest cases under describe('Solidity Extraction') covering
language detection, container extraction (incl. is X inheritance refs),
method extraction (incl. signature text + visibility mapping),
struct/enum/field extraction (incl. constant kind), and import/calls
extraction. 289/289 tests green. Updates README's language table,
CHANGELOG [Unreleased], and the agent-eval corpus.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant