feat: add Solidity language support (supersedes #299)#667
Open
naiba wants to merge 1 commit into
Open
Conversation
Indexes .sol files end-to-end via the off-the-shelf tree-sitter-wasms
grammar (no vendored wasm needed). Maps every callable form, every
member declaration, and every cross-symbol edge that an audit/trace
flow depends on:
- contract / library → class
- interface → interface
- struct / enum → struct / enum (with enum_value as enum_member;
the grammar has no body: field on struct/enum, so we walk direct
children in visitNode instead of bailing)
- function / modifier / constructor / fallback / receive → method
(constructor/fallback/receive have no name: field — synthesized via
resolveName)
- event / error → field-shaped node carrying the name so emit X
/ revert X resolve to the declaration
- state_variable_declaration / struct_member → field
- constant_variable_declaration → constant (isConst hook)
- import_directive → import (source path is the moduleName)
- emit_statement / revert_statement / modifier_invocation are all
registered as call types so onlyRole / emit Withdrawn / revert
NotOwner / library `using X for Y` calls become real calls edges
- contract `is X, Y` → extends references on the contract node, with
the resolver's interface-impl synthesizer reclassifying class→
interface as implements at resolve time (Java/C# pattern)
tree-sitter-solidity puts each `parameter` as a direct named child of
function_definition rather than under a `parameters:` field, so the
generic getChildByField walk would lose the param list. getSignature
walks namedChildren and reconstructs (t1 a, t2 b) visibility mutability
returns (...) by hand — without this every Solidity signature would
show only `external returns (bool)` with no params, which is exactly
the gap that drives the agent back to Read.
Validated end-to-end on three real repos via scripts/add-lang/bench.sh:
solmate (60 files), solady (305 files), openzeppelin-contracts (681
files) — all PASS verify-extraction; with-codegraph A/B answers each
audit-flow question in a single codegraph_explore call vs 2-5 Glob/
Read/Grep calls without.
12 vitest cases under describe('Solidity Extraction') covering
language detection, container extraction (incl. is X inheritance refs),
method extraction (incl. signature text + visibility mapping),
struct/enum/field extraction (incl. constant kind), and import/calls
extraction. 289/289 tests green. Updates README's language table,
CHANGELOG [Unreleased], and the agent-eval corpus.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds Solidity (
.sol) as a first-class indexed language. Off-the-shelftree-sitter-wasmsgrammar (ABI 14, healthy under the multi-grammarruntime — verified via
scripts/add-lang/check-grammar.mjs), no vendored*.wasm, no new runtime dependency.Supersedes #299 — that PR is currently DIRTY/conflicting and missed
several edges that codegraph relies on for cross-contract audit flows:
event/error/modifier invocation calls,
is X, Yinheritance,constant_variable_declarationkind, and parameter-aware signatures.This PR closes those gaps end-to-end. If maintainers prefer this
implementation, #299 can be closed in favor of this PR. Happy to defer to
#299 if the original author rebases and absorbs the deltas — flagging here
for maintainer choice.
What gets indexed
contract_declaration,library_declarationclassis X, Yclause emitsextendsrefsinterface_declarationinterfacestruct_declaration/struct_memberstruct/fieldbody:field on struct, sovisitNodewalks direct childrenenum_declaration/enum_valueenum/enum_memberenum_valuehas noname:field — node text is the namefunction_definition(file-level)functionfunction_definition(in contract/library)methodisInsideClassLikeNodemodifier_definitionmethodonlyRole/whenNotPausedetc. become real targets ofmodifier_invocationconstructor_definitionmethod(name ="constructor")resolveNamefallback_receive_definitionmethod(name ="fallback"/"receive")state_variable_declaration(in contract)fieldconstant_variable_declaration(file-level)constantevent_definition/error_declarationfieldemit X(...)andrevert X()resolve to real targetsimport_directiveimportcall_expression/emit_statement/revert_statement/modifier_invocationcallsrefsusing X for Ylibrary-method callscallsrefstotal.add(amount)→SafeMath.addSignature shape
tree-sitter-soliditydoes not wrap parameters in aparameters:field — each
parameteris a direct named child offunction_definition.The generic
getChildByField(node, 'parameters')returns null, which wouldlose the param list and produce signatures like
external returns (bool)with no params.
getSignaturewalksnamedChildrenand reconstructs:Real index output on the included sample:
Files changed
No new dependencies, no vendored
.wasm, nopackage.jsonchanges, and nocopy-assetsimpact because the grammar is loaded fromtree-sitter-wasms.Tests
12 Solidity extraction cases were added under
describe('Solidity Extraction'):detectLanguage,isLanguageSupported, andgetSupportedLanguagesis X, Yinheritance refsusing X for Ylibrary callsReal-repo benchmark
Followed the
/add-langworkflow on three popular Solidity codebases.Extraction passed on all three; with-codegraph A/B answered each
cross-contract audit flow in one
codegraph_explorecall.codegraph_explorecodegraph_explorecodegraph_exploreSample cross-contract flow verified in the index:
Why not vendor the wasm
tree-sitter-wasmsalready shipstree-sitter-solidity.wasmat ABI 14.scripts/add-lang/check-grammar.mjs solidity sample.solparses cleanly andreuses safely under the multi-grammar runtime. Vendoring would add repo size
without changing behavior.
Closes
Closes #299, which this PR supersedes with broader Solidity extraction
coverage, full test coverage, no vendored wasm, and real-repo validation.