Skip to content

slave node design#148

Open
syntrust wants to merge 6 commits into
mainfrom
dl-goshard-slave
Open

slave node design#148
syntrust wants to merge 6 commits into
mainfrom
dl-goshard-slave

Conversation

@syntrust

Copy link
Copy Markdown
Collaborator

Comment thread L1/slave-node-bootstrap.md Outdated
Comment thread L1/slave-node-bootstrap.md
Comment thread L1/slave-node-bootstrap.md
facts geth's block format cannot hold yet:

```
GenesisMeta {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the GenesisMeta record (prev-root-block hash, xshard cursor, full shard id) is a temporary scaffold that exists only because geth's stock header has no home for these fields yet, the implementation should mark it as such explicitly — not just note it in this design doc.

Concretely, when implementing:

  • Tag the GenesisMeta struct, its rawdb accessors, and the Reconcile() path with a grep-able marker, e.g. // TODO: temporary — remove once QKC block format lands; these fields move into the genesis block's header/meta and Reconcile() should compare the genesis block itself.
  • Make the marker say plainly that once the block issue merges, this code is re-implemented, not patched: the fields are read from the genesis block's header/meta, and Reconcile() switches to the geth-native genesis-hash check (SetupGenesisBlock-style). The GenesisMeta record is then deleted, not migrated — at that stage the db only holds the genesis block, so --clean re-bootstrap suffices and no migration code is needed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9081a96

@syntrust syntrust requested a review from qzhodl June 23, 2026 06:51
Comment thread L1/slave-node-bootstrap.md Outdated

1. **Validation at load time.** `LoadClusterConfig` runs `Validate()` before any database
is opened: every full shard id in the resolved slave must resolve to a configured
chain/shard; no shard may be owned twice; `ShardGenesis.ROOT_HEIGHT` must equal

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validate(): "no shard may be owned twice" needs clarification — as written it may be a bug.

pyquarkchain supports the same shard being run by multiple slaves as replicas: branch_to_slaves is typed Dict[int, List[SlaveConnection]] (master.py:775), built by appending (branch_to_slaves.setdefault(full_shard_id, []).append(slave), master.py:907), with the explicit comment "Slaves may run multiple copies of the same branch" (master.py:1124). Routing is replica-aware: writes (add_transaction) fan out to all replicas (master.py:1240-1241), reads / PoSW take the first (master.py:1050). So the same full shard id appearing in more than one slave's FULL_SHARD_ID_LIST is a valid multi-replica deployment.

If this check means cross-slave unique ownership (a shard may belong to only one slave), it would reject those legitimate configs and must be dropped. It also contradicts this issue's narrowed SlaveContext, which only holds the resolved slave and can't see the others — so a cross-slave check isn't even possible here.

This issue is single-slave bootstrap, so the only thing that's both meaningful and doable is: the resolved slave's own FULL_SHARD_ID_LIST must contain no duplicate entries. Suggest rewording to exactly that, and avoiding "owned" since it implies cross-slave ownership. Any global shard-coverage / replica-sanity validation belongs at the master layer and must allow replicas.

Worth confirming whether any current mainnet/testnet config actually uses multiple slaves per shard: if so, implementing this as "unique ownership" is a P0 (rejects a live config); even if not, hardcoding "one slave per shard" would block adding replicas later.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2e7e79f

Comment thread L1/slave-node-bootstrap.md Outdated

Two reference implementations inform the design:

- **pyquarkchain** (`/Users/dl/code/pyquarkchain`) is the *compatibility source of

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a GitHub URL, not a local path.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 398c3e5

Comment thread L1/slave-node-bootstrap.md Outdated
truth*. The slave must consume the exact `cluster_config.json` that the unmodified
Python master (`quarkchain/cluster/cluster.py`) writes when it launches slaves, and the
root genesis hash it derives must be byte-identical to pyquarkchain's.
- **goquarkchain** (`/Users/dl/code/goquarkchain`) is the *shape reference* for the Go

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 398c3e5

@syntrust syntrust requested a review from iteyelmp June 23, 2026 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants