slave node design#148
Conversation
| facts geth's block format cannot hold yet: | ||
|
|
||
| ``` | ||
| GenesisMeta { |
There was a problem hiding this comment.
Since the GenesisMeta record (prev-root-block hash, xshard cursor, full shard id) is a temporary scaffold that exists only because geth's stock header has no home for these fields yet, the implementation should mark it as such explicitly — not just note it in this design doc.
Concretely, when implementing:
- Tag the GenesisMeta struct, its rawdb accessors, and the Reconcile() path with a grep-able marker, e.g. // TODO: temporary — remove once QKC block format lands; these fields move into the genesis block's header/meta and Reconcile() should compare the genesis block itself.
- Make the marker say plainly that once the block issue merges, this code is re-implemented, not patched: the fields are read from the genesis block's header/meta, and Reconcile() switches to the geth-native genesis-hash check (SetupGenesisBlock-style). The GenesisMeta record is then deleted, not migrated — at that stage the db only holds the genesis block, so --clean re-bootstrap suffices and no migration code is needed.
|
|
||
| 1. **Validation at load time.** `LoadClusterConfig` runs `Validate()` before any database | ||
| is opened: every full shard id in the resolved slave must resolve to a configured | ||
| chain/shard; no shard may be owned twice; `ShardGenesis.ROOT_HEIGHT` must equal |
There was a problem hiding this comment.
Validate(): "no shard may be owned twice" needs clarification — as written it may be a bug.
pyquarkchain supports the same shard being run by multiple slaves as replicas: branch_to_slaves is typed Dict[int, List[SlaveConnection]] (master.py:775), built by appending (branch_to_slaves.setdefault(full_shard_id, []).append(slave), master.py:907), with the explicit comment "Slaves may run multiple copies of the same branch" (master.py:1124). Routing is replica-aware: writes (add_transaction) fan out to all replicas (master.py:1240-1241), reads / PoSW take the first (master.py:1050). So the same full shard id appearing in more than one slave's FULL_SHARD_ID_LIST is a valid multi-replica deployment.
If this check means cross-slave unique ownership (a shard may belong to only one slave), it would reject those legitimate configs and must be dropped. It also contradicts this issue's narrowed SlaveContext, which only holds the resolved slave and can't see the others — so a cross-slave check isn't even possible here.
This issue is single-slave bootstrap, so the only thing that's both meaningful and doable is: the resolved slave's own FULL_SHARD_ID_LIST must contain no duplicate entries. Suggest rewording to exactly that, and avoiding "owned" since it implies cross-slave ownership. Any global shard-coverage / replica-sanity validation belongs at the master layer and must allow replicas.
Worth confirming whether any current mainnet/testnet config actually uses multiple slaves per shard: if so, implementing this as "unique ownership" is a P0 (rejects a live config); even if not, hardcoding "one slave per shard" would block adding replicas later.
|
|
||
| Two reference implementations inform the design: | ||
|
|
||
| - **pyquarkchain** (`/Users/dl/code/pyquarkchain`) is the *compatibility source of |
There was a problem hiding this comment.
This should be a GitHub URL, not a local path.
| truth*. The slave must consume the exact `cluster_config.json` that the unmodified | ||
| Python master (`quarkchain/cluster/cluster.py`) writes when it launches slaves, and the | ||
| root genesis hash it derives must be byte-identical to pyquarkchain's. | ||
| - **goquarkchain** (`/Users/dl/code/goquarkchain`) is the *shape reference* for the Go |
Tracking QuarkChain/goshard#17