Misc fixes#398
Conversation
…tReadyAddresses = true
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1b15cd6bfd
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // requires that you install the DNS server component on your k8s cluster. | ||
| member self.ToService() : V1Service = | ||
| let serviceSpec = V1ServiceSpec(clusterIP = "None", selector = CfgVal.labels) | ||
| let serviceSpec = V1ServiceSpec(clusterIP = "None", selector = CfgVal.labels, publishNotReadyAddresses = true) |
There was a problem hiding this comment.
Apply publishNotReadyAddresses to Helm workers
This only updates the NetworkCfg-generated headless service; the ParallelCatchupV2 Helm chart still defines its own headless Service with clusterIP: None and wires the StatefulSet to it in src/MissionParallelCatchup/parallel_catchup_helm/templates/catchup_workers.yaml:1-31, but that Service still lacks publishNotReadyAddresses. When running the pubnet parallel catchup V2 mission, worker pod DNS records are still withheld until readiness, so the startup DNS race this change is meant to remove remains for that mission.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR adds a mechanism to pass arbitrary environment variables from the ssc CLI through to deployed stellar-core containers (both in generated Kubernetes specs and the parallel-catchup Helm chart), and adjusts headless Service behavior to publish DNS entries before readiness.
Changes:
- Add
coreEnvtoMissionContext, plumb it from CLI parsing, and apply it to generatedstellar-corecontainer specs. - Expose
worker.coreEnvin the parallel catchup Helm chart (values + template) and wire it from the mission installer viahelm --set. - Set
publishNotReadyAddresses=trueon the headless Service produced byNetworkCfg.ToService().
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/MissionParallelCatchup/parallel_catchup_helm/values.yaml | Adds worker.coreEnv values entry for env passthrough configuration. |
| src/MissionParallelCatchup/parallel_catchup_helm/templates/catchup_workers.yaml | Renders worker.coreEnv entries into the StatefulSet container env list. |
| src/FSLibrary/StellarMissionContext.fs | Extends MissionContext with a coreEnv field. |
| src/FSLibrary/StellarKubeSpecs.fs | Passes coreEnv into core containers and sets publishNotReadyAddresses on the headless Service. |
| src/FSLibrary/MissionHistoryPubnetParallelCatchupV2.fs | Converts coreEnv into Helm --set overrides for the parallel catchup chart. |
| src/FSLibrary.Tests/Tests.fs | Updates test MissionContext initialization to include coreEnv. |
| src/App/Program.fs | Adds --core-env option and parsing to populate MissionContext.coreEnv. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| V1Container( | ||
| name = containerName, | ||
| image = imageName, | ||
| command = [| "/bin/sh" |], | ||
| args = [| "-x"; "-c"; allCmdsAndCleanup.ToString() |], | ||
| env = [| peerNameEnvVar; asanOptionsEnvVar |], | ||
| env = Array.concat [ [| peerNameEnvVar; asanOptionsEnvVar |]; coreEnvVars ], | ||
| resources = res, |
| member self.ToService() : V1Service = | ||
| let serviceSpec = V1ServiceSpec(clusterIP = "None", selector = CfgVal.labels) | ||
| let serviceSpec = V1ServiceSpec(clusterIP = "None", selector = CfgVal.labels, publishNotReadyAddresses = true) | ||
| V1Service(spec = serviceSpec, metadata = self.NamespacedMeta self.ServiceName) |
Two separate fixes here:
publishNotReadyAddresses=trueto the headless service configuration for each statefulset that wires it into DNS. this will get DNS entries into k8s' managed DNS as soon as possible -- before k8s has successfully completed a readiness probe -- which in turn means that other core nodes that try to resolve their names are going to at least get DNS answers regardless of the pod being ready. I believe (though it's hard to test) that this narrows startup races that might cause failed connection-establishments due to cached negative DNS lookups. Fairly hypothetical change, but it doesn't seem to do any harm in my testing and I think it might improve things a bit.