Skip to content

Add transport config to blueprints#2499

Open
Dreamsorcerer wants to merge 4 commits into
feat/webrtc-transportfrom
sam/config-transports
Open

Add transport config to blueprints#2499
Dreamsorcerer wants to merge 4 commits into
feat/webrtc-transportfrom
sam/config-transports

Conversation

@Dreamsorcerer

@Dreamsorcerer Dreamsorcerer commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

This merges into #2048. Moves transport config to be part of the general blueprint config.

@codecov

codecov Bot commented Jun 15, 2026

Copy link
Copy Markdown

❌ 2 Tests Failed:

Tests completed Failed Passed Skipped
2158 2 2156 71
View the full list of 2 ❄️ flaky test(s)
dimos.e2e_tests.test_dimsim_path_replaning::test_path_replanning

Flake rate in main: 7.14% (Passed 13 times, Failed 1 times)

Stack Traces | 242s run time
lcm_spy = <dimos.e2e_tests.lcm_spy.LcmSpy object at 0x7198d521f020>
start_blueprint = <function start_blueprint.<locals>.set_name_and_start at 0x7198d54c4360>
dim_sim = <dimos.e2e_tests.dim_sim_client.DimSimClient object at 0x7198d55d8d10>
direct_cmd_vel_explorer = <dimos.simulation.mujoco.direct_cmd_vel_explorer.DirectCmdVelExplorer object at 0x7198d5533d10>
spawn_wall_on_pose = <function spawn_wall_on_pose.<locals>.spawn at 0x7198d54c49a0>

    @pytest.mark.self_hosted_large
    def test_path_replanning(
        lcm_spy, start_blueprint, dim_sim, direct_cmd_vel_explorer, spawn_wall_on_pose
    ) -> None:
        start_blueprint(
            "--dimsim-scene=empty",
            "run",
            "unitree-go2-agentic",
            simulator="dimsim",
        )
        lcm_spy.save_topic(".../McpClient/on_system_modules/res")
        lcm_spy.wait_for_saved_topic(".../McpClient/on_system_modules/res", timeout=1200.0)
    
        # robot spawns at (3, 2)
    
        # side wall
        dim_sim.add_wall(2, -2.5, 12, -2.5)
        # other side wall
        dim_sim.add_wall(2, 3.5, 12, 3.5)
        # back wall (behind robot)
        dim_sim.add_wall(2, -2.5, 2, 3.5)
        # forward wall (far end)
        dim_sim.add_wall(12, -2.5, 12, 3.5)
        # dividing wall at x=7 with doors at y=[-1.5,-0.5] and y=[1.5,2.5]
        dim_sim.add_wall(7, -2.5, 7, -1.5)
        dim_sim.add_wall(7, -0.5, 7, 1.5)
        dim_sim.add_wall(7, 2.5, 7, 3.5)
    
        direct_cmd_vel_explorer.linear_speed = 0.8
        direct_cmd_vel_explorer.follow_points([(10, 2), (2.5, 2), (3, 2)])
    
        # When the robot comes within 1.5 m of the left door's centre, drop a wall
        # in the opening so the planner has to bail out and route through the
        # right door at y=-1 instead.
        spawn_wall_on_pose(
            point=(7, 2),
            threshold=1.5,
            wall=(7, 1.5, 7, 2.5),
        )
    
        dim_sim.publish_goal(10.913, 0.588)
    
>       lcm_spy.wait_until_odom_position(10.913, 0.588, threshold=1, timeout=120)

dim_sim    = <dimos.e2e_tests.dim_sim_client.DimSimClient object at 0x7198d55d8d10>
direct_cmd_vel_explorer = <dimos.simulation.mujoco.direct_cmd_vel_explorer.DirectCmdVelExplorer object at 0x7198d5533d10>
lcm_spy    = <dimos.e2e_tests.lcm_spy.LcmSpy object at 0x7198d521f020>
spawn_wall_on_pose = <function spawn_wall_on_pose.<locals>.spawn at 0x7198d54c49a0>
start_blueprint = <function start_blueprint.<locals>.set_name_and_start at 0x7198d54c4360>

dimos/e2e_tests/test_dimsim_path_replaning.py:60: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
dimos/e2e_tests/lcm_spy.py:182: in wait_until_odom_position
    self.wait_for_message_result(
        predicate  = <function LcmSpy.wait_until_odom_position.<locals>.predicate at 0x7198d54c5300>
        self       = <dimos.e2e_tests.lcm_spy.LcmSpy object at 0x7198d521f020>
        threshold  = 1
        timeout    = 120
        x          = 10.913
        y          = 0.588
dimos/e2e_tests/lcm_spy.py:168: in wait_for_message_result
    self.wait_until(
        event      = <threading.Event at 0x7198d4d31df0: unset>
        fail_message = 'Failed to get to position x=10.913, y=0.588'
        listener   = <function LcmSpy.wait_for_message_result.<locals>.listener at 0x7198d54c53a0>
        predicate  = <function LcmSpy.wait_until_odom_position.<locals>.predicate at 0x7198d54c5300>
        self       = <dimos.e2e_tests.lcm_spy.LcmSpy object at 0x7198d521f020>
        timeout    = 120
        topic      = '/odom#geometry_msgs.PoseStamped'
        type       = <class 'dimos.msgs.geometry_msgs.PoseStamped.PoseStamped'>
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <dimos.e2e_tests.lcm_spy.LcmSpy object at 0x7198d521f020>

    def wait_until(
        self,
        *,
        condition: Callable[[], bool],
        timeout: float,
        error_message: str,
        poll_interval: float = 0.1,
    ) -> None:
        start_time = time.time()
        while time.time() - start_time < timeout:
            if condition():
                return
            time.sleep(poll_interval)
>       raise TimeoutError(error_message)
E       TimeoutError: Failed to get to position x=10.913, y=0.588

condition  = <bound method Event.is_set of <threading.Event at 0x7198d4d31df0: unset>>
error_message = 'Failed to get to position x=10.913, y=0.588'
poll_interval = 0.1
self       = <dimos.e2e_tests.lcm_spy.LcmSpy object at 0x7198d521f020>
start_time = 1781808335.9697042
timeout    = 120

dimos/e2e_tests/lcm_spy.py:105: TimeoutError
dimos.e2e_tests.test_dimsim_walk_forward::test_walk_forward

Flake rate in main: 33.33% (Passed 12 times, Failed 6 times)

Stack Traces | 206s run time
lcm_spy = <dimos.e2e_tests.lcm_spy.LcmSpy object at 0x7198d4d33020>
start_blueprint = <function start_blueprint.<locals>.set_name_and_start at 0x7198d54c59e0>
human_input = <function human_input.<locals>.send_human_input at 0x7198d54c5e40>
dim_sim = <dimos.e2e_tests.dim_sim_client.DimSimClient object at 0x7198d4d54d40>

    @pytest.mark.self_hosted_large
    def test_walk_forward(lcm_spy, start_blueprint, human_input, dim_sim) -> None:
        start_blueprint(
            "run",
            "--disable",
            "spatial-memory",
            "--disable",
            "security-module",
            "unitree-go2-agentic",
            simulator="dimsim",
        )
        lcm_spy.save_topic(".../McpClient/on_system_modules/res")
        lcm_spy.wait_for_saved_topic(".../McpClient/on_system_modules/res", timeout=1200.0)
    
        origin_x, origin_y = 1, 2
        dim_sim.set_agent_position(origin_x, origin_y)
    
        human_input("move forward 3 meter")
    
>       lcm_spy.wait_until_odom_position(origin_x + 3, origin_y, threshold=0.4, timeout=120)

dim_sim    = <dimos.e2e_tests.dim_sim_client.DimSimClient object at 0x7198d4d54d40>
human_input = <function human_input.<locals>.send_human_input at 0x7198d54c5e40>
lcm_spy    = <dimos.e2e_tests.lcm_spy.LcmSpy object at 0x7198d4d33020>
origin_x   = 1
origin_y   = 2
start_blueprint = <function start_blueprint.<locals>.set_name_and_start at 0x7198d54c59e0>

dimos/e2e_tests/test_dimsim_walk_forward.py:37: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
dimos/e2e_tests/lcm_spy.py:182: in wait_until_odom_position
    self.wait_for_message_result(
        predicate  = <function LcmSpy.wait_until_odom_position.<locals>.predicate at 0x7198d54c5a80>
        self       = <dimos.e2e_tests.lcm_spy.LcmSpy object at 0x7198d4d33020>
        threshold  = 0.4
        timeout    = 120
        x          = 4
        y          = 2
dimos/e2e_tests/lcm_spy.py:168: in wait_for_message_result
    self.wait_until(
        event      = <threading.Event at 0x7198d4d542f0: unset>
        fail_message = 'Failed to get to position x=4, y=2'
        listener   = <function LcmSpy.wait_for_message_result.<locals>.listener at 0x7198d54c5bc0>
        predicate  = <function LcmSpy.wait_until_odom_position.<locals>.predicate at 0x7198d54c5a80>
        self       = <dimos.e2e_tests.lcm_spy.LcmSpy object at 0x7198d4d33020>
        timeout    = 120
        topic      = '/odom#geometry_msgs.PoseStamped'
        type       = <class 'dimos.msgs.geometry_msgs.PoseStamped.PoseStamped'>
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <dimos.e2e_tests.lcm_spy.LcmSpy object at 0x7198d4d33020>

    def wait_until(
        self,
        *,
        condition: Callable[[], bool],
        timeout: float,
        error_message: str,
        poll_interval: float = 0.1,
    ) -> None:
        start_time = time.time()
        while time.time() - start_time < timeout:
            if condition():
                return
            time.sleep(poll_interval)
>       raise TimeoutError(error_message)
E       TimeoutError: Failed to get to position x=4, y=2

condition  = <bound method Event.is_set of <threading.Event at 0x7198d4d542f0: unset>>
error_message = 'Failed to get to position x=4, y=2'
poll_interval = 0.1
self       = <dimos.e2e_tests.lcm_spy.LcmSpy object at 0x7198d4d33020>
start_time = 1781809026.7287138
timeout    = 120

dimos/e2e_tests/lcm_spy.py:105: TimeoutError

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@Dreamsorcerer Dreamsorcerer marked this pull request as ready for review June 18, 2026 19:29
@greptile-apps

greptile-apps Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR moves WebRTC transport configuration into the standard blueprint config schema, so credentials and transport settings can be supplied via -o transports.<name>.<field>=... CLI flags or TRANSPORTS__<NAME>__<FIELD>=... env vars instead of the old transport-specific env vars (TELEOP_*, CF_TELEOP_*). As part of this, ProviderConfig and its subclasses are migrated from @dataclass(frozen=True) to Pydantic BaseModel with frozen=True, extra="forbid".

  • Blueprint.config() dynamically appends a transports sub-model to the schema by introspecting each transport's _config_cls; _apply_transport_overrides in the coordinator then rebuilds affected transports with the merged config before startup.
  • BrokerConfig and CloudflareConfig have their TELEOP_* / CF_TELEOP_* env-var fallbacks unconditionally removed, making this a hard breaking change for any deployment that relied on those names.
  • with_config_overrides on both WebRTCTransport and WebRTCVideoTransport uses model_copy(update=...), which in Pydantic v2 does not coerce types, so non-string fields (ordered: bool, heartbeat_hz: float, max_retransmits: int | None) will hold raw strings when the override originates from the CLI or an env var.

Confidence Score: 3/5

The core config-routing wiring is structurally sound, but two transports silently accept raw-string values for typed fields, and the hard removal of legacy env vars will break existing deployments without warning.

Both with_config_overrides implementations use model_copy(update=...) without re-validation. CLI flags and env vars arrive as strings, so fields like ordered: bool and heartbeat_hz: float end up holding raw strings that raise TypeError when used. Additionally, every deployment using the old TELEOP_API_KEY / CF_TELEOP_APP_ID env var family will hit a RuntimeError at startup with no deprecation grace period.

dimos/core/transport.py — both with_config_overrides implementations need model_validate instead of model_copy; dimos/protocol/pubsub/impl/webrtc/providers/broker.py and cloudflare.py for the breaking env-var removal.

Important Files Changed

Filename Overview
dimos/core/coordination/blueprints.py Adds dynamic transports.* sub-model to the blueprint config schema. Logic is clean: deduplication via seen set and the transport_config_name helper are straightforward. No issues found.
dimos/core/coordination/module_coordinator.py Extracts transports overrides from blueprint_args and applies them via _apply_transport_overrides. Raw string values from CLI/env are forwarded to with_config_overrides without re-validation, causing type mismatches for non-string fields.
dimos/core/transport.py Implements with_config_overrides on WebRTCTransport and WebRTCVideoTransport using model_copy(update=...), which does not coerce types in Pydantic v2, leaving non-string fields as raw strings when overrides originate from CLI or env.
dimos/protocol/pubsub/impl/webrtc/providers/spec.py Migrates ProviderConfig from @DataClass(frozen=True) to Pydantic BaseModel. Equality and hashing semantics preserved; singleton _providers dict continues to work correctly.
dimos/protocol/pubsub/impl/webrtc/providers/broker.py Migrates BrokerConfig to Pydantic, hardens defaults, and removes all TELEOP_* env-var fallbacks — a hard breaking change for existing deployments.
dimos/protocol/pubsub/impl/webrtc/providers/cloudflare.py Migrates CloudflareConfig to Pydantic, removes CF_TELEOP_* env-var fallbacks, and exports MAX_MSG_SIZE for the benchmark. Same breaking-change concern as broker.py.
dimos/robot/cli/dimos.py Fixes arg_help to handle transport config nodes that have no backing BlueprintAtom. Null-safe lookup and guarded _atom access are correct.
dimos/protocol/pubsub/impl/webrtc/test_transport.py Adds thorough new tests for Pydantic frozen semantics, blueprint config schema exposure, and transport override rebuild. Removes now-unnecessary monkeypatching of deleted env vars.
dimos/teleop/quest_hosted/blueprints.py Updates run comments to reflect the new CLI syntax. No code changes.

Comments Outside Diff (1)

  1. dimos/protocol/pubsub/impl/webrtc/providers/broker.py, line 64-74 (link)

    P2 Breaking removal of TELEOP_* env vars

    The previous implementation fell back to TELEOP_API_KEY, TELEOP_BROKER_URL, TELEOP_ROBOT_ID, and TELEOP_ROBOT_NAME. All four are now silently dropped. Any deployment that sets those env vars will fail at BrokerProvider construction with a runtime error (not a helpful deprecation warning). The same applies to CF_TELEOP_APP_ID / CF_TELEOP_APP_SECRET in CloudflareConfig. Is there a planned migration / deprecation period, or are the old names officially retired? Are TELEOP_* and CF_TELEOP_* env vars officially retired, or should there be a deprecation shim that reads the old names and warns before this merges?

Reviews (1): Last reviewed commit: "Cleanup" | Re-trigger Greptile

Comment thread dimos/core/transport.py
Comment on lines +392 to +394
def with_config_overrides(self, overrides: Mapping[str, Any]) -> Self:
new_config = self._config.model_copy(update=dict(overrides))
return type(self)(self.topic, self._msg_type, config=new_config)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 model_copy bypasses type coercion for CLI/env overrides

model_copy(update=...) in Pydantic v2 does not run validators by default, so non-string fields receive raw strings from the CLI or env. For example, setting TRANSPORTS__BROKER__ORDERED=false stores the string "false" — which is truthy — into ordered: bool, and TRANSPORTS__BROKER__HEARTBEAT_HZ=2.0 stores "2.0" into heartbeat_hz: float, which raises TypeError the first time arithmetic is done on it. The fix is to re-validate after the merge, e.g. type(self._config).model_validate({**self._config.model_dump(), **overrides}).

Comment thread dimos/core/transport.py
Comment on lines 497 to 499
)
return lambda: None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Same model_copy validation gap on WebRTCVideoTransport

The WebRTCVideoTransport.with_config_overrides implementation has the same issue: self._config.model_copy(update=dict(overrides)) won't coerce string values coming from the CLI/env into the declared field types (e.g., ordered: bool, max_retransmits: int | None). It should use model_validate to re-run coercion, consistent with whatever fix is applied to WebRTCTransport.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants