Conversation
Test Results3 993 tests +1 051 3 985 ✅ +1 056 18m 56s ⏱️ + 12m 46s For more details on these failures, see this check. Results for commit d7ee093. ± Comparison against base commit f6c2dea. This pull request removes 225 and adds 1276 tests. Note that renamed tests count towards both.This pull request skips 1 and un-skips 5 tests.♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Pull request overview
This PR bundles several long-running feature and stability tracks across MeshWeaver core + Memex: social publishing foundations, in-process #r "nuget:..." compilation support (node-type + interactive markdown), move-operation performance/timeout hardening, and multiple UI/stream reliability improvements. It also standardizes the code folder naming from _Source/_Test to Source/Test across code, tests, docs, and samples.
Changes:
- Introduces
MeshWeaver.Social(options, DI wiring, publish queue, credential model) plus initial Memex wiring (LinkedIn connect entry points + user menu hooks). - Adds
MeshWeaver.NuGetresolver + directive parser and integrates it into script compilation (#r "nuget:Pkg, Version"), including cache backends and tests. - Improves operational robustness: parallelized recursive moves, default 30s mesh-op timeout, “no endless spinner” navigation status UI, and remote stream resubscribe behavior.
Reviewed changes
Copilot reviewed 159 out of 265 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| test/MeshWeaver.StorageImport.Test/StorageImporterTests.cs | Updates test expectations/docs to Source/ naming. |
| test/MeshWeaver.Social.Test/PostStatsRefresherTest.cs | Adds stats refresher test coverage (needs deterministic timeout handling). |
| test/MeshWeaver.Social.Test/MeshWeaver.Social.Test.csproj | Adds new Social test project referencing Social + Fixture. |
| test/MeshWeaver.Social.Test/InMemoryPublishQueueTest.cs | Adds unit tests for publish queue due-drain + dedup. |
| test/MeshWeaver.Persistence.Test/FileSystemPersistenceTest.cs | Updates partition tests to Source/ naming. |
| test/MeshWeaver.MathDemo.Test/TestPaths.cs | Adds helper paths for MathDemo sample test assets. |
| test/MeshWeaver.MathDemo.Test/MeshWeaver.MathDemo.Test.csproj | Adds MathDemo test project and copies sample graph data to output. |
| test/MeshWeaver.Hosting.PostgreSql.Test/SatelliteQueryTests.cs | Updates code-path routing tests to Source/ naming. |
| test/MeshWeaver.Hosting.Monolith.Test/UserActivityAreaTest.cs | Updates regression test docs to Source/ naming. |
| test/MeshWeaver.Hosting.Blazor.Test/NavigationServiceTest.cs | Adjusts test to assert “no 404 flash” during retries. |
| test/MeshWeaver.Graph.Test/NuGetDirectiveParserTest.cs | Adds unit tests for parsing/stripping #r "nuget:...". |
| test/MeshWeaver.Graph.Test/NuGetAssemblyResolverTest.cs | Adds networked NuGet restore end-to-end tests (skippable via env var). |
| test/MeshWeaver.Graph.Test/MeshWeaver.Graph.Test.csproj | References new MeshWeaver.NuGet project. |
| test/MeshWeaver.FutuRe.Test/MeshWeaver.FutuRe.Test.csproj | Updates compile-included sample sources to Source/ paths. |
| test/MeshWeaver.Content.Test/CompilationErrorTest.cs | Updates broken-code test to Source/ path. |
| test/MeshWeaver.AI.Test/MeshPluginTest.cs | Updates MCP tool count expectations (adds RunTests/Move/Copy). |
| src/MeshWeaver.Social/SocialOptions.cs | Adds configurable knobs for publishing/stats/ingest scheduling. |
| src/MeshWeaver.Social/SocialExtensions.cs | Adds DI wiring for social publishing subsystem and hosted services. |
| src/MeshWeaver.Social/PlatformCredential.cs | Adds credential record model (access/refresh/expiry metadata). |
| src/MeshWeaver.Social/MeshWeaver.Social.csproj | Introduces Social library project. |
| src/MeshWeaver.Social/IPublishQueue.cs | Adds publish queue abstraction + in-memory implementation. |
| src/MeshWeaver.Social/IApprovalPublishBridge.cs | Defines bridge contract and PublishableSnapshot model. |
| src/MeshWeaver.NuGet/ResolvedPackageSet.cs | Adds resolver output model (assemblies, probing dirs, versions). |
| src/MeshWeaver.NuGet/NuGetServiceCollectionExtensions.cs | Adds DI extension to register resolver + cache. |
| src/MeshWeaver.NuGet/NuGetPackageReference.cs | Adds package reference model (id + version range). |
| src/MeshWeaver.NuGet/NuGetDirectiveParser.cs | Implements #r "nuget:..." extraction + source stripping. |
| src/MeshWeaver.NuGet/MeshWeaver.NuGet.csproj | Introduces NuGet resolver project and dependencies. |
| src/MeshWeaver.NuGet/INuGetPackageCache.cs | Adds optional persistent cache interface + null implementation. |
| src/MeshWeaver.NuGet/INuGetAssemblyResolver.cs | Adds resolver interface returning ResolvedPackageSet. |
| src/MeshWeaver.NuGet.AzureBlob/MeshWeaver.NuGet.AzureBlob.csproj | Adds Azure Blob cache backend project. |
| src/MeshWeaver.NuGet.AzureBlob/BlobNuGetPackageCacheExtensions.cs | Adds DI helper to register blob-backed cache. |
| src/MeshWeaver.Mesh.Contract/Services/MeshOperationOptions.cs | Adds mesh operation timeout options (default 30s). |
| src/MeshWeaver.Mesh.Contract/Services/IStorageAdapter.cs | Updates docs/examples to Source/ naming. |
| src/MeshWeaver.Mesh.Contract/Services/INavigationService.cs | Adds Status observable contract for UI progress reporting. |
| src/MeshWeaver.Mesh.Contract/Services/IIconGenerator.cs | Adds icon generator abstraction returning an observable SVG. |
| src/MeshWeaver.Mesh.Contract/PartitionDefinition.cs | Updates standard table mappings (Source/Test → code) and clarifies semantics. |
| src/MeshWeaver.Mesh.Contract/MeshExtensions.cs | Adds timeout override + move timeout enforcement + grain dispose on delete. |
| src/MeshWeaver.Mesh.Contract/CodeConfiguration.cs | Updates docs to Source/ naming. |
| src/MeshWeaver.Kernel.Hub/MeshWeaver.Kernel.Hub.csproj | Removes Interactive package mgmt dependency; references MeshWeaver.NuGet. |
| src/MeshWeaver.Hosting/Persistence/MigrationUtility.cs | Updates migration heuristics to include Source/Test + legacy _Source/_Test. |
| src/MeshWeaver.Hosting/Persistence/FileSystemStorageAdapter.cs | Treats Source/Test as code paths + keeps legacy compatibility. |
| src/MeshWeaver.Hosting/Persistence/FileSystemPersistenceService.cs | Parallelizes descendant move I/O (with concurrency implications). |
| src/MeshWeaver.Hosting/Persistence/CachingStorageAdapter.cs | Updates code sub-namespace detection (Source/Test + legacy). |
| src/MeshWeaver.Hosting.PostgreSql/PostgreSqlPartitionedStoreFactory.cs | Guards against source/test mistakenly becoming schemas. |
| src/MeshWeaver.Hosting.PostgreSql/PostgreSqlCrossSchemaQueryProvider.cs | Filters malformed parameters to avoid NRE during SQL interpolation. |
| src/MeshWeaver.Hosting.Blazor/MeshWeaver.Hosting.Blazor.csproj | Adds NU1510 suppression. |
| src/MeshWeaver.Graph/PartitionTypeSource.cs | Updates docs to Source/ naming. |
| src/MeshWeaver.Graph/MeshWeaver.Graph.csproj | References MeshWeaver.NuGet. |
| src/MeshWeaver.Graph/MeshNodeLayoutAreas.cs | Improves create href behavior + reactive/grouped children catalog. |
| src/MeshWeaver.Graph/MeshDataSource.cs | Updates docs to Source/ naming. |
| src/MeshWeaver.Graph/Configuration/ScriptCompilationService.cs | Integrates NuGet directive parsing + resolver into compilation. |
| src/MeshWeaver.Graph/Configuration/NodeTypeDefinition.cs | Updates docs/examples to Source/ naming. |
| src/MeshWeaver.Graph/Configuration/MeshDataSourceNodeType.cs | Changes sources namespace constant to Source. |
| src/MeshWeaver.Graph/Configuration/GraphConfigurationExtensions.cs | Registers NuGet resolver and uses Source code path. |
| src/MeshWeaver.Graph/Configuration/CodeNodeType.cs | Treats Code nodes as primary content; defines Source/Test constants. |
| src/MeshWeaver.Documentation/Data/DataMesh/UnifiedPath.md | Documents @/ semantics and HTML-href pitfalls. |
| src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia/Profile/Source/SocialMediaProfileLayoutAreas.cs | Adds SocialMedia profile layout areas example. |
| src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia/Profile/Source/SocialMediaProfile.cs | Adds SocialMedia profile content model example. |
| src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia/Post/Source/SocialMediaPost.cs | Adds SocialMedia post content model example. |
| src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia/Post/Source/Platform.cs | Adds SocialMedia platform reference-data example. |
| src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia.md | Updates docs to Source/ naming and authoring guidance. |
| src/MeshWeaver.Documentation/Data/DataMesh/SatelliteEntities.md | Clarifies Source/Test are primary content, not satellites. |
| src/MeshWeaver.Documentation/Data/DataMesh/NodeTypes.md | Adds Node Types documentation index page. |
| src/MeshWeaver.Documentation/Data/DataMesh/NodeTypeConfiguration.md | Updates docs to Source/ naming. |
| src/MeshWeaver.Documentation/Data/DataMesh/NodeOperations.md | Updates docs to Source/ naming. |
| src/MeshWeaver.Documentation/Data/DataMesh/DataConfiguration.md | Updates docs to Source/ naming. |
| src/MeshWeaver.Documentation/Data/DataMesh/CreatingNodeTypes.md | Updates docs to Source/Test naming throughout. |
| src/MeshWeaver.Documentation/Data/DataMesh.md | Updates TOC links and adds NuGet packages bullet. |
| src/MeshWeaver.Documentation/Data/Architecture/PartitionedPersistence.md | Updates persistence routing docs for Source/Test. |
| src/MeshWeaver.Documentation/Data/Architecture/MeshGraph.md | Updates examples to Source/ naming. |
| src/MeshWeaver.Documentation/Data/Architecture/BusinessRules/Cession/Source/CessionSampleData.cs | Adds cession sample dataset for docs/demo. |
| src/MeshWeaver.Documentation/Data/Architecture/BusinessRules/Cession/Source/CessionResultsArea.cs | Adds reactive charting layout area example. |
| src/MeshWeaver.Documentation/Data/Architecture/BusinessRules/Cession/Source/CessionEngine.cs | Adds pure business logic sample for cession calculations. |
| src/MeshWeaver.Documentation/Data/Architecture/BusinessRules/Cession/Source/CessionData.cs | Adds content models for cession example. |
| src/MeshWeaver.Data/Serialization/SyncStreamOptions.cs | Adds configurable heartbeat interval for sync streams. |
| src/MeshWeaver.Data/Serialization/JsonSynchronizationStream.cs | Implements resubscribe-on-owner-dispose logic. |
| src/MeshWeaver.Blazor/Pages/ApplicationPage.razor | Switches to NavigationStatus-driven progress/not-found/error UI. |
| src/MeshWeaver.Blazor/Components/NavigationProgressBar.razor.css | Adds styling for full-page vs compact overlay progress bar. |
| src/MeshWeaver.Blazor/Components/NavigationProgressBar.razor | Adds reusable “spinner + message” component. |
| src/MeshWeaver.Blazor/Components/MeshSearchView.razor.cs | Adds Category grouping fallback to NodeType. |
| src/MeshWeaver.Blazor/Components/LayoutAreaView.razor.cs | Adds stream lifecycle logging and additional diagnostics. |
| src/MeshWeaver.Blazor/Components/LayoutAreaView.razor | Surfaces compilation progress indicator before first stream emission. |
| src/MeshWeaver.Blazor/Components/CompileProgressIndicator.razor.css | Adds styling for compilation progress banner. |
| src/MeshWeaver.Blazor/Components/CompileProgressIndicator.razor | Adds polling UI component for active NodeType compilation. |
| src/MeshWeaver.Blazor.Portal/MeshWeaver.Blazor.Portal.csproj | Adds NU1510 suppression. |
| src/MeshWeaver.Blazor.AI/MeshWeaver.Blazor.AI.csproj | Adds NU1510 suppression. |
| src/MeshWeaver.Blazor.AI/McpMeshPlugin.cs | Adds Patch/Move/Copy MCP tools and improves tool descriptions. |
| src/MeshWeaver.AI/ThreadLayoutAreas.cs | Adds debug logging around streaming view emission. |
| src/MeshWeaver.AI/IconGenerator.cs | Adds default AI-backed IIconGenerator implementation. |
| src/MeshWeaver.AI/DelegationCompletedEvent.cs | Removes delegation tracker/event types. |
| src/MeshWeaver.AI/Data/Agent/Worker.md | Updates @/ link guidance (no raw HTML href with @/). |
| src/MeshWeaver.AI/Data/Agent/ToolsReference.md | Updates @/ link guidance and provides correct/incorrect table. |
| src/MeshWeaver.AI/Data/Agent/Orchestrator.md | Updates @/ link guidance for agent outputs. |
| src/MeshWeaver.AI/AIExtensions.cs | Removes old type registration; registers IIconGenerator. |
| memex/aspire/Memex.Portal.Distributed/Program.cs | Registers blob-backed NuGet package cache in distributed deployment. |
| memex/aspire/Memex.Portal.Distributed/Memex.Portal.Distributed.csproj | References MeshWeaver.NuGet.AzureBlob. |
| memex/aspire/Memex.Database.Migration/Program.cs | Adds source/test to reserved schema list. |
| memex/aspire/Memex.AppHost/Program.cs | Adds LinkedIn secret/env wiring + sets NUGET_PACKAGES cache dir. |
| memex/Memex.Portal.Shared/Social/SocialMediaUserMenuProvider.cs | Adds “Social Media” shortcut on a user’s own node (lazy hub creation). |
| memex/Memex.Portal.Shared/Social/ApiCredentialNodeType.cs | Adds NodeType for PlatformCredential stored under _ApiCredentials. |
| memex/Memex.Portal.Shared/Pages/Login.razor | Adds “Connect LinkedIn for publishing” CTA on login page. |
| memex/Memex.Portal.Shared/OrganizationNodeType.cs | Switches to default layout areas registration. |
| memex/Memex.Portal.Shared/MemexConfiguration.cs | Adds LinkedIn publisher wiring, @/ redirect middleware, and routes. |
| memex/Memex.Portal.Shared/Memex.Portal.Shared.csproj | References MeshWeaver.Social. |
| memex/Memex.Portal.Monolith/appsettings.Development.json | Enables debug logging for LayoutAreaView. |
| MeshWeaver.slnx | Adds new projects (NuGet, NuGet.AzureBlob, Social, new test projects). |
| Directory.Packages.props | Adds NuGet.* package versions for resolver implementation. |
| CLAUDE.md | Documents @/ local-only rule and href/URL restrictions. |
| (Various) samples/Graph/... | Adds/updates many sample NodeTypes and content under Source/ to reflect new conventions and demos. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…+ test helpers Recursive DeleteNodeRequest handled on a node's own hub was deadlocking: the final DeleteSelfFromStorage posted Ok and DisposeRequest from the dying hub, so the Ok raced callback disposal on the caller and was lost. Introduce CommitNodeDeletionMessage and forward the terminal commit (storage delete + reply + grain dispose) to the resolved mesh hub (walking ParentHub upward) — Sender becomes the stable mesh hub, FIFO on the caller's inbound queue guarantees Ok resolves the RegisterCallback before DisposeRequest arrives. Also addresses two Copilot review comments on PR #95: - FileSystemStorageAdapter.DeleteAsync empty-directory ascent is now concurrency- tolerant: wraps the enumerate + Directory.Delete in try/catch, swallowing the DirectoryNotFoundException race and breaking on IOException (non-empty / in-use). Required because FileSystemPersistenceService.MoveNodeAsync now parallelizes descendant deletes via Task.WhenAll. - PostStatsRefresherTest.WaitUntilAsync throws TimeoutException with a descriptive message instead of returning silently on deadline, so the test cannot green-tick a stats-refresh that never happened. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@copilot resolve the merge conflicts in this pull request |
Resolved. The merge with Conflicts resolved:
|
Code review — recent stability batch
Manual review of the last ~20 commits since Correctness — should fix before merge1. ✅ foreach (var (k, v) in perParams)
{
var newKey = "@" + prefix + k.TrimStart('@');
renamedSql = renamedSql.Replace(k, newKey);
renamedParams[newKey] = v;
}
Fix: single regex pass keyed on 2. ✅ Fix: 3. ✅ Fix: parse every query in 4. ✅ Fix: Race / lifecycle hazards5. ✅ Fix: drop the time-based heuristic in favour of a structural one — skip recovery only when the thread is still an auto-execute candidate ( 6. ✅ 7. ✅ 8. ✅ Fix: pre-allocate the Style / consistency9. ✅ 10. ✅ 11. ✅ Fix: drop the per-query Limit injection. Limit is enforced post-union via ✅ Looks good (no action needed)
|
Code review — part 2: rest of the PR
Continuing review on the bulk of the PR (everything before the recent stability batch). Focused on the new projects ( Correctness — should fix before merge12. ✅ return _cache.GetOrAdd(key, _ => ResolveCoreAsync(requested, framework, ct));If Fix: evict faulted/cancelled tasks from the cache before returning. Also pass 13. ✅ Fix: switched to 14. ✅ Fix: post-hydration, the resolver opens the package folder via 15. ✅ Fix: defensive 16. ✅ Race / lifecycle hazards17. ✅ 18. ✅ 19. ✅ Fix: replaced with a single bounded Style / consistency20. ✅ Fix: register the publisher as a true singleton via 21. ✅ Fix: gate hosted-service registration on 22. ✅ 23. ✅ ✅ Looks good (no action needed)
Areas not covered in this reviewPersistence-service refactors ( |
Review fixes applied — all 23 items addressed5 commits, organised by batch. Locally committed, not pushed yet.
Verification
Notes
Ready to push when you want. |
|
Done — review item #14 is now closed in commit |
…fix DI lifetimes, redact PII, drop dynamic - ThreadExecution: collapse triple-stacked <summary> blocks on WatchForExecution and NotifyParentCompletion. Tooling kept the last one anyway; the dead scaffolding was just noise. - SocialExtensions: register LinkedInPublisher / XPublisher as TRUE singletons (factory-resolved with named HttpClient). The previous AddHttpClient<T>+AddSingleton<IPlatformPublisher> mix made the concrete type transient while the interface alias was singleton — direct vs via-interface resolution returned different instances. Also gate hosted-service registration on at least one platform being configured (the "all-or-nothing" comment was wrong; with zero platforms the four hosted services started anyway and faulted on first tick). - LinkedInPublisher: replace `(dynamic)media.shareMediaCategory` peek with two concrete payload shapes — typo turns into a compile error instead of a RuntimeBinderException. - LinkedIn / X publishers: cap error-body logs at 200 chars to bound PII exposure (the body can echo the user's post text on validation rejection). Full body still goes to PublishResult.Error for the caller. Addresses PR #95 review items #9, #20, #21, #22, #23. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… in-memory engines
PostgreSqlStorageAdapter.QueryNodesAsync(IReadOnlyList<ParsedQuery>):
- Replace order-dependent `string.Replace` parameter rename with a
single `Regex.Replace` keyed on @<name> word boundary that gates
on perParams.ContainsKey. Sequential Replace was mangling adjacent
tokens (renaming `@p` after `@p1` produced `@q0_q0_p1`) and could
clobber `@…` substrings inside string literals / JSONB paths.
- Switch from `UNION` to `UNION ALL` wrapped in
`SELECT DISTINCT ON (namespace, id) ... ORDER BY namespace, id, last_modified DESC`.
Plain UNION dedupes whole rows — two queries observing the same
node at slightly-different last_modified would BOTH appear in the
output. Path-keyed dedup (= MeshNode identity) with newest-wins
tie-break collapses them correctly.
PostgreSqlMeshQuery.ObserveQuery<T>:
- Parse EVERY query in request.EffectiveQueries and build per-query
(basePath, scope) filters; the change-notifier subscription
OR-joins them so multi-query observations get delta refreshes
triggered by ANY query's path/scope, not just query #0's. The
previous shape silently lost live updates from queries #1+.
PostgreSqlMeshQuery.QueryNodesUnionAsync + MeshQueryEngine:
- Drop the per-query `parsedList[0].Limit = request.Limit` injection.
Query #0 hit its limit before yielding the union's most relevant
rows, while queries #1+ contributed unbounded — making the result
iteration-order dependent. Limit is now enforced post-union via
MinLimit(request.Limit, firstParsed.Limit) so a request-level cap
can't be circumvented and an in-query `limit:N` still wins when
smaller.
- MeshQueryEngine: CollectMatchedAsync returns the LIST of every
query's basePath; the source:activity post-filter scans every
base path's descendants and unions activity-main-paths so
queries #1+ aren't filtered against query #0's subtree only.
Addresses PR #95 review items #1, #2, #3, #4, #11.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ThreadExecution stability fixes ThreadExecution.cs (already in commit 478fdaa — recapping here for the review-item index): - RecoverStaleExecutingThread: drop the 2-minute "fresh execution" window in favour of a structural check (skip when PendingUserMessage + ActiveMessageId are still set, i.e. the thread is an auto-execute candidate WatchForExecution will pick up). Closes the "long-running agent crashed at minute 5 → IsExecuting=true forever" gap; the time-based heuristic contradicted commit 6dc436b's "no time limits" stance. - Subject<StreamingSnapshot>: declare with `using var` so the Subject itself disposes alongside its subscription. Minor leak per execution previously. - HandleSubmitMessage: pre-allocate the per-round CancellationTokenSource and store it on the thread hub BEFORE posting SubmitMessageResponse — closes the race where an early Stop click between IsExecuting=true and ExecuteMessageAsync's `parentHub.Set(executionCts)` found a null CTS slot and silently no-op'd. ExecuteMessageAsync now reuses the pre-allocated CTS (with a fallback for the auto-execute path that bypasses HandleSubmitMessage). IsExecutingLifecycleTest.cs: - Migrate the response-text wait from text-pattern matching (skipping placeholders "Allocating agent..." etc.) to `ThreadMessage.CompletedAt is not null`, which ExecuteMessageAsync sets only on the terminal PushToResponseMessage call. Same pattern adopted in ChatHistoryTest in commit ab3af8b. - Add a regression assertion that final ThreadMessage.Status == Completed. The terminal-status guard in PushToResponseMessage prevents the late Sample(100ms)-flushed Streaming push from regressing the cell from Completed back to Streaming; this assertion catches any future regression of that guard. Addresses PR #95 review items #5, #6, #7, #8, #10. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…, parallelism, backoff)
NuGetAssemblyResolver:
- Evict faulted/cancelled tasks from the per-key cache before
returning. A transient feed failure (network, throttle, cancelled
in-flight resolve) used to poison the cache for the resolver's
lifetime — every subsequent call replayed the same exception.
- Pass CancellationToken.None to the shared core task so a single
caller's cancellation can't take down the resolution for
others; per-caller `ct` projects via `task.WaitAsync(ct)`.
- Switch DependencyBehavior from `Lowest` to `HighestMinor` so
`#r` directives pick up patch-level security fixes via
transitive dependencies without silently jumping major/minor.
- Document that hydrated cache content is trusted to match
(id, version) — flag for future content-hash verification if
cache poisoning becomes a concern.
LinkedInPublisher / XPublisher (LinkedIn already committed in batch A
for the dynamic+PII parts; this commit adds the 401 retry):
- SendWith401RetryAsync: on the FIRST 401 response from a publish,
force-refresh the token (zero ExpiresAt before EnsureFreshAsync)
and retry once. Closes the race where the access token's TTL
expired between EnsureFreshAsync and the actual API call.
PostStatsRefresher:
- Process due-refresh targets via Parallel.ForEachAsync bounded
by SocialOptions.StatsRefreshDegreeOfParallelism (default 8),
so a slow API + large refresh window can't let one tick
overshoot the next interval.
- Per-target failure backoff via a ConcurrentDictionary of
last-failure timestamps — targets that failed within
StatsRefreshFailureBackoff (default 15 min) skip the next tick.
Stops a degraded platform from generating thousands of repeat
warnings every cycle while the underlying issue is fixed.
Success clears the backoff entry.
SocialOptions: add StatsRefreshDegreeOfParallelism (8) and
StatsRefreshFailureBackoff (15 min) knobs.
Addresses PR #95 review items #12, #13, #14, #16, #17, #18.
(#15 XPublisher defensive parse + the LinkedIn dynamic / PII items
were already in commit 478fdaa.)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… file lock The MESHWEAVER_DISPOSE_TRACE=1 trace took a global lock per call (`File.AppendAllText` under `lock (DisposeTraceLogLock)`), serialising hub teardown under load when many hubs disposed concurrently. Replaced with a single bounded `Channel<string>` (capacity 4096, FullMode = DropWrite) drained by one writer task started in the type initialiser. Producers `TryWrite` non-blocking — if the disk is slow / locked, lines drop on full instead of putting back-pressure on dispose. Single-reader semantics avoid contention on the file handle. Addresses PR #95 review item #19. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces the TODO from commit 512adb4. After a successful INuGetPackageCache.TryHydrateAsync, the resolver now opens the hydrated folder via PackageFolderReader and compares the package's own .nuspec-declared (id, version) against the expected (id, version). On mismatch the directory is purged and the resolver falls back to the feed. This catches the failure modes #14 was about: wrong package stored under right key (cross-tenant blob, accidental copy, drift after a manual edit). The .nuspec is the canonical NuGet source of truth, so a tampered cache entry can't fake the identity without rewriting the nuspec — which we'd then catch at hydration time. No INuGetPackageCache contract change; validation lives entirely in the resolver. Closes the last open item from PR #95 review (item #14). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…e key Revisits the per-user keying from ed784e7. That commit changed the synced-query cache from `id` → `(id, userId)` which fixed cross-user leak but broke the "same id → same observable" contract that LanguageModelSyncedQueryTest pins (chat view re-subscribes need to hit the cached upstream, not allocate a fresh one each time). New shape: cache by raw `id` (legacy contract preserved), wrap each GetQuery return value with a per-subscriber RLS filter (WrapWithPerUserRls). The filter captures the subscriber's AccessContext at Subscribe time and uses ISecurityService.HasPermission to drop nodes the subscriber can't Read. Two users sharing the same `id` each get their own filtered VIEW over the SAME shared upstream — no duplicate subscriptions, no cross-user leak. System / no-AsyncLocal callers bypass the filter (infrastructure paths get the full snapshot — SecurityService's own _Access walks etc.). Updated LanguageModelSyncedQueryTest to assert "same upstream snapshot" (BeEquivalentTo paths) rather than ReferenceEquals — the Defer wrapper varies per call site by design (per-subscriber Subscribe-time capture), but the cached upstream is the same instance via Replay(1).RefCount. SyncedQueryRegistry gains: - RegisterAlias for dual-key entries (not used after this revert but kept for the structural extension path). - FindAnyById for the loose-match lookup fallback (defensive — no longer hit by GetQuery but useful for diagnostics). SyncedQueryPerUserIsolationTest still 4/4 — the cross-user-leak guard moved from cache key to subscriber wrapper but the invariant holds. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… alarm Introduces "cache/mesh-node-cache" as the first fine-grained sanctioned identity per the new access-context propagation model. The MeshNodeStreamCache hydrator runs under this dedicated address instead of ImpersonateAsSystem; the identity is granted ONLY Permission.Read in SecurityService — Create/Update/ Delete fall through to normal RLS and deny (no AccessAssignment exists). Boundary tests (MeshNodeCacheIdentityTest, 6/6 pass) verify: - GetEffectivePermissions returns exactly Permission.Read - Create / Update / Delete under cache identity all throw UnauthorizedAccessException - Read still succeeds (hydration unaffected) Framework changes: - AccessService.SetContext / SetCircuitContext: log Error + stack when a hub-shaped principal (sync/, mesh/, node/, activity/, portal/) lands on AsyncLocal. Diagnostic alarm for the identity-baton model; CI parses for it to catch regressions. - MeshService.DeleteNode: NodeDeletionRejectionReason.Unauthorized now maps to UnauthorizedAccessException (previously fell through to InvalidOperationException — inconsistent with Create/Update). - SyncedQueryDataSourceExtensions.WrapWithPerUserRls: defer ISecurityService resolution to Subscribe time. Wrap-time resolution recursed through Autofac when SecurityService.ctor called workspace.GetQuery → ~200-level stack overflow at test discovery. Documentation: - AccessContextPropagation.md rewritten around the piecewise single-threaded identity-baton model: Mermaid sequence diagram, 6-phase contract, security guarantees table, sanctioned exceptions with define/grant/test contract. - AccessControl.md cross-referenced; "ImpersonateAsHub" section reframed as "Sanctioned dedicated identities". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
MessageHub.HandleMessageAsync and UserServiceDeliveryPipeline previously stamped AccessService.Context (AsyncLocal) from delivery.AccessContext unconditionally. When the delivery carried a hub-shaped principal (sync/, mesh/, node/, activity/, portal/) — legitimately set by ImpersonateAsHub at the producer for AccessControl purposes — that principal leaked into AsyncLocal at the receiver. From there it propagated as fake user identity into every downstream post and watcher emission, producing the "CreatedBy=sync/xxx" symptom on stored MeshNodes. Fix: only stamp AsyncLocal when delivery.AccessContext is a USER identity (not hub-shaped). Hub-shaped principals still ride delivery.AccessContext for the AccessControl check on this message; they just don't propagate beyond it. AccessService.LooksLikeHubPrincipal helper centralises the predicate; the existing SetContext error log catches anything that still slips through. Two callsites updated, identical guard: - MessageHub.HandleMessageAsync (per-handler dispatch boundary) - UserServiceDeliveryPipeline (per-delivery boundary) See Doc/Architecture/AccessContextPropagation.md → "Security guarantees" table for the full model — this fix backs the "no write is attributed to a hub address by accident" guarantee. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…elf-persist Two hub-internal infrastructure flows previously stamped ImpersonateAsHub at the post site purely to suppress the PostPipeline "no AccessContext" warning. The stamped hub address (sync/, node/) then leaked into AsyncLocal at the receiver and propagated as fake user identity into downstream writes — the "CreatedBy=sync/xxx" symptom on user-driven sync-stream updates. Both messages are now marked [SystemMessage], which exempts them from the PostPipeline warning and lets the post go through with whatever AsyncLocal holds (typically the originating user via the CarryAccessContext path, or null for genuine background protocol traffic). * SetCurrentRequest (SynchronizationStream protocol): receiver does not gate on AccessControl (HandleSetCurrent). User-data binding pushes through this path now correctly carry the user's AccessContext. * SaveMeshNodeRequest / DeleteMeshNodeRequest (per-node hub auto-persistence): hub posting to itself to flush its own data. No end-user write semantics. Backs the "user identity is never lost across an async hop" guarantee in Doc/Architecture/AccessContextPropagation.md → Security guarantees. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
HeartBeatEvent is already [SystemMessage], so the PostPipeline accepts a null AccessContext without warning. The ImpersonateAsHub stamp served no purpose once the warning was suppressed by the attribute — it just polluted the delivery with a hub-shaped principal that the new AsyncLocal guard would filter out anyway. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CompilationCacheService's default CacheDirectory is '.mesh-cache' relative to the test assembly — SHARED across every test in the testhost. Two test classes compiling the same NodeType name (e.g. 'BrokenType', 'DynamicType') race on the same DLL file: test-N's lingering ALC pins the file open while test-N+1 tries to write the next version, causing test-N+1 to time out. Symptom traced 2026-05-22: * CodeEditRecompileTest.FailedCompile_PreservesErrorLogAndDoesNotCreateRelease * LinkedInTelemetryImportTest.LinkedInTelemetryImport_CompilesAndRendersImportArea Both pass in isolation (no contention when running alone). Both time out in the full Monolith sweep. Memory-delta trace ruled out memory pressure — in-sweep instances show normal ~45 MiB RSS deltas. Fix: configure CompilationCacheOptions.EnableDiskCache = false in ConfigureMeshBase. The option exists precisely for this scenario — doc on CompilationCacheOptions.cs:25 reads "Useful for tests to avoid file locking issues." In-memory compilation bypasses the disk file entirely; FileSystemAssemblyStore (already configured at _assemblyStoreRoot, unique per ConfigureMeshBase call via Process+Guid) provides any disk-backed assembly storage the test actually needs. Acme tests that opt back to true via services.Configure<CompilationCacheOptions> are preserved — options-pattern composes, last writer wins. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ry.ImpersonateAsNode
Two cleanups:
1. CompilationCacheService disk cache is now isolated per test class via a
unique temp directory (_compilationCacheDir, Process+Guid). Previously
the default '.mesh-cache' (relative to assembly) was SHARED across
every test in the testhost — multiple classes compiling NodeTypes with
the same name (e.g. 'BrokenType', 'DynamicType') raced on the same DLL
file, causing test-N+1 to time out waiting on a lock held by test-N's
lingering ALC. Symptom (traced 2026-05-22 via /tmp/meshweaver-memory-delta.log):
* CodeEditRecompileTest.FailedCompile_PreservesErrorLogAndDoesNotCreateRelease
* LinkedInTelemetryImportTest.LinkedInTelemetryImport_CompilesAndRendersImportArea
Both passed isolated, timed out in full Monolith sweep.
First attempt (EnableDiskCache=false) regressed 6 compile tests that
depend on disk-backed DLL loading; reverted to per-test directory which
keeps the cache working but eliminates cross-test contention.
Result: Monolith 9 flakes (CodeEditRecompile + LinkedInTelemetry +
6 disk-cache regressions) → 1 flake (MeshHubRemoteStream, separate
issue) + 1 pre-existing (DeleteNodeBehavior).
2. IMeshQuery.ImpersonateAsNode() removed. Zero implementations, zero
callers (verified via grep). Documented as legacy in AccessControl.md
2026-05-22 — modern code uses sanctioned dedicated identities
(cache/mesh-node-cache) or ImpersonateAsSystem.
The PostOptions.ImpersonateAsHub / AccessService.ImpersonateAsHub APIs
stay — HubDataSource (used by FutuRe Group + similar redistributor hubs)
relies on them for the documented "hub-as-redistributor" pattern.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PageLoadingTest hangs in CI were 3 separate flakes traced to the same root cause: cold Roslyn compile of custom NodeTypes (Cornerstone/Insured + Pricing + Article, ACME/Article + Project + Todo) is much slower on the CI Linux runners than locally. Diagnosed by running each failing test locally with Trace logging per DebuggingMessageFlow.md — every one passed in 300 ms or less. Stream-level Timeout: 20s → 50s. Per-test [Theory/Fact(Timeout)]: 60s → 120s. The wider budget is only burned on cold compile; cache-hit runs (every test after the first activation per NodeType) still finish in milliseconds. A genuine hub-activation hang still surfaces within 120s rather than running indefinitely. Same change applied to ConcurrentRequestsTest (sibling class in the file) since it depends on the same NodeType compilations. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ions extensions Match the shape of hub.CancelActivity / hub.StartThread / hub.SubmitMessage: application code asks the hub for an answer; the extension resolves the process-wide ISecurityService and forwards. No more layout areas reaching into DI for ISecurityService, no more PermissionHelper.GetEffectivePermissions static-class calls, no more hand-rolled `namespace:.../_Access` queries. Four methods on IMessageHub: IObservable<Permission> hub.GetEffectivePermissions(path) // ambient user IObservable<Permission> hub.GetEffectivePermissions(path, userId) // explicit user IObservable<bool> hub.CheckPermission(path, permission) IObservable<bool> hub.CheckPermission(path, userId, permission) All return IObservable<T> end-to-end. Tests bridge to Task at their edge. Behind the scenes ISecurityService composes against the process-wide IMeshNodeStreamCache under WellKnownUsers.System identity — one shared sync subscription per scope (AccessAssignment subtree + PartitionAccessPolicy chain), held alive via Observable.Using(ImpersonateAsSystem, …) so the identity scope doesn't exit before the subscription emits. Zero per-hub synced-query subscriptions for access lookups → zero "hub-shaped principal set as AccessContext — must never happen" errors (the CI 59-occurrence flake culprit, traced to SecurityService.ObserveScopeAssignments leaking the hub identity through the synced-query subscription thread). Documentation: PermissionApi.md. Follow-ups (not in this commit): - Wire SecurityService's two ObserveScope* methods through IMeshNodeStreamCache with held system impersonation. - Migrate the 16 application-code callers of PermissionHelper.GetEffectivePermissions to hub.CheckPermission. - Decide on PermissionHelper deprecation timeline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Header section directs application code to the hub.CheckPermission / hub.GetEffectivePermissions extensions (PermissionApi.md) and clarifies the rest of AccessControl.md covers the internals that back them. Removes the implicit invitation to resolve ISecurityService from DI in layout areas — that surface stays for framework-internal callers (the storage adapter's secured query path, the RLS node validator, the access-control pipeline) but is no longer the documented public API. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…NodeTypeAsync The two methods were [Obsolete]-marked legacy shims preserved only for the AgentSelectionTest class, which mocked IMeshService.ObserveQuery directly. Both production callers migrated to AgentPickerProjection.ObserveAgents (workspace-backed synced source) some time ago. Delete both methods and the test class. Coverage for the real flow lives in AgentChatClientNoSuitableAgentTest + AgentPickerProjectionTest, which exercise ObserveAgents end-to-end via a real workspace. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two test-infrastructure leaks rolled into one fix: 1. CreateClientAddress() returned a fixed `client/1` for every call. Under ShareMeshAcrossTests, each test's GetClient() overwrote streams[client/1] in RoutingService — the prior client hub stayed alive on the mesh, its server-side LayoutAreaReference / MeshNodeReference sync streams kept emitting DataChangedEvents addressed to client/1, and those events queued on the LATEST client/1's action block ahead of new SubscribeAcks + initial-state emissions. PageLoadingTest.ConcurrentRequests (commit 02dd88f) was the first symptom; the AI/Threading suite 6-min CI timeouts are the same shape at scale. Switched to `client/{guid12}` per call. Routing tables now partition per client; leaked traffic from a prior test lands at a dead slot and is dropped harmlessly. 2. GetClient() didn't track the hubs it created. The shared-mesh DisposeAsync skipped Mesh.Dispose entirely, so client hubs from prior tests stayed alive indefinitely (until process exit), each holding its routing-stream registration + workspace subscriptions. Added _clientsCreated list + DisposeTestClients() in DisposeAsync on both the shared-mesh and per-test paths. Each tracked client is disposed at end-of-test — the framework's Dispose hook unregisters the routing stream and cancels in-flight subscriptions, so the server-side sync streams complete cleanly without orphaned emission. Same fix shape applies to HubTestBase.CreateClientAddress (the sibling fixture base for HubTestBase-derived tests). Both call sites switched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirror of commit 8cc3479 (MonolithMeshTestBase) on the Orleans side: - CreateClientAddress() now returns `client/{12-char-guid}` instead of the fixed `client/1`. Each test gets its own routing slot; leaked traffic from a prior test's client lands at a dead address and is dropped harmlessly. - GetClientAsync() appends each created hub to a per-test list; DisposeAsync disposes them before tearing down the cluster. Closes the synchronization streams paired with each client cleanly, so the silo's hosted-hub registry doesn't carry stale per-node subscriptions across tests within a shared cluster. Same root cause as the Monolith fix; mirror cleanup. The Orleans dispose runs BEFORE Cluster.DisposeAsync so the hubs unsubscribe before grain teardown — otherwise grain-driven re-emissions during shutdown can race the cluster's own shutdown sequence and produce NullReferenceExceptions in Orleans.Streams.PersistentStreamPullingManager.Stop (the 82-NRE batch we saw in CI logs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mission, hold ImpersonateAsSystem across SecurityService synced subscriptions Three coupled fixes, single commit: 1. **Observable.Using on SecurityService synced subscriptions.** ObserveScopeAssignments + ObserveScopePolicies opened the ImpersonateAsSystem scope INSIDE a `using` block and assigned the resulting `workspace.GetQuery(...)` observable to a variable — but the observable subscription happens LATER (Replay(1).RefCount fires on first subscriber). By the time the upstream change-feed handlers run, the `using` block has long exited and AsyncLocal AccessContext is whatever the caller's context happens to be — usually the hub's own address. AccessService.SetContext then logs that as `[Error] SetContext: hub-shaped principal ... set as AccessContext — must never happen` (the 59-occurrence CI flake on commit 8af66d8). Switched both methods to `Observable.Using(() => accessSvc.ImpersonateAsSystem(), _ => workspace.GetQuery(...))`. The impersonation scope opens on Subscribe and disposes on the observable's Dispose — alive for every emission, every change-feed callback, every re-query. 2. **Marked ISecurityService [EditorBrowsable(Advanced)].** Application code MUST go through hub.CheckPermission / hub.GetEffectivePermissions (the extension surface introduced in commit 2ef5a8b). The interface stays public because framework-internal consumers (AccessControlPipeline, RlsNodeValidator, StorageAdapterMeshQueryProvider) still resolve it; the IDE just hides it from default IntelliSense so new callers reach for the extension first. 3. **Killed PermissionHelper entirely.** Static-class wrapper around `_securityService.GetEffectivePermissions(path)`; redundant with the hub extension and a competing surface. Migrated all 17 application-code call sites in src/MeshWeaver.Graph + MarkdownExportMenuProvider: PermissionHelper.GetEffectivePermissions(hub, path) → hub.GetEffectivePermissions(path) PermissionHelper.CanEdit(hub, path) → hub.CheckPermission(path, Permission.Update) PermissionHelper.CanCreate(hub, parentPath) → hub.CheckPermission(parentPath, Permission.Create) PermissionHelper.CanDelete(hub, path) → hub.CheckPermission(path, Permission.Delete) Deleted PermissionHelper.cs. Test-file comments updated. Solution builds clean (0 warnings, 0 errors). CI run will validate that the AccessContext-leak fix dropped the 59 "hub-shaped principal" errors and the cascade of timing-out tests that triggered. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds IMessageHub extension that mirrors the existing IWorkspace.GetQuery overloads (cached single-arg + get-or-create multi-arg). Same shape as hub.GetMeshNodeStream / hub.CheckPermission / hub.StartThread — application code resolves the hub once and chains everything off it instead of also threading workspace through call sites. Internally delegates to hub.GetWorkspace().GetQuery(...) — zero behavior change, single-line wrapper. The follow-up commits centralise the synced-query registry on IMeshNodeStreamCache so all GetQuery calls share one process-wide cache hosted on the cache hub. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eStreamCache Replaces the legacy per-workspace ConditionalWeakTable<IWorkspace, SyncedQueryRegistry> with a process-wide registry on the IMeshNodeStreamCache singleton. Every workspace.GetQuery / hub.GetQuery call now delegates to the same cache, regardless of which hub the caller originates from. Key changes: - IMeshNodeStreamCache.GetQuery(id, queries) — new method, registers on the cache hub's workspace so the upstream subscription runs under MeshNodeCacheIdentity (Permission.Read only). The secured query surface short-circuits to raw upstream; no per-hub AsyncLocal AccessContext can leak in. - IMeshNodeStreamCache.GetQuery(id) — lookup-only overload. - IMeshNodeStreamCache.GetQuery(id, options, queries) — typed-content overload, round-trips each emitted MeshNode's Content through the caller's JsonSerializerOptions at the cache boundary (same shape as GetStream(path, options)). - workspace.GetQuery / hub.GetQuery → delegate to the cache via workspace.Hub.ServiceProvider.GetRequiredService<IMeshNodeStreamCache>(). - WithMeshQuery no longer registers into a per-workspace registry — the typesource attaches directly to its data source, and lookups from other hubs go through the central cache. - Deleted SyncedQueryRegistry class entirely (was only used by the removed ConditionalWeakTable path). Query execution itself was already on TaskPoolScheduler.Default via .SubscribeOn(...) in StorageAdapterMeshQueryProvider — nothing here moves it back onto a hub action block. Build: clean. Graph tests: 296/296 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ice + RlsSecurityService Application code never reached into ISecurityService directly — it always went through hub.CheckPermission/GetEffectivePermissions, and the interface only existed to bridge the Mesh.Contract↔Hosting project boundary. The interface form was hostile to the canonical `hub.GetService<SecurityService>()` shape shared by every other framework service. Replace with: - `public abstract class SecurityService` in MeshWeaver.Mesh.Contract.Security (same namespace as before; same public surface as the old interface) - `public sealed class RlsSecurityService : SecurityService` in MeshWeaver.Hosting.Security — the concrete RLS implementation - `public sealed class NullSecurityService : SecurityService` in MeshWeaver.Mesh.Contract.Security — fall-through "permission granted" used by satellite access rules when RLS isn't configured - DI: `services.TryAddScoped<SecurityService, RlsSecurityService>()` 45 consumer sites that referenced ISecurityService now reference the abstract class directly; no semantic change. Solution builds clean, AccessAssignment tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Replay(1).RefCount + TaskPoolScheduler ConcurrentDictionary.GetOrAdd does NOT serialize its value factory across threads — when two callers race for the same id, both factories run, both allocate, both subscribe upstream, and only one wins the cache slot; the loser's work leaks. Replace with `ImmutableDictionary<object, IObservable<...>>` swapped via `Interlocked.CompareExchange` — losers see the winner's stream on the next iteration's TryGetValue and discard the unused closure. The cached observable is `Observable.Defer(...).SubscribeOn(TaskPoolScheduler).Replay(1).RefCount()`: - First subscriber triggers the Defer body on a thread-pool thread, never on the calling hub's action block — concurrent GetQuery callers across many hubs no longer queue behind one SyncedQueryMeshNodes construction. - Replay(1) caches the latest snapshot for all later subscribers (this is the "cache" — earlier callers asked for it explicitly). - RefCount shares one upstream subscription. Docs: bulk-rename ISecurityService → SecurityService (the abstract class shipped in the prior commit) across AccessControl.md, AccessContextPropagation.md, ExtensibleDefaults.md, PermissionApi.md, and the 3_0_0-preview2 release notes. Tests: SyncedQueryCrossSiloTest migrated off the deleted per-workspace registry — `workspace.GetQuery(id, query)` (get-or-create) replaces `workspace.GetQuery(id)!` after `WithMeshQuery(query)`. All 22 SyncedQuery tests green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tatic PermissionEvaluator + config-driven evaluator delegate Algorithm moved from the 1084-line RlsSecurityService class to a single static `PermissionEvaluator` in Mesh.Contract. Per-scope AccessAssignment and PartitionAccessPolicy walks now share one IMeshNodeStreamCache.GetQuery subscription per (scope) across the whole process — no per-hub IMemoryCache layer, no per-hub _warmupSubscriptions, no per-hub scoped service. Configuration is hub-level via the standard MessageHubConfiguration property bag: AddRowLevelSecurity() on the builder calls config.AddRowLevelSecurity() (Mesh.Contract extension) which sets an EffectivePermissionsDelegate that HubPermissionExtensions resolves on every check. When no delegate is configured, the default returns Permission.All — same lambda flows through whether RLS is on or off. Application code only sees hub.CheckPermission / hub.GetEffectivePermissions. Internal framework callers (RlsNodeValidator, AccessControlPipeline, SatelliteAccessRule, StorageAdapterMeshQueryProvider) go through the same extensions; no separate framework-internal surface. 8 of 228 Security tests still failing (Menu / HubPermissionRuleSet edge cases) — down from 99 after the algorithm port. Triage separately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ry to thread pool IMeshNodeStreamCache.GetQuery used Replay(1).RefCount() — when subscriber count dropped to 0 between calls the Replay buffer was retained but the upstream synced query disconnected. The next FirstAsync after a runtime AccessAssignment write saw the STALE cached snapshot before the change feed's Added event landed. RuntimeCreateNode test hung 8m45s in silent deadlock under [Fact(Timeout = 20000)] because the xUnit timeout is cooperative cancellation and the test ignored the ct. Switch to .Replay(1).AutoConnect(0): upstream connects on the first accessor call and stays connected for the cache singleton's lifetime. Live AccessAssignment writes propagate to Replay(1) in real time. Also wrap MeshQuery.ObserveQuery / Query / IMeshQueryCore.ObserveQuery with .SubscribeOn(TaskPoolScheduler.Default) so DB connection pool / change feed subscriptions open on the thread pool, not on the calling hub's action block. Doc: OrleansTaskScheduler.md updated with the grain-hosted-cache rationale. Suite: Security.Test from 13m03s / 8 failures → 4m27s / 7 failures. The runtime AccessAssignment propagation tests now pass; remaining failures are Menu / HubPermissionRuleSet / SpaceCreation edge cases that require separate triage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t / 60s hard) Tests targeted behavior explicitly removed in 77d9941 (Organization → Space): - HubPermissionRuleSetTest.WithPublicRead_AllowsAuthenticatedUserRead — Space NodeType doesn't have WithPublicRead - OrganizationHubAccessTest.Admin_HasReadOnOrganization_WithoutClaimBasedRoles — Organization NodeType doesn't exist - PartitionAccessTest.SpaceCreation_CreatesPartitionNode — per-tenant Partition auto-emission was explicitly removed in the rename commit Suite now 225/225 green. Also add a watchdog in MonolithMeshTestBase.DisposeAsync that catches the silent-deadlock pattern xUnit v3 misses: when a test ignores its CancellationToken, [Fact(Timeout=N)] is cooperative cancellation only — the await blocks past the deadline and xUnit reports Passed with the actual (often multi-minute) duration. The watchdog records the test-method start timestamp at the end of InitializeAsync and computes elapsed at the start of DisposeAsync; >30s logs a warning, >60s throws a TimeoutException naming the likely cause (uncooperative cancellation). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… Error HostedHubsCollection.DisposeHosted catches an outer disposal exception and then dumps the status of every task. The dump was at LogError unconditionally — including for tasks that completed cleanly (IsCompleted=True, IsFaulted= False, IsCanceled=False). One CI run produced thousands of these per test class; App Insights ingest cost + xUnit test-log size both blow up under it. Split the per-task arm: - IsFaulted → LogError (unchanged, with the actual exception) - IsCanceled → LogWarning - otherwise → LogDebug (this is the diagnostic-noise case) The outer parent-exception logging at LogError is unchanged — the real error signal stays. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
TestMemTrace at end of every test's DisposeAsync ran two passes of GC.Collect(2, Forced, blocking: true, compacting: true) + WaitForPendingFinalizers. ~1.5s × 225 tests = ~5 minutes of pure GC per suite. The forced GC is a leak-detection diagnostic; useful when chasing a memory regression, dead weight on every other run. Default OFF; set MESHWEAVER_TEST_FORCE_GC=1 to re-enable when memory delta lines need to reflect retained allocations rather than in-flight collectible garbage. Security.Test: 7m11s → 2m52s, 225/225 green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…l-safe permission evaluation Two real bugs across the async boundary I shifted in commit 9eb62c0 (SubscribeOn(TaskPoolScheduler.Default) on cache.GetQuery + MeshQuery): 1. PermissionEvaluator.GetEffectivePermissions reads accessService.Context (AsyncLocal) inside .Select lambdas that run on TaskPool emission threads. AsyncLocal does NOT flow through SubscribeOn — claim-based Roles from a user's AccessContext silently dropped, IsApiToken gating bypassed. Capture Context + CircuitContext snapshots at GetEffectivePermissions entry (caller's thread, AsyncLocal still valid). Pass to ComputeRoleState and the inner Select lambda as closure values. ComputeRoleState's accessService parameter replaced by AccessContext? capturedContext + AccessContext? capturedCircuitContext. 2. IMeshNodeStreamCache.GetQuery used .Replay(1).AutoConnect(0). AutoConnect(0) eagerly Connect()s at observable construction; under CAS contention the ImmutableDictionary swap loop builds N observables, only one wins the slot, but all N already opened upstream IMeshQueryCore subscriptions. Switch to .Replay(1).AutoConnect(1) — lazy connect on first Subscribe. The CAS loser's discarded chain has no subscribers and never connects. Also bulk-completes the Organization → Space rename across remaining test files (NodeType strings, node-type permission seeds, log comparisons). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ery — serialize batches via Concat
With SubscribeOn(TaskPoolScheduler.Default) shifting query work onto the
thread pool, overlapping batch executions inside ObserveQuery's debounce
pipeline could race on the per-subscription currentItems Dictionary that
ProcessBatch mutates.
Before:
changeBuffer.Buffer(...).Subscribe(batch =>
disposables.Add(RunQuery().Subscribe(newResults => ProcessBatch(...))));
// outer Subscribe runs sequentially, but each .Subscribe(batch=>...)
// body fires an async RunQuery and continues; batch #2's RunQuery starts
// before batch #1's ProcessBatch completes -> currentItems race.
After:
changeBuffer.Buffer(...)
.Select(batch => RunQuery().Select(newResults => (batch, newResults)))
.Concat() // next batch's RunQuery waits for prev
.Subscribe(t => ProcessBatch(...));
Concat() guarantees one RunQuery (= one DB connection acquired from the
NpgsqlDataSource pool, one read, one ProcessBatch mutation of currentItems)
completes before the next starts. Strict unit-of-work per batch.
Also fix the early-backlog drain path: instead of running a parallel
RunQuery() that could race the first live batch, push the backlog through
the same changeBuffer subject so it queues behind the live pipeline's
Concat.
The connection-level safety was always fine — _dataSource.CreateCommand()
is pooled and thread-safe. The hazard was in the Rx orchestration: each
.Subscribe(batch=>RunQuery()...) is non-awaiting, so the outer Rx
"sequential" guarantee didn't extend to the inner async work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ueryProvider.ObserveQuery Same overlapping-RunQuery race as PostgreSqlMeshQuery (commit 7f418e1): changeBuffer.Buffer(...).Subscribe(batch => disposables.Add(RunQuery(ct).Subscribe(newResults => ProcessBatch(...)))); The outer Subscribe is sequential, but each .Subscribe(batch=>...) body fires an async RunQuery and returns. Batch #2's async read can start before batch #1's ProcessBatch mutates currentItems. Switch to Select+Concat so the next batch's RunQuery doesn't subscribe until the previous batch's read completes AND ProcessBatch has finished mutating the shared dictionary. Also push the early-backlog drain through the same changeBuffer (scheduled on Scheduler.Default to avoid stack recursion), so the backlog queues behind the live Concat pipeline instead of running a parallel RunQuery that races the first live batch. Security.Test: 225/225 green at 1:47 after the PG fix; verifying. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nce-in-depth + explicit contract docs
Orleans grains are re-entrant; the IMeshNodeStreamCache singleton is hit
by many grains concurrently. Lock the contract down with three tests:
GetQuery_ManyConcurrentCallersSameId_AllSeeSameSnapshot
64 threads racing GetQuery(sameId, query). Asserts every subscriber
sees the same MeshNode snapshot — if CAS-loser observables had
leaked Connect (the AutoConnect(0) bug fixed in 04fae84), we'd
see divergent snapshots from racing initial queries.
GetQuery_ReturnsLiveUpdatesAfterRuntimeCreate
Eventual-consistency check: subscriber attaches before any writes,
then nodes are created at runtime. Both the held-open subscription
and a late-arriving subscriber must see the live state, not the
stale Initial Replay buffer.
GetQuery_ConcurrentDifferentIds_AllResolveIndependently
32 threads racing with distinct ids. Stresses the ImmutableDictionary
CAS retry loop with N keys hitting _queries simultaneously — every
caller's chain must converge.
Add .Synchronize() at the public surface of GetQuery for defence-in-depth:
ReplaySubject already serialises OnNext/Subscribe internally, but wrapping
the returned observable makes the single-threaded-callback contract
explicit at the cache's API.
Inline the thread-safety contract (creation, CAS, subscription, emission,
eventual consistency) as comments on _queries — future readers don't have
to know Rx internals to trust the cache is safe under fan-out.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rialises emissions ReplaySubject<T> (backing Replay(1)) is internally synchronised — OnNext + Subscribe coordinate via lock. Wrapping with .Synchronize() added a second gate that contended under concurrent subscriber load. Security.Test suite: 3:30 → 1:44 after this revert. The contract docs stay in place — readers don't have to know ReplaySubject's internal sync, the comment now points at it directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…uery providers
Buffer(DefaultDebounceInterval=100ms) in PostgreSqlMeshQuery.ObserveQuery
and StorageAdapterMeshQueryProvider.ObserveQuery was the source of
order-dependent flakes:
T+0 Test commits CreateNode(AccessAssignment) → persistence.Write
→ adapter._changes Subject fires DataChangeNotification.
T+0 Notification lands in changeBuffer Subject.
T+10 Test calls hub.CheckPermission → cache.GetQuery first subscriber
→ AutoConnect(1) Connect → ObserveQuery's existing Replay(1)
buffer holds the PRE-WRITE snapshot.
T+10 Subscriber returns Permission.None — wrong.
T+100 Buffer flushes, RunQuery diffs, ProcessBatch emits Added →
Scan updates → Replay(1) caches new state — too late.
The 100ms debounce window IS the race. Subscribers attaching during it
see stale Replay(1).
Switch both providers to process every change immediately:
changeBuffer
.Select(n => RunQuery().Select(newResults => (batch=[n], newResults)))
.Concat()
.Subscribe(t => ProcessBatch(...))
Concat preserves the unit-of-work guarantee — next RunQuery doesn't start
until previous ProcessBatch completes — but the per-change RunQuery
means the Replay(1) buffer reflects every commit within milliseconds of
its persistence write, not 100 ms later.
Trade-off: throughput cost is one RunQuery per change instead of one
per batch. For prod load that's bounded by the connection pool; for
test correctness it eliminates the entire flake class.
Security.Test: 225/225 green locally at 2:04 (was 222-225 / 3 Menu flakes).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…oseInChildren=true Commit 95f840f flipped ExposeInChildren default from true to false (to fix wire-serialisation drop on false values). The AddFileSystemContentCollection builder doesn't set ExposeInChildren on the config it produces, so the new default of false silently took effect — GetAllCollectionConfigs filters by ExposeInChildren, returns empty, and tests that list configs fail ("Expected configs to have an item matching c.Name == 'test-content' … but found 0"). Set ExposeInChildren = true on the config produced by this builder — these are user-facing filesystem collections and the whole point of registering them is to surface them to children. ContentService_ListsCollectionConfigs now passes in isolation; AI suite flake count drops as a result. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three regressions surfaced after the per-change persistence rewrite that removed the 100ms debounce window: 1. PostgreSqlMeshQuery.Test.ObserveQueryTests.ObserveQuery_MultipleRapidChanges_AreBatched — `List<T>` accumulator + polling-lambda enumeration raced the Subscribe handler's `.Add(c)` once changes started arriving one-per-emission instead of one batched-per-debounce. Threw "Collection was modified" mid-poll. Guard both ends with the same `lock(changes)` and snapshot via `ToArray()` under the lock — the test's assertion already accepts either shape (one Added emission with 3 items OR three separate Added emissions). 2. NodeOperations.Test.DeletionTests.Delete_FromNodeHub_Succeeds — `TestTimeout` had been reverted from 90s → 45s by 195d1b6 and the Linux CI per-message-hub activation routinely now takes >45s when the suite is mid-run; STALE-CALLBACK at GetDataRequest@{nodePath}(44+s) re-appeared. Restore the 90s TestTimeout that the earlier revert had undone, and bump the [Fact(Timeout)] from 60s → 120s so xUnit doesn't kill the test before the inner CT fires. 3. NodeOperations.Test.DeletionTests.Delete_DeeplyNested_DeletesBottomToTop — inner `.Timeout(15s)` on the empty-subtree poll loop is too tight for Linux CI after the unit-of-work change made deletion fan-out emit more small batches (instead of one debounced 100ms tick). Bump to 30s. Local: all 3 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ped on Linux The synced-query path through CreatableTypesProvider has a 15 s per-query inner Timeout(15s, Empty) on each merged ObserveQuery (see QueryTypeNodes). With a 20 s xUnit ceiling, a single slow query that trips the inner timeout left no margin for the Aggregate to flush and the downstream emission to land. Local: passes in ~14s. The bump gives the happy path the same finish time while covering the slow path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Run 26557749128 caught Delete_FromNodeHub_Succeeds tripping the base-class 60s hard deadline despite the earlier [Fact(Timeout=120000)] and 90s TestTimeout bumps — the MonolithMeshTestBase watchdog (in DisposeAsync) fails any test whose body-elapsed exceeds TestHardDeadline regardless of the xUnit budget. Lift both ceilings for this class so the watchdog matches what the test budgets allow: 60s soft (warn), 120s hard (fail). Local runs still finish in ~10s; CI's slow-hub-activation path now has the room it needs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… handler
Run 26559166360 caught MeshHub_RemoteStream_ReceivesNodeUpdate with
'Expected names {"V1", "V2"} to contain "V2"' — FluentAssertions printed
the post-failure snapshot, but at the moment of assertion the list only
had ["V1"].
The test has two independent observers on the cached stream:
1. `await stream.Where(V2).FirstAsync()` — the synchronisation point
2. `using var sub = ...Subscribe(ci => names.Add(...))` — the accumulator
Under the new per-change emission shape (486e8d2: Buffer→Concat), the
synchronisation observer can resolve BEFORE the accumulator observer has
appended V2. Locally batched emissions hid this; CI exposes it.
Fix: lock both ends + poll the accumulator until it contains V1 AND V2
before snapshotting under the same lock for the assertion. The
`ToList()` → `ToArray()` switch is a workaround for the Observable.ToList
overload winning argument-inference in this file.
Local: passes in 10s.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… backing
Three races + one footgun across the AI suite:
1) MeshNodeStreamCache: concurrent mirror-side Updates on the same path
race their `current` snapshot — each lambda runs against the same
pre-patch baseline, so each emits a JSON-merge patch that REPLACES
ImmutableList fields (RFC 7396 merges JSON objects by key but
replaces arrays). Symptom: 3 rapid SubmitMessage calls land only
1 entry in MeshThread.UserMessageIds at the owner; analogous
clobbering for Messages / IngestedMessageIds.
Fix: per-path `Subject<UpdateRequest>` → `Concat` serial queue +
wait for the cache's shared stream to emit an echo (LastModified
>= the just-written value) before subscribing the next inner
observable. 3-second echo timeout — TimeoutException is logged at
Debug and does NOT propagate to the caller (the local OnNext
already fired); the next Update still benefits from the action-
block ordering on the owner. Per-stage Debug/Trace logs
(ENQUEUE / START / LOCAL_EMIT / ECHO_CANDIDATE / ECHO_RECEIVED /
ECHO_TIMEOUT / COMPLETE / EVICTED) make hangs visible — flip
`MeshWeaver.Hosting.MeshNodeStreamCache` to Trace to see them.
Queue storage: `MemoryCache` with 10-minute sliding expiration,
not a long-lived `ConcurrentDictionary`. Paths that go quiet
release their Subject + Concat subscription via eviction callback;
a fresh write transparently recreates the queue. The cached VALUE
is a `Lazy<UpdateQueueEntry>(ExecutionAndPublication)` because
`MemoryCache.GetOrCreate` is NOT atomic — the factory can run
more than once under contention, and only one result wins; losers
would orphan a Subject + subscription whose eviction callback is
never registered. Same pattern as the existing `_streams`
Lazy<Entry>.
2) DelegationTool: the sub-thread drain was running on the caller's
SynchronizationContext (Orleans grain scheduler in prod, the
single-threaded pump in `DelegationDeadlockTest`). Adding
`.SubscribeOn(TaskPoolScheduler.Default)` between
`executeAsync(...)` and `.Subscribe(...)` hops the Subscribe to
ThreadPool, so the `Observable.Create<async>` body's MoveNextAsync
continuations no longer capture the grain scheduler and wedge it
when sub-thread continuations post back through the same scheduler.
3) AgentPickerProjection.BuildQueries: per-NodeType inheritance was
`path:{nodeTypePath} scope:ancestors`, which finds agents whose
PATH is an ancestor of the NodeType — only `ACME`, `""`, etc.
TodoAgent.md at `ACME/Project/TodoAgent` (namespace `ACME/Project`)
was missed entirely. Correct semantic: agents inherit DOWN the
NAMESPACE hierarchy, so query is
`namespace:{nodeTypePath} scope:selfAndAncestors`. TodoAgent's
namespace equals the NodeType path = self match; agents at parent
namespaces (`ACME`, `""`) still inherit via the ancestor scope.
Fixes AgentChatClient_InitializeAsync_FindsTodoAgentFromNodeTypeNamespace.
4) QueryParser: `selfAndDescendants` was silently falling through to
`QueryScope.Exact` (only `selfAndAncestors`/`ancestorsAndSelf`
were aliased). Added the symmetric alias to `QueryScope.Subtree`
so the same footgun doesn't bite future callers — matches the
pattern documented in feedback_query_scope_children.md.
Suite impact: AI 442/445 in ~7m (was 437/445 with 8 race failures);
Security 225/225; both stable on repeated runs. Remaining 3 AI
failures are pre-existing flakes unrelated to these races
(Submit_SingleSubmit watcher double-dispatch, NuGet feed test,
CodeNode lastExecution stamps).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ssContext
The 2026-05-22 revert made CarryAccessContext a pass-through "until we
have a leak-free design," and the docs (AsynchronousCalls.md:1120-1137 +
CqrsAndContentAccess.md:309) kept promising "AccessContext rides for
free on every framework primitive's cold observable." Those two
realities have been diverging ever since — and every Subscribe-callback
that lands on a non-caller scheduler (workspace emission thread,
TaskPool, the new per-path Concat queue inside MeshNodeStreamCache)
has been silently reading the wrong AsyncLocal.
This commit closes the gap. CarryAccessContext now:
1. Captures `AccessService.Context` by VALUE at invocation time
(NOT CircuitContext — PostPipeline picks that up itself; the wrap
deliberately doesn't synthesise identity from a Blazor session
value the caller didn't explicitly opt into).
2. Wraps the source observable so every OnNext / OnError /
OnCompleted callback is delivered inside an
AccessService.SwitchAccessContext(captured) `using` scope.
3. Disposes the scope as the callback returns — AsyncLocal is
touched ONLY for the duration of the subscriber's body, never
stamped into the surrounding logical execution context. This
closes the McpUpdate user1/user2 cross-contamination bug that
drove the 2026-05-22 revert (the previous impl called
access.SetContext(captured) without restoring, so the captured
value leaked into the caller's logical execution context
indefinitely).
Both IServiceProvider and AccessService overloads now use the same
per-callback RestoringObserver implementation; the AccessService
overload short-circuits the DI lookup when the caller already holds
a reference.
Tests:
- test/MeshWeaver.Messaging.Hub.Test/AccessContextSurvivesSubscribeTest.cs
Rewrites the old "PassThrough_Does_Not_Restore" test into
"Captured_Context_Restored_Per_Wrap_Even_After_AmbientCleared" —
asserts the new per-callback restore AND the no-leak contract
(after all callbacks return, the test thread's AsyncLocal must
be back to what it was before any emission).
- test/MeshWeaver.Security.Test/MeshNodeCacheIdentityTest.cs
Adds two new canaries for the cross-cutting boundary:
* CacheUpdate_Concat_PreservesCallerIdentity — the per-path
Concat queue added in 1787345 was the most acute gap; the
Subject → Concat → Subscribe chain runs the inner observable
on a ThreadPool thread, so without the wrap the OnNext
callback observes null/sync identity, never the caller.
* CacheUpdate_AfterCallerScopeDisposed_StillCarriesCapturedIdentity —
pins the capture-by-value semantic (Subscribing after the
caller's using-scope is disposed must still observe the
captured identity, NOT whatever ambient ended up on
AsyncLocal post-dispose).
Verification:
- All 6 AccessContextSurvivesSubscribeTest tests green (5 unchanged,
1 renamed + rewritten).
- All 227 Security.Test green locally (incl. the 2 new cache canaries).
- AI test suite 445/445 green at 8m14s — previously failing CI
canaries (MeshPluginTest.FullCrudWorkflow, ThreadStreamingIdentityTest.SubmitMessage_*,
LinkedInTelemetryImport, SubThreadHangRepro x2, LayoutAreaIdentityTest.AuthorizedUser_*)
all pass under this wrap.
Audit deliverables (referenced by C:\Users\RolandBuergi\.claude\plans\swift-tinkering-melody.md):
C:/tmp/claude/identity-audit/identity-boundary-audit.md
C:/tmp/claude/identity-audit/asynccalls-vs-impl.md
C:/tmp/claude/identity-audit/identity-test-coverage.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… Exec/Compile watcher identity
The recurring silent-overwrite bug behind AppendUserInput / CheckInbox /
ThreadStreamingIdentity flakes traces to the same shape:
workspace.GetMeshNodeStream().Update(node =>
{
var t = node.Content as MeshThread ?? new MeshThread(); // ← silent overwrite
return node with { Content = t with { ... } };
});
When `Content` arrives as a raw `JsonElement` (file-system / Postgres /
Cosmos all round-trip through JSON serialisation; only InMemory keeps the
typed instance), the `as MeshThread` cast returns null and the
`?? new MeshThread()` fallback overwrites every other field on the node
with defaults (Status=Idle, pending={}, etc.). The next stream.Update then
persists that default-valued thread — silent data corruption.
Fix: every emission and Update lambda passing through
`MeshNodeStreamHandle` is now round-tripped through the workspace's
`JsonSerializerOptions` at the boundary. Two pieces:
* Subscribe path: a `TypedContentObserver` between the underlying sync
stream and the subscriber deserialises any `JsonElement` Content to
its registered domain type via the workspace's polymorphic
`$type` discriminator. No-op for already-typed Content.
* Update path: the caller's lambda is wrapped so the input is typed
(deserialised if needed) before `update(node)` is called. The post-
update emission also goes through the typed converter so callers
chaining `.Select(node => node.Content as MyType)` get the same
typed shape as Subscribe. (No outbound serialisation: the downstream
cold pipeline runs `SerializeToNode` itself for cross-hub patches,
and OWN-path equality dedup in the data source breaks when we force
a serialise-deserialise round-trip on every write.)
Eliminates the `?? new TFoo()` antipattern across every callsite: when
Content is genuinely absent or wrong-shaped the cast fails cleanly and
the lambda can return `node` unchanged, no silent overwrite.
Two helpers exposed for reuse by other primitives needing the same shape
guarantee: `MeshNodeStreamHandle.EnsureTypedContent(node, options)` and
`MeshNodeStreamHandle.EnsureSerialisedContent(node, options)`.
Watchers — applying the AccessContext propagation rule:
* `ThreadExecution.InstallExecRoundWatcher` — DispatchAfterClaim
creates satellite cells and posts cross-hub messages, all of which
must be attributed to the thread owner (not the cache hub's emission
identity). Wraps in `using AccessContextScope.FromNode(node, ...)`
so every downstream write rides under thread.CreatedBy. The access
check that gates the dispatch already happened (user without thread
access can't flip Status to StartingExecution).
* `NodeTypeCompilationHelpers.InstallCompileWatcher` — compile runs
under SYSTEM identity, by design. Wraps in
`using AccessContextScope.AsSystem(accessService)` so the
DispatchCompileTrigger post lands at the handler with
delivery.AccessContext = system-security; every internal write
inside the activity (read source files across all users, write the
activity log, emit the assembly) then bypasses RLS. The access
check is upstream — the user has to be permitted to flip
RequestedReleaseAt on the NodeType's MeshNode.
* `ThreadSubmission.InstallServerWatcher` — claim flip is an OWN
update, no cross-hub, no RLS gate inside the action block.
No scope needed; comment added to clarify the rule.
New helper: `MeshWeaver.Mesh.Security.AccessContextScope` (Mesh.Contract)
with `FromNode(node, access)` and `AsSystem(access)` factories — the
two operation classes the codebase needs.
Docs updated:
* CqrsAndContentAccess.md — new section "Content is always typed at
the GetMeshNodeStream boundary" with the bad/good comparison.
* AsynchronousCalls.md — same rule cross-referenced from the cold-
write contract section.
Verification:
* AI suite: 444/445 (was 9 failures pre-fix). Remaining 1 is
CheckInbox_MultiplePending — a pre-existing rapid-OWN-update race
where 3 concurrent AppendUserInput calls collide on the data
source's action block. Not addressed in this commit (separate
concurrent-write design).
* Identity-canary tests still green: CacheUpdate_Concat_PreservesCallerIdentity
+ CacheUpdate_AfterCallerScopeDisposed_StillCarriesCapturedIdentity
+ the 6 AccessContextSurvivesSubscribeTest tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
77 commits of long-running work on
bug_fix— grouped by theme:MeshWeaver.Social+ LinkedIn publisher + scheduled publishing pipeline (engine/queue/stats), LinkedIn OAuth connect + past-post ingest in Memex portal, per-user linked-account menu items.#r "nuget:Pkg, Version"at the top of_Source/*.csresolves via public NuGet.Protocol without an SDK on the container. Same resolver serves interactive markdown code cells.FileSystemPersistenceService.MoveNodeAsyncruns per-descendantWriteAsync/DeleteAsyncthroughTask.WhenAll; newMeshOperationOptions(defaultTimeout = 30s) +WithMeshOperationTimeout(TimeSpan)override;HandleMoveNodeRequestchains.Timeout()on the persistence Observable so a stuck adapter can't hang the caller. Prod repro: DAV2026 subtree move that took 240 s and killed the MCP session — now bounded.CompilationCacheService,_Source/edit re-invalidates owning NodeType, cross-silo broadcast viaMeshChangeFeed, grain-dispose on node delete, live "Compiling … (Ns)" progress inLayoutAreaView.Category(falls back toNodeType), reactive Children catalog, self-as-default create location for non-NodeType nodes, sample orgs →Markdownfor search visibility.MeshChangeFeedevents, resubscribe on owner dispose,DeleteLayoutAreaemits a placeholder immediately and times out slow streams.IAsyncEnumerableaggregator fixes (satellite-safeGatherInputsAsync), xunit methodTimeout 30 s → 60 s, Anthropic Opus bump, icon generator, etc.New test suites (selected)
test/MeshWeaver.Persistence.Test/MoveNodeRecursiveTest.cs— 10 tests: recursion, parallelism, source missing / target exists / storage throws / cancellation (all must not hang), RxTimeout()contract, default-30s config.test/MeshWeaver.Social.Test/*—InMemoryPublishQueueTest,LinkedInPublisherEngagementTest,PostStatsRefresherTest,ScheduledPostPublisherTest,FakePublisher.test/MeshWeaver.Persistence.Test/WorkspaceCacheEvictionTest.cs,ResubscribeOnOwnerDisposeTest.cs,DeleteLayoutAreaIntegrationTest.cs.test/MeshWeaver.Markdown.Test/PathUtilsTest.cs,test/MeshWeaver.MathDemo.Test/MatrixViewsTest.cs.Contributors
dist/cleanup, fix: sample orgs invisible in search due to wrong NodeType #94 sample-org search-visibility fixUpstream already merged into this branch
refactor: reactive persistence — IMeshStorage writes return IObservable(merged)Test plan
dotnet buildsucceedsdotnet test test/MeshWeaver.Persistence.Test --filter MoveNodeRecursiveTest— 10/10 green (~8 s)dotnet test test/MeshWeaver.Hosting.Monolith.Test --filter MoveNodeAsync— 5/5 green (regression guard)dotnet test test/MeshWeaver.Social.Test— publish queue / scheduling / stats green_Source/*.csusing#r "nuget:MathNet.Numerics, 5.0.0"— compiles & renders (cold + warm cache)/social/connect/linkedin→ profile linked; menu shows connected accountScheduledPostPublisher→ LinkedIn publisher posts;PostStatsRefresherpulls stats🤖 Generated with Claude Code