Conversation
Expose the previously-hardcoded request concurrency cap as a flag on the `databricks sync` command. Defaults to the existing limit (20) so behavior is unchanged when the flag is omitted; values < 1 are rejected. Co-authored-by: Isaac
Approval status: pending
|
pietern
left a comment
There was a problem hiding this comment.
Thanks for the PR!
Small suggestion on naming the flag and vars.
| cmd.Flags().StringVar(&f.excludeFrom, "exclude-from", "", "file containing patterns to exclude from sync (one pattern per line)") | ||
| cmd.Flags().StringVar(&f.includeFrom, "include-from", "", "file containing patterns to include to sync (one pattern per line)") | ||
| cmd.Flags().BoolVar(&f.dryRun, "dry-run", false, "simulate sync execution without making actual changes") | ||
| cmd.Flags().IntVar(&f.maxConcurrentRequests, "max-concurrent-requests", sync.MaxRequestsInFlight, "maximum number of concurrent requests to the workspace") |
There was a problem hiding this comment.
There is precedent for configuring concurrency in fs cp in #4132.
Please mirror the flag name, handling, etc, for consistency.
|
|
||
| DryRun bool | ||
|
|
||
| MaxConcurrentRequests int |
Match the name of `fs cp --concurrency` (#4132) for consistency, and expose the same flag on `databricks bundle sync` so both sync entry points are tunable. Co-authored-by: Isaac
|
@igrekun I applied the suggestions to the PR, please take a look and confirm this matches what you need. Also includes the same flag to |
Co-authored-by: Isaac
|
My concern is the validation order for Right now go run . --profile __missing_profile_for_review__ sync --concurrency 0 /tmp /Workspace/tmpThis returns the missing-profile error instead of I think we should move |
Changes
Expose the previously-hardcoded request concurrency cap as a
--concurrencyflag ondatabricks syncanddatabricks bundle sync. Defaults to the existing limit (20) so behavior is unchanged when the flag is omitted; values < 1 are rejected.The flag name matches
databricks fs cp --concurrency(#4132).Why
When uploading multiple files ~5mb size, high concurrency can trigger stream timeouts.
Tests