feat(python-notebook-migration): add database tables for Notebook Migration tool#5055
feat(python-notebook-migration): add database tables for Notebook Migration tool#5055zyratlo wants to merge 10 commits into
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5055 +/- ##
============================================
- Coverage 48.95% 48.90% -0.06%
+ Complexity 2377 2372 -5
============================================
Files 1048 1046 -2
Lines 40270 40185 -85
Branches 4272 4261 -11
============================================
- Hits 19714 19652 -62
+ Misses 19402 19380 -22
+ Partials 1154 1153 -1
*This pull request uses carry forward flags. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Please check this pr: #4401, for database changes |
I have addressed this issue |
| CREATE TABLE IF NOT EXISTS notebook | ||
| ( | ||
| nid SERIAL NOT NULL PRIMARY KEY, | ||
| wid INT NOT NULL, | ||
| notebook JSONB NOT NULL, | ||
| FOREIGN KEY (wid) REFERENCES workflow(wid) ON DELETE CASCADE | ||
| ); |
There was a problem hiding this comment.
Cardinality question: nothing in the schema prevents inserting two notebook rows for the same wid, but the parent issue (#4301, demo #5 "when the user reopens a workflow that was generated from a notebook, it will also reopen the notebook") reads like a 1:1 relationship. If a workflow is supposed to have at most one notebook, would a UNIQUE (wid) on notebook (or making wid the PK in place of nid) be safer than relying on application code to enforce it?
There was a problem hiding this comment.
In the current prototype implementation, you are correct in that a workflow will only ever have one notebook (a 1:1 relationship). However, the reason the schema is designed this way is because we wanted to allow future work to make the notebook editable and savable, which would create the situation where multiple notebooks (or versions of the same notebook) are linked to the same workflow. This is why we didn't make wid UNIQUE here.
Alternatively, another option is that we can make wid UNIQUE in this PR, and when the aforementioned future work is done then we can make the schema change to allow multiple notebooks for a workflow. What do you think?
| CREATE TABLE IF NOT EXISTS notebook | ||
| ( | ||
| nid SERIAL NOT NULL PRIMARY KEY, | ||
| wid INT NOT NULL, | ||
| notebook JSONB NOT NULL, | ||
| FOREIGN KEY (wid) REFERENCES workflow(wid) ON DELETE CASCADE | ||
| ); |
There was a problem hiding this comment.
The main read pattern for notebook seems to be "given a workflow, find its notebook" (e.g., reopening a workflow → load its notebook). Postgres doesn't auto-create an index on FK columns, so this lookup would currently sequential-scan the table. Worth a CREATE INDEX ON notebook(wid)? (If a UNIQUE(wid) is added per the previous comment, that already creates an index and this is moot.)
There was a problem hiding this comment.
I agree that indexing wid would help here. Since this is tied to the discussion on whether to make wid UNIQUE, I will wait for our decision on that before making changes. If we decide to make wid UNIQUE, then no further work needs to be done here. If we keep wid non-UNIQUE, I will add the CREATE INDEX.
What changes were proposed in this PR?
This PR adds two new tables to the database,
notebookandworkflow_notebook_mapping. These tables will be used in the new Python Notebook Migration tool to store the user-uploaded notebook and the generated mapping between the notebook and workflow.Any related issues, documentation, discussions?
Closes #5054
The parent issue is #4301
Schema
How was this PR tested?
New tables were manually confirmed to be created successfully and usable.
Was this PR authored or co-authored using generative AI tooling?
No