Skip to content

feat(python-notebook-migration): add database tables for Notebook Migration tool#5055

Open
zyratlo wants to merge 10 commits into
apache:mainfrom
zyratlo:migration-tool-database-tables
Open

feat(python-notebook-migration): add database tables for Notebook Migration tool#5055
zyratlo wants to merge 10 commits into
apache:mainfrom
zyratlo:migration-tool-database-tables

Conversation

@zyratlo
Copy link
Copy Markdown
Contributor

@zyratlo zyratlo commented May 13, 2026

What changes were proposed in this PR?

This PR adds two new tables to the database, notebook and workflow_notebook_mapping. These tables will be used in the new Python Notebook Migration tool to store the user-uploaded notebook and the generated mapping between the notebook and workflow.

Any related issues, documentation, discussions?

Closes #5054
The parent issue is #4301

Schema

image

How was this PR tested?

New tables were manually confirmed to be created successfully and usable.

Was this PR authored or co-authored using generative AI tooling?

No

@github-actions github-actions Bot added the ddl-change Changes to the TexeraDB DDL label May 13, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 48.90%. Comparing base (d8c254c) to head (89015d3).

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5055      +/-   ##
============================================
- Coverage     48.95%   48.90%   -0.06%     
+ Complexity     2377     2372       -5     
============================================
  Files          1048     1046       -2     
  Lines         40270    40185      -85     
  Branches       4272     4261      -11     
============================================
- Hits          19714    19652      -62     
+ Misses        19402    19380      -22     
+ Partials       1154     1153       -1     
Flag Coverage Δ *Carryforward flag
access-control-service 39.53% <ø> (ø)
agent-service 33.76% <ø> (ø) Carriedforward from f967080
amber 51.52% <ø> (-0.05%) ⬇️
computing-unit-managing-service 0.00% <ø> (ø)
config-service 0.00% <ø> (ø)
file-service 37.99% <ø> (ø)
frontend 40.53% <ø> (-0.12%) ⬇️ Carriedforward from f967080
python 90.79% <ø> (ø) Carriedforward from f967080
workflow-compiling-service 56.81% <ø> (ø)

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mengw15 mengw15 self-requested a review May 13, 2026 20:47
@mengw15
Copy link
Copy Markdown
Contributor

mengw15 commented May 18, 2026

Please check this pr: #4401, for database changes

@zyratlo
Copy link
Copy Markdown
Contributor Author

zyratlo commented May 18, 2026

Please check this pr: #4401, for database changes

I have addressed this issue

Copy link
Copy Markdown
Contributor

@mengw15 mengw15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments

Comment thread sql/updates/23.sql Outdated
Comment thread sql/texera_ddl.sql
Comment on lines +441 to +447
CREATE TABLE IF NOT EXISTS notebook
(
nid SERIAL NOT NULL PRIMARY KEY,
wid INT NOT NULL,
notebook JSONB NOT NULL,
FOREIGN KEY (wid) REFERENCES workflow(wid) ON DELETE CASCADE
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cardinality question: nothing in the schema prevents inserting two notebook rows for the same wid, but the parent issue (#4301, demo #5 "when the user reopens a workflow that was generated from a notebook, it will also reopen the notebook") reads like a 1:1 relationship. If a workflow is supposed to have at most one notebook, would a UNIQUE (wid) on notebook (or making wid the PK in place of nid) be safer than relying on application code to enforce it?

Copy link
Copy Markdown
Contributor Author

@zyratlo zyratlo May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current prototype implementation, you are correct in that a workflow will only ever have one notebook (a 1:1 relationship). However, the reason the schema is designed this way is because we wanted to allow future work to make the notebook editable and savable, which would create the situation where multiple notebooks (or versions of the same notebook) are linked to the same workflow. This is why we didn't make wid UNIQUE here.

Alternatively, another option is that we can make wid UNIQUE in this PR, and when the aforementioned future work is done then we can make the schema change to allow multiple notebooks for a workflow. What do you think?

Comment thread sql/texera_ddl.sql
Comment on lines +441 to +447
CREATE TABLE IF NOT EXISTS notebook
(
nid SERIAL NOT NULL PRIMARY KEY,
wid INT NOT NULL,
notebook JSONB NOT NULL,
FOREIGN KEY (wid) REFERENCES workflow(wid) ON DELETE CASCADE
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main read pattern for notebook seems to be "given a workflow, find its notebook" (e.g., reopening a workflow → load its notebook). Postgres doesn't auto-create an index on FK columns, so this lookup would currently sequential-scan the table. Worth a CREATE INDEX ON notebook(wid)? (If a UNIQUE(wid) is added per the previous comment, that already creates an index and this is moot.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that indexing wid would help here. Since this is tied to the discussion on whether to make wid UNIQUE, I will wait for our decision on that before making changes. If we decide to make wid UNIQUE, then no further work needs to be done here. If we keep wid non-UNIQUE, I will add the CREATE INDEX.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ddl-change Changes to the TexeraDB DDL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Notebook Migration] 1. Add Database Tables

3 participants