feat(python-notebook-migration): add database tables for Notebook Migration tool by zyratlo · Pull Request #5055 · apache/texera

zyratlo · 2026-05-13T19:15:59Z

What changes were proposed in this PR?

This PR adds two new tables to the database, notebook and workflow_notebook_mapping. These tables will be used in the new Python Notebook Migration tool to store the user-uploaded notebook and the generated mapping between the notebook and workflow.

Any related issues, documentation, discussions?

Closes #5054
The parent issue is #4301

Schema

How was this PR tested?

New tables were manually confirmed to be created successfully and usable.

Was this PR authored or co-authored using generative AI tooling?

No

codecov-commenter · 2026-05-13T20:21:01Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 48.90%. Comparing base (d8c254c) to head (89015d3).

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #5055      +/-   ##
============================================
- Coverage     48.95%   48.90%   -0.06%     
+ Complexity     2377     2372       -5     
============================================
  Files          1048     1046       -2     
  Lines         40270    40185      -85     
  Branches       4272     4261      -11     
============================================
- Hits          19714    19652      -62     
+ Misses        19402    19380      -22     
+ Partials       1154     1153       -1

Flag	Coverage Δ		*Carryforward flag
access-control-service	`39.53% <ø> (ø)`
agent-service	`33.76% <ø> (ø)`		Carriedforward from f967080
amber	`51.52% <ø> (-0.05%)`	⬇️
computing-unit-managing-service	`0.00% <ø> (ø)`
config-service	`0.00% <ø> (ø)`
file-service	`37.99% <ø> (ø)`
frontend	`40.53% <ø> (-0.12%)`	⬇️	Carriedforward from f967080
python	`90.79% <ø> (ø)`		Carriedforward from f967080
workflow-compiling-service	`56.81% <ø> (ø)`

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

mengw15 · 2026-05-18T04:38:42Z

Please check this pr: #4401, for database changes

…ordance with PR apache#4401

zyratlo · 2026-05-18T21:28:01Z

Please check this pr: #4401, for database changes

I have addressed this issue

mengw15

Left some comments

mengw15 · 2026-05-25T08:46:00Z

+CREATE TABLE IF NOT EXISTS notebook
+(
+    nid         SERIAL  NOT NULL PRIMARY KEY,
+    wid         INT     NOT NULL,
+    notebook    JSONB   NOT NULL,
+    FOREIGN KEY (wid) REFERENCES workflow(wid) ON DELETE CASCADE
+);


Cardinality question: nothing in the schema prevents inserting two notebook rows for the same wid, but the parent issue (#4301, demo #5 "when the user reopens a workflow that was generated from a notebook, it will also reopen the notebook") reads like a 1:1 relationship. If a workflow is supposed to have at most one notebook, would a UNIQUE (wid) on notebook (or making wid the PK in place of nid) be safer than relying on application code to enforce it?

In the current prototype implementation, you are correct in that a workflow will only ever have one notebook (a 1:1 relationship). However, the reason the schema is designed this way is because we wanted to allow future work to make the notebook editable and savable, which would create the situation where multiple notebooks (or versions of the same notebook) are linked to the same workflow. This is why we didn't make wid UNIQUE here.

Alternatively, another option is that we can make wid UNIQUE in this PR, and when the aforementioned future work is done then we can make the schema change to allow multiple notebooks for a workflow. What do you think?

mengw15 · 2026-05-25T08:46:00Z

+CREATE TABLE IF NOT EXISTS notebook
+(
+    nid         SERIAL  NOT NULL PRIMARY KEY,
+    wid         INT     NOT NULL,
+    notebook    JSONB   NOT NULL,
+    FOREIGN KEY (wid) REFERENCES workflow(wid) ON DELETE CASCADE
+);


The main read pattern for notebook seems to be "given a workflow, find its notebook" (e.g., reopening a workflow → load its notebook). Postgres doesn't auto-create an index on FK columns, so this lookup would currently sequential-scan the table. Worth a CREATE INDEX ON notebook(wid)? (If a UNIQUE(wid) is added per the previous comment, that already creates an index and this is moot.)

I agree that indexing wid would help here. Since this is tied to the discussion on whether to make wid UNIQUE, I will wait for our decision on that before making changes. If we decide to make wid UNIQUE, then no further work needs to be done here. If we keep wid non-UNIQUE, I will add the CREATE INDEX.

…EATE TABLE IF NOT EXISTS

added notebook and workflow_notebook_mapping tables

e60310e

mengw15 assigned zyratlo May 13, 2026

github-actions Bot added the ddl-change Changes to the TexeraDB DDL label May 13, 2026

mengw15 self-requested a review May 13, 2026 20:47

Merge branch 'main' into migration-tool-database-tables

c8fa486

zyratlo and others added 2 commits May 18, 2026 14:13

Merge branch 'main' into migration-tool-database-tables

986bd38

removed "\c texera_db" in sql file and added changeSet element in acc…

9c63915

…ordance with PR apache#4401

zyratlo and others added 3 commits May 18, 2026 14:50

added newline and changed changeSet id to match update SQL file

6deb604

Merge branch 'main' into migration-tool-database-tables

4387ad0

Merge branch 'main' into migration-tool-database-tables

914f6b5

mengw15 reviewed May 25, 2026

View reviewed changes

zyratlo and others added 3 commits May 26, 2026 09:59

Merge branch 'main' into migration-tool-database-tables

f4e019a

remove DROP TABLE statements from sql/updates/23.sql and switch to CR…

bc8381c

…EATE TABLE IF NOT EXISTS

Merge branch 'main' into migration-tool-database-tables

89015d3

zyratlo mentioned this pull request May 28, 2026

feat(python-notebook-migration): add notebook-migration-service microservice in backend #5258

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(python-notebook-migration): add database tables for Notebook Migration tool#5055

feat(python-notebook-migration): add database tables for Notebook Migration tool#5055
zyratlo wants to merge 10 commits into
apache:mainfrom
zyratlo:migration-tool-database-tables

zyratlo commented May 13, 2026

Uh oh!

codecov-commenter commented May 13, 2026 •

edited

Loading

Uh oh!

mengw15 commented May 18, 2026

Uh oh!

zyratlo commented May 18, 2026

Uh oh!

mengw15 left a comment

Uh oh!

Uh oh!

mengw15 May 25, 2026

Uh oh!

zyratlo May 26, 2026 •

edited

Loading

Uh oh!

mengw15 May 25, 2026

Uh oh!

zyratlo May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zyratlo commented May 13, 2026

What changes were proposed in this PR?

Any related issues, documentation, discussions?

Schema

How was this PR tested?

Was this PR authored or co-authored using generative AI tooling?

Uh oh!

codecov-commenter commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mengw15 commented May 18, 2026

Uh oh!

zyratlo commented May 18, 2026

Uh oh!

mengw15 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mengw15 May 25, 2026

Choose a reason for hiding this comment

Uh oh!

zyratlo May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mengw15 May 25, 2026

Choose a reason for hiding this comment

Uh oh!

zyratlo May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented May 13, 2026 •

edited

Loading

zyratlo May 26, 2026 •

edited

Loading