Add Schur completment and its mat-free mode by zitongzhan · Pull Request #35 · pypose/bae

zitongzhan · 2026-05-24T03:07:51Z

This pull request introduces significant improvements to the optimizer infrastructure, focusing on enhanced memory profiling, a new Schur complement optimizer, and better support for matrix-free operations.

Optimizer Enhancements

Added a new Schur optimizer class in bae.optim.optimizer, implementing the Schur complement method with support for both standard and matrix-free normal equations, block Jacobi preconditioning, and efficient memory usage.
Updated the LM optimizer to support a matrix_free_normal mode, allowing for more efficient computation and memory usage in large-scale problems.
Add a custom TrustRegion class that supports Warp, especially for use with the Schur optimizer.

Sparse Matrix and PyOps Improvements

Improved sparse matrix operations, including fixes to inv_op for correct tensor creation and a new test block in py_ops.py for diagonal operations on CUDA.

Added a section for future plans including a new backend for distributed solver.

… schur-matmul

…l path runs

gemini-code-assist

Code Review

This pull request introduces high-performance Triton kernels for sparse BSR operations, including matrix-vector multiplication, matrix-matrix multiplication, and transposition. It also implements a matrix-free NormalMatVec operator and a new Schur complement-based optimizer to improve the efficiency of bundle adjustment tasks. The bundle adjustment example was updated with CUDA memory snapshotting and Warp mempool reporting. Review feedback highlights a critical issue where in-place diagonal modifications in the LM and Schur optimizers cause damping factors to accumulate incorrectly during step rejections. Additionally, the reviewer recommends removing performance-hindering torch.cuda.empty_cache() calls, addressing potential divisions by zero in the Conjugate Gradient solver, and cleaning up redundant or commented-out code.

gemini-code-assist · 2026-05-24T03:16:21Z

+                    diag_scale *= 1.0 + pg['damping']
+                    A.set_damping(diag_scale - 1.0)
+                else:
+                    diagonal_op_(A, op=partial(torch.mul, other=1+pg['damping']))


The diagonal_op_ function performs an in-place multiplication on the matrix A. Since this is inside the while loop, if a step is rejected and the loop repeats, the damping will be applied cumulatively (e.g., $(1+\lambda_1)(1+\lambda_2)...$) instead of being applied to the original $J^T J$ diagonal. This deviates from the standard Levenberg-Marquardt algorithm and can lead to excessively aggressive damping. Consider cloning the matrix or resetting the diagonal before applying damping in each iteration.

Resolve in future PR

gemini-code-assist · 2026-05-24T03:16:21Z

+                R = R.tensor()
+            else:
+                R = R.detach()
+            torch.cuda.empty_cache()


Calling torch.cuda.empty_cache() inside the optimization step is generally discouraged as it triggers a GPU synchronization and can significantly degrade performance. If memory management is a concern, it's better to optimize tensor lifecycles or use a dedicated memory pool. If this was added for debugging memory usage, it should be removed before merging.

gemini-code-assist · 2026-05-24T03:16:21Z

+
+        Ap = matvec(p)
+        Ap_flat = Ap.reshape(-1)
+        alpha = (rz / torch.dot(p.reshape(-1), Ap_flat)).item()


Potential division by zero if torch.dot(p.reshape(-1), Ap_flat) is zero (e.g., if the matrix is singular or not positive definite). While $J^T J$ is positive semi-definite, numerical issues or zero curvature directions could cause this to be zero. Consider adding a small epsilon or a check for numerical stability.

gemini-code-assist · 2026-05-24T03:16:21Z

+
+        rz_new = torch.dot(r_flat, z_flat)
+        beta = (rz_new / rz).item()
+        p.mul_(beta).add_(z)


Potential division by zero if rz is zero. Although the convergence check at line 692 should ideally terminate the loop if the residual is zero, a safety check for rz before division is recommended to prevent NaN values in case of numerical instability.

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

…ary import line

+from pathlib import Path
+import warp as wp
+from warp import sparse as wpsparse
+from datapipes.bal_loader import get_problem, read_bal_data


+import warp as wp
+from warp import sparse as wpsparse
+from datapipes.bal_loader import get_problem, read_bal_data
+from bae.sparse.py_ops import *


Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

zitongzhan · 2026-05-28T00:27:02Z

-        "observes": dataset["points_2d"],
-        "cidx": dataset["camera_index_of_observations"],
-        "pidx": dataset["point_index_of_observations"],
+        "points_2d": dataset["points_2d"],


@SEOKWOOPARK Let's keep the diff minimal and avoid changes unrelated to the purpose of the PR

zitongzhan · 2026-05-28T01:15:31Z

Profile Summary
Profiled current ba_example.py on Venice problem-1778-993923-pre: 5,001,946 observations, 1,778 cameras, 993,923 points. I passed matrix_free_normal=True and False; current ba_example.py defaults to disabled.

Mode	Steady wall time	Main slow operators
`matrix_free_normal=True`	1.53 s	Warp BSR MV kernels inside `linear.cg`: ~1.19 s CUDA, ~84%
`matrix_free_normal=False`	1.83 s	Split between explicit Schur `warp_bsr_mm`: ~0.66 s, and CG BSR MV: ~0.68 s

Enabled
With matrix-free enabled, the bottleneck is still BSR matvec, now inside Warp CG. The hottest kernels were:

Kernel / scope	CUDA time
`bsr_mv_transpose_kernel...`	801 ms
`bsr_mv_kernel_acf84b96...`	230 ms
`bsr_mv_kernel_0d4f3dc9...`	163 ms
`jacobian`	123 ms

This corresponds to the repeated matrix-free Schur matvec in optimizer.py, especially the sparse.bsr_mv chain at lines 145-152 and the CG calls at lines 180 and 200.

Disabled
With matrix-free disabled, the cost shifts: explicit Schur construction becomes about as expensive as CG matvecs.

Kernel / scope	CUDA time
`warp_bsr_mm` scope	658 ms
`_bsr_mm_compute_values...`	611 ms
`bsr_mv_tiled_kernel...`	645 ms
`jacobian`	122 ms

That maps to explicit Schur construction at optimizer.py: WV_i = sparse.bsr_mm(W, V_i) and WVi_Wt = sparse.bsr_mm(WV_i, Wt).

zitongzhan and others added 25 commits December 15, 2025 03:12

add normal matvec and memory profiler

9490ff8

print peak cuda allocation

9c90aca

add warp memory pool report

6256e79

use A._get_Jt when matrix_free_normal

3a5ce9b

add back schur by warp's matmul

0064146

safely import cudss

acd1b3c

Add future plans section to README

91c8ade

Added a section for future plans including a new backend for distributed solver.

add normal matvec and memory profiler

19774c3

print peak cuda allocation

4ca9c86

add warp memory pool report

b71f1a3

use A._get_Jt when matrix_free_normal

d678867

add back schur by warp's matmul

d127b88

Merge branch 'schur-matmul' of github.com:zitongzhan/bae_private into…

fa9ab70

… schur-matmul

Merge remote-tracking branch 'upstream/release' into schur-matmul

6619808

Preventing TrustRegion from accepting diverging steps

3e4761d

fix(optimizer/LM): Remove redundant solver calls so matrix_free_norma…

5d9e2b2

…l path runs

feat(optim/Schur): Add Matrix-Free path and matrix_free_normal branch

e34bea2

Resolving conflict with release branch in README

5f4f093

Version up to 0.2.1

f64d00b

Fix deprecated function in Warp

40798f1

Replace Warp with Triton kernels and adjust corresponding codes

165104d

Remove codes relevant to Chunk

b305f81

Merge branch 'release' into memory-issue-swp

3a97f9e

Remove ba_helpers.py

a0b4b8b

Fix a conflict in ba_example.py

f46fb74

github-code-quality Bot found potential problems May 24, 2026

View reviewed changes

Comment thread bae/sparse/warp_wrappers.py Fixed

Comment thread bae/optim/optimizer.py Fixed

Comment thread bae/sparse/py_ops.py Fixed

gemini-code-assist Bot reviewed May 24, 2026

View reviewed changes

zitongzhan and others added 3 commits May 23, 2026 20:35

Potential fix for pull request finding 'Variable defined multiple times'

48ad787

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

Potential fix for pull request finding 'Unused local variable'

8cc6eb3

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

minimize diff

074b931

zitongzhan and others added 6 commits May 24, 2026 05:04

restore pysolvers

4746522

revert import shuffle

7f3ea3d

restore LM

d3e24d9

fix import order ba example

04908d9

Revert a version with Triton to a version with Warp

fd5cc96

Remove Triton implmentation file

35b9b79

github-code-quality Bot found potential problems May 27, 2026

View reviewed changes

Comment thread ba_example.py Fixed

Comment thread ba_example.py Fixed

Comment thread ba_example.py Fixed

Comment thread ba_example.py Fixed

Comment thread ba_example.py Fixed

Add 'final' and 'venice' dataset in ba_example.py and remove unnecess…

733c0a6

…ary import line

github-code-quality Bot found potential problems May 27, 2026

View reviewed changes

Potential fix for pull request finding 'Unused import'

c94f27a

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

zitongzhan commented May 28, 2026

View reviewed changes

Conversation

zitongzhan commented May 24, 2026

Optimizer Enhancements

Sparse Matrix and PyOps Improvements

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

zitongzhan May 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zitongzhan May 28, 2026

Choose a reason for hiding this comment

Uh oh!

zitongzhan commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants