Skip to content

Add sparse Lanczos SVD solver#3034

Open
Intron7 wants to merge 4 commits into
NVIDIA:mainfrom
Intron7:feat/add-lanczos-svds
Open

Add sparse Lanczos SVD solver#3034
Intron7 wants to merge 4 commits into
NVIDIA:mainfrom
Intron7:feat/add-lanczos-svds

Conversation

@Intron7

@Intron7 Intron7 commented May 21, 2026

Copy link
Copy Markdown
Contributor

As discussed previously with @cjnolet I'm also adding my Lanczos SVD solver for sparse CSR matrices.

This is the more precise sparse SVD path next to the existing randomized solver. The solver repeatedly applies A @ v and A.T @ u to build Krylov bases, computes the SVD of the small bidiagonal problem, uses the resulting Ritz vectors to identify converged singular triplets, locks those vectors, and restarts on the remaining unconverged part. It also uses full reorthogonalization and a final A @ V refinement step to improve singular values and left singular vectors.

Compared with randomized SVD, this is aimed at quality: clustered spectra, slow singular-value decay, near-rank-deficient inputs, and PCA workloads where ARPACK-like accuracy matters.

Ran the #2999 -style row sweep on the same singlecell dataset, k=50, n_oversamples=10, n_power_iters=2, best-of-3 GPU timings. GPU: RTX PRO 6000 Blackwell.



  ┌──────┬───────┬─────────────────┬──────────────┬─────────────────────────┬─────────────────────┬──────────────────┐
  │ rows │   nnz │ raft randomized │ raft Lanczos │          Lanczos / rand │ randomized residual │ Lanczos residual │
  ├──────┼───────┼─────────────────┼──────────────┼─────────────────────────┼─────────────────────┼──────────────────┤
  │  50k │  101M │          0.180s │       0.252s │            1.40x slower │            3.11e-02 │         1.55e-07 │
  │ 200k │  400M │          0.698s │       1.328s │            1.90x slower │            3.12e-02 │         1.57e-07 │
  │ 500k │ 1.02B │          1.776s │       1.763s │          basically tied │            3.06e-02 │         1.57e-07 │
  │ 982k │ 2.01B │          3.536s │       3.423s │ Lanczos slightly faster │            3.09e-02 │         1.56e-07 │
  └──────┴───────┴─────────────────┴──────────────┴─────────────────────────┴─────────────────────┴──────────────────┘

Using the 2999 CPU baselines for context:

  ┌──────┬─────────────┬──────────────┬─────────────────────────────┬──────────────────────────┐
  │ rows │ sklearn CPU │ scipy ARPACK │ randomized speedup vs scipy │ Lanczos speedup vs scipy │
  ├──────┼─────────────┼──────────────┼─────────────────────────────┼──────────────────────────┤
  │  50k │       7.38s │       18.69s │                        104x │                      74x │
  │ 200k │      25.61s │       62.49s │                         90x │                      47x │
  │ 500k │      64.97s │      153.40s │                         86x │                      87x │
  │ 982k │     126.16s │      307.04s │                         87x │                      90x │
  └──────┴─────────────┴──────────────┴─────────────────────────────┴──────────────────────────┘

@copy-pr-bot

copy-pr-bot Bot commented May 21, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@aamijar aamijar added non-breaking Non-breaking change feature request New feature or request labels May 26, 2026
@aamijar aamijar moved this to In Progress in Unstructured Data Processing May 26, 2026
@aamijar

aamijar commented May 26, 2026

Copy link
Copy Markdown
Contributor

Hi @Intron7, I haven't looked at this PR closely yet, but one thing to think about is that we should try and reuse parts of the existing lanczos eigensolver as much as possible. They should have some things in common right?

@aamijar

aamijar commented May 27, 2026

Copy link
Copy Markdown
Contributor

/ok to test eabeaa4

@Intron7

Intron7 commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

@aamijar the algorithm is different even is the name is similar. The two paths share the Lanczos name but the kernels are different algorithms: the eigensolver builds a symmetric tridiagonal via a one-vector recurrence and ritz-solves with syevd; the SVD builds a bidiagonal via Golub-Kahan with two coupled bases (A @ v and Aᵀ @ u) and ritz-solves with gesvdj. Restart is also different, the SVD path locks converged singular triplets and restarts on the unconverged subspace.
The realistic shared surface is the reorthogonalization helpers (CGS2/MGS2) and the cublas wrapper calls. The existing lanczos.cuh does its reorthogonalization inline against its own single-vector layout, so factoring CGS2/MGS2 into a shared utility would require touching the existing eigensolver too. I'd rather land this PR as-is and do a separate refactor PR to extract a shared bidiag_reorth /lanczos_reorth utility if you want.

@cjnolet

cjnolet commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

/ok to test b80ca30

Comment thread cpp/include/raft/sparse/solver/solver_types.hpp
Comment thread cpp/include/raft/sparse/solver/lanczos_svds.cuh
@cjnolet

cjnolet commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

/ok to test 93e9a19

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants