JavaScript: Don't extract obviously generated files#19680
Conversation
Fixes two things: - The basic test should no longer extract `tst.js` (as `tst.ts` is present) - The `AutoBuild` mock did not populate `extractedFiles` correctly, which broke the logic that looks for TypeScript files with the same basename.
This is needed for the logic that skips files inside the directory specified in the `tsconfig.json` `outDir` compiler option.
There was a problem hiding this comment.
Pull Request Overview
This PR extends the JavaScript extractor to automatically skip obviously generated files, specifically:
- Skips JS variants when the corresponding TypeScript source exists.
- Skips any files under the
outDirdefined intsconfig.json.
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| change-notes/2025-06-05-skip-obviously-generated-files.md | Documents new skip behavior for generated JS and tsconfig outDir |
| javascript/extractor/test/com/semmle/js/extractor/test/AutoBuildTests.java | Adds tests for skipping derived JS files and files under outDir |
| javascript/extractor/src/com/semmle/js/extractor/AutoBuild.java | Implements filters for tsconfig outDir and TS-derived JS files |
| javascript/extractor/src/com/semmle/js/dependencies/tsconfig/TsConfigJson.java | Introduces JSON binding for tsconfig root object |
| javascript/extractor/src/com/semmle/js/dependencies/tsconfig/CompilerOptions.java | Introduces JSON binding for compilerOptions.outDir |
Comments suppressed due to low confidence (2)
javascript/extractor/test/com/semmle/js/extractor/test/AutoBuildTests.java:179
- [nitpick] The test method
basicTestnow skips JS files by default—consider renaming the test or updating its comment to reflect this changed behavior.
addFile(false, LGTM_SRC, "tst.js");
javascript/extractor/src/com/semmle/js/dependencies/tsconfig/TsConfigJson.java:4
- [nitpick] Indentation here uses 2 spaces, while the rest of the project uses 4; align this class with the existing formatting conventions.
private CompilerOptions compilerOptions;
| // ignore malformed tsconfig or missing fields | ||
| } | ||
| } | ||
| // exclude files in output directories as configured in tsconfig.json |
There was a problem hiding this comment.
[nitpick] This comment duplicates the one above; consider removing it to reduce redundancy and keep the code concise.
| // exclude files in output directories as configured in tsconfig.json |
| outDirs.add(odir); | ||
| } | ||
| } catch (Exception e) { | ||
| // ignore malformed tsconfig or missing fields |
There was a problem hiding this comment.
[nitpick] Silently swallowing all exceptions during tsconfig parsing can obscure errors; consider logging a warning or including the exception message.
| // ignore malformed tsconfig or missing fields | |
| System.err.println("Warning: Failed to parse tsconfig.json at " + cfg + ": " + e.getMessage()); |
asgerf
left a comment
There was a problem hiding this comment.
Minor nitpick otherwise LGTM
Also make the indentation in `CompilerOptions.java` more consistent.
In #19680 we added support for automatically ignoring files in the `outDir` directory as specified in the TSconfig compiler options (as these files were likely duplicates of `.ts` file we were already scanning). However, in some cases people put `outDir: "."` or even `outDir: ".."` in their configuration, which had the side effect of excluding _all_ files, leading to a failed extraction. With the changes in this PR, we now ignore any `outDir`s that are not properly contained within the source root of the code being scanned. This should prevent the files from being extracted, while still allowing us to not double-scan files in, say, a `.github` directory, as seen in some Actions workflows.
In github#19680 we added support for automatically ignoring files in the `outDir` directory as specified in the TSconfig compiler options (as these files were likely duplicates of `.ts` file we were already scanning). However, in some cases people put `outDir: "."` or even `outDir: ".."` in their configuration, which had the side effect of excluding _all_ files, leading to a failed extraction. With the changes in this PR, we now ignore any `outDir`s that are not properly contained within the source root of the code being scanned. This should prevent the files from being extracted, while still allowing us to not double-scan files in, say, a `.github` directory, as seen in some Actions workflows.
Extends the JavaScript extractor so that certain files are skipped automatically during extraction.
Specifically, it will skip
tsconfig.jsonfile (if any).