Skip to content

Commit 9c85716

Browse files
committed
Avoid downloading test PDFs multiple times
It's somewhat common for multiple test-cases to use the same PDF, for example tests for both `eq` and `text` with one PDF. Currently the logic in `downloadManifestFiles` doesn't handle duplicate test PDFs, which leads to wasted time/resources by downloading the same PDF more than once. Naturally the effect of this is especially bad when downloading all linked PDFs. Total number of test PDFs downloaded: - 507, with `master`. - 447, with this patch.
1 parent ae70a5d commit 9c85716

1 file changed

Lines changed: 20 additions & 7 deletions

File tree

test/downloadutils.mjs

Lines changed: 20 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -40,14 +40,27 @@ async function downloadFile(file, url) {
4040
}
4141

4242
async function downloadManifestFiles(manifest) {
43-
const links = manifest
44-
.filter(item => item.link && !fs.existsSync(item.file))
45-
.map(item => {
46-
const url = fs.readFileSync(`${item.file}.link`).toString().trimEnd();
47-
return { file: item.file, url };
48-
});
43+
// Keep track of file identifiers to remove any duplicates,
44+
// since multiple test-cases may use the same PDF.
45+
const seenFiles = new Set();
4946

50-
for (const { file, url } of links) {
47+
const links = new Map(
48+
manifest
49+
.filter(({ link, file }) => {
50+
if (!link || seenFiles.has(file)) {
51+
return false;
52+
}
53+
seenFiles.add(file);
54+
return !fs.existsSync(file);
55+
})
56+
.map(({ file }) => {
57+
const url = fs.readFileSync(`${file}.link`).toString().trimEnd();
58+
return [file, url];
59+
})
60+
);
61+
seenFiles.clear();
62+
63+
for (const [file, url] of links) {
5164
console.log(`Downloading ${url} to ${file}...`);
5265
try {
5366
await downloadFile(file, url);

0 commit comments

Comments
 (0)