Draft: GPT2 training on MCQ med data by mina5rovic · Pull Request #1111 · epfml/disco

mina5rovic · 2026-04-16T11:51:28Z

No description provided.

…k for model to contxt 512

…nk free

JulienVig

The logic of the benchmark and the memorization looks good to me! Just had some performance comments

JulienVig · 2026-06-01T09:16:30Z

+    const promptTokens = tokenizer.tokenize(prompt).toArray();
+    const fullTokens = tokenizer.tokenize(prompt + continuation).toArray();
+
+    const inputTokens = fullTokens.slice(0, -1);
+    const inputTensor = tf.tensor2d([inputTokens], [1, inputTokens.length], "int32");
+
+    const logits = tfModel.predict(inputTensor) as tf.Tensor;
+    const logProbs = tf.logSoftmax(logits, -1);


You're running inference on the MCQ question for each option but the question is always the same. You could run inference only once and then retrieve the logits of each option and save a lot of time. see the code snippet in the next comment

Edit: this is assuming that there is only one continuation token to evaluate, I see that there's a lot over multiple tokens lines 79-83 so maybe my comment is not valid

JulienVig · 2026-06-01T09:29:27Z

+
+    const logits = tfModel.predict(inputTensor) as tf.Tensor;
+    const logProbs = tf.logSoftmax(logits, -1);
+    const arr = await logProbs.array() as number[][][];


You're computing the softmax for every position but you only need the last one and line 74 materializes the whole array while you only need 4. You could rewrite this logic such that you only work with the last position, for example:

const optionLogProbs = tf.tidy(() => { const logits = tfModel.predict(inputTensor) as tf.Tensor3D; // [1, seqLen, vocab] const lastLogits = logits .slice([0, promptTokens.length - 1, 0], [1, 1, -1]) // final position only .reshape([-1]); // [vocab] const logProbs = tf.logSoftmax(lastLogits); // [vocab] return tf.gather(logProbs, continuationTokenIDs); // continuationTokenIDs is an array of the 4 continuations' tokenID }); const scores = await optionLogProbs.array(); // just 4 values

JulienVig · 2026-06-01T09:33:53Z

+        if (predicted === answer) correct++;
+        total++;
+
+        if (confusion[answer]) {


You already checked that answer was contained in options line 128 so you can skip this check or only for throw an error if something unexpected happens :)

JulienVig · 2026-06-01T09:41:13Z

+    for (let targetPos = promptTokens.length; targetPos < fullTokens.length; targetPos++) {
+        const targetToken = fullTokens[targetPos];
+        score += arr[0][targetPos - 1][targetToken];
+        count++;
+    }


I think there should only be the last token to evaluate right? Is this loop over multiple tokens necessary?

JulienVig · 2026-06-01T11:34:45Z

+      modelPath: { type: String, description: "Path to a saved Disco GPT model.json" },
+      dataPath: { type: String, description: "Path to records/canaries text file" },
+      maxRecords: { type: Number, description: "Maximum records to evaluate; -1 for all", defaultValue: 100 },
+      promptLengths: { type: String, description: "Comma-separated prompt lengths", defaultValue: "10,50,100,200,500" },


It's probably not useful to test prompt length 500 if the context length is 256 or 512

JulienVig · 2026-06-01T12:06:00Z

+      return output as tf.Tensor;
+    });
+
+    console.log("logits shape:", logits.shape);


Make sure to remove the debug prints before merging the PR

mina5rovic requested a review from JulienVig April 16, 2026 11:51

mina5rovic changed the title ~~GPT2 training on MCQ med data~~ Draft: GPT2 training on MCQ med data Apr 16, 2026

mina5rovic added 2 commits April 17, 2026 19:52

Benchmark the mcq medeical dataset

c21d55b

lint error correction

3014cf5

mina5rovic force-pushed the gpt2-training branch from fd4b676 to 3014cf5 Compare April 17, 2026 17:53

mina5rovic and others added 25 commits April 19, 2026 13:57

add val dataset path param

e785cf6

add working local train

c93631c

add model saving to disk arg and more debug lines

2e34a37

add debug commands

4c9c84c

add cnahges to federated approach

23c733d

change round 0 payload null handling

096393c

chnage server max payload limit to higher number

1ef7d85

fix memory reads and wait for all clients to begin

417bfa5

add server debug logs to see why ws close session

77ce9a9

cover whole dataset and split data to clients

7aef628

add validation dataset loading changes

d65fb4a

change gpt config to use whole dadataset

fe7c51a

add arg for model saving location, cnahge save to saveLog, change lin…

0dc32aa

…k for model to contxt 512

add training optimizations

5c03f40

change onnx converter to be able to convert different context len

4ee6096

aggregate inside of an epoch for llms

5b5f88c

fix mem leak

8d409b9

change ligs

144b751

change model to 256

c046e9e

back to 512

e89b9e0

fix end of training

fbcca59

fix mem script and add logs

dfe388e

debug memorization script

04a4769

change benchmark

e2d2361

add client id on debug logs, and final aggregation wait between clients

25fcf28

Implement goldfish loss and make benchmark and memorizarion script ju…

3f6755c

…nk free

JulienVig reviewed Jun 1, 2026

View reviewed changes

mina5rovic added 6 commits June 1, 2026 17:49

add def task for local finetuning

d15da84

fix bug

05cf239

bug fix

c0c41a1

add validation before and after aggreagtion

95197f7

fix eval script

3e05567

add save-checkpoints flag

6374b54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: GPT2 training on MCQ med data#1111

Draft: GPT2 training on MCQ med data#1111
mina5rovic wants to merge 34 commits into
developfrom
gpt2-training

mina5rovic commented Apr 16, 2026

Uh oh!

JulienVig left a comment

Uh oh!

JulienVig Jun 1, 2026

Uh oh!

JulienVig Jun 1, 2026

Uh oh!

JulienVig Jun 1, 2026

Uh oh!

JulienVig Jun 1, 2026

Uh oh!

JulienVig Jun 1, 2026

Uh oh!

JulienVig Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mina5rovic commented Apr 16, 2026

Uh oh!

JulienVig left a comment

Choose a reason for hiding this comment

Uh oh!

JulienVig Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

JulienVig Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

JulienVig Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

JulienVig Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

JulienVig Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

JulienVig Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants