Draft: GPT2 training on MCQ med data#1111
Conversation
fd4b676 to
3014cf5
Compare
…k for model to contxt 512
JulienVig
left a comment
There was a problem hiding this comment.
The logic of the benchmark and the memorization looks good to me! Just had some performance comments
| const promptTokens = tokenizer.tokenize(prompt).toArray(); | ||
| const fullTokens = tokenizer.tokenize(prompt + continuation).toArray(); | ||
|
|
||
| const inputTokens = fullTokens.slice(0, -1); | ||
| const inputTensor = tf.tensor2d([inputTokens], [1, inputTokens.length], "int32"); | ||
|
|
||
| const logits = tfModel.predict(inputTensor) as tf.Tensor; | ||
| const logProbs = tf.logSoftmax(logits, -1); |
There was a problem hiding this comment.
You're running inference on the MCQ question for each option but the question is always the same. You could run inference only once and then retrieve the logits of each option and save a lot of time. see the code snippet in the next comment
Edit: this is assuming that there is only one continuation token to evaluate, I see that there's a lot over multiple tokens lines 79-83 so maybe my comment is not valid
|
|
||
| const logits = tfModel.predict(inputTensor) as tf.Tensor; | ||
| const logProbs = tf.logSoftmax(logits, -1); | ||
| const arr = await logProbs.array() as number[][][]; |
There was a problem hiding this comment.
You're computing the softmax for every position but you only need the last one and line 74 materializes the whole array while you only need 4. You could rewrite this logic such that you only work with the last position, for example:
const optionLogProbs = tf.tidy(() => {
const logits = tfModel.predict(inputTensor) as tf.Tensor3D; // [1, seqLen, vocab]
const lastLogits = logits
.slice([0, promptTokens.length - 1, 0], [1, 1, -1]) // final position only
.reshape([-1]); // [vocab]
const logProbs = tf.logSoftmax(lastLogits); // [vocab]
return tf.gather(logProbs, continuationTokenIDs); // continuationTokenIDs is an array of the 4 continuations' tokenID
});
const scores = await optionLogProbs.array(); // just 4 values
| if (predicted === answer) correct++; | ||
| total++; | ||
|
|
||
| if (confusion[answer]) { |
There was a problem hiding this comment.
You already checked that answer was contained in options line 128 so you can skip this check or only for throw an error if something unexpected happens :)
| for (let targetPos = promptTokens.length; targetPos < fullTokens.length; targetPos++) { | ||
| const targetToken = fullTokens[targetPos]; | ||
| score += arr[0][targetPos - 1][targetToken]; | ||
| count++; | ||
| } |
There was a problem hiding this comment.
I think there should only be the last token to evaluate right? Is this loop over multiple tokens necessary?
| modelPath: { type: String, description: "Path to a saved Disco GPT model.json" }, | ||
| dataPath: { type: String, description: "Path to records/canaries text file" }, | ||
| maxRecords: { type: Number, description: "Maximum records to evaluate; -1 for all", defaultValue: 100 }, | ||
| promptLengths: { type: String, description: "Comma-separated prompt lengths", defaultValue: "10,50,100,200,500" }, |
There was a problem hiding this comment.
It's probably not useful to test prompt length 500 if the context length is 256 or 512
| return output as tf.Tensor; | ||
| }); | ||
|
|
||
| console.log("logits shape:", logits.shape); |
There was a problem hiding this comment.
Make sure to remove the debug prints before merging the PR
No description provided.