Describe the bug
As I mentioned in this issue, the default value of top_p and temperature is not guaranteed to be 1. Therefore, the code below will get a modified logits, i.e., a distribution processed depending on generation_config from hf end.
|
if self.use_accelerator: |
|
outputs = self.backend_model.generate( |
|
input_ids=inputs, |
|
pad_token_id=self.tokenizer.pad_token_id, |
|
*args, |
|
**kwargs |
|
) |
|
else: |
|
if self.device == "gpu": |
|
outputs = self.ds_engine.module.generate( |
|
input_ids=inputs, |
|
synced_gpus=True, |
|
pad_token_id=self.tokenizer.pad_token_id, |
|
*args, |
|
**kwargs |
|
) |
|
elif self.device == "cpu": |
|
outputs = self.backend_model.generate( |
|
input_ids=inputs, |
|
synced_gpus=True, |
|
pad_token_id=self.tokenizer.pad_token_id, |
|
*args, |
|
**kwargs |
|
) |
Much worse, you applied top_p and temperature again in score_to_prob, resulting unexpected distribution:
|
for _ in range(num_new_tokens): |
|
pred = self.predict_next_token(model=model, input_ids=sequence, num_new_tokens=1) # predict next one token |
|
prob = self.score_to_prob(pred.scores[0], temperature=temperature) |
|
sampled = self.sample(prob=prob, num_samples=1) |
|
new_tokens.append(sampled) |
|
sequence = torch.cat([sequence, sampled['sampled_token']], dim=1) |
Describe the bug
As I mentioned in this issue, the default value of
top_pandtemperatureis not guaranteed to be1. Therefore, the code below will get a modified logits, i.e., a distribution processed depending ongeneration_configfrom hf end.LMFlow/src/lmflow/models/hf_decoder_model.py
Lines 382 to 405 in 1b223f7
Much worse, you applied
top_pandtemperatureagain inscore_to_prob, resulting unexpected distribution:LMFlow/src/lmflow/pipeline/inferencer.py
Lines 435 to 440 in 1b223f7