Why we use `test()` to see the reward and testing_sample_step? Can I use the `train()` to see how the reward change when training? It seems that the `last perf` is the reward. Because we want to compare with the openai,
Why we use
test()to see the reward and testing_sample_step?Can I use the
train()to see how the reward change when training?It seems that the
last perfis the reward.Because we want to compare with the openai,