Change Critic NN as Multi-NN
wrong remain Time Fix
wrong remain Time Fix, what a stupid mistake...
and fix doubled WANDB writer
Deeper TargetNN
deeper target NN and will get target state while receive hidden layer's output.
Change Middle input
let every thing expect raycast input to target network.
Change Activation function to Tanh
Change Activation function to Tanh, and it's works a little bit better than before.
save training dataset by it target type.
while training NN use single target training set to backward NN.
this improve at least 20 times faster than last update!
while game over add remaintime/15 to every step's rewards. to improve this round's training weight.
fix get target from states still using onehot decoder bug.