save training dataset by it target type.
while training NN use single target training set to backward NN.
this improve at least 20 times faster than last update!
while game over add remaintime/15 to every step's rewards. to improve this round's training weight.
fix get target from states still using onehot decoder bug.