while game over add remaintime/15 to every step's rewards. to improve this round's training weight. fix get target from states still using onehot decoder bug.
Add Multi neural network in output layer use different nn while facing to different target.