change learning timing to each episode end.
Add load & save function. Add train flag to test model. Add new action select function while in test mode. Add decision period to skip step.
add discrete and continuous action in same NN model. model save and load. reward is increasing, converge was observed. this two models are seems good: Aimbot_9331_1667423213_hybrid_train2 Aimbot_9331_1667389873_hybrid
weight and bias sync added
Parallel Environment Discrete PPO finish. Runnable.