强化学习 Reinforcement Learning 教程系列
简介
Q-learning
Sarsa
Deep Q Network
Policy Gradient
Actor Critic
- 6.1 什么是 Actor Critic
- 6.2 Actor Critic (Tensorflow)
- 6.3 什么是 Deep Deterministic Policy Gradient (DDPG)
- 6.4 Deep Deterministic Policy Gradient (DDPG) (Tensorflow)
- 6.5 什么是 Asynchronous Advantage Actor-Critic (A3C)
- 6.6 Asynchronous Advantage Actor-Critic (A3C) (Tensorflow)
- 6.7 Distributed Proximal Policy Optimization (DPPO) (Tensorflow)
Model Based RL