- Published on
RLpapers: List of Reinforcement Learning Papers and Codes
- Authors
- Name
- Tyler Kim
- @tylertaewook
RLpapers
This repo lists all the books, papers, and codes I used for my own RL journey. I will constantly update my progress with my own annotated papers [paper_annot] and code implemented [code] as I continue. As I'm studying, I will also work on a RL Jargons List for an organized place to list unfamiliar terms.
The papers that are numbered below are part of our curriculum at DIYA's MultiAgentRL team, where our goal is to read two papers and review them together every week. The other papers are composed to supplement my understandings whenever I have extra time.
First and foremost, the basics:
- Richard Sutton's Reinforcement Learning: An Introduction
- David Silver's RL Course
Value Based
- Playing Atari with Deep Reinforcement Learning, 2013. Algorithm:
DQN
[paper_annot] [code]
- Human-level control through deep reinforcement learning, 2015. Algorithm:
DQN
[paper]
- Multiagent Cooperation and Competition with Deep Reinforcement Learning [paper]
- Deep Recurrent Q-Learning for Partially Observable MDPs, Hausknecht and Stone, 2015. Algorithm:
Deep Recurrent Q-Learning(DRQN)
[paper_annot]
- Prioritized Experience Replay, Schaul et al, 2015. Algorithm:
Prioritized Experience Replay (PER)
[paper] - Dueling [paper]
- Distributional [paper]
- Rainbow [paper]
- Quantile Regression [paper]
DPG
- Deterministic Policy Gradient Algorithms
DPG
[paper]
- Continuous Control With Deep Reinforcement Learning, Lillicrap et al, 2015. Algorithm:
DDPG
[paper] - Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments, Lowe et al. 2017. Algorithm:
MADDPG
[paper]
- Addressing Function Approximation Error in Actor-Critic Methods [paper]
Policy Gradient
- Trust Region Policy Optimization, Schulman et al, 2015. Algorithm:
TRPO
[paper_annot] - High-Dimensional Continuous Control Using Generalized Advantage Estimation, Schulman et al, 2015. Algorithm:
GAE
- Proximal Policy Optimization Algorithms, Schulman et al, 2017. Algorithm:
PPO-Clip, PPO-Penalty
[paper_annot] - Emergence complexity via Muti-agent competition, Bansal et al, 2017.
- Adversarial Policies: Attacking Deep Reinforcement Learning https://arxiv.org/pdf/1905.10615.pdf
MARL
- Counterfactual Multi-Agent Policy Gradient, Foerster et al. Algorithm:
COMA
- Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning, Rashid et al. 2018. Algorithm:
QMIX
- Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research
- Deep Reinforcement Learning from Self-Play in Imperfect-Information Games NFSP (Neural Fictitious Self-Play)
- Jaderberg et al. Human-level performance in 3D multiplayer games with population based reinforcement learning https://science.sciencemag.org/content/sci/364/6443/859.full.pdf
- ASYMMETRIC SELF-PLAY FOR AUTOMATIC GOAL DISCOVERY IN ROBOTIC MANIPULATION Asymmetric self-play & HRL
- A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning PSRO/DCH
A2C/A3C
SAC
- Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, 2018 [paper]
- Soft Actor-Critic Algorithms and Applications, 2019 [paper]