Published on

RLpapers: List of Reinforcement Learning Papers and Codes



This repo lists all the books, papers, and codes I used for my own RL journey. I will constantly update my progress with my own annotated papers [paper_annot] and code implemented [code] as I continue. As I'm studying, I will also work on a RL Jargons List for an organized place to list unfamiliar terms.

The papers that are numbered below are part of our curriculum at DIYA's MultiAgentRL team, where our goal is to read two papers and review them together every week. The other papers are composed to supplement my understandings whenever I have extra time.

First and foremost, the basics:

Value Based

  1. Playing Atari with Deep Reinforcement Learning, 2013. Algorithm: DQN [paper_annot] [code]
  • Human-level control through deep reinforcement learning, 2015. Algorithm: DQN [paper]
  1. Multiagent Cooperation and Competition with Deep Reinforcement Learning [paper]
  2. Deep Recurrent Q-Learning for Partially Observable MDPs, Hausknecht and Stone, 2015. Algorithm: Deep Recurrent Q-Learning(DRQN) [paper_annot]
  • Prioritized Experience Replay, Schaul et al, 2015. Algorithm: Prioritized Experience Replay (PER) [paper]
  • Dueling [paper]
  • Distributional [paper]
  • Rainbow [paper]
  • Quantile Regression [paper]


  • Deterministic Policy Gradient Algorithms DPG [paper]
  1. Continuous Control With Deep Reinforcement Learning, Lillicrap et al, 2015. Algorithm: DDPG [paper]
  2. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments, Lowe et al. 2017. Algorithm: MADDPG [paper]
  • Addressing Function Approximation Error in Actor-Critic Methods [paper]

Policy Gradient

  1. Trust Region Policy Optimization, Schulman et al, 2015. Algorithm: TRPO [paper_annot]
  2. High-Dimensional Continuous Control Using Generalized Advantage Estimation, Schulman et al, 2015. Algorithm: GAE
  3. Proximal Policy Optimization Algorithms, Schulman et al, 2017. Algorithm: PPO-Clip, PPO-Penalty [paper_annot]
  4. Emergence complexity via Muti-agent competition, Bansal et al, 2017.
  5. Adversarial Policies: Attacking Deep Reinforcement Learning


  1. Counterfactual Multi-Agent Policy Gradient, Foerster et al. Algorithm: COMA
  2. Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning, Rashid et al. 2018. Algorithm: QMIX
  3. Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research
  4. Deep Reinforcement Learning from Self-Play in Imperfect-Information Games NFSP (Neural Fictitious Self-Play)
  5. Jaderberg et al. Human-level performance in 3D multiplayer games with population based reinforcement learning
  7. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning PSRO/DCH


  • Asynchronous Methods for Deep Reinforcement Learning [paper] [code]


  • Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, 2018 [paper]
  • Soft Actor-Critic Algorithms and Applications, 2019 [paper]


  • Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms [paper]
  • A Survey and Critique of Multiagent Deep Reinforcement Learning [paper]