RLpapers

This repo lists all the books, papers, and codes I used for my own RL journey. I will constantly update my progress with my own annotated papers [paper_annot] and code implemented [code] as I continue. As I'm studying, I will also work on a RL Jargons List for an organized place to list unfamiliar terms.

The papers that are numbered below are part of our curriculum at DIYA's MultiAgentRL team, where our goal is to read two papers and review them together every week. The other papers are composed to supplement my understandings whenever I have extra time.

First and foremost, the basics:

Richard Sutton's Reinforcement Learning: An Introduction
David Silver's RL Course

Value Based

Playing Atari with Deep Reinforcement Learning, 2013. Algorithm: DQN [paper_annot] [code]

Human-level control through deep reinforcement learning, 2015. Algorithm: DQN [paper]

Multiagent Cooperation and Competition with Deep Reinforcement Learning [paper]
Deep Recurrent Q-Learning for Partially Observable MDPs, Hausknecht and Stone, 2015. Algorithm: Deep Recurrent Q-Learning(DRQN) [paper_annot]

Prioritized Experience Replay, Schaul et al, 2015. Algorithm: Prioritized Experience Replay (PER) [paper]
Dueling [paper]
Distributional [paper]
Rainbow [paper]
Quantile Regression [paper]

DPG

Deterministic Policy Gradient Algorithms DPG [paper]

Continuous Control With Deep Reinforcement Learning, Lillicrap et al, 2015. Algorithm: DDPG [paper]
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments, Lowe et al. 2017. Algorithm: MADDPG [paper]

Addressing Function Approximation Error in Actor-Critic Methods [paper]

Policy Gradient

Trust Region Policy Optimization, Schulman et al, 2015. Algorithm: TRPO [paper_annot]
High-Dimensional Continuous Control Using Generalized Advantage Estimation, Schulman et al, 2015. Algorithm: GAE
Proximal Policy Optimization Algorithms, Schulman et al, 2017. Algorithm: PPO-Clip, PPO-Penalty [paper_annot]
Emergence complexity via Muti-agent competition, Bansal et al, 2017.
Adversarial Policies: Attacking Deep Reinforcement Learning https://arxiv.org/pdf/1905.10615.pdf

MARL

Counterfactual Multi-Agent Policy Gradient, Foerster et al. Algorithm: COMA
Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning, Rashid et al. 2018. Algorithm: QMIX
Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research
Deep Reinforcement Learning from Self-Play in Imperfect-Information Games NFSP (Neural Fictitious Self-Play)
Jaderberg et al. Human-level performance in 3D multiplayer games with population based reinforcement learning https://science.sciencemag.org/content/sci/364/6443/859.full.pdf
ASYMMETRIC SELF-PLAY FOR AUTOMATIC GOAL DISCOVERY IN ROBOTIC MANIPULATION Asymmetric self-play & HRL
A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning PSRO/DCH

A2C/A3C

Asynchronous Methods for Deep Reinforcement Learning [paper] [code]

SAC

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, 2018 [paper]
Soft Actor-Critic Algorithms and Applications, 2019 [paper]

Survey

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms [paper]
A Survey and Critique of Multiagent Deep Reinforcement Learning [paper]