Playing hard exploration games by watching YouTube

논문 저자: Yusuf Aytar, Tobias Pfaff, David Budden, Tom Le Paine, Ziyu Wang, Nando de Freitas (DeepMind)
논문 링크: https://arxiv.org/pdf/1805.11592.pdf
유투브 링크: https://youtu.be/Msy82sIfprI

A Natural Policy Gradient 보충자료

cs294에서도 NPG에 대한 내용을 소개한다. cs294 lecture 13에 해당하는 내용이다. 링크는 다음과 같다.

이번 ICLR 2018에 채택된 강화학습 논문 중에서 관심이 가는 논문들 리스트를 뽑아봤습니다. 다른 논문도 많지만 제 취향대로 뽑아봤습니다. 먼저 읽고 싶은 논문은 (2), (10), (12) 입니다.

(1) Universal Agent for Disentangling Environments and Tasks
- 읽고 싶은 이유: 어떤 task를 학습하는데 두 개로 agent를 쪼개는 방식이 흥미로워서
- abtract: Recent state-of-the-art reinforcement learning algorithms are trained under the goal of excelling in one specific task. Hence, both environment and task specific knowledge are entangled into one framework. However, there are often scenarios where the environment (e.g. the physical world) is fixed while only the target...
  더보기