MLSP-32.4
EFFICIENT AND STABLE INFORMATION DIRECTED EXPLORATION FOR CONTINUOUS REINFORCEMENT LEARNING
Mingzhe Chen, Xi Xiao, Wanpeng Zhang, Tsinghua University, China; Xiaotian Gao, Microsoft Research Asia, China
Session:
Reinforcement Learning I
Track:
Machine Learning for Signal Processing
Location:
Gather Area H
Presentation Time:
Wed, 11 May, 20:00 - 20:45 China Time (UTC +8)
Wed, 11 May, 12:00 - 12:45 UTC
Wed, 11 May, 12:00 - 12:45 UTC
Session Chair:
Hoi To Wai, The Chinese University of Hong Kong
Session MLSP-32
MLSP-32.1: POPO: PESSIMISTIC OFFLINE POLICY OPTIMIZATION
Qiang He, Xinwen Hou, Yu Liu, Institute of Automation, Chinese Academy of Sciences, China
MLSP-32.2: BYZANTINE-ROBUST FEDERATED DEEP DETERMINISTIC POLICY GRADIENT
Qifeng Lin, Qing Ling, School of Computer Science and Engineering, Sun Yat-Sen University, China
MLSP-32.3: IMPROVING ACTOR-CRITIC REINFORCEMENT LEARNING VIA HAMILTONIAN MONTE CARLO METHOD
Duo Xu, Faramarz Fekri, Georgia Institute of Technology, United States of America
MLSP-32.4: EFFICIENT AND STABLE INFORMATION DIRECTED EXPLORATION FOR CONTINUOUS REINFORCEMENT LEARNING
Mingzhe Chen, Xi Xiao, Wanpeng Zhang, Tsinghua University, China; Xiaotian Gao, Microsoft Research Asia, China
MLSP-32.5: HYPERGRAPH-BASED REINFORCEMENT LEARNING FOR STOCK PORTFOLIO SELECTION
Xiaojie Li, Chaoran Cui, Donglin Cao, Juan Du, Chunyun Zhang, Shandong University of Finance and Economics, China