YIN Chang-ming, WHANG Han-xing, ZHAO Fei. Risk-Sensitive Reinforcement Learning Algorithms With Generalized Average Criterion[J]. Applied Mathematics and Mechanics, 2007, 28(3): 369-378.
Citation: YIN Chang-ming, WHANG Han-xing, ZHAO Fei. Risk-Sensitive Reinforcement Learning Algorithms With Generalized Average Criterion[J]. Applied Mathematics and Mechanics, 2007, 28(3): 369-378.

Risk-Sensitive Reinforcement Learning Algorithms With Generalized Average Criterion

  • Received Date: 2006-02-20
  • Rev Recd Date: 2007-01-16
  • Publish Date: 2007-03-15
  • A new algorithm which immolates optimality of control policies potentially to obtain the robusticity of solutions is proposed.The robusticity of solutions may become a very important property for a learning system due to when there exists nonOmatching between theory models and practical physical system,or the practical system is not static,or availability of a control action will change along with variety of time.The main contribution is that a set of approximation algorithms and its convergence results will be given.Applying generalized average operator instead of the general optimal operator max(or min)a class of important learning algorithm,dynamic programming algorithm were studied,and their convergence from theoretic point of view was discussed.The purpose is to improve robusticity of reinforcement learning algorithms theoretically.
  • loading
  • [1]
    Sutton R S.Learning to predict by the method of temporal difference[J].Machine Learning,1988,3(1):9-44.
    [2]
    Sutton R S. Open the oretical questions in reinforcement learning[A].In:Proc of Euro COLT'99(Computational Learning Theory)[C].Cambridge, MA: MIT Press,1999,11-17.
    [3]
    Sutton R S,Barto A G.Reinforcement Learning: An Introduction[M].Massachusetts: MIT Press, 1998, 20-300.
    [4]
    Watkins C J C H,Dayan P.Q-learning[J].Machine Learning,1992,8(13):279-292.
    [5]
    Watkins C J C H. Learning from delayed rewards[D].England:University of Cambridge,1989.
    [6]
    Bertsekas D P,Tsitsiklis J N.Parallel and Distributed Computation: Numerical Methods[M].Englewood Cliffs, New Jersey: Prentice-Hall,1989,10-109.
    [7]
    YIN Chang-ming,CHEN Huan-wen,XIE Li-juan. A Relative Value Iteration Q-learning Algorithm and its Convergence Based-on Finite Samples[J].Journal of Computer Research and Development,2002,39(9):1064-1070.
    [8]
    YIN Chang-ming,CHEN Huan-wen,XIE Li-juan.Optimality cost relative value iteration Q-learning algorithm based on finite samples[J].Journal of Computer Engineering and Applications,2002,38(11):65-67.
    [9]
    Wiering M, Schmidhuber J.Speeding up Q-learning[A].In:Proc of the 10th European Conf on Machine Learning[C].Germany:Springer-Verlag,1998,352-363.
    [10]
    Singh S.Soft dynamic programming algorithms: convergence proofs[A].In:Proceedings of Workshop on Computational Learning and Natural Learning (CLNL)[C].Massachusetts:Town of Provinceton.University of Massachuetts,1993.
    [11]
    Cavazos-Cadena R,Montes-de-Oca R.The value iteration algorithm in risk-sensitive average Markov decision chains with finite state[J].Mathematics of Operations Research,2003,28(4):752-776. doi: 10.1287/moor.28.4.752.20515
    [12]
    Peng J,Williams R.Incremental multi-step Q-learning[J].Machine Learning,1996,22(4):283-290.
    [13]
    Singh S. Reinforcement learning algorithm for average-payoff Markovian decision processes[A].Procedins of the 12th National Conference on Artificial Intelligence[C].Taho city:Ca Morgan Kaufmann,1994,1:700-705.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (1992) PDF downloads(844) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return