风险敏感度激励学习的广义平均算法

殷苌茗; 王汉兴; 赵飞

风险敏感度激励学习的广义平均算法

1.
长沙理工大学计算机与通信工程学院,长沙 410076; 2.上海大学理学院数学系, 上海 200444

基金项目: 国家自然科学基金资助项目(10471088;60572126)

详细信息

作者简介:
殷苌茗(1964- ),男,湖南人,副教授,博士(联系人.Tel:+86-731-5542939;E-mail:yinchm@csust.edu.cn).

中图分类号: O23；TP182
计量
- 文章访问数: 2655
- HTML全文浏览量: 160
- PDF下载量: 851
- 被引次数: 0
出版历程
- 收稿日期: 2006-02-20
- 修回日期: 2007-01-16
- 刊出日期: 2007-03-15

Risk-Sensitive Reinforcement Learning Algorithms With Generalized Average Criterion

1.
College of Computer and Communicational Engineering, Changsha University of Science and Technology, Changsha 410076, P. R. China;

摘要

摘要: 提出了一种新的算法．这个算法通过潜在地牺牲控制策略的最优性来获取其鲁棒性．这是因为,如果在理论模型与实际的物理系统之间存在不匹配，或者实际系统是非静态的,或者控制动作的可使用性随时间的变化而变化时,那么鲁棒性就可能成为一个十分重要的问题．主要工作是给出了一组逼近算法和它们的收敛结果．利用广义平均算子来替代最优算子max(或min),对激励学习中的一类最重要的算法——动态规划算法——进行了研究,并讨论了它们的收敛性,目的就是为了提高激励学习算法的鲁棒性．同时使用了更具一般性的风险敏感度性能评价体系,发现基于动态规划的学习算法中的一般结论在这种体系之下并不完全成立．
- 激励学习 /
- 风险敏感度 /
- 广义平均 /
- 算法 /
- 收敛性
Abstract: A new algorithm which immolates optimality of control policies potentially to obtain the robusticity of solutions is proposed.The robusticity of solutions may become a very important property for a learning system due to when there exists nonOmatching between theory models and practical physical system,or the practical system is not static,or availability of a control action will change along with variety of time.The main contribution is that a set of approximation algorithms and its convergence results will be given.Applying generalized average operator instead of the general optimal operator max(or min)a class of important learning algorithm,dynamic programming algorithm were studied,and their convergence from theoretic point of view was discussed.The purpose is to improve robusticity of reinforcement learning algorithms theoretically.
- reinforcement learning /
- riskOsensitive /
- generalized average /
- algorithm /
- convergence

HTML全文

参考文献(13)

[1]	Sutton R S.Learning to predict by the method of temporal difference[J].Machine Learning,1988,3(1):9-44.
[2]	Sutton R S. Open the oretical questions in reinforcement learning[A].In:Proc of Euro COLT'99(Computational Learning Theory)[C].Cambridge, MA: MIT Press,1999，11-17.
[3]	Sutton R S,Barto A G.Reinforcement Learning: An Introduction[M].Massachusetts: MIT Press, 1998, 20-300.
[4]	Watkins C J C H,Dayan P.Q-learning[J].Machine Learning,1992,8(13):279-292.
[5]	Watkins C J C H. Learning from delayed rewards[D].England:University of Cambridge,1989.
[6]	Bertsekas D P,Tsitsiklis J N.Parallel and Distributed Computation: Numerical Methods[M].Englewood Cliffs, New Jersey: Prentice-Hall,1989,10-109.
[7]	YIN Chang-ming,CHEN Huan-wen,XIE Li-juan. A Relative Value Iteration Q-learning Algorithm and its Convergence Based-on Finite Samples[J].Journal of Computer Research and Development,2002,39(9):1064-1070.
[8]	YIN Chang-ming,CHEN Huan-wen,XIE Li-juan.Optimality cost relative value iteration Q-learning algorithm based on finite samples[J].Journal of Computer Engineering and Applications,2002,38(11):65-67.
[9]	Wiering M, Schmidhuber J.Speeding up Q-learning[A].In:Proc of the 10th European Conf on Machine Learning[C].Germany:Springer-Verlag,1998,352-363.
[10]	Singh S.Soft dynamic programming algorithms: convergence proofs[A].In:Proceedings of Workshop on Computational Learning and Natural Learning (CLNL)[C].Massachusetts:Town of Provinceton.University of Massachuetts,1993.
[11]	Cavazos-Cadena R,Montes-de-Oca R.The value iteration algorithm in risk-sensitive average Markov decision chains with finite state[J].Mathematics of Operations Research,2003,28(4):752-776. doi: 10.1287/moor.28.4.752.20515
[12]	Peng J,Williams R.Incremental multi-step Q-learning[J].Machine Learning,1996,22(4):283-290.
[13]	Singh S. Reinforcement learning algorithm for average-payoff Markovian decision processes[A].Procedins of the 12th National Conference on Artificial Intelligence[C].Taho city:Ca Morgan Kaufmann,1994,1:700-705.

施引文献

资源附件(0)

访问统计

计量

文章访问数: 2655
HTML全文浏览量: 160
PDF下载量: 851
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

风险敏感度激励学习的广义平均算法

作者简介:
殷苌茗(1964- ),男,湖南人,副教授,博士(联系人.Tel:+86-731-5542939;E-mail:yinchm@csust.edu.cn).

计量

Risk-Sensitive Reinforcement Learning Algorithms With Generalized Average Criterion

计量

目录

留言板

风险敏感度激励学习的广义平均算法

作者简介: 殷苌茗(1964- ),男,湖南人,副教授,博士(联系人.Tel:+86-731-5542939;E-mail:yinchm@csust.edu.cn).

计量

出版历程

Risk-Sensitive Reinforcement Learning Algorithms With Generalized Average Criterion

计量

出版历程

目录

作者简介:
殷苌茗(1964- ),男,湖南人,副教授,博士(联系人.Tel:+86-731-5542939;E-mail:yinchm@csust.edu.cn).