| [1] | Sutton R S.Learning to predict by the method of temporal difference[J].Machine Learning,1988,3(1):9-44. | 
		
				| [2] | Sutton R S. Open the oretical questions in reinforcement learning[A].In:Proc of Euro COLT'99(Computational Learning Theory)[C].Cambridge, MA: MIT Press,1999,11-17. | 
		
				| [3] | Sutton R S,Barto A G.Reinforcement Learning: An Introduction[M].Massachusetts: MIT Press, 1998, 20-300. | 
		
				| [4] | Watkins C J C H,Dayan P.Q-learning[J].Machine Learning,1992,8(13):279-292. | 
		
				| [5] | Watkins C J C H. Learning from delayed rewards[D].England:University of Cambridge,1989. | 
		
				| [6] | Bertsekas D P,Tsitsiklis J N.Parallel and Distributed Computation: Numerical Methods[M].Englewood Cliffs, New Jersey: Prentice-Hall,1989,10-109. | 
		
				| [7] | YIN Chang-ming,CHEN Huan-wen,XIE Li-juan. A Relative Value Iteration Q-learning Algorithm and its Convergence Based-on Finite Samples[J].Journal of Computer Research and Development,2002,39(9):1064-1070. | 
		
				| [8] | YIN Chang-ming,CHEN Huan-wen,XIE Li-juan.Optimality cost relative value iteration Q-learning algorithm based on finite samples[J].Journal of Computer Engineering and Applications,2002,38(11):65-67. | 
		
				| [9] | Wiering M, Schmidhuber J.Speeding up Q-learning[A].In:Proc of the 10th European Conf on Machine Learning[C].Germany:Springer-Verlag,1998,352-363. | 
		
				| [10] | Singh S.Soft dynamic programming algorithms: convergence proofs[A].In:Proceedings of Workshop on Computational Learning and Natural Learning (CLNL)[C].Massachusetts:Town of Provinceton.University of Massachuetts,1993. | 
		
				| [11] | Cavazos-Cadena R,Montes-de-Oca R.The value iteration algorithm in risk-sensitive average Markov decision chains with finite state[J].Mathematics of Operations Research,2003,28(4):752-776. doi:  10.1287/moor.28.4.752.20515 | 
		
				| [12] | Peng J,Williams R.Incremental multi-step Q-learning[J].Machine Learning,1996,22(4):283-290. | 
		
				| [13] | Singh S. Reinforcement learning algorithm for average-payoff Markovian decision processes[A].Procedins of the 12th National Conference on Artificial Intelligence[C].Taho city:Ca Morgan Kaufmann,1994,1:700-705. |