Abstract
In this research, a stable biped walking pattern is generated using reinforcement learning. The biped walking pattern is chosen as a simple third order polynomial. To complete it, four boundary conditions are needed. The initial position and velocity and the final position and velocity of the joint are selected as boundary conditions. In order to find the proper boundary condition value, a reinforcement learning algorithm is used. Also desired motion or posture can be achieved using the initial and final positions. The final velocity of the walking pattern is chosen as a learning parameter. To test the algorithm, a simulator that takes into consideration the whole model of the robot and the environment is developed. The algorithm is verified through a simulation.