ProjectsSeptember 26, 2025

Reinforcement Learning Agents

Overview

Implemented reinforcement learning agents that learn to act optimally in unknown environments. The agents were tested in Gridworld, a simulated robot crawler, and Pacman, learning behavior from interaction rather than explicit supervision.

Techniques & Concepts

Markov Decision Processes (MDPs): Modeled environments with states, actions, rewards, and transitions.
Value iteration: Computed optimal value functions and policies using dynamic programming.
Q-learning: Implemented model-free learning that estimates action values from experience.
Exploration vs. exploitation: Tuned ε-greedy exploration strategies for effective learning.
Approximate Q-learning: Used feature-based representations to generalize across large state spaces in Pacman.

Result

Gridworld agent converges to optimal or near-optimal policies based on reward structure.
Crawler agent learns stable walking behavior through trial and error.
Pacman agent learns to collect food and avoid ghosts without an explicit environment model.

What I Learned

How reward design and discount factors shape agent behavior.
The trade-offs between model-based (value iteration) and model-free (Q-learning) methods.
How function approximation enables RL in larger, more complex state spaces.

Related projects

Game AI: Search and Adversarial Agents

Built intelligent Pacman agents using classic search algorithms and adversarial planning (minimax, expectimax, alpha–beta pruning).

Read Summary and Reflection