Overview
Techniques & Concepts
- Markov Decision Processes (MDPs): Modeled environments with states, actions, rewards, and transitions.
- Value iteration: Computed optimal value functions and policies using dynamic programming.
- Q-learning: Implemented model-free learning that estimates action values from experience.
- Exploration vs. exploitation: Tuned ε-greedy exploration strategies for effective learning.
- Approximate Q-learning: Used feature-based representations to generalize across large state spaces in Pacman.
Result
- Gridworld agent converges to optimal or near-optimal policies based on reward structure.
- Crawler agent learns stable walking behavior through trial and error.
- Pacman agent learns to collect food and avoid ghosts without an explicit environment model.
What I Learned
- How reward design and discount factors shape agent behavior.
- The trade-offs between model-based (value iteration) and model-free (Q-learning) methods.
- How function approximation enables RL in larger, more complex state spaces.