Q-Learning

This article talks about Q-Learning, which learns the optimal policy even when actions are selected according to a more exploratory or even random policy. It is an Off-Policy algorithm for Temporal Difference learning. It is a form of reinforcement learning in which the agent learns to assign values to state-action pairs. Q-Learning works by learning an action-value function that ultimately gives the expected utility of taking a given action in a given state and following the optimal policy thereafter. Sometimes in noisy environments “Q-Learning” can overestimate the actions values, slowing the learning.

Q-Learning

More Post

Comets from outside our Solar System Might Visit us Often, Study Suggests

Annual Report 2012 of Shahjalal Islami Bank Limited

Sample Request Letter format for Vehicle Loan from Company

Purchase Price Allocation (PPA)

P Wave – in Seismology

Are Avocados Really Called That Because of Testicles?

Latest Post

Mid-ocean Ridge (MOR)

Harnessing Hydrogen at the Genesis of Life

NGC 5728’s Faint Characteristics are Exposed

Astronomers Discover the Oldest Black Hole Ever Observed

Atomic Hydrogen Welding

Variable-frequency Transformer (VFT)

Version Space Learning

Volume of a Sphere

Dividing by Repeated Subtractions

Reciprocal Relations of Trigonometric Ratios