LIAO YONG Technology Space

Dive into Deep Reinforcment Learning

[latexpage] Background The loss weights are uniform or manually tuned. GradNorm GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks Task imbalances impede...

Overview of Industrial CTR Prediction

[latexpage] Hand-crafted Feature Engineering 1. Feature Interaction Learning no interaction, pair-wise interaction (inner-product, outer-product, convolutional, attention and etc.), high-order interaction (explicitly, implicit) No interaction: LR,...

Natural Policy Gradient

[latexpage]Natural Gradient shares some common ideas with the high-level policy-based methods like Trust Region Policy Optimization (TRPO), Proximal Policy Optimization (PPO) and etc. The basic...

Trust Region Policy Optimization (TRPO)

[latexpage]TRPO can be viewed as a combination of natural policy gradient, line search strategy and monotonic improvement theorem, it updates policies by taking the largest step...

Prioritized Double DQN

[latexpage]This post is mainly summarized from original PER paper, but with more detailed and illustrative explanation, we will go through key ideas and implementation of...

Deep Q-Learning Series (DQN)

[latexpage]DQN series reinforcement learning algorithms involve with learning by using Deep Q Networks, these algorithms include Deep Q-learning (short for DQN), Double Deep Q-learning (Double...

Temporal Difference Learning from Scratch

[latexpage]This post hilights on basic temporal difference learning theory and algorithms that contribute much to more advanced topics like Deep Q Learning (DQN), doublel DQN,...

Dynamic Programming in Reinforcement Learning

[latexpage]Dynamic Programming (DP) plays an important role in traditional approaches of solving reinforcement learning with known dynamics. This post focuses on a more classical view...

Asynchronous Advantage Actor Critic (A3C)

[latexpage] Policy-based Methods directly learn a parameterized policy.Value-value methods learn the values of actions, and then selected actions based on their estimated action values. In...

Policy Gradient Methods Overview

[latexpage]Policy gradient methods have advantages over better convergence properties, being effective in high-dimensional or continuous action spaces and being able to learn stochastic policies. 1....

Twin Delayed DDPG (TD3) Walkthrough

[latexpage]In this post, we will start with a brief introduction of TD3 algorithm and why it is proposed, thus leading to the problem of overestimation...

Deep Determinstic Policy Gradient (DDPG)

[latexpage]This post we will summarize the deep deterministic policy gradient (DDPG) algorithm, to see how it works in continuous action space and how the Deep...

LIAO YONG Technology Space

HOME

Dive into Deep Reinforcment Learning

About Me

Recent Posts

Archives

Categories

Contact

Dive into Deep Reinforcment Learning

About Me

Recent Posts

Archives

Categories

Tags

Contact