2022-01-25
[latexpage] Background The loss weights are uniform or manually tuned. GradNorm GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks Task imbalances impede...
2022-01-02
[latexpage] Hand-crafted Feature Engineering 1. Feature Interaction Learning no interaction, pair-wise interaction (inner-product, outer-product, convolutional, attention and etc.), high-order interaction (explicitly, implicit) No interaction: LR,...
2020-02-22
[latexpage]Natural Gradient shares some common ideas with the high-level policy-based methods like Trust Region Policy Optimization (TRPO), Proximal Policy Optimization (PPO) and etc. The basic...
2020-02-04
[latexpage]TRPO can be viewed as a combination of natural policy gradient, line search strategy and monotonic improvement theorem, it updates policies by taking the largest step...
2020-01-20
[latexpage]This post is mainly summarized from original PER paper, but with more detailed and illustrative explanation, we will go through key ideas and implementation of...
2020-01-07
[latexpage]DQN series reinforcement learning algorithms involve with learning by using Deep Q Networks, these algorithms include Deep Q-learning (short for DQN), Double Deep Q-learning (Double...
2019-12-22
[latexpage]This post hilights on basic temporal difference learning theory and algorithms that contribute much to more advanced topics like Deep Q Learning (DQN), doublel DQN,...
2019-12-05
[latexpage]Dynamic Programming (DP) plays an important role in traditional approaches of solving reinforcement learning with known dynamics. This post focuses on a more classical view...
2019-11-27
[latexpage] Policy-based Methods directly learn a parameterized policy.Value-value methods learn the values of actions, and then selected actions based on their estimated action values. In...
2019-11-12
[latexpage]Policy gradient methods have advantages over better convergence properties, being effective in high-dimensional or continuous action spaces and being able to learn stochastic policies. 1....
2019-10-31
[latexpage]In this post, we will start with a brief introduction of TD3 algorithm and why it is proposed, thus leading to the problem of overestimation...
2019-10-14
[latexpage]This post we will summarize the deep deterministic policy gradient (DDPG) algorithm, to see how it works in continuous action space and how the Deep...