Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
currymj
12 months ago
|
parent
|
context
|
favorite
| on:
Mathematical Foundations of Reinforcement Learning
Find materials on PPO which should be widespread since it is the most popular RL algorithm. GRPO works on the same principles, just makes certain estimates from samples rather than training an auxiliary neural network to make them.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: