This post presents a summary of my PhD thesis. I explored how to learn from feedback given to model outputs when the collection of direct supervision signals...
Posts by Year
How can we give RL agents that learn from human feedback a possible advantage to succeed in this difficult learning scenario?
How can we train semantic parsers if neither question-parse nor question-answer pairs can be collected?
Discussing good, bad and ugly practices of reinforcement learning in neural machine translation.
This post explains the need for the score function gradient estimator trick and how it works.