This post presents a summary of my PhD thesis. I explored how to learn from feedback given to model outputs when the collection of direct supervision signals...
This is the blog of the Statistical NLP Group at the Department of Computational Linguistics, Heidelberg University. Our research addresses various aspects of the problem of the confusion of languages, by means of statistical learning for natural language processing.
We blog about pitfalls in methodologies, recent advances and important problems in this field of research.
How can we give RL agents that learn from human feedback a possible advantage to succeed in this difficult learning scenario?
How can we train semantic parsers if neither question-parse nor question-answer pairs can be collected?
Discussing good, bad and ugly practices of reinforcement learning in neural machine translation.
This post explains the need for the score function gradient estimator trick and how it works.