Hyperparamter Grid Search

author:

date: 30. April 2021

download: https://github.com/StatNLP/empirical_methods


read data

analyze weighting scheme as fixed parameter

We see that still most of the variation is due to sentence differences (about 77%) and only a small amount is due to different learning rates (about 1%).

The partial grid considered by Kreutzer et al.

minimize data for model

show partial grid

variance decomposition partial grid

ICC for partial grid

load full grid data

show full grid

variance decomposition full grid

ICC for full grid

An extended grid where the grid in 2 is extended with new hyperparamter values

load extended grid data

show extended grid

variance decomposition extended grid

ICC for extended grid

Quick Check

Partial grid

Full grid