Common reinforcement learning algorithms assume access to a numeric feedback signal. However, the definition of a numeric feedback signal can be difficult in practice due to several limitations and badly defined values may lead to an unintended outcome. For humans, it is usually easier to define qualitative feedback signals, like preferences, than quantitative. This talk will present a state-of-the-art approach to solving reinforcement learning with preference-based feedback that minimizes the number of required preferences. We show how to solve the temporal credit assignment problem in a preference setting, how to efficiently learn a utility function and how to derive a maximizing policy. We also consider the active learning problem of posing new preference queries to an expert.