Dueling Bandits and Other Partial-Feedback Learning Problems In supervised learning, we usually assume that for each instance we have access to a correct prediction to compare against. This "full-feedback" setting allows us to evaluate any prediction/action we may want to consider. The feedback is however more restricted in several real word applications. In on-line advertising for instance, we are only able to evaluate the ad we actually displayed on a web page. This constraint leads to a trade-off between "exploration" (a new ad needs to be displayed in order for us to learn its click-through rate) and "exploitation" (displaying the ad with the current best estimate seems better in the short term). This is what we call a "bandit feedback" by analogy with a gambler facing several unknown slot machines and wagering on most rewarding ones. Another interesting example of partial feedback is the ranked prediction where only a short list of items is proposed for evaluation : the only feedback we have is preference between the proposed items. Shall we only propose items that we already consider relevant or shall we also explore apparently irrelevant ones? Dueling Bandits and Cascading Bandits algorithms were recently proposed to deal with this problem. I will first survey the different aspects of on-line learning with partial feedback before focusing on ranked prediction and Dueling Bandits.