Grants
2015-2017
Research Grant: "Weakly Supervised Learning of Cross-Lingual Systems" (co-Principal Investigator) |
Summary:
Cross-lingual rankings for information retrieval can be learned directly from
data that are weakly supervised by relevance indicators such as citations in
patents or hyperlinks in Wikipedia pages, but are not strictly parallel. We
intend to turn this idea on its head by applying the techniques that have been
successful for learning-to-rank for cross-lingual retrieval to discriminative
training of machine translation on massive non-parallel data, and in the
process, further improve methods for cross-lingual retrieval. The key
ingredients of our proposed techniques will be the combination of learning
from weakly supervised data with techniques that best deploy the weak
supervision signals by using fine-grained sparse features and attempt at
learning from positive and negative examples. We motivate our research by an
application to translation and cross-lingual retrieval in the medical domain
where massive amounts of quasi-parallel training data are available on the
Internet, in research publications, and patent data.