Archives

blog thumbnail

How difficult is matching images and text? – An investigation

Siting Liang

Ordinarily, we call the channels of communication and sensation modalities. We experience the world involving multiple modalities, such as vision, hearing or touch. A system which handles dataset includes multiple modalities is characterized as a multi-modality modal. For example, MSCOCO dataset contains not only the images, but also the language captions for each image. Utilizing such data, a model can learn to bridge the language and vision modalities.
blog thumbnail

(Multimodal) Commonsense Reasoning: Where are we and where could we go?

Ines Pisetta

This blog post aims to give a broad overview over the development of Commonsense Reasoning (CR) in the field of Natural Language Processing (NLP) and its multimodal intersection with Computer Vision (CV). What makes CR so important in the age of Deep Learning, how did the approaches to it and datasets for it change over time, what are the main challenges, how can other modalities than language contribute towards CR in Machine Learning and why is Commonsense Reasoning still a hot topic?
blog thumbnail

A Guide to State of the Art Object Detection for Multimodal NLP

Tai Mai

If you’re reading this, chances are you’re a computational linguist and chances are you have not had a lot of contact with computer vision. You might even think to yourself “Well yeah, why would I? It has nothing to do with language, does it?” But what if a language model could also rely on visual signals and ground language? This would definitely help in many situations: Take, for example, ambiguous formulations where textual context alone cannot decide whether “Help me into the car!” would refer to an automobile or a train car. As it turns out, people are working very hard on exactly that; combining computer vision with natural language processing (NLP). This is a case of so called Multimodal Learning.
blog thumbnail

Let’s evaluate with Macro F1: what can go wrong?

Juri Opitz, Sebastian Burst

Earlier this year we got slightly puzzled about how to best calculate the “macro F1” score to measure the performance of a classifier.
blog thumbnail

Adversarial Training

Sebastian Burst

In the past two years, machine learning, particularly neural computer vision and NLP, have seen a tremendous rise in popularity of all things adversarial. In this blog post I will give an overview of the two most popular training methods that are commonly referred to as adversarial: Injecting adversarial examples (1) and min-max optimization (2). After showcasing how they are applied in NLP I will compare them and examine ways to combine them (3).
blog thumbnail

Dimensionality Reduction

Michael Staniek

In this blogpost we want to learn how to do dimensionality reduction for datasets.
This can be used to visualise word embeddings or other data with more than 2 or 3 dimensions.
blog thumbnail

Combining Neural Networks for Review Generation

Max Bacher

In the summer term of 2018 the ICL Heidelberg offered an advanced course on Neural Networks for Natural Language Processing. During this course we presented and discussed two papers on neural language generation.
blog thumbnail

Example blog post

Nils Trost

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum.