Ruprecht-Karls-Universität Heidelberg
Institut für Computerlinguistik

Bilder vom Neuenheimer Feld, Heidelberg und der Universität Heidelberg

NLP in Industry: challenges and best practices

Module Description

Course Module Abbreviation Credit Points
BA-2010[100%|75%] CS-CL 6 LP
BA-2010[50%] BS-CL 6 LP
BA-2010 AS-CL 8 LP
Master SS-CL, SS-TAC 8 LP
Lecturer Daniel Dahlmeier
Module Type Proseminar / Hauptseminar
Language English
First Session 13.11.2020
Time and Place Friday, 11:15-12:45, online auf heiCONF
Commitment Period tbd.

Prerequisite for Participation

  • Mathematical Foundations of CL (or a comparable introductory class to linear algebra and theory of probability)
  • Programming I (Python)
  • Statistical Methods for CL (or a comparable introductory class to machine learning)
  • Assessment

    Content

    This seminar focuses on common challenges and best practices for natural language processing (NLP) in industry. NLP in industry comes with its own challenges, like data sparsity, data privacy regulations, and cost-benefit trade-offs. In this seminar, we discuss these challenges and technical approaches to overcome them. The seminar includes a practical project where participants propose a project, implement experiments and present the results to the class.

    Module Overview

    Agenda

    Date Session Materials
    13.11.2020
    11:15–12:45
    In this session, we will discuss general differences between academic research and industry, the current state of AI and NLP, and introduce the course project.
    1. Seminar logistics
    2. AI hype
    3. Research and innovation
    4. NLP in Industry
    5. Course Project
    Lecture slides will be made available on heiBOX before the seminar.
    20.11.2020
    11:15–12:45
    In this session, we will discuss the role of data and data protection regulation and examples of industry use cases.
    1. Data
    2. Industry Use Cases
    Lecture slides will be made available on heiBOX before the seminar.
    27.11.2020
    11:15–12:45
    In this session, we will talk in more detail about text classification and entity recognition and about advanced NLP topics for common requirements in industry applications.
    1. Text classification
    2. Entity Recognition
    3. Advanced NLP Topics
    Lecture slides will be made available on heiBOX before the seminar.
    04.12.2020
    11:15–12:45
    In this session, we will talk about the business value of NLP, common non-functional requirements, and give an overview about software engineering methodologies.
    1. The business case for NLP
    2. Non-functional requirements
    3. SCRUM, DevOps and all that
    Lecture slides will be made available on heiBOX before the seminar.
    22-26.02.2021
    9:00-16:00
    Block seminar for project presentations. Final schedule to be decided

    Literature

    • Daniel Jurafsky, James H. Martin. 2008. Speech and Language Processing (2nd Edition)
    • Alfred Spector, Peter Norvig, and Slav Petrov. 2012.Google's hybrid approach to research. Communications of the ACM 55(7).
    • Daniel Dahlmeier. 2017. On the Challenges of Translating NLP Research into Commercial Products. In Proceedings of ACL
    • D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, and Michael Young. 2014. Machine learning: The high interest credit card of technical debt. In Proceedings of SE4ML: Software Engineering for Machine Learning (NIPS 2014 Workshop)

    » More Materials

    zum Seitenanfang