Current (Practical) State of Language Technologies for “Low-Resourced” Languages.

Date:

Gave a talk at the CARE Speaker Series in NYU Abu Dhabi. Hosted by Prof. Tuka Alhanai

Description: Many of the world’s languages–and by extension, their communities–remain understudied and under-served. A common reason for this gap is the lack of data available for low-resourced languages. In this talk, I will cover (1) what direct harm results from the the lack of research and tools for these languages, (2) what challenges people face when trying to create data in these languages, and (3) how we can investigate and improve Language technologies–particularly Machine Translation–for these languages from a Human-Computer Interaction Perspective. For the first part of the talk, we will cover the experiences of users on YouTube with policy-violating content. Then, we will look at the challenges that Wikipedia contributors face when trying to create content in these languages, including how they use Machine Translation tools and where these tools fail in this context. Finally, we will look at a few avenues of research in Machine Translation, including investigating gender bias and improving Machine translation for health care.