Skip to main content

Learning to Read, Ground, and Reason in Multimodal Text

Web data, news and textbooks offer informative but unstructured multimodal text. The ability to translate multimodal text into a semantic representation that is amenable to further reasoning is a fundamental problem in modern AI. In this project we design systems that can understand and use multimodal text through multiple interconnected components: semantic interpretation, multimodal alignment, knowledge acquisition and reasoning.