The Multimodal Learning project develops machine intelligence technologies to analyze data of multiple modalities. Enterprise data usually comes from different modalities where information is more connected than separated. A joint analysis can help associate missing information of one modality to another where a modality could be a media format, a style or a device type. Examples include adding text explanations to images and matching relevant content represented in different styles or reproduced by different devices. It can also help confirm important facts for applications that require precise predictions such as medical diagnosis or copyright verification. The vision of our Multimodal Learning project is to bridge the gap between multimodal data that people use to communicate in everyday life.