Return to Article Details A Compositional Framework for Grounding Language Inference, Generation, and Acquisition in Video Download Download PDF