Cartinoe
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Visual-and-Language Tasks