Cartinoe
VL-BERT: Pre-training of Generic Visual-Linguistic Representations ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ