Abstract: Humans are capable of inferring dynamic context from a still image and, with the provision of additional commonsense knowledge, can accurately complete visual commonsense reasoning tasks.