MMCOMET: A Large-Scale Multimodal Commonsense Knowledge G...

MMCOMET: A Large-Scale Multimodal Commonsense Knowledge Graph for Contextual Reasoning

arXiv:2603.01055v1 Announce Type: new Abstract: We present MMCOMET, the first multimodal commonsense knowledge graph (MMKG) that integrates physical, social, and eventive knowledge. MMCOMET extends the ATOMIC2020 knowledge graph to include a visual dimension, through an efficient image retrieval process, resulting in over 900K multimodal triples. This new resource addresses a major limitation of existing MMKGs in supporting complex reasoning tasks like image captioning and storytelling. Through a standard visual storytelling experiment, we show that our holistic approach enables the generation of richer, coherent, and contextually grounded stories than those produced using text-only knowledge. This resource establishes a new foundation for multimodal commonsense reasoning and narrative generation.

相关推荐

MMCOMET: A Large-Scale Multimodal Commonsense Knowledge Graph for Contextual Reasoning

CollabEval: Enhancing LLM-as-a-Judge via Multi-Agent Collaboration

CollabEval: Enhancing LLM-as-a-Judge via Multi-Agent Collaboration

HiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM Agents

Alien Science: Sampling Coherent but Cognitively Unavailable Research Directions from Idea Atoms

Tracking Capabilities for Safer Agents