Automated Data Enrichment using Confidence-Aware Fine-Gra...

Automated Data Enrichment using Confidence-Aware Fine-Grained Debate among Open-Source LLMs for Mental Health and Online Safety

arXiv:2512.06227v2 Announce Type: replace Abstract: Real-world indicators play an important role in many natural language processing (NLP) applications, such as life-event for mental health analysis and risky behaviour for online safety, yet labelling such information in training datasets is often costly and/or difficult due to their dynamic nature. Large language models (LLMs) show promising potential for automated annotation, yet multi-label prediction remains challenging. In this work, we propose a Confidence-Aware Fine-Grained Debate (CFD) framework that simulates collaborative annotation using fine-grained information to better support automated multi-label enrichment. We introduce two new expert-annotated resources: A mental health Reddit well-being dataset and an online safety Facebook sharenting risk dataset. Experiments show that CFD achieves the most robust enrichment performance compared to a range of baseline approaches. We further evaluate various training-free enrichment incorporation strategies and demonstrate that LLM-enriched indicators consistently improves our downstream tasks. Enriched features incorporated via debate transcripts yield the largest gains, outperforming the non-enriched baseline by 9.9\% on the online safety task.

相关推荐

From Passive to Persuasive: Steering Emotional Nuance in Human-AI Negotiation

TENG-BC: Unified Time-Evolving Natural Gradient for Neural PDE Solvers with General Boundary Conditions

MemeIntel: Explainable Detection of Propagandistic and Hateful Memes

HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images

MemeIntel: Explainable Detection of Propagandistic and Hateful Memes

From Passive to Persuasive: Steering Emotional Nuance in Human-AI Negotiation