Key Achievement: Bidirectional reasoning supervision enables 3B parameter models to surpass label-only fine-tuned 70B models on multilingual financial sustainability classification — published at EMNLP 2025 Industry Track, Suzhou, China.

Large Language Models have demonstrated remarkable capability across NLP tasks, yet their effectiveness in the multilingual financial domain remains underexplored. This research tackles financial sustainability classification across four diverse languages — English, Hindi, Bengali, and Telugu — with Bengali and Telugu representing low-resource settings where annotated data is scarce. A novel bidirectional reasoning fine-tuning approach is introduced that integrates both positive and negative rationales alongside classification labels, consistently outperforming all baseline methods while enabling smaller models to match significantly larger ones.

The Challenge of Multilingual Financial NLP

Financial markets are inherently global, yet most financial NLP research focuses on high-resource languages like English. Stakeholders in multilingual regions such as South Asia face delays and inaccuracies when analyzing financial reports in local languages, leading to missed risks and suboptimal investment decisions. Extending sustainability classification to low-resource languages like Bengali and Telugu is critical for equitable global financial access and risk assessment.

Three Fine-Tuning Strategies Compared

Labels Only (Baseline)

Traditional fine-tuning trains LLMs solely on classification labels using cross-entropy loss, offering no explanatory reasoning for decisions.

Unidirectional Reasoning

Extends label-only training by adding a positive rationale explaining why a statement is classified as sustainable or unsustainable.

Bidirectional Reasoning (Ours)

Trains with both positive reasons (why the label applies) and negative reasons (why the opposite does not apply), creating a contrastive supervision framework.

Performance Highlights

97.17%

F1 Score — English (3B model)

Languages Evaluated

3B → 70B

Small Model Surpasses Large

EMNLP 2025

Industry Track

Benchmark Results — English Language

The bidirectional reasoning approach was evaluated across LLaMA-3.2 (3B), LLaMA-3.1 (8B), LLaMA-3.1 (70B), and the Qwen-2.5 family. Accuracy and F1 on the English financial sustainability dataset:

Model	Fine-Tuning Method	Accuracy (%)	F1 (%)
LLaMA-3.2 (3B)	Labels Only	94.71	95.16
LLaMA-3.2 (3B)	Unidirectional Reason	94.71	95.12
LLaMA-3.2 (3B)	Bidirectional Reasons (Ours)	96.92	97.17
LLaMA-3.1 (70B)	Labels Only	93.83	94.26
LLaMA-3.1 (70B)	Bidirectional Reasons (Ours)	96.48	96.80

The 3B bidirectional model (F1: 97.17%) surpasses the label-only 70B model (F1: 94.26%) — demonstrating that structured reasoning supervision can compensate for much larger model capacity.

Three-Stage Pipeline

Automated Reason Generation

GPT-4o automatically generates both positive and negative rationales for each training statement across all four languages, eliminating the need for costly human annotation.

PEFT Fine-Tuning with LoRA

Models are fine-tuned using LoRA (rank 64, alpha 16) with bidirectional supervision, minimizing cross-entropy loss jointly over classification label, positive reason R+, and negative reason R-.

Multilingual Evaluation

Models are assessed across English, Hindi, Bengali, and Telugu on financial sustainability classification, covering high- to low-resource language settings.

Generalization Beyond Finance

To validate robustness across domains, experiments were conducted on hate speech (ETHOS dataset) and ethics classification (DFAR dataset). The bidirectional reasoning approach consistently outperformed alternatives in both accuracy and F1 score, confirming its generalizability beyond the financial context.

+17.6%

Ethics Accuracy Gain vs Labels-Only (3B)

+28.1%

Ethics F1 Gain vs Labels-Only (3B)

Key Contributions

Multilingual Coverage: Advances financial sustainability classification to include low-resource languages Bengali and Telugu alongside Hindi and English.
Bidirectional Reasoning Framework: A novel contrastive fine-tuning method supervising LLMs with both positive and negative rationales, improving classification performance and decision interpretability.
Efficient Deployment: Combined with PEFT and LoRA, the approach enables small 3B models to match or outperform 70B models fine-tuned with conventional label-only methods.

Bidirectional Reasoning