A Comprehensive Survey on Abstractive Text Summarization with a Focus on Low Resource Languages and Challenges in Kannada Language

Puneeth R.; J. Somasekar

doi:10.61453/joit.v2026_0207

Authors

Puneeth R. Jain (Deemed to be University), Bangalore, Karnataka, India
J. Somasekar Jain (Deemed to be University), Bangalore, Karnataka, India

DOI:

https://doi.org/10.61453/joit.v2026_0207

Abstract

The exponential growth of digital textual data has made automated text summarization a critical technology. Abstractive summarization, which generates novel human-like summaries using transformer-based models such as BERT, BART, and T5, has achieved strong results on high-resource languages, yet low-resource languages like Kannada remain severely underexplored. This survey reviews abstractive summarization methodologies from early neural architectures to advanced transformer frameworks, with special attention to knowledge-augmented models and multilingual pretraining. We critically examine challenges in Indic language summarization — including data scarcity, factual inconsistency, and evaluation limitations — and identify concrete research directions for Kannada: benchmark dataset creation, culturally informed modeling, cross-lingual transfer learning, and knowledge-aware summarization pipelines.

References

A. Joshi et al., “The State and Fate of Linguistic Diversity and Inclusion in the NLP World,” in Proc. ACL, 2020, https://doi.org/10.18653/v1/2020.acl-main.560

A. Kunchukuttan et al., “IndicBART: A Pre-trained Model for Natural Language Generation of Indic Languages,” https://doi.org/10.48550/arXiv.2109.02903

A. Nenkova and K. McKeown, “Automatic Summarization,” Foundations and Trends in Information Retrieval, vol. 5, nos. 2–3, pp. 103–233, 2011, https://doi.org/10.1561/1500000015

A. Ramesh and K. Shashirekha, “Kannada Text Summarization: A Study on Low-Resource Indic Language,” International Journal of Computer Applications, vol. 183, no. 19, pp. 20–26, 2021.

A. Rizzello, “An Investigation on the Extractive Summarization of Kannada Text,” in Computational Intelligence and Applications. Singapore: Springer, 2023. DOI: https://doi.org/10.1007/978-981-99-1410-4_26

C. Raffel et al., “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer,” Journal of Machine Learning Research, vol. 21, no. 140, pp. 1–67, 2020. https://doi.org/10.48550/arXiv.1910.10683

C.-Y. Lin, “ROUGE: A Package for Automatic Evaluation of Summaries,” in Proc. ACL Workshop on Text Summarization Branches Out, 2004. https://aclanthology.org/W04-1013/

D. Kakwani et al., “IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages,” in Findings of EMNLP, 2020, https://doi.org/10.18653/v1/2020.findings-emnlp.445

D. S. Pankaj, “Challenges in Creating Text Summarization Models in Malayalam: A Study,” in Proc. International Conference on Innovative Computing and Cloud Computing, 2023. https://doi.org/10.1109/ICCC57789.2023.10165363

E. J. Hu et al., “LoRA: Low-Rank Adaptation of Large Language Models,” in Proc. ICLR, 2022, https://doi.org/10.48550/arXiv.2106.09685

G. Ramesh et al., “Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages,” https://doi.org/10.48550/arXiv.2104.05596

H. P. Luhn, “The Automatic Creation of Literature Abstracts,” IBM Journal of Research and Development, vol. 2, no. 2, pp. 159–165, 1958, https://doi.org/10.1147/RD.22.0159

I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to Sequence Learning with Neural Networks,” in Advances in Neural Information Processing Systems, vol. 27, 2014. https://doi.org/10.48550/arXiv.1409.3215

J. Lin et al., “How to Train Your Dragon: Data Augmentation and Optimization for Low-Resource Summarization,” in Proc. COLING, 2022. https://doi.org/10.48550/arXiv.2302.07452

Kannada BERT, Hugging Face Model Hub, 2021.

L. Xue et al., “mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer,” in Proc. NAACL, 2021, https://doi.org/10.18653/v1/2021.naacl-main.41

M. Fabbri et al., “SummEval: Re-evaluating Summarization Evaluation,” Transactions of the Association for Computational Linguistics, vol. 9, pp. 391–409, 2021, https://doi.org/10.1162/tacl_a_00373

M. Lewis et al., “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension,” in Proc. ACL, 2020, https://doi.org/10.48550/arXiv.1910.13461

P. Dhakal and D. S. Baral, “Abstractive Summarization of Low-resourced Nepali Language using Multilingual Transformers,” https://doi.org/10.48550/arXiv.2409.19566

P. K. S. Shivaraddi et al., “Kavi-Kannada Natural Language Processing System,” International Journal of Advanced Research in Science, Communication and Technology, vol. 2, no. 1, 2022, https://doi.org/10.48175/IJARSCT-5837

P. Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” in Advances in Neural Information Processing Systems, vol. 33, 2020, https://doi.org/10.48550/arXiv.2005.11401

R. Kumar et al., “How Robust are Pre-trained Models to Domain Shift for Low-Resource Tasks? A Case Study in Summarization,” in Proc. Workshop on TextGraphs at ACL, 2023.

S. Sotudeh et al., “Curriculum-Guided Abstractive Summarization,” https://doi.org/10.48550/arXiv.2302.01342

T. Hasan et al., “XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages,” in Findings of ACL-IJCNLP, 2021, https://doi.org/10.18653/v1/2021.findings-acl.413

T. Liu et al., “Multilingual Denoising Pre-training for Neural Machine Translation,” Transactions of the Association for Computational Linguistics, vol. 8, pp. 726–742, 2020, https://doi.org/10.1162/tacl_a_00343

W. Kryściński et al., “Evaluating the Factual Consistency of Abstractive Text Summarization,” in Proc. EMNLP, 2020, https://doi.org/10.18653/v1/2020.emnlp-main.750

Y. Liu et al., “Abstractive Text Summarization Using the BRIO Training Paradigm,” https://doi.org/10.48550/arXiv.2305.13696

A Comprehensive Survey on Abstractive Text Summarization with a Focus on Low Resource Languages and Challenges in Kannada Language

Authors

DOI:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License