Research

Focused on NLP and Computer Vision for low-resource languages — particularly Bangla, spoken by 230M+ people yet severely underserved by modern AI.

Publications

Tonmoy Talukder, G M Shahariar

PDFCode & DatasetPresentation
Citation bib
@article{talukder2026bangla,
  title={Bangla Key2Text: Text Generation from Keywords for a Low Resource Language},
  author={Talukder, Tonmoy and Shahariar, GM},
  journal={arXiv preprint arXiv:2604.19508},
  year={2026}
}

This paper introduces Bangla Key2Text, a large-scale dataset of 2.6 million Bangla keyword–text pairs designed for keyword-driven text generation in a low-resource language. The dataset is constructed using a BERT-based keyword extraction pipeline applied to millions of Bangla news texts, transforming raw articles into structured keyword–text pairs suitable for supervised learning. To establish baseline performance on this new benchmark, we fine-tune two sequence-to-sequence models, mT5 and BanglaT5, and evaluate them using multiple automatic metrics and human judgments. Experimental results show that task-specific fine-tuning substantially improves keyword-conditioned text generation in Bangla compared to zero-shot large language models. The dataset, trained models, and code are publicly released to support future research in Bangla natural language generation and keyword-to-text generation tasks.

Transfer learning for under-resourced language processingResources for low-resource languageLow-resource methods for NLP

G M Shahariar *, Tonmoy Talukder *, Rafin Alam Khan Sotez, Md Tanvir Rouf Shawon

* denotes equal contribution; names are listed in alphabetical order.

PDFCode & DatasetPresentation
Citation bib
@article{shahariar2023rank,
  title={Rank Your Summaries: Enhancing Bengali Text Summarization via Ranking-based Approach},
  author={Shahariar, GM and Talukder, Tonmoy and Sotez, Rafin Alam Khan and Shawon, Md Tanvir Rouf},
  journal={arXiv preprint arXiv:2307.07392},
  year={2023}
}

With the increasing need for text summarization techniques that are both efficient and accurate, it becomes crucial to explore avenues that enhance the quality and precision of pre-trained models specifically tailored for summarizing Bengali texts. When it comes to text summarization tasks, there are numerous pre-trained transformer models at one's disposal. Consequently, it becomes quite a challenge to discern the most informative and relevant summary for a given text among the various options generated by these pre-trained summarization models. This paper aims to identify the most accurate and informative summary for a given text by utilizing a simple but effective ranking-based approach that compares the output of four different pre-trained Bengali text summarization models. The process begins by carrying out preprocessing of the input text that involves eliminating unnecessary elements such as special characters and punctuation marks. Next, we utilize four pre-trained summarization models to generate summaries, followed by applying a text ranking algorithm to identify the most suitable summary. Ultimately, the summary with the highest ranking score is chosen as the final one. To evaluate the effectiveness of this approach, the generated summaries are compared against human-annotated summaries using standard NLG metrics such as BLEU, ROUGE, BERTScore, WIL, WER, and METEOR. Experimental results suggest that by leveraging the strengths of each pre-trained transformer model and combining them using a ranking-based approach, our methodology significantly improves the accuracy and effectiveness of the Bengali text summarization.

BengaliText SummarizationSummaryTextRankTransformersRankingBERTmT5

Research Quest

I am deeply passionate about researching the utilization of natural language processing, computer vision, and multimodal learning, all engaged with human interaction. However, my particular focus is on utilizing these technologies to enhance the understanding and use of low-resource languages.

Recently, I have been actively engaged in multiple projects centered on text summarization, text generation, text classification, question answering, and image captioning within this domain. Additionally, my curiosity extends to understanding how machine learning models interact with data representations during training and how this interaction can be used to improve their performance.

What truly captivates my curiosity is the exploration of how computers could learn from both languages and images simultaneously, mirroring the multifaceted nature of human learning — learning through listening, watching, and even feeling. This captivating exploration fuels my enthusiasm, and I am eager to witness the boundless possibilities that await.

Research Interest. My research interest lies in Natural Language Processing, Natural Language Generation, "low resource" language, Multimodal Deep Learning, Computer Vision, and Human-Computer Interaction.

Research Interests

Natural Language Processing & Generation

Text generation, summarization, classification, and question answering — with a focus on Bangla. Includes keyword-to-text generation (Key2Text) and ranking-based summarization across multiple pre-trained transformer models.

Low-Resource Language AI

Building datasets, benchmarks, and fine-tuned models for under-resourced languages like Bangla. Tackling the gap where modern AI performs unevenly due to limited high-quality training data and tooling.

Multimodal Deep Learning & Computer Vision

Exploring how models learn from language and images simultaneously — including image captioning for low-resource languages. Motivated by how human learning spans listening, watching, and feeling.

Human-Computer Interaction

Interested in how NLP and CV systems are experienced by real users — designing AI that is not only accurate in research settings but also usable and accessible in real-world products.

Future Directions

Bangla Text Generation & Summarization

Expanding keyword-to-text generation beyond news into scientific and conversational domains; applying instruction-tuning and better ranking strategies to push Bangla NLG quality further.

Low-Resource Multimodal Learning

Building image captioning and cross-modal representation systems for Bangla — studying how models can jointly learn from text and vision when labeled multimodal data is scarce.

Reproducible Low-Resource NLP

Developing infrastructure for reproducible research: versioned Bangla dataset releases, deterministic preprocessing pipelines, and standardized evaluation across BLEU, ROUGE, BERTScore, and human judgment.

Human-Centered AI for Low-Resource Languages

Bridging NLP and Computer Vision research with real-world usability — building AI systems accessible to Bangla-speaking communities, with attention to how users interact with and benefit from these systems.