Tonmoy Talukder

Tonmoy Talukder

Backend Engineer
NLP Researcher
Dhaka, Bangladesh

I'm a backend-focused full-stack engineer and NLP researcher building scalable infrastructure, distributed systems, and low-resource AI solutions. My engineering work emphasizes correctness, reliability, observability, and scale — from job queues and database design to APIs powering production workloads, including a project serving 30K–50K requests per day.

Alongside engineering, I research low-resource language NLP and Computer Vision, with publications at LREC 2026 and BIM 2023. My long-term goal is to build reliable, scalable infrastructure that bridges advanced AI research with real-world products.

About meView ResumeOpen to Backend / Infra roles · 2026
30K–50K
daily API requests
2
research publications
500+
LeetCode solved
1y 3m
production systems

Flagship Project

Full Case Study
Distributed System

KaryaFlow — Distributed Job Queue

Case Study →

Designed a multi-tenant async job processing system with exactly-once semantics, priority queues, and horizontal scaling. Handles 40K+ daily jobs with P99 latency under 200ms.

40K+
Daily Jobs
<200ms
P99 Latency
99.99%
Exactly-Once
GoRedisKubernetesPostgreSQLOpenTelemetry

Skills & Tech Stack

Core Engineering
GoTypeScriptPythonSQL
Backend Systems & APIs
MicroservicesRESTgRPCKafkaRabbitMQRedisChiEnt ORM
Data Stores & Analytics
PostgreSQLMongoDBClickHouseMinIO
Cloud, DevOps & Observability
GCPDockerKubernetesCI/CDPrometheusGrafanaOpenTelemetryGit
ML & AI
PyTorchHugging FaceOpenCVScikit-learnCLIP
Frontend Engineering
ReactNext.jsTailwind CSSFrontend Architecture

Tonmoy Talukder, G M Shahariar

Abstract

This paper introduces Bangla Key2Text, a large-scale dataset of 2.6 million Bangla keyword–text pairs designed for keyword-driven text generation in a low-resource language. The dataset is constructed using a BERT-based keyword extraction pipeline applied to millions of Bangla news texts, transforming raw articles into structured keyword–text pairs suitable for supervised learning. To establish baseline performance on this new benchmark, we fine-tune two sequence-to-sequence models, mT5 and BanglaT5, and evaluate them using multiple automatic metrics and human judgments. Experimental results show that task-specific fine-tuning substantially improves keyword-conditioned text generation in Bangla compared to zero-shot large language models. The dataset, trained models, and code are publicly released to support future research in Bangla natural language generation and keyword-to-text generation tasks.

PDFCode & DatasetPresentation
Citation bib
@article{talukder2026bangla,
  title={Bangla Key2Text: Text Generation from Keywords for a Low Resource Language},
  author={Talukder, Tonmoy and Shahariar, GM},
  journal={arXiv preprint arXiv:2604.19508},
  year={2026}
}
Transfer learning for under-resourced language processingResources for low-resource languageLow-resource methods for NLP

G M Shahariar *, Tonmoy Talukder *, Rafin Alam Khan Sotez, Md Tanvir Rouf Shawon

* denotes equal contribution; names are listed in alphabetical order.

Abstract

With the increasing need for text summarization techniques that are both efficient and accurate, it becomes crucial to explore avenues that enhance the quality and precision of pre-trained models specifically tailored for summarizing Bengali texts. When it comes to text summarization tasks, there are numerous pre-trained transformer models at one's disposal. Consequently, it becomes quite a challenge to discern the most informative and relevant summary for a given text among the various options generated by these pre-trained summarization models. This paper aims to identify the most accurate and informative summary for a given text by utilizing a simple but effective ranking-based approach that compares the output of four different pre-trained Bengali text summarization models. The process begins by carrying out preprocessing of the input text that involves eliminating unnecessary elements such as special characters and punctuation marks. Next, we utilize four pre-trained summarization models to generate summaries, followed by applying a text ranking algorithm to identify the most suitable summary. Ultimately, the summary with the highest ranking score is chosen as the final one. To evaluate the effectiveness of this approach, the generated summaries are compared against human-annotated summaries using standard NLG metrics such as BLEU, ROUGE, BERTScore, WIL, WER, and METEOR. Experimental results suggest that by leveraging the strengths of each pre-trained transformer model and combining them using a ranking-based approach, our methodology significantly improves the accuracy and effectiveness of the Bengali text summarization.

PDFCode & DatasetPresentation
Citation bib
@article{shahariar2023rank,
  title={Rank Your Summaries: Enhancing Bengali Text Summarization via Ranking-based Approach},
  author={Shahariar, GM and Talukder, Tonmoy and Sotez, Rafin Alam Khan and Shawon, Md Tanvir Rouf},
  journal={arXiv preprint arXiv:2307.07392},
  year={2023}
}
BengaliText SummarizationSummaryTextRankTransformersRankingBERTmT5

Recent News

All news
May 2026
Conference

Presenting at LREC 2026: Bangla Key2Text — keyword-driven text generation for a low-resource language.

Feb 2026
Publication

Paper accepted at LREC 2026: Bangla Key2Text — text generation from keywords for a low-resource language.

Jan 2026
Leadership

Appointed as project lead for BSFIC Store, a government-sector multi-mill inventory management platform for sugar mills.

Dec 2025
Role

Promoted to Software Engineer at WorldTech. Developing backend infrastructure for core API platform.

Latest Writing

All posts
FeaturedApr 2026 · 10 min read

Rate Limiting in Distributed Systems: Token Bucket vs Sliding Window

A deep dive into the two most common rate limiting algorithms — when to use each, how they behave under burst traffic, and how to implement them with Redis.

GoRedisDistributed Systems
Available for Backend / Infra roles · 2026

Let's build something great together.

Whether you're looking for a backend engineer who thinks in systems, or a researcher who builds production-grade tools — I'd love to connect.