Abidin, Zaenal and Junaidi, Akmal and Wamiliana, Wamiliana and Lumbanraja, Favorisen R and Kurniasari, Dian and Borman, Rohmat Indra (2025) Rule-based Dialect of Tulang Bawang Stemmer. In: 2025 International Conference on Advancement in Data Science, E-learning and Information System (ICADEIS), 03-04 February 2025, Bandung.

[img] Text
Rule-Based_Dialect_of_Tulang_Bawang_Stemmer.pdf

Download (933kB)
Official URL: https://ieeexplore.ieee.org/document/10933405

Abstract

Stemming, an essential procedure in natural language processing (NLP), diminishes words to their base forms, facilitating tasks such as information retrieval and sentiment analysis. Although stemming techniques for high resource languages are well-developed, numerous low-resource languages, including dialect of Tulang Bawang, suffer from inadequate solutions owing to a scarcity of linguistic data and resources. Existing systems, including rule-based stemmers, have demonstrated efficacy in processing low-resource languages such as Indonesian and Javanese by utilizing established morphological rules. Nonetheless, these methods encounter considerable obstacles, such as restricted adaptability, inability to accommodate unusual root structures, and excessive dependence on fixed rules that might result in over- or under-stemming. Rule-based methodologies frequently misidentify roots when faced with intricate affixes or unconventional word forms. We introduce an improved rule-based Tulang Bawang Stemmer aimed at overcoming these constraints by enhancing current linguistic rules and integrating new patterns specific to the language's morphology. Assessed on 500 test samples and 200 independent test samples, our improved stemmer attained gold standard evaluation metrics of 96.2% and 94%, respectively, surpassing prior implementations in both precision and generalization. The findings demonstrate the potential of enhanced rule-based techniques to improving NLP for low resource languages. Improved stemming performance enables better downstream applications, promotes more efficient text analysis, and advances research in underrepresented languages.

Item Type: Conference or Workshop Item (Paper)
Subjects: Q Science > QA Mathematics
Q Science > QA Mathematics > QA76 Computer software
Divisions: Fakultas Matematika dan Ilmu Pengetahuan Alam (FMIPA) > Prodi Matematika
Depositing User: DIAN KURNIASARI
Date Deposited: 17 Apr 2026 02:42
Last Modified: 17 Apr 2026 02:42
URI: http://repository.lppm.unila.ac.id/id/eprint/54835

Actions (login required)

View Item View Item