Junaidi, Akmal Statistical Modeling of the Relation Between Characters and Diacritics in Lampung Script. 12th International Conference on Document Analysis and Recognition.


Download (1MB) | Preview


Lampung Script is a non-cursive script where a rich set of diacritics is used to modify the syllable denoted by a character symbol. Consequently, the analysis of the relation between characters and diacritic marks associated with them plays an important role in the recognition process. As diacritics can appear in three different relative positions with respect to a character (top, bottom, and right) associating them correctly with a character is a challenging problem. In this paper we propose a novel approach for modeling the relations between characters and diacritics in handwritten Lampung documents. First, a document is segmented into characters and diacritic marks. Then every character defines a normalized coordinate system into which nearby diacritics can be mapped. The relation between a diacritic mark and its associated character can then be described by a statistical model. In a writer independent experimental evaluation we investigate models with different degrees of specialization with respect to their capability of predicting the correct character-to-diacritic associations. We achieve significant error rate reductions with respect to a naive association model using a nearest-neighbor criterion.

Item Type: Article
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Fakultas Matematika dan Ilmu Pengetahuan Alam (FMIPA) > Prodi Ilmu Komputer
Depositing User: Mr. Akmal Junaidi
Date Deposited: 09 Dec 2021 11:10
Last Modified: 09 Dec 2021 11:10
URI: http://repository.lppm.unila.ac.id/id/eprint/37230

Actions (login required)

View Item View Item