Ningsih, Febi Siti Sutria and Khotimah, Purnomo Husnul and Arisal, Andria and Rozie, Andri Fachrur and Munandar, Devi and Riswantini, Dianadewi and Nugraheni, Ekasari and Suwarningsih, Wiwin and Kurniasari, Dian (2023) Synonym-based Text Generation in Restructuring Imbalanced Dataset for Deep Learning Models. Proceedings of the 5th International Conference on Networking, Information Systems & Security: Envisage Intelligent Systems in 5G/6G-based Interconnected Digital Worlds NISS 2022. ISSN 978-1-6654-5363-9


Download (778kB) | Preview
Official URL:


One of which machine learning data processing problems is imbalanced classes. Imbalanced classes could potentially cause bias towards the majority classes due to the nature of machine learning algorithms that presume that the object cardinality in classes is around similar number. Oversampling or generating new objects in minority class are common approaches for balancing the dataset. In text oversampling method, semantic meaning loses often occur when deep learning algorithms are used. We propose synonym-based text generation for restructuring the imbalanced COVID-19 online-news dataset. Three deep learning models (MLP, CNN, and LSTM) using TF/IDF and word embedding (WE) feature are tested with the original and balanced dataset. The results indicate that the balance condition of the dataset and the use of text representative features affect the performance of the deep learning model. Using balanced data and deep learning models with WE greatly affect the classification significantly higher performances as high as 4%, 5%, and 6% in accuracy, precision, recall, and f1-score, respectively.

Item Type: Article
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Fakultas Matematika dan Ilmu Pengetahuan Alam (FMIPA) > Prodi Matematika
Depositing User: DIAN KURNIASARI
Date Deposited: 06 Apr 2023 00:53
Last Modified: 06 Apr 2023 00:53

Actions (login required)

View Item View Item