Muludi, Kurnia and widyantoro, Dwi H and Kuspriyanto, Kuspriyanto and Santoto, Oerip (2011) Multi-Inductive Learning Approach for Information Extraction. Proceeding 2011 International Conference on Electrical Engineering and Informatics. G1.
|
Text
G1-1.pdf Download (678kB) | Preview |
Abstract
The vast amount of information in the Internet is not easy to find and use. Information Extraction technology is one of alternatives that can solve this problem. Conventional Natural Language Processing approach is hampered by its portability, scalability and adaptability. Introduction of Machine Learning into Information extraction is one of solutions. Inductive Learning only needs annotated training examples. The problem is there is no performance consistency of algorithms on various information domains. Automatic and smart classifier selection from various machine learning algorithms is one of the best way to handle this problem. The goal of this paper is to propose a method for Information Extraction System based on Inductive Learning and Meta Learning that have good performance. In this paper Multi-Inductive Learning is developed to answer that question. Multi-Inductive Learning is consist of several Inductive Learning algorithms that have significant difference in their mechanism. This is to ensure there is bias variance in this method. Through k-fold cross validation on training document, Multi-Inductive Learning algorithm can choose the best classifier for each slot on a certain domain. These best classifiers then employ to do full extraction on testing document. The conducted experiment shows that Multi-Inductive Learning has better performance than that of single Inductive Learning algorithm based Information Extraction systems. On Reuters Corporate Acquisition, Multi-Inductive Learning gives a score of 46.3 % and has the best performance among other state of the art information systems. Out of nine slots that should be extracted, six of them give the best performance. Multi-Inductive Learning also gives better performance on Job Posting dataset. Average performance of it gives 82.1 % and is the best among other state of the art of Information Extraction. Out of 17 slots that should be tested, nine of them are extracted with the best performance.
Item Type: | Article |
---|---|
Subjects: | Q Science > QA Mathematics > QA76 Computer software |
Divisions: | Fakultas Matematika dan Ilmu Pengetahuan Alam (FMIPA) > Prodi Ilmu Komputer |
Depositing User: | Kurnia Muludi |
Date Deposited: | 25 Apr 2018 08:25 |
Last Modified: | 25 Apr 2018 08:25 |
URI: | http://repository.lppm.unila.ac.id/id/eprint/6590 |
Actions (login required)
View Item |