An application of Malay short-form word conversion using Levenshtein distance / Azilawati Azizan, NurAine Saidin, Nurkhairizan Khairudin & Rohana Ismail

Formerly, short-form word was widely used in the field of journalism. However, nowadays, short-form word has been widely used by many people, especially in online communication. These short-form words trigger problems in the field of data mining, especially those involving online text processing. It...

Full description

Saved in:
Bibliographic Details
Main Authors: Azilawati Azizan, Azilawati Azizan (Author), NurAine Saidin, NurAine Saidin (Author), Nurkhairizan Khairudin, Nurkhairizan Khairudin (Author), Rohana Ismail, Rohana Ismail (Author)
Format: Book
Published: UiTM Press, 2020-11.
Subjects:
Online Access:Link Metadata
Tags: Add Tag
No Tags, Be the first to tag this record!

MARC

LEADER 00000 am a22000003u 4500
001 repouitm_38191
042 |a dc 
100 1 0 |a Azilawati Azizan, Azilawati Azizan  |e author 
700 1 0 |a NurAine Saidin, NurAine Saidin  |e author 
700 1 0 |a Nurkhairizan Khairudin, Nurkhairizan Khairudin  |e author 
700 1 0 |a Rohana Ismail, Rohana Ismail  |e author 
245 0 0 |a An application of Malay short-form word conversion using Levenshtein distance / Azilawati Azizan, NurAine Saidin, Nurkhairizan Khairudin & Rohana Ismail 
260 |b UiTM Press,   |c 2020-11. 
500 |a https://ir.uitm.edu.my/id/eprint/38191/2/38191.pdf 
520 |a Formerly, short-form word was widely used in the field of journalism. However, nowadays, short-form word has been widely used by many people, especially in online communication. These short-form words trigger problems in the field of data mining, especially those involving online text processing. It leads to inaccurate result of text mining activities. On the other hand, only few works have investigated on Malay short-form word identification and conversion. Therefore, this work aims to develop an application that can identify and convert Malay short-form words into its' full word. In order to develop this application, the short-form rules need to be carefully examined. The formal rules from Dewan Bahasa & Pustaka (DBP) are used as the primary reference for generating the short form word identification algorithm. While for the conversion algorithm, Levenshtein Distance (LD) is used to measure the similarity. The rule-based technique is also used as a complement to LD technique. As a result, 70.27% of the Malay short-form words have been correctly converted into their full words. The conversion rate is quite promising, and this work can be further strengthened by incorporating more rules into the algorithm. 
546 |a en 
690 |a Malaysia 
690 |a Malaysia 
690 |a Algorithms 
655 7 |a Article  |2 local 
655 7 |a PeerReviewed  |2 local 
787 0 |n https://ir.uitm.edu.my/id/eprint/38191/ 
787 0 |n https://mijuitm.com.my/view-articles/ 
856 4 1 |u https://ir.uitm.edu.my/id/eprint/38191/  |z Link Metadata