特徵與樣本選取於一對多與一對一之多分類資料處理方法之研究;Feature and Instance Selection in One versus All and One versus One Multi-class Classification methods

NCU Institutional Repository > 管理學院 > 資訊管理研究所 > 博碩士論文 > Item 987654321/95432

請使用永久網址來引用或連結此文件: http://ir.lib.ncu.edu.tw/handle/987654321/95432

題名:	特徵與樣本選取於一對多與一對一之多分類資料處理方法之研究;Feature and Instance Selection in One versus All and One versus One Multi-class Classification methods
作者:	柳成彥;YAN, LIU CHENG
貢獻者:	資訊管理學系
關鍵詞:	資料前處理;特徵選取;樣本選取;多元分類處理;多分類資料集;資料探勘;Data pre-processing;feature selection;instance selection;multi-class classification;multi-class dataset;data mining
日期:	2024-07-03
上傳時間:	2024-10-09 16:51:21 (UTC+8)
出版者:	國立中央大學
摘要:	隨著資訊技術的日漸發達，各領域的資訊量都呈現爆炸性的增長，其中過量的資料若未經過適當的前處理，會使模型建模受其中的雜訊影響，使分類器的分類效能降低。至今已有許多研究證實，針對資料進行特徵選取以及樣本選取以篩選出一份資料中重要的特徵及樣本，能夠有效的提升分類器的效能與模型的準確度，然而在過往的研究中較少討論針對多分類資料集是否有不同的資料處理方法以提升效能，目前已有針對樣本使用多元分類處理方法之研究，然而特徵還未有人研究，因此本研究將探討：先對多分類資料集進行多元分類處理方法後進行特徵選取，對於建模的影響。本研究使用多元分類處理技術中之一對多(OvA)以及一對一(OvO)進行資料切分並搭配特徵選取方法進行特徵的選擇，於特徵選取中使用三大類(過濾類、包裝類、嵌入類)方法進行選取，使用支持向量機(Support vector machine, SVM)與K鄰近值分類演算法(K-nearest neighbors classification algorithm, KNN)作為分類器，探求最好之實驗組合。並於實驗第二階段加入樣本選取方法(Instance Selection)，探討樣本選取結合多元分類處理後之特徵選取的使用先後對於分類器分類效能的影響。本研究使用UCI與Feature Selection @ ASU上共15種多分類資料集進行實驗，根據實驗結果顯示，使用XGBoost特徵選取演算法並使用多元分類處理OvO進行特徵聯集(Union)，在KNN分類器之下獲得最佳的平均結果，與未經特徵選取的Baseline相比，AUC提升了2.9%。 ;With the rapid advancement of information technology, data volume across various fields has exploded. Without proper preprocessing, this excess data can introduce noise, negatively impacting model performance and reducing classifier effectiveness. Studies have shown that feature selection and instance selection can significantly enhance classifier performance and model accuracy by filtering out important features and samples. However, there has been limited discussion on whether different data processing methods for multi-class datasets can further enhance performance. While multi-class processing methods for instances have been explored, feature-focused research is lacking. This study investigates the impact of applying multi-class classification to a multi-class dataset, followed by feature selection on model building. We utilize One-versus-All (OvA) and One-versus-One (OvO) techniques in multi-class classification for data splitting, combined with three major types of feature selection methods (filter, wrapper, and embedded). Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) classification algorithms are used to explore optimal combinations. In the second stage of study, we incorporate instance selection methods to examine the impact of the sequence of instance and feature selection combined with multi-class classification on classifier performance. This study uses 15 multi-class datasets from UCI and Feature Selection @ ASU, our results show that employing the XGBoost feature selection algorithm with OvO multi-class classification for feature union achieved the best average results under the KNN classifier. Compared to the baseline without feature selection, the AUC improved by 2.9%.
顯示於類別:	[資訊管理研究所] 博碩士論文

文件中的檔案:

檔案	描述	大小	格式	瀏覽次數
index.html		0Kb	HTML	66	檢視/開啟

在NCUIR中所有的資料項目都受到原著作權保護.

社群 sharing

資料載入中.....