浏览全部资源
扫码关注微信
1. 江苏开放大学信息与机电工程学院,江苏 南京 210017
2. 南京航空航天大学计算机科学与技术学院,江苏 南京 210016
[ "许小媛(1980−),女,江苏开放大学信息与机电工程学院副教授,主要研究方向为计算机软件应用、模式识别和算法研究。" ]
[ "黄黎(1982−),女,南京航空航天大学计算机科学与技术学院博士生,江苏开放大学信息与机电工程学院讲师,主要研究方向为数据挖掘。" ]
网络出版日期:2017-06,
纸质出版日期:2017-06-20
移动端阅览
许小媛, 黄黎. 基于实例学习和协同子集搜索的特征选择方法[J]. 电信科学, 2017,33(6):105-113.
Xiaoyuan XU, Li HUANG. A feature selection method based on instance learning and cooperative subset search[J]. Telecommunications science, 2017, 33(6): 105-113.
许小媛, 黄黎. 基于实例学习和协同子集搜索的特征选择方法[J]. 电信科学, 2017,33(6):105-113. DOI: 10.11959/j.issn.1000-0801.2017122.
Xiaoyuan XU, Li HUANG. A feature selection method based on instance learning and cooperative subset search[J]. Telecommunications science, 2017, 33(6): 105-113. DOI: 10.11959/j.issn.1000-0801.2017122.
特征子集搜索是数据挖掘分类任务中一个关键性的难题,常用的过滤器方法忽略了基因之间的相关性,此外,现有的解决方法并不是专门针对处理小样本数据,因此在特征选择方面表现出了不稳定性。为了解决上述问题,在实例学习的基础上提出了一种新型的混合封装过滤算法,并且提出了一种具有封装器评价体系的分类器算法——协同性子集搜索(CSS)。选取几个高维小样本的癌症数据集作为数据来源,对提出的评价体系进行了实验测试,结果表明,该方法在准确性及稳定性方面较其他方法表现更好。
Feature subset selection is a key problem in such data mining classification tasks.In practice
the filter methods ignore the correlations between genes which are prevalent in gene expression data
additionally
existing methods are not specially conceived to handle the small sample size of the data which is one of the main causes of feature selection instability.In order to deal with these issues
a new hybrid
filter wrapper was proposed
and a cooperative subset search(CSS)
was then researched with a classifier algorithm to represent an evaluation system of wrappers.The method was experimentally tested and compared with state-of-the-art algorithms based on several high-dimension allow sample size cancer data sets.Results show that the proposed approach outperforms other methods in terms of accuracy and stability of the selected subset.
刘德山 , 范雅惠 , 闫德勤 , 等 . 一种新的去无关基因肿瘤样本分类方法 [J ] . 辽宁师范大学学报(自然科学版) , 2015 , 38 ( 1 ): 41 - 46 .
LIU D S , FAN Y H , YAN D Q , et al . A novel removing irrelevant gene classification algorithm for tumor samples [J ] . Journal of Liaoning Normal University(Natural Science Edition) , 2015 , 38 ( 1 ): 41 - 46 .
KOHANE I S , KHO A T , BUTTE A J . Micro arrays for an integrative genomics [M ] . Cambridge : MIT PressPress , 2003 .
TOLOSI L , LENGAUER T . Classification with correlated features:unreliability of feature ranking and solutions [J ] . Bioinformatics , 2011 , 27 ( 14 ): 1986 - 1994 .
陈骥思 , 余艳梅 , 殷宇 , 等 . 自适应快速FCM彩色图像分割研究 [J ] . 计算机工程与应用 , 2010 , 46 ( 7 ): 178 - 180 .
CHEN J S , YU Y M , YIN Y , et al . Research on adaptive fast FCM color image segmentation [J ] . Computer Engineering and Application , 2010 , 46 ( 7 ): 178 - 180 .
PENG H , LONG F , DING C . Feature selection based on mutual information:criteria of max-dependency,max-relevance,and min-redundancy [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2005 ( 27 ): 1226 - 1238 .
KOHAVI R , . A study of cross-validation and boots trap for accuracy estimation and model selection [C ] // The 14th International Joint Conference on Artificial Intelligence,August 20-25,1995,Montréal,Canada . Burlington:Morgan Kaufmann Publishers , 1995 : 1137 - 1143 .
HUANG D , LAI J H , WANG C D . Combining multiple clustering via crowd agreement estimation and multi-granularity link analysis [J ] . Neuro Computing , 2015 ( 170 ): 240 - 250 .
ALOM T N , SUDIPTA R . Identification of WBC based on dynamic clustering using modified FCM algorithm with an approach to optimal result [C ] // ACM 6th International Conference on Computer and Communication Technology(ICCCT 2015),September 25-27,2015,Allahabad,India . New York:ACM Press , 2015 : 461 - 464 .
潘晓花 , 孙文杰 , 韦志辉 , 等 . 脑MR图像互信息最大的凸优化分割模型 [J ] . 计算机辅助设计与图形学学报 , 2012 , 24 ( 8 ): 1082 - 1089 .
PAN X H , SUN W J , WEI Z H , et al . The convex optimization segmentation model of MR brain image with maximum mutual information [J ] . Journal of Computer Aided Design , 2012 , 24 ( 8 ): 1082 - 1089 .
SHIPP M A , ROSS K N , TAMAYO P , et al . Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning [J ] . Nat.Med , 2002 ( 9 ): 68 - 74 .
DYRSKJOT L , THYKJAER T , KRUHOFFER M , et al . Identifying distinct classes of bladder carcinoma using micro arrays [J ] . Nature Genetics , 2003 ( 33 ): 90 - 96 .
TROYANSKAYA O , CANTOR M , SHERLOCK G , et al . Missing value estimation methods for dna microarrays [J ] . Bioinformatics , 2001 , 17 ( 6 ): 520 - 525 .
SINGH D,G , FEBBO P , ROSS K , et al . Gene expression correlates of clinical prostate cancer behavior [J ] . Cancer Cell , 2002 , 1 ( 2 ): 203 - 209 .
VAN’ T , VEER L J , DAI H , VANDEVIJVER M J , et al . Gene expression profiling predicts clinical outcome of breast can cer [J ] . Nature , 2002 , 415 ( 6871 ): 530 - 536 .
POMEROY S L , TAMAYO P , GAASENBEEK M , et al . Prediction of central nervous system embryonal tumour outcome based on gene expression [J ] . Nature , 2002 , 415 ( 6870 ): 436 - 442 .
GORDON G , JENSEN R , HSIAO L , et al . Translation of micro array data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma [J ] . Cancer Res , 2002 ( 62 ): 4963 - 4967 .
GUYON I , BENHUR A , GUNN S , et al . Result analysis of the nips 2003 feature selection challenge [C ] // International Conference on Neural Information Processing Systems,November 22-25,2004,Vancouver,British Columbia,Canada . New York:ACM Press , 2004 : 545 - 552 .
李鸿健 , 代宇 , 刘锐 , 等 . 云数据中心高能效的虚拟机迁移整合算法研究 [J ] . 电信科学 , 2015 , 31 ( 1 ): 71 - 77 .
LI H J , DAI Y , LIU R , et al . Energy-efficient virtual machine migration and consolidation algorithm in cloud data center [J ] . Telecommunications Science , 2015 , 31 ( 1 ): 71 - 77 .
张建敏 , 谢伟良 , 杨峰义 , 等 . 移动边缘计算技术及其本地分流方案 [J ] . 电信科学 , 2016 , 32 ( 7 ): 132 - 139 .
ZHANG J M , XIE W L , YANG F Y , et al . Mobile edge computing and application in traffic offloading [J ] . Telecommunications Science , 2016 , 32 ( 7 ): 132 - 139 .
李丹 , 刘方明 , 郭得科 , 等 . 软件定义的云数据中心网络基础理论与关键技术 [J ] . 电信科学 , 2014 , 30 ( 6 ): 48 - 59 .
LI D , LIU F M , GUO D K , et al . Fundamental theory and key technology of software defined cloud data center network [J ] . Telecommunications Science , 2014 , 30 ( 6 ): 48 - 59 .
KALOUSIS A , PRADOS J , SANCHEZ J C , et al . Distilling classification models from cross validation runs:an application to mass spectrometry [C ] // The 2004 IEEE International Conference on Tools with Artificial Intelligence,November 15-17,2004,Boca Raton,Florida,USA . New Jersey:IEEE Press , 2004 : 113 - 119 .
0
浏览量
411
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构