An improved algorithm called CLTree-R was proposed.It could compensate the shortcoming of CLTree algorithm such as low accurate and inefficiency.Then CLTree-R was applied in clustering analysis for UCI data sets.In order to improve the efficiency,data set was parallel processed on Spark platform.Experimental results show that this algorithm can get reasonable customer classification when making cluster analysis on official data set.
关键词
Keywords
references
韩家炜 . 数据挖掘:概念与技术 [M ] . 北京 : 机械工业出版社 , 2012 .
HAN J W . Data mining:concepts and techniques [M ] . Beijing : China Machine PressPress , 2012 .
DUNHAM M H . Datamining introductory and advanced topics [M ] . New York : ACM PressPress , 2002 : 23 - 60 .
QUINLAN J R . Machine learning [M ] . Berlin : SpringerPress , 1986 : 81 - 106 .
QUINLAN J R.C4 . 5:program for machine learning [M ] . New York : ACM PressPress , 1993 .