浏览全部资源
扫码关注微信
[ "王铮(1973-),男,中国电信股份有限公司上海研究院工程师,主要研究方向为大数据平台、应用及业务网络。" ]
[ "任华(1977-),女,中国电信股份有限公司上海研究院工程师,主要研究方向为大数据平台和业务平台。" ]
[ "方燕萍(1981-),女,中国电信股份有限公司上海研究院工程师,主要研究方向为大数据和移动互联网领域。" ]
网络出版日期:2016-12,
纸质出版日期:2016-12-20
移动端阅览
王铮, 任华, 方燕萍. 随机森林在运营商大数据补全中的应用[J]. 电信科学, 2016,32(12):7-12.
Zheng WANG, Hua REN, Yanping FANG. Application of random forest in big data completion[J]. Telecommunications science, 2016, 32(12): 7-12.
王铮, 任华, 方燕萍. 随机森林在运营商大数据补全中的应用[J]. 电信科学, 2016,32(12):7-12. DOI: 10.11959/j.issn.1000-0801.2016317.
Zheng WANG, Hua REN, Yanping FANG. Application of random forest in big data completion[J]. Telecommunications science, 2016, 32(12): 7-12. DOI: 10.11959/j.issn.1000-0801.2016317.
电信运营商有大量数据,但是鉴于多种原因,数据的质量不够理想,出现大量数据不完整甚至缺失。对于已有数据的挖掘,必须在数据满足质量要求且达到足够采样比例的前提下开展。依托现有的全国日志留存系统,设计完整数据的模板样库,鉴别不能满足质量要求的数据,使用随机森林算法,找到最符合的相同或相关数据,补全数据并提升数据质量;用回溯反馈的方法优化并扩充模板样库。在全国日志留存系统中构建数据补全子系统,实现端到端的数据质量保障和提升,补全并改善历史数据甚至实时数据的质量,最终满足数据处理和挖掘的要求,提升运营商数据质量和价值。
Telecom operators have a lot of data
but in view of a variety of reasons
the quality of the data is not ideal
there are a lot of data is not complete or even missing. For existing data mining
it is necessary to carry out the data to meet the quality of the data and to achieve sufficient sampling proportion. Relying on the country's existing log retention system
template library design data integrity
authentication could not meet the quality requirements of the data
using the random forest algorithm
the same data with or related data was found
data was completed and data quality was improved
and the template library was extended by optimization of feedback. The construction of completion data subsystem in the system log retained end-to-end data quality guaranteed and improved quality
completed and improved the real-time data and historical data
and ultimately met the requirements of data processing and mining operators
improved data quality and value.
BREIMAN L . Random forests [J ] . Machine Learning , 2001 , 45 ( 1 ): 5 - 32 .
李慧 . 一种改进的随机森林并行分类方法在运营商大数据的应用 [D ] . 成都 : 电子科技大学 , 2015 .
LI H . An improved random forest parallel classification method and its application to big data of telecom operators [D ] . Chengdu : University of Electronic Science and Technology of China , 2015 .
BREIMAN L . Bagging predictors [J ] . Machine Learning , 1996 , 24 ( 1 ): 123 - 140 .
DIETTERICH T . An experimental comparison of three methods for constructing ensembles of decision trees: bagging boosting and randomization [J ] . Machine Learning , 2000 ( 40 ): 139 - 157 .
方匡南 , 吴见彬 , 朱建平 , 等 . 随机森林方法研究综述 [J ] . 统计与信息论坛 , 2011 ( 3 ): 32 - 38 .
FANG K N , WU J B , ZHU J P , et al . A review of technologies on random forests [J ] . Statistics & Information Forum , 2011 ( 3 ): 32 - 38 .
曹正凤 . 随机森林算法优化研究 [D ] . 北京 : 首都经济贸易大学 , 2014 .
CAO Z F . Study on optimization of random forests algorithm [D ] . Beijing : Capital University of Economics and Business , 2014 .
黄师师 , 黄哲学 . 随机森林理论浅析 [J ] . 集成技术 , 2013 , 2 ( 1 ): 1 - 7 .
HUANG S S , HUANG Z X . A brief theoretical overview of random forests [J ] . Journal of Integration Technology , 2013 , 2 ( 1 ): 1 - 7 .
0
浏览量
615
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构