【#文档大全网# 导语】以下是®文档大全网的小编为您整理的《《白话机器学习算法》数据来源和参考资料》,欢迎阅读!
《白话机器学习算法》数据来源和参考资料
k均值聚类:Facebook用户性格特征
Stillwell, D., & Kosinski, M. (2012). myPersonality Project: .
Kosinski, M., Matz, S., Gosling, S., Popov, V., & Stillwell, D. (2015). Facebook as a Social Science Research Tool: Opportunities, Challenges, Ethical Considerations and Practical Guidelines. American Psychologist.
主成分分析:食物的营养成分
美国农业部(2015),USDA Food Composition Databases: 。
关联规则:杂货店数据
数据集包含在如下R包中:Hahsler, M., Buchta, C., Gruen, B., & Hornik, K. (2016). arules: Mining Association Rules and Frequent Itemsets. R包版本1.5-0. 。
Hahsler, M., Hornik, K., & Reutterer, T. (2006). Implications of Probabilistic Data Modeling for Mining Association Rules. In Spiliopoulou, M., Kruse, R., Borgelt, C., Nürnberger, A.,& Gaul, W. Eds., From Data and Information Analysis to Knowledge Engineering, Studies in Classification, Data Analysis, and Knowledge Organization. pp.598-605. Berlin, Germany: Springer-Verlag.
Hahsler, M., & Chelluboina, S. (2011). Visualizing Association Rules: Introduction to the R-extension Package arulesViz. R Project Module, 223-238.
回归分析:预测房价
Harrison, D., & Rubinfeld, D. (1978). Hedonic Prices and the Demand for Clean Air. Journal of Environmental Economics and Management, 5, 81-102.
k最近邻算法:葡萄酒的化学成分
Forina, M., et al. (1998). Wine Recognition: .
Cortez, P., Cerdeira, A., Almeida, F., Matos, T., & Reis, J. (2009). Modeling Wine Preferences by Data Mining from Physicochemical Properties. Decision Support Systems, 47(4), 547-553.
支持向量机:预测心脏病 Robert Detrano (M.D., Ph.D), from Virginia Medical Center, Long Beach and Cleveland Clinic Foundation (1988). Heart Disease Database (Cleveland) [Data file and description]: +Disease.
Detrano, R., et al. (1989). International Application of a New Probability Algorithm for the Diagnosis of Coronary Artery Disease. The American Journal of Cardiology, 64(5), 304-310.
决策树:泰坦尼克号乘客数据
Report on the Loss of the 'Titanic' (S.S.) (1990). British Board of Trade Inquiry Report (reprint), Gloucester, UK: Allan Sutton Publishing and are discussed in Dawson, R. J. M. (1995). The 'Unusual Episode' Data Revisited. Journal of Statistics Education, 3(3).
随机森林:旧金山犯罪事件数据
SF OpenData, City and County of San Francisco (2016). Crime Incidents.
随机森林:旧金山天气
National Oceanic and Atmospheric Administration, National Centers for Environmental Information (2016). Quality Controlled Local Climatological Data (QCLCD).
神经网络:手写数字
LeCun, Y., & Cortes, C. (1998). The MNIST Database of Handwritten Digits [Data file and description]: .
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based Learning Applied to Document Recognition. Proceedings of the IEEE, 86(11), 2278-2324.
若想获取更多开放数据集,请访问。
本文来源:https://www.wddqxz.cn/778d5e4caa114431b90d6c85ec3a87c240288a89.html