《保险研究》20190406-《Bagging集成方法在保险欺诈识别中的应用研究》(李秀芳、黄志国、陈孝伟)

[中图分类号]F84;G623 [文献标识码]A [文章编号]1004-3306(2019)04-0066-19 DOI:10.13497/j.cnki.is.2019.04.006

资源价格:30积分

  • 内容介绍

[摘   要]保险欺诈不仅危及保险公司的正常经营,增加投保人的负担,甚至有可能影响到国家的金融稳定。随着大数据时代的到来,保险反欺诈亟需引入革命性技术。Bagging集成方法以其可调节模型结构、易于部署、参数空间可控、支持并行运算等特点成为保险公司进行保险反欺诈一个好的选择。Bagging方法主要包括Bagging算法、Random Subspace算法、Random Patches算法,它们又能与不同基学习器结合构成新的分支算法及算法特例。本文基于这些算法对保险欺诈问题进行了实证检验,分析了各算法及与基学习器的适用性问题,以及基学习器个数对算法表现的影响。分析发现:针对保险欺诈识别问题,在Bagging、Random Subspace、Random Patches三者之中,Random Patches算法的表现最好,Bagging的运行时间最短;不同算法适用的基学习器不同,但总体来说最适合Bagging集成方法的是决策树;基于决策树的方法都一致选择是否委托律师代理作为最重要的特征;基学习器个数对不同Bagging算法表现的影响并不一致。

[关键词]Bagging;保险欺诈;极端随机树;随机森林

[基金项目]本文受到国家自然科学基金面上项目“保险公司经济资本预测与最优配置问题研究”(NO.71573143)、 “不确定全面风险分析框架下供应链风险建模与优化研究”(NO.61673225)和中央高校基本科研业务费专项资金“随机最优控制与金融保险管理交叉研究”(NO.63185019)的资助。

[作者简介]李秀芳,南开大学金融学院教授、博士生导师,研究方向:精算学、保险学;黄志国,南开大学金融学院博士研究生,研究方向:风险管理、动态经济学、机器学习;陈孝伟,南开大学金融学院副教授、硕士生导师,研究方向:不确定理论、资本资产定价、动态经济学。


Insurance Fraud Detection Based on Bagging Ensemble Learning

LI Xiu-fang,HUANG Zhi-guo,CHEN Xiao-wei

Abstract:Insurance fraud not only jeopardizes the normal operation of insurance companies,but also increases the burden on policyholders,and may even affect China's financial stability. With the advent of the era of big data,it is necessary to introduce revolutionary technology for insurance fraud detection. The Bagging ensemble method has become an optimal choice because it’s easy to adjust the model structure according to the amount of data,easy to deploy,controllable parameter space,and support for parallel computing. The Bagging methodology mainly comprises Bagging algorithm,Random Subspace algorithm,and Random Patches algorithm,and they can be combined with other base learners to form new branch algorithms and algorithm examples. Based on these algorithms,the paper conducted empirical testing on insurance frauds,the applicability of various algorithms and base learners,and the impacts of the number of base learners on the performance of algorithms. It was found that,for insurance fraud detection,the Random Patches algorithm had the highest score and the Bagging had the shortest running time among the Bagging,Random Subspace and Random Patches.  Different algorithm should apply different base learner,but in general,among various base learners,the best was decision tree for the Bagging ensemble method. The most important feature of the decision tree method was whether to entrust a lawyer. The number of base learners had different effects on the performance of different algorithms.

Key words:Bagging;insurance fraud;Extremely Randomized Trees;Random Forest