DataScience:基于GiveMeSomeCredit数据集利用特征工程处理、逻辑回归LoR算法实现构建风控中的金融评分卡模型
目录
基于GiveMeSomeCredit数据集利用特征工程处理、逻辑回归LoR算法实现构建风控中的金融评分卡模型
分析label字段:统计SeriousDlqin2yrs类别及其个数统计
分析3个类似字段—NumberOfTimes90DaysLate、NumberOfTime60
分析单个字段—DebtRatio及与MonthlyIncome、SeriousDlqin2yrs关系
分析单个字段—NumberOfOpenCreditLinesAndLoans
分析单个字段—NumberRealEstateLoansOrLines
2.5.1、基于筛选的特征,利用WOE函数把分箱转成WOE值
3.2、模型评估:计算AUC值、绘制ROC曲线、输出混淆矩阵
4.1.1、求出两个刻度A、B:根据2个假设推导出评分卡的刻度参数A和B计算公式
4.1.2、设计评分卡规则表 :根据刻度B、对应分箱的WOE编码、模型系数,得到score_card_rule
4.2.1、随机选取12个样本(6个好的和6个坏的)并计算每个样本的总评分并对比Label,可验证模型效果
相关文章
DataScience:基于GiveMeSomeCredit数据集利用特征工程处理、逻辑回归LoR算法实现构建风控中的金融评分卡模型
DataScience:基于GiveMeSomeCredit数据集利用特征工程处理、逻辑回归LoR算法实现构建风控中的金融评分卡模型实现
- Unnamed: 0 ... NumberOfDependents
- 0 1 ... 2.0
- 1 2 ... 1.0
- 2 3 ... 0.0
- 3 4 ... 0.0
- 4 5 ... 0.0
-
- [5 rows x 12 columns]
- <class 'pandas.core.frame.DataFrame'>
- RangeIndex: 150000 entries, 0 to 149999
- Data columns (total 12 columns):
- Column Non-Null Count Dtype
- --- ------ -------------- -----
- 0 Unnamed: 0 150000 non-null int64
- 1 SeriousDlqin2yrs 150000 non-null int64
- 2 RevolvingUtilizationOfUnsecuredLines 150000 non-null float64
- 3 age 150000 non-null int64
- 4 NumberOfTime30-59DaysPastDueNotWorse 150000 non-null int64
- 5 DebtRatio 150000 non-null float64
- 6 MonthlyIncome 120269 non-null float64
- 7 NumberOfOpenCreditLinesAndLoans 150000 non-null int64
- 8 NumberOfTimes90DaysLate 150000 non-null int64
- 9 NumberRealEstateLoansOrLines 150000 non-null int64
- 10 NumberOfTime60-89DaysPastDueNotWorse 150000 non-null int64
- 11 NumberOfDependents 146076 non-null float64
- dtypes: float64(4), int64(8)
- memory usage: 13.7 MB
- None
- Unnamed: 0 ... NumberOfDependents
- count 150000.000000 ... 146076.000000
- mean 75000.500000 ... 0.757222
- std 43301.414527 ... 1.115086
- min 1.000000 ... 0.000000
- 25% 37500.750000 ... 0.000000
- 50% 75000.500000 ... 0.000000
- 75% 112500.250000 ... 1.000000
- max 150000.000000 ... 20.000000
- [8 rows x 12 columns]
- Column Number_of_Null_Values Proportion
- 0 Unnamed: 0 0 0.000000
- 1 SeriousDlqin2yrs 0 0.000000
- 2 RevolvingUtilizationOfUnsecuredLines 0 0.000000
- 3 age 0 0.000000
- 4 NumberOfTime30-59DaysPastDueNotWorse 0 0.000000
- 5 DebtRatio 0 0.000000
- 6 MonthlyIncome 29731 0.198207
- 7 NumberOfOpenCreditLinesAndLoans 0 0.000000
- 8 NumberOfTimes90DaysLate 0 0.000000
- 9 NumberRealEstateLoansOrLines 0 0.000000
- 10 NumberOfTime60-89DaysPastDueNotWorse 0 0.000000
- 11 NumberOfDependents 3924 0.026160
- Unnamed: 0 0
- SeriousDlqin2yrs 0
- RevolvingUtilizationOfUnsecuredLines 0
- age 0
- NumberOfTime30-59DaysPastDueNotWorse 0
- DebtRatio 0
- MonthlyIncome 0
- NumberOfOpenCreditLinesAndLoans 0
- NumberOfTimes90DaysLate 0
- NumberRealEstateLoansOrLines 0
- NumberOfTime60-89DaysPastDueNotWorse 0
- NumberOfDependents 0
- Default Rate: 0.06684
- count 150000.000000
- mean 6.048438
- std 249.755371
- min 0.000000
- 25% 0.029867
- 50% 0.154181
- 75% 0.559046
- max 50708.000000
- Name: RevolvingUtilizationOfUnsecuredLines, dtype: float64
- [[0, 0.06684], [1, 0.37177950868783705], [2, 0.14555256064690028], [3, 0.09931506849315068], [4, 0.08679245283018867], [5, 0.07874015748031496], [6, 0.07692307692307693], [7, 0.0778688524590164], [8, 0.07407407407407407], [9, 0.07053941908713693], [10, 0.07053941908713693], [11, 0.07053941908713693], [12, 0.06666666666666667], [13, 0.058823529411764705], [14, 0.058823529411764705], [15, 0.05531914893617021], [16, 0.05531914893617021], [17, 0.05531914893617021], [18, 0.05531914893617021], [19, 0.05555555555555555]]
- Proportion of Defaulters with Total Amount of Money Owed Not Exceeding Total Credit Limit: 0.05991996127598361
- Proportion of Defaulters with Total Amount of Money Owed Not Exceeding or Equal to 13 times of Total Credit Limit:
- 0.06685273968029273
- count 150000.000000
- mean 52.295207
- std 14.771866
- min 0.000000
- 25% 41.000000
- 50% 52.000000
- 75% 63.000000
- max 109.000000
- Name: age, dtype: float64
89DaysPastDueNotWorse、NumberOfTime30-59DaysPastDueNotWorse
- 0 141662
- 1 5243
- 2 1555
- 3 667
- 4 291
- 5 131
- 6 80
- 7 38
- 8 21
- 9 19
- 10 8
- 11 5
- 12 2
- 13 4
- 14 2
- 15 2
- 17 1
- 96 5
- 98 264
- Name: NumberOfTimes90DaysLate, dtype: int64
- 0 142396
- 1 5731
- 2 1118
- 3 318
- 4 105
- 5 34
- 6 16
- 7 9
- 8 2
- 9 1
- 11 1
- 96 5
- 98 264
- Name: NumberOfTime60-89DaysPastDueNotWorse, dtype: int64
- 0 126018
- 1 16033
- 2 4598
- 3 1754
- 4 747
- 5 342
- 6 140
- 7 54
- 8 25
- 9 12
- 10 4
- 11 1
- 12 2
- 13 1
- 96 5
- 98 264
- Name: NumberOfTime30-59DaysPastDueNotWorse, dtype: int64
- NumberOfTimes90DaysLate ... NumberOfTime30-59DaysPastDueNotWorse
- count 269.000000 ... 269.000000
- mean 97.962825 ... 97.962825
- std 0.270628 ... 0.270628
- min 96.000000 ... 96.000000
- 25% 98.000000 ... 98.000000
- 50% 98.000000 ... 98.000000
- 75% 98.000000 ... 98.000000
- max 98.000000 ... 98.000000
-
- [8 rows x 3 columns]
- {'98,98,98': 263, '96,96,96': 4}
- temp = df_train[(df_DR > df_DR95) & (df_train['SeriousDlqin2yrs'] == df_train['MonthlyIncome'])]
- temp.to_csv('20220314temp.csv')
- count 150000.000000
- mean 353.005076
- std 2037.818523
- min 0.000000
- 25% 0.175074
- 50% 0.366508
- 75% 0.868254
- max 329664.000000
- Name: DebtRatio, dtype: float64
- 2449.0
- DebtRatio MonthlyIncome SeriousDlqin2yrs
- count 7494.000000 7494.000000 7494.000000
- mean 4417.958367 5126.905791 0.055111
- std 7875.314649 1183.339377 0.228212
网站声明:如果转载,请联系本站管理员。否则一切后果自行承担。