Overview

Dataset statistics

Number of variables11
Number of observations10000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.9 MiB
Average record size in memory196.9 B

Variable types

NUM5
CAT3
BOOL3

Reproduction

Analysis started2020-05-20 04:34:45.540283
Analysis finished2020-05-20 04:34:58.955192
Duration13.41 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Tenure has 413 (4.1%) zeros Zeros
Balance has 3617 (36.2%) zeros Zeros

Variables

CreditScore
Real number (ℝ≥0)

Distinct count460
Unique (%)4.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean650.5288
Minimum350
Maximum850
Zeros0
Zeros (%)0.0%
Memory size78.2 KiB

Quantile statistics

Minimum350
5-th percentile489
Q1584
median652
Q3718
95-th percentile812
Maximum850
Range500
Interquartile range (IQR)134

Descriptive statistics

Standard deviation96.65329874
Coefficient of variation (CV)0.14857651
Kurtosis-0.4257256848
Mean650.5288
Median Absolute Deviation (MAD)67
Skewness-0.0716066082
Sum6505288
Variance9341.860157
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
8502332.3%
 
678630.6%
 
655540.5%
 
705530.5%
 
667530.5%
 
684520.5%
 
670500.5%
 
651500.5%
 
683480.5%
 
660480.5%
 
652480.5%
 
648480.5%
 
682470.5%
 
640470.5%
 
663470.5%
 
637460.5%
 
679450.4%
 
714450.4%
 
710450.4%
 
645450.4%
 
686450.4%
 
687450.4%
 
633450.4%
 
646440.4%
 
619440.4%
 
Other values (435)861086.1%
 
ValueCountFrequency (%) 
35050.1%
 
3511< 0.1%
 
3581< 0.1%
 
3591< 0.1%
 
3631< 0.1%
 
3651< 0.1%
 
3671< 0.1%
 
3731< 0.1%
 
3762< 0.1%
 
3821< 0.1%
 
ValueCountFrequency (%) 
8502332.3%
 
84980.1%
 
84850.1%
 
84760.1%
 
84650.1%
 
84560.1%
 
84470.1%
 
8432< 0.1%
 
84270.1%
 
841120.1%
 

Geography
Categorical

Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
France
5014
Germany
2509
Spain
2477
ValueCountFrequency (%) 
France501450.1%
 
Germany250925.1%
 
Spain247724.8%
 

Length

Max length7
Median length6
Mean length6.0032
Min length5

Overview of Unicode Properties

Unique unicode characters12
Unique unicode categories (?)2
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
a1000016.7%
 
n1000016.7%
 
r752312.5%
 
e752312.5%
 
F50148.4%
 
c50148.4%
 
G25094.2%
 
m25094.2%
 
y25094.2%
 
S24774.1%
 
p24774.1%
 
i24774.1%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter5003283.3%
 
Uppercase Letter1000016.7%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
F501450.1%
 
G250925.1%
 
S247724.8%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
a1000020.0%
 
n1000020.0%
 
r752315.0%
 
e752315.0%
 
c501410.0%
 
m25095.0%
 
y25095.0%
 
p24775.0%
 
i24775.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin60032100.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
a1000016.7%
 
n1000016.7%
 
r752312.5%
 
e752312.5%
 
F50148.4%
 
c50148.4%
 
G25094.2%
 
m25094.2%
 
y25094.2%
 
S24774.1%
 
p24774.1%
 
i24774.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII60032100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
a1000016.7%
 
n1000016.7%
 
r752312.5%
 
e752312.5%
 
F50148.4%
 
c50148.4%
 
G25094.2%
 
m25094.2%
 
y25094.2%
 
S24774.1%
 
p24774.1%
 
i24774.1%
 

Gender
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
Male
5457
Female
4543
ValueCountFrequency (%) 
Male545754.6%
 
Female454345.4%
 

Length

Max length6
Median length4
Mean length4.9086
Min length4

Overview of Unicode Properties

Unique unicode characters6
Unique unicode categories (?)2
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
e1454329.6%
 
a1000020.4%
 
l1000020.4%
 
M545711.1%
 
F45439.3%
 
m45439.3%
 

Most occurring categories

ValueCountFrequency (%) 
Lowercase Letter3908679.6%
 
Uppercase Letter1000020.4%
 

Most frequent Uppercase Letter characters

ValueCountFrequency (%) 
M545754.6%
 
F454345.4%
 

Most frequent Lowercase Letter characters

ValueCountFrequency (%) 
e1454337.2%
 
a1000025.6%
 
l1000025.6%
 
m454311.6%
 

Most occurring scripts

ValueCountFrequency (%) 
Latin49086100.0%
 

Most frequent Latin characters

ValueCountFrequency (%) 
e1454329.6%
 
a1000020.4%
 
l1000020.4%
 
M545711.1%
 
F45439.3%
 
m45439.3%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII49086100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
e1454329.6%
 
a1000020.4%
 
l1000020.4%
 
M545711.1%
 
F45439.3%
 
m45439.3%
 

Age
Real number (ℝ≥0)

Distinct count70
Unique (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38.9218
Minimum18
Maximum92
Zeros0
Zeros (%)0.0%
Memory size78.2 KiB

Quantile statistics

Minimum18
5-th percentile25
Q132
median37
Q344
95-th percentile60
Maximum92
Range74
Interquartile range (IQR)12

Descriptive statistics

Standard deviation10.48780645
Coefficient of variation (CV)0.2694584128
Kurtosis1.395347062
Mean38.9218
Median Absolute Deviation (MAD)6
Skewness1.011320263
Sum389218
Variance109.9940842
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
374784.8%
 
384774.8%
 
354744.7%
 
364564.6%
 
344474.5%
 
334424.4%
 
404324.3%
 
394234.2%
 
324184.2%
 
314044.0%
 
413663.7%
 
293483.5%
 
303273.3%
 
423213.2%
 
432973.0%
 
282732.7%
 
442572.6%
 
452292.3%
 
462262.3%
 
272092.1%
 
262002.0%
 
471751.8%
 
481681.7%
 
251541.5%
 
491471.5%
 
Other values (45)185218.5%
 
ValueCountFrequency (%) 
18220.2%
 
19270.3%
 
20400.4%
 
21530.5%
 
22840.8%
 
23991.0%
 
241321.3%
 
251541.5%
 
262002.0%
 
272092.1%
 
ValueCountFrequency (%) 
922< 0.1%
 
881< 0.1%
 
851< 0.1%
 
842< 0.1%
 
831< 0.1%
 
821< 0.1%
 
814< 0.1%
 
803< 0.1%
 
794< 0.1%
 
7850.1%
 

Tenure
Real number (ℝ≥0)

ZEROS

Distinct count11
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.0128
Minimum0
Maximum10
Zeros413
Zeros (%)4.1%
Memory size78.2 KiB

Quantile statistics

Minimum0
5-th percentile1
Q13
median5
Q37
95-th percentile9
Maximum10
Range10
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.892174377
Coefficient of variation (CV)0.5769578633
Kurtosis-1.165225227
Mean5.0128
Median Absolute Deviation (MAD)2
Skewness0.01099145798
Sum50128
Variance8.364672627
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2104810.5%
 
1103510.3%
 
7102810.3%
 
8102510.2%
 
5101210.1%
 
3100910.1%
 
49899.9%
 
99849.8%
 
69679.7%
 
104904.9%
 
04134.1%
 
ValueCountFrequency (%) 
04134.1%
 
1103510.3%
 
2104810.5%
 
3100910.1%
 
49899.9%
 
5101210.1%
 
69679.7%
 
7102810.3%
 
8102510.2%
 
99849.8%
 
ValueCountFrequency (%) 
104904.9%
 
99849.8%
 
8102510.2%
 
7102810.3%
 
69679.7%
 
5101210.1%
 
49899.9%
 
3100910.1%
 
2104810.5%
 
1103510.3%
 

Balance
Real number (ℝ≥0)

ZEROS

Distinct count6382
Unique (%)63.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean76485.889288
Minimum0.0
Maximum250898.09
Zeros3617
Zeros (%)36.2%
Memory size78.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median97198.54
Q3127644.24
95-th percentile162711.669
Maximum250898.09
Range250898.09
Interquartile range (IQR)127644.24

Descriptive statistics

Standard deviation62397.4052
Coefficient of variation (CV)0.8158028335
Kurtosis-1.489411768
Mean76485.88929
Median Absolute Deviation (MAD)46766.79
Skewness-0.1411087109
Sum764858892.9
Variance3893436176
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0361736.2%
 
105473.742< 0.1%
 
130170.822< 0.1%
 
113063.831< 0.1%
 
80242.371< 0.1%
 
134320.231< 0.1%
 
90218.91< 0.1%
 
155196.171< 0.1%
 
95386.821< 0.1%
 
125961.741< 0.1%
 
126606.631< 0.1%
 
82794.181< 0.1%
 
120782.71< 0.1%
 
167557.121< 0.1%
 
122338.431< 0.1%
 
128504.761< 0.1%
 
102016.381< 0.1%
 
190479.481< 0.1%
 
182065.851< 0.1%
 
124547.131< 0.1%
 
151933.631< 0.1%
 
118546.711< 0.1%
 
141806.461< 0.1%
 
98807.451< 0.1%
 
119703.11< 0.1%
 
Other values (6357)635763.6%
 
ValueCountFrequency (%) 
0361736.2%
 
3768.691< 0.1%
 
12459.191< 0.1%
 
14262.81< 0.1%
 
16893.591< 0.1%
 
23503.311< 0.1%
 
24043.451< 0.1%
 
27288.431< 0.1%
 
27517.151< 0.1%
 
27755.971< 0.1%
 
ValueCountFrequency (%) 
250898.091< 0.1%
 
238387.561< 0.1%
 
222267.631< 0.1%
 
221532.81< 0.1%
 
216109.881< 0.1%
 
214346.961< 0.1%
 
213146.21< 0.1%
 
212778.21< 0.1%
 
212696.321< 0.1%
 
212692.971< 0.1%
 

NumOfProducts
Categorical

Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
1
5084
2
4590
3
 
266
4
 
60
ValueCountFrequency (%) 
1508450.8%
 
2459045.9%
 
32662.7%
 
4600.6%
 

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters4
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
1508450.8%
 
2459045.9%
 
32662.7%
 
4600.6%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number10000100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
1508450.8%
 
2459045.9%
 
32662.7%
 
4600.6%
 

Most occurring scripts

ValueCountFrequency (%) 
Common10000100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
1508450.8%
 
2459045.9%
 
32662.7%
 
4600.6%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII10000100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
1508450.8%
 
2459045.9%
 
32662.7%
 
4600.6%
 

HasCrCard
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
1
7055
0
2945
ValueCountFrequency (%) 
1705570.5%
 
0294529.4%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
1
5151
0
4849
ValueCountFrequency (%) 
1515151.5%
 
0484948.5%
 

EstimatedSalary
Real number (ℝ≥0)

Distinct count9999
Unique (%)> 99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean100090.239881
Minimum11.58
Maximum199992.48
Zeros0
Zeros (%)0.0%
Memory size78.2 KiB

Quantile statistics

Minimum11.58
5-th percentile9851.8185
Q151002.11
median100193.915
Q3149388.2475
95-th percentile190155.3755
Maximum199992.48
Range199980.9
Interquartile range (IQR)98386.1375

Descriptive statistics

Standard deviation57510.49282
Coefficient of variation (CV)0.5745864221
Kurtosis-1.181518447
Mean100090.2399
Median Absolute Deviation (MAD)49198.15
Skewness0.002085357662
Sum1000902399
Variance3307456784
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
24924.922< 0.1%
 
109029.721< 0.1%
 
182025.951< 0.1%
 
82820.851< 0.1%
 
30314.041< 0.1%
 
143265.651< 0.1%
 
148305.821< 0.1%
 
21254.061< 0.1%
 
56297.851< 0.1%
 
113481.021< 0.1%
 
185992.361< 0.1%
 
69370.051< 0.1%
 
76679.61< 0.1%
 
77469.381< 0.1%
 
179291.851< 0.1%
 
133172.481< 0.1%
 
59374.821< 0.1%
 
194700.811< 0.1%
 
168023.61< 0.1%
 
180456.81< 0.1%
 
68367.181< 0.1%
 
52581.961< 0.1%
 
22762.231< 0.1%
 
75888.651< 0.1%
 
21215.671< 0.1%
 
Other values (9974)997499.7%
 
ValueCountFrequency (%) 
11.581< 0.1%
 
90.071< 0.1%
 
91.751< 0.1%
 
96.271< 0.1%
 
106.671< 0.1%
 
123.071< 0.1%
 
142.811< 0.1%
 
143.341< 0.1%
 
178.191< 0.1%
 
216.271< 0.1%
 
ValueCountFrequency (%) 
199992.481< 0.1%
 
199970.741< 0.1%
 
199953.331< 0.1%
 
199929.171< 0.1%
 
199909.321< 0.1%
 
199862.751< 0.1%
 
199857.471< 0.1%
 
199841.321< 0.1%
 
199808.11< 0.1%
 
199805.631< 0.1%
 

Exited
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size78.2 KiB
0
7963
1
2037
ValueCountFrequency (%) 
0796379.6%
 
1203720.4%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

CreditScoreGeographyGenderAgeTenureBalanceNumOfProductsHasCrCardIsActiveMemberEstimatedSalaryExited
0619FranceFemale4220.00111101348.881
1608SpainFemale41183807.86101112542.580
2502FranceFemale428159660.80310113931.571
3699FranceFemale3910.0020093826.630
4850SpainFemale432125510.8211179084.100
5645SpainMale448113755.78210149756.711
6822FranceMale5070.0021110062.800
7376GermanyFemale294115046.74410119346.881
8501FranceMale444142051.0720174940.500
9684FranceMale272134603.8811171725.730

Last rows

CreditScoreGeographyGenderAgeTenureBalanceNumOfProductsHasCrCardIsActiveMemberEstimatedSalaryExited
9990714GermanyMale33335016.6011053667.080
9991597FranceFemale53488381.2111069384.711
9992726SpainMale3620.00110195192.400
9993644FranceMale287155060.4111029179.520
9994800FranceFemale2920.00200167773.550
9995771FranceMale3950.0021096270.640
9996516FranceMale351057369.61111101699.770
9997709FranceFemale3670.0010142085.581
9998772GermanyMale42375075.3121092888.521
9999792FranceFemale284130142.7911038190.780