Overview
Brought to you by YData
Dataset statistics
| Number of variables | 5 |
|---|---|
| Number of observations | 230 |
| Missing cells | 97 |
| Missing cells (%) | 8.4% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 74.1 KiB |
| Average record size in memory | 329.7 B |
Variable types
| Text | 3 |
|---|---|
| Categorical | 2 |
Status is highly imbalanced (88.7%) | Imbalance |
Status has 97 (42.2%) missing values | Missing |
StationId has unique values | Unique |
StationName has unique values | Unique |
Reproduction
| Analysis started | 2024-11-14 23:59:02.947662 |
|---|---|
| Analysis finished | 2024-11-14 23:59:03.739076 |
| Duration | 0.79 seconds |
| Software version | ydata-profiling vv4.12.0 |
| Download configuration | config.json |
Variables
StationId
Text
Unique 
| Distinct | 230 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 14.1 KiB |
Length
| Max length | 5 |
|---|---|
| Median length | 5 |
| Mean length | 5 |
| Min length | 5 |
Unique
| Unique | 230 ? |
|---|---|
| Unique (%) | 100.0% |
Sample
| 1st row | AP001 |
|---|---|
| 2nd row | AP002 |
| 3rd row | AP003 |
| 4th row | AP004 |
| 5th row | AP005 |
| Value | Count | Frequency (%) |
| ap001 | 1 | 0.4% |
| dl012 | 1 | 0.4% |
| dl011 | 1 | 0.4% |
| ap003 | 1 | 0.4% |
| ap004 | 1 | 0.4% |
| ap005 | 1 | 0.4% |
| as001 | 1 | 0.4% |
| br001 | 1 | 0.4% |
| br002 | 1 | 0.4% |
| br003 | 1 | 0.4% |
| Other values (220) | 220 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 371 | |
| 1 | 97 | 8.4% |
| 2 | 59 | 5.1% |
| P | 55 | 4.8% |
| H | 53 | 4.6% |
| R | 49 | 4.3% |
| L | 47 | 4.1% |
| M | 40 | 3.5% |
| D | 40 | 3.5% |
| 3 | 35 | 3.0% |
| Other values (19) | 304 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1150 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 371 | |
| 1 | 97 | 8.4% |
| 2 | 59 | 5.1% |
| P | 55 | 4.8% |
| H | 53 | 4.6% |
| R | 49 | 4.3% |
| L | 47 | 4.1% |
| M | 40 | 3.5% |
| D | 40 | 3.5% |
| 3 | 35 | 3.0% |
| Other values (19) | 304 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1150 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 371 | |
| 1 | 97 | 8.4% |
| 2 | 59 | 5.1% |
| P | 55 | 4.8% |
| H | 53 | 4.6% |
| R | 49 | 4.3% |
| L | 47 | 4.1% |
| M | 40 | 3.5% |
| D | 40 | 3.5% |
| 3 | 35 | 3.0% |
| Other values (19) | 304 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1150 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 371 | |
| 1 | 97 | 8.4% |
| 2 | 59 | 5.1% |
| P | 55 | 4.8% |
| H | 53 | 4.6% |
| R | 49 | 4.3% |
| L | 47 | 4.1% |
| M | 40 | 3.5% |
| D | 40 | 3.5% |
| 3 | 35 | 3.0% |
| Other values (19) | 304 |
StationName
Text
Unique 
| Distinct | 230 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 19.7 KiB |
Length
| Max length | 53 |
|---|---|
| Median length | 44 |
| Mean length | 30.091304 |
| Min length | 17 |
Unique
| Unique | 230 ? |
|---|---|
| Unique (%) | 100.0% |
Sample
| 1st row | Secretariat, Amaravati - APPCB |
|---|---|
| 2nd row | Anand Kala Kshetram, Rajamahendravaram - APPCB |
| 3rd row | Tirumala, Tirupati - APPCB |
| 4th row | PWD Grounds, Vijayawada - APPCB |
| 5th row | GVM Corporation, Visakhapatnam - APPCB |
| Value | Count | Frequency (%) |
| 234 | 20.3% | |
| delhi | 38 | 3.3% |
| hspcb | 28 | 2.4% |
| nagar | 27 | 2.3% |
| dpcc | 24 | 2.1% |
| mpcb | 22 | 1.9% |
| uppcb | 22 | 1.9% |
| kspcb | 17 | 1.5% |
| cpcb | 16 | 1.4% |
| wbpcb | 14 | 1.2% |
| Other values (470) | 712 |
Most occurring characters
| Value | Count | Frequency (%) |
| 924 | 13.4% | |
| a | 755 | 10.9% |
| r | 367 | 5.3% |
| C | 318 | 4.6% |
| P | 311 | 4.5% |
| i | 302 | 4.4% |
| B | 263 | 3.8% |
| - | 254 | 3.7% |
| e | 235 | 3.4% |
| l | 233 | 3.4% |
| Other values (54) | 2959 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 6921 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 924 | 13.4% | |
| a | 755 | 10.9% |
| r | 367 | 5.3% |
| C | 318 | 4.6% |
| P | 311 | 4.5% |
| i | 302 | 4.4% |
| B | 263 | 3.8% |
| - | 254 | 3.7% |
| e | 235 | 3.4% |
| l | 233 | 3.4% |
| Other values (54) | 2959 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 6921 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 924 | 13.4% | |
| a | 755 | 10.9% |
| r | 367 | 5.3% |
| C | 318 | 4.6% |
| P | 311 | 4.5% |
| i | 302 | 4.4% |
| B | 263 | 3.8% |
| - | 254 | 3.7% |
| e | 235 | 3.4% |
| l | 233 | 3.4% |
| Other values (54) | 2959 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 6921 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 924 | 13.4% | |
| a | 755 | 10.9% |
| r | 367 | 5.3% |
| C | 318 | 4.6% |
| P | 311 | 4.5% |
| i | 302 | 4.4% |
| B | 263 | 3.8% |
| - | 254 | 3.7% |
| e | 235 | 3.4% |
| l | 233 | 3.4% |
| Other values (54) | 2959 |
City
Text
| Distinct | 127 |
|---|---|
| Distinct (%) | 55.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 14.6 KiB |
Length
| Max length | 18 |
|---|---|
| Median length | 14 |
| Mean length | 7.2913043 |
| Min length | 4 |
Unique
| Unique | 106 ? |
|---|---|
| Unique (%) | 46.1% |
Sample
| 1st row | Amaravati |
|---|---|
| 2nd row | Rajamahendravaram |
| 3rd row | Tirupati |
| 4th row | Vijayawada |
| 5th row | Visakhapatnam |
| Value | Count | Frequency (%) |
| delhi | 38 | 16.1% |
| mumbai | 13 | 5.5% |
| bengaluru | 10 | 4.2% |
| kolkata | 7 | 3.0% |
| patna | 6 | 2.5% |
| hyderabad | 6 | 2.5% |
| noida | 6 | 2.5% |
| lucknow | 5 | 2.1% |
| gurugram | 4 | 1.7% |
| chennai | 4 | 1.7% |
| Other values (118) | 137 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 312 | |
| i | 132 | 7.9% |
| r | 124 | 7.4% |
| h | 99 | 5.9% |
| u | 98 | 5.8% |
| l | 96 | 5.7% |
| e | 87 | 5.2% |
| n | 82 | 4.9% |
| d | 60 | 3.6% |
| o | 46 | 2.7% |
| Other values (36) | 541 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1677 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| a | 312 | |
| i | 132 | 7.9% |
| r | 124 | 7.4% |
| h | 99 | 5.9% |
| u | 98 | 5.8% |
| l | 96 | 5.7% |
| e | 87 | 5.2% |
| n | 82 | 4.9% |
| d | 60 | 3.6% |
| o | 46 | 2.7% |
| Other values (36) | 541 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1677 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| a | 312 | |
| i | 132 | 7.9% |
| r | 124 | 7.4% |
| h | 99 | 5.9% |
| u | 98 | 5.8% |
| l | 96 | 5.7% |
| e | 87 | 5.2% |
| n | 82 | 4.9% |
| d | 60 | 3.6% |
| o | 46 | 2.7% |
| Other values (36) | 541 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1677 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| a | 312 | |
| i | 132 | 7.9% |
| r | 124 | 7.4% |
| h | 99 | 5.9% |
| u | 98 | 5.8% |
| l | 96 | 5.7% |
| e | 87 | 5.2% |
| n | 82 | 4.9% |
| d | 60 | 3.6% |
| o | 46 | 2.7% |
| Other values (36) | 541 |
State
Categorical
| Distinct | 21 |
|---|---|
| Distinct (%) | 9.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 14.9 KiB |
| Delhi | |
|---|---|
| Haryana | |
| Uttar Pradesh | |
| Maharashtra | |
| Karnataka | |
| Other values (16) |
Length
| Max length | 14 |
|---|---|
| Median length | 11 |
| Mean length | 8.8478261 |
| Min length | 5 |
Unique
| Unique | 5 ? |
|---|---|
| Unique (%) | 2.2% |
Sample
| 1st row | Andhra Pradesh |
|---|---|
| 2nd row | Andhra Pradesh |
| 3rd row | Andhra Pradesh |
| 4th row | Andhra Pradesh |
| 5th row | Andhra Pradesh |
Common Values
| Value | Count | Frequency (%) |
| Delhi | 38 | |
| Haryana | 29 | |
| Uttar Pradesh | 26 | |
| Maharashtra | 22 | |
| Karnataka | 20 | |
| Madhya Pradesh | 16 | |
| West Bengal | 14 | 6.1% |
| Rajasthan | 10 | 4.3% |
| Bihar | 10 | 4.3% |
| Punjab | 8 | 3.5% |
| Other values (11) | 37 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| pradesh | 47 | |
| delhi | 38 | |
| haryana | 29 | |
| uttar | 26 | |
| maharashtra | 22 | 7.4% |
| karnataka | 20 | 6.8% |
| madhya | 16 | 5.4% |
| west | 14 | 4.7% |
| bengal | 14 | 4.7% |
| rajasthan | 10 | 3.4% |
| Other values (14) | 60 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 494 | |
| r | 198 | 9.7% |
| h | 177 | 8.7% |
| e | 128 | 6.3% |
| t | 124 | 6.1% |
| n | 100 | 4.9% |
| s | 97 | 4.8% |
| d | 77 | 3.8% |
| l | 72 | 3.5% |
| 66 | 3.2% | |
| Other values (26) | 502 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 2035 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| a | 494 | |
| r | 198 | 9.7% |
| h | 177 | 8.7% |
| e | 128 | 6.3% |
| t | 124 | 6.1% |
| n | 100 | 4.9% |
| s | 97 | 4.8% |
| d | 77 | 3.8% |
| l | 72 | 3.5% |
| 66 | 3.2% | |
| Other values (26) | 502 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 2035 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| a | 494 | |
| r | 198 | 9.7% |
| h | 177 | 8.7% |
| e | 128 | 6.3% |
| t | 124 | 6.1% |
| n | 100 | 4.9% |
| s | 97 | 4.8% |
| d | 77 | 3.8% |
| l | 72 | 3.5% |
| 66 | 3.2% | |
| Other values (26) | 502 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 2035 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| a | 494 | |
| r | 198 | 9.7% |
| h | 177 | 8.7% |
| e | 128 | 6.3% |
| t | 124 | 6.1% |
| n | 100 | 4.9% |
| s | 97 | 4.8% |
| d | 77 | 3.8% |
| l | 72 | 3.5% |
| 66 | 3.2% | |
| Other values (26) | 502 |
Status
Categorical
Imbalance  Missing 
| Distinct | 2 |
|---|---|
| Distinct (%) | 1.5% |
| Missing | 97 |
| Missing (%) | 42.2% |
| Memory size | 12.1 KiB |
| Active | |
|---|---|
| Inactive | 2 |
Length
| Max length | 8 |
|---|---|
| Median length | 6 |
| Mean length | 6.0300752 |
| Min length | 6 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Active |
|---|---|
| 2nd row | Active |
| 3rd row | Active |
| 4th row | Active |
| 5th row | Active |
Common Values
| Value | Count | Frequency (%) |
| Active | 131 | |
| Inactive | 2 | 0.9% |
| (Missing) | 97 |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| active | 131 | |
| inactive | 2 | 1.5% |
Most occurring characters
| Value | Count | Frequency (%) |
| c | 133 | |
| t | 133 | |
| i | 133 | |
| v | 133 | |
| e | 133 | |
| A | 131 | |
| I | 2 | 0.2% |
| n | 2 | 0.2% |
| a | 2 | 0.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 802 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| c | 133 | |
| t | 133 | |
| i | 133 | |
| v | 133 | |
| e | 133 | |
| A | 131 | |
| I | 2 | 0.2% |
| n | 2 | 0.2% |
| a | 2 | 0.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 802 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| c | 133 | |
| t | 133 | |
| i | 133 | |
| v | 133 | |
| e | 133 | |
| A | 131 | |
| I | 2 | 0.2% |
| n | 2 | 0.2% |
| a | 2 | 0.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 802 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| c | 133 | |
| t | 133 | |
| i | 133 | |
| v | 133 | |
| e | 133 | |
| A | 131 | |
| I | 2 | 0.2% |
| n | 2 | 0.2% |
| a | 2 | 0.2% |
Correlations
| State | Status | |
|---|---|---|
| State | 1.000 | 0.000 |
| Status | 0.000 | 1.000 |
Missing values
A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Sample
| StationId | StationName | City | State | Status | |
|---|---|---|---|---|---|
| 0 | AP001 | Secretariat, Amaravati - APPCB | Amaravati | Andhra Pradesh | Active |
| 1 | AP002 | Anand Kala Kshetram, Rajamahendravaram - APPCB | Rajamahendravaram | Andhra Pradesh | NaN |
| 2 | AP003 | Tirumala, Tirupati - APPCB | Tirupati | Andhra Pradesh | NaN |
| 3 | AP004 | PWD Grounds, Vijayawada - APPCB | Vijayawada | Andhra Pradesh | NaN |
| 4 | AP005 | GVM Corporation, Visakhapatnam - APPCB | Visakhapatnam | Andhra Pradesh | Active |
| 5 | AS001 | Railway Colony, Guwahati - APCB | Guwahati | Assam | Active |
| 6 | BR001 | Collectorate, Gaya - BSPCB | Gaya | Bihar | NaN |
| 7 | BR002 | SFTI Kusdihra, Gaya - BSPCB | Gaya | Bihar | NaN |
| 8 | BR003 | Industrial Area, Hajipur - BSPCB | Hajipur | Bihar | NaN |
| 9 | BR004 | Muzaffarpur Collectorate, Muzaffarpur - BSPCB | Muzaffarpur | Bihar | NaN |
| StationId | StationName | City | State | Status | |
|---|---|---|---|---|---|
| 220 | WB005 | Ghusuri, Howrah - WBPCB | Howrah | West Bengal | NaN |
| 221 | WB006 | Padmapukur, Howrah - WBPCB | Howrah | West Bengal | NaN |
| 222 | WB007 | Ballygunge, Kolkata - WBPCB | Kolkata | West Bengal | Active |
| 223 | WB008 | Bidhannagar, Kolkata - WBPCB | Kolkata | West Bengal | Active |
| 224 | WB009 | Fort William, Kolkata - WBPCB | Kolkata | West Bengal | Active |
| 225 | WB010 | Jadavpur, Kolkata - WBPCB | Kolkata | West Bengal | Active |
| 226 | WB011 | Rabindra Bharati University, Kolkata - WBPCB | Kolkata | West Bengal | Active |
| 227 | WB012 | Rabindra Sarobar, Kolkata - WBPCB | Kolkata | West Bengal | Active |
| 228 | WB013 | Victoria, Kolkata - WBPCB | Kolkata | West Bengal | Active |
| 229 | WB014 | Ward-32 Bapupara, Siliguri - WBPCB | Siliguri | West Bengal | NaN |