Skip to main content

Posts

Featured post

6. Exploratory Data Analysis [ EDA ] _ Part4 [ Deleting Missing Values ]

When should we Delete missing values in a given data set in Machine learning? Handling missing values is an important step in the preprocessing of data for machine learning models. The decision to delete missing values depends on the extent of missing data, the nature of the data, and the impact of missing values on the performance of your model. Here are some considerations: Percentage of Missing Values: If a small percentage of your data has missing values (e.g., less than 5%), you may choose to simply remove the rows with missing values, especially if the missing values are randomly distributed and not likely to introduce bias. If a large percentage of your data has missing values, removing those rows might lead to a significant loss of information. In such cases, other strategies, like imputation, might be more appropriate. Reason for Missing Values: Understanding why the values are missing can help in deciding the appropriate strategy. If values are missing completely at random, d...
Recent posts

5. Exploratory Data Analysis [ EDA ] _Part 3 [ Identifying missing values ]

Note: Please read previous article :  Checking for Duplicate Values  for better understanding. b. Identifying Missing Values  in  Dependent and Independent Variables Checking for missing values is a crucial step in the data analysis and preprocessing process for several important reasons: Data Quality Assurance: Identifying missing values helps ensure the quality and integrity of the dataset. It allows for a thorough examination of data completeness and accuracy. Avoiding Bias in Analysis: Missing values can introduce bias into statistical analyses and machine learning models. Detecting and addressing these gaps is essential to obtain accurate and unbiased results. Preventing Misleading Conclusions: Ignoring missing values may lead to incorrect conclusions and interpretations. It's important to be aware of the extent of missing data to avoid drawing misleading or inaccurate insights. Ensuring Validity of Results: Many statistical tests and analyses assume the availa...

4.Exploratory Data Analysis [ EDA ] _ Part 2 [ Checking for Duplicate Values ]

Note : Before going to the forward, please read the previous article :  3. Exploratory Data Analysis_ Part_1   [How to Laod data & Lowering the Data ] for better understand. In this Section we are going to discuss about: 1.Data Cleaning 2.Checking For Duplicate Values in a Dataset What is data cleaning in EDA? Data cleaning in Exploratory Data Analysis (EDA) is the process of identifying and addressing issues or anomalies in the raw data to ensure its accuracy, consistency, and reliability. The purpose of data cleaning is to prepare the data for analysis by removing errors, inconsistencies, and irrelevant information that could potentially distort the results of the analysis. Key aspects of data cleaning in EDA include: Handling Missing Values: Identifying and addressing missing values in the dataset. This may involve imputing missing values using statistical methods, removing rows or columns with missing values, or making informed decisions about ...