Pre-processing in simple terms refers to the transformations applied to data before feeding it to a particular algorithm. The training phase of an ML project can be adversely affected by noisy data and redundant information. Data pre-processing is thus used to facilitate the training/testing process by appropriately transforming and scaling the entire dataset. It includes cleaning, normalization, transformation, etc.
In practice, data pre-processing can be very tedious but there are libraries like Pandas that make it very much easier. In Python, it said to be the best tool for importing and managing datasets, pre-processing data as well as data analysis. In this video, we will explore how it aids the whole process and help us effectively pre-process our data.
Removal of unwanted data
The very first step in data pre-processing is to remove redundant data and unnecessary noise from the dataset. Here we can make use of Pandas specific functions to deal with null values and handling redundant data.
Feature transformation & selection
We then select only the essential features and eliminate unwanted columns along with which apply data specific transformation rules to get the best out from it.
Few more steps -
Basis the data and use case we can apply many other in-built but very effective functions provided by Pandas to achieve require format and eventually make it model-ready i.e. can feed to the selected model for training.
~~~~~~~~~~
Link to the code used in the video - [ Ссылка ]
~~~~~~~~~~
Connect with us on our social media channels to get daily updates on Data Science and Artificial Intelligence.
~~~~~~~~~~
LinkedIn - [ Ссылка ]
Twitter - [ Ссылка ]
Facebook - [ Ссылка ]
Instagram - [ Ссылка ]
Data Pre-processing using Python | Pandas for Data Science
Теги
datadatahacksdatatricks codecodetrickscodehackscodingkeepcoding pythonpythonhackspythontrickspandaspandastrickspandashacksdatasciencedatasciencetricksdatasciencehacksmachinelearningmlmachinelearningtricksmachinelearninghacks deeplearningdlartificialintelligenceaiaihacksaitricksaiconcepts naturallanguageprocessingnlptextprocessingcomputersciencecstrickshackskeeplearningdatasciencewizardsdswedapreprocessingmissing values