Classification in Machine Learning
What is classification ?
A classification algorithm is a Supervised Learning approach that uses training data to identify the category of fresh observations.A program learns from a dataset or observations and then classifies fresh observations into one of many categories Such as, Yes or No, 0 or 1, Spam or Not Spam. Classification algorithm is a Supervised learning technique, hence it takes labeled input data, which means it contains input with the corresponding output.
The main goal of the Classification algorithm is to identify the category of a given dataset, and these algorithms are mainly used to predict the output for the categorical data.
There are two types of Classifications:
- Binary Classifier: If the classification problem has only two possible output
Examples: SPAM or NOT SPAM, CAT or DOG, etc. - Multi-class Classifier: If a classification problem has more than two outputs
Example: Classifications of types of dog breeds
There are two types of learners:
- Lazy Learners: Lazy Learner firstly stores the training dataset and wait until it receives the test dataset. In Lazy learner case, classification is done on the basis of the most related data stored in the training dataset. It takes less time in training but more time for predictions.
Example: K-NN algorithm - Eager Learners: Eager Learners develop a classification model based on a training dataset before receiving a test dataset. Eager Learner takes more time in learning, and less time in prediction.
Example: Decision Trees, Naïve Bayes, ANN.
How to evaluate classification models:
1 . Cross-Entropy Loss: (ylog(p)+(1?y)log(1?p)) where y is the actual output and p is the predicted output
It is used for evaluating the performance of a classifier, whose output is a probability value between the 0 and 1. For a good binary Classification model, the value of log loss should be near to 0. The lower log loss represents the higher accuracy of the model.
2 . Confusion Matrix:
The confusion matrix provides us a matrix/table as output and describes the performance of the model. It is also known as the error matrix
sentiment analysis in classification
A sentiment analysis task is usually modeled as a classification problem, whereby a classifier is fed a text and returns a category, e.g. positive, negative
The process of detecting positive or negative sentiment in text is known as sentiment analysis. Businesses frequently utilize it to detect sentiment in social data, assess brand reputation, and gain a better understanding of their customers.
In social media monitoring, sentiment analysis is used to obtain insights into how consumers feel about specific subjects and to discover important concerns in real time before they spiral out of hand.
Sounds interesting right , So now lets learn it with a real world example
Ill be providing you a sframe to try it Link
https://drive.google.com/drive/folders/1zdH21IciINxhj_6yIMM82PIKrddyqJkU?usp=sharing
ill be having the code in GitHub link down below
😋😋😋😋😋😋😋😋😋😋😋😋😋😋😋😋
Lets start
Next
open a googlecolab or use a ipynb notebook
Next
!pip install turicreate
Next
Read the file and explore
Next
Adding the word count
Next
Get the most sale item
Next
Visualize to build the classifier
Feature Engineering
Test Train splitting
Next
Evaluation
Next Testing the model
Next
Lets analyze by giving desired words
Next
now creating a new model for sentimental analysis with selected word
Next
Compare both the models
M1: old model with word count
M2 : new model with selected words
Next
take a selected product of your own I took this
now we can see a side to side comparison of the two models
Hope the tutorial was helpful. If there is anything we missed out, do let us know through comments.😇
❤️❤️❤️❤️❤️❤️❤️Thanks for reading❤️❤️❤️❤️❤️❤️❤️❤️