Description
MACHINE INTELLIGENCE LABORATORY- PESU UE19CS305
Teaching Assistantsvighneshkamath43@gmail.com sarasharish2000@gmail.com abaksy@gmail.com
In this week’s experiment, you will implement a Support Vector Machine classifier using one of the most popular machine learning frameworks in Python, called scikit-learn.
Scikit-learn is one of the most widely used and fully featured machine learning frameworks that, apart from offering a wide variety of machine learning models, also offers many classes for pre-processing data.
You are expected to use the pre-processing steps of your choice along with the SVM classifier to create a pipeline (explained later) which will be used to automate the entire process of training and evaluating the model you build.
You are provided with the following files:
1. week6.py
2. SampleTest.py
3. train.csv
4. test.csv
Dataset Format
The dataset consists of 20 features (labelled as x1 to x20) and a target column that is named ‘targets’.
The features are all numeric and continuous, hence no encoding is needed of any kind.
The target column ‘targets’ consists of the output class corresponding to the point X. It is an integer value You do not need to handle missing values in the dataset.
Figure 1: Dataset Format
The entire dataset is split into three parts. Two of these parts are supplied to you
1) train.csv is meant for you to train the model on
2) test.csv is meant for you to evaluate the accuracy of the trained model on
We will be measuring the performance of your model on a third split called eval.csv. You will be scored based on the accuracy of the model on this unseen validation split of the data.
The scoring will be done as follows:
• Score 10 : accuracy >= 85%
• Score 9: 75% <= accuracy < 85%
• Score 8: 70% <= accuracy < 75%
• Score 7: 65% <= accuracy < 70%
• Score 6: 60% <= accuracy < 65%
• Score 5: 55% <= accuracy < 60%
• Score 4: 50% <= accuracy < 55%
• Score 3: 45% <= accuracy < 50%
• Score 2: 40% <= accuracy < 45%
• Score 1: 35% <= accuracy < 40%
• Score 0: accuracy < 35%
Basics of scikit-learn
The bedrock of scikit-learn is the estimator.
Estimators meant for classifying data (such as the SVC class you will use here) implement the following methods, among many more:
1) The fit(X, y) method fits the model based on the training data X and y supplied. Since SVM is a supervised algorithm, it requires both the input X and the output labels y as input.
2) The predict(Xtest) method takes the test dataset X and returns a NumPy array of the predicted class labels for each point in the test data
3) The score(Xtest, Ytest) method takes in the test dataset and the true labels and returns the model accuracy.
Transformers are used to transform the input dataset into a pre-processed form. They expose the transform() method for transforming the input data.
Transformers for pre-processing the input dataset can be found under the sklearn.preprocessing module. Use the appropriate pre-processing methods to improve the accuracy of your model.
Scikit-Learn Pipeline
The Pipeline class in the sklearn.pipeline library allows one to sequentially apply a list of transforms and a final estimator.
Intermediate steps of the pipeline must be ‘transforms’, that is, they must implement fit and transform methods. The final estimator only needs to implement the fit method.
The sequence of steps is given to the constructor of the Pipeline class in the form of a list of 2-tuples. The first element of the tuple is the name of the pipeline stage (a string) and the second one is the estimator or transform that is being applied in that stage. Note that is a class object and not just a class name.
The pipeline is run by calling the fit() method on the pipeline object.
Check out this tutorial on scikit-learn pipelines.
Important Points:
1. Please do not make changes to the function definitions that are provided to you, as well as the functions that have already been implemented. Use the skeleton as it has been given. Also do not make changes to the sample test file provided to you.
2. You are free to write any helper functions that can be called in any of these predefined functions given to you. Helper functions must be only in the file named ‘YOUR_SRN.py’.
3. Your code will be auto evaluated by our testing script and our dataset and test cases will not be revealed. Please ensure you take care of all edge cases!
6. Hidden test cases will not be revealed post evaluation.
7. Only the SVC model available in the sklearn.svm library is allowed for this experiment (documentation: here). You must not use any other classifiers in the scikit-learn package.
8. You can use any kernel of your choice available in the scikit-learn package to improve the test accuracy of your model
9. You are only allowed to use the Pipeline available in the sklearn.pipeline library and no other pipelines.
10. Make sure to use a Pipeline to stack all the pre-processing and estimator stages in the right order. Return the Pipeline object from the solve() method.
week6.py
You are provided with structure of class SVM.
The class SVM contains one constructor and one method. Your task is to write code for the solve() method.
3. You cannot change the skeleton of the code
4. Note that the target value is an int
SampleTest.py
1. This will help you check your code.
2. Passing the cases in this does not ensure full marks, you will need to take care of edge cases
3. Name your code file as YOUR_SRN.py
python3.7 SampleTest.py –SRN YOUR_SRN
You are required to complete code in week6.py and rename it to SRN.py
Failing to submit in the right format will lead to zero marks without a chance for correction
Example: If your SRN is PES1201801819 your file should be PES1201801819.py
• Delete all print statements if used for debugging before submission
• Ensure you run your file against sample test before submitting
• Check for syntax and indentation errors in your file, and make sure that tabs and spaces are used uniformly for indentation (i.e., if tabs are used then only tabs must be used throughout the file, and similarly for spaces)
• All the helper functions should be in the same file (SRN.py), make sure you run the sample test cases as indicated above before submitting.
Where to submit: Edmodo
Reviews
There are no reviews yet.