Use a pandas Dataframe to read in and store the data -Re-write the ID3 decision tree code from…

programming assignment, we are going to investigate the accuracy of our ID3 decision tree implementation compared to the decision tree implemented in the sci-kit learn machine learning library. We are going to train the models to classify whether passengers on the RMS Titanic survived the shipwreck or not.

For this assignment, we are going to implement the following:

Write code to read a dataset in from a file

Dataset file name will be specified as a command line argument

Use a pandas Dataframe to read in and store the data

Re-write the ID3 decision tree code from the class notes to make use of an object oriented implementation of a tree

Instead of the True, False, tuple tree representation used by Joel Grus, implement an object oriented tree

Write code to implement K-fold cross validation

Value of K will be specified as a command line argument

Write code to compute classifier evaluation metrics

Accuracy

Precision

Recall

F1 score

Adapt the sci-kit learn example code provided in this document to compare the ID3 decision tree implementation to sci-kit learn’s implementation of a decision tree

Write code to plot the evaluation metric results of code executions with different values of K

The following sections describe each of these steps in more detail.

Note: for this assignment, code one Jupyter Notebook that tells the story of your data science endeavor with results. All python code (e.g. decision tree classes and functions, classification evaluation metrics, sci-kit learn’s DecisionTreeClassifier comparision) are to be implemented in .py files, and the results of the classification are to be included in the Jupyter Notebook.

Attachments:

"Is this question part of your assignment? We can help"

ORDER NOW