programming assignment, we are going to investigate the accuracy of our ID3 decision tree implementation compared to the decision tree implemented in the sci-kit learn machine learning library. We are going to train the models to classify whether passengers on the RMS Titanic survived the shipwreck or not.
For this assignment, we are going to implement the following:
Write code to read a dataset in from a file
Dataset file name will be specified as a command line argument
Use a pandas Dataframe to read in and store the data
Re-write the ID3 decision tree code from the class notes to make use of an object oriented implementation of a tree
Instead of the True, False, tuple tree representation used by Joel Grus, implement an object oriented tree
Write code to implement K-fold cross validation
Value of K will be specified as a command line argument
Write code to compute classifier evaluation metrics
Adapt the sci-kit learn example code provided in this document to compare the ID3 decision tree implementation to sci-kit learn’s implementation of a decision tree
Write code to plot the evaluation metric results of code executions with different values of K
The following sections describe each of these steps in more detail.
Note: for this assignment, code one Jupyter Notebook that tells the story of your data science endeavor with results. All python code (e.g. decision tree classes and functions, classification evaluation metrics, sci-kit learn’s DecisionTreeClassifier comparision) are to be implemented in .py files, and the results of the classification are to be included in the Jupyter Notebook.