My last decision tree post ended with successfully building a tree model using the sklearn library. But just checking how well the (supervised) tree model predicts the known classification, that alone isn’t all I’m interested in. I want to see the tree and look at its structure. Fortunately, that’s not very difficult. What you need are two Python libraries.
If they’re not already installed, it’s easy enough to do so. Although you might run into some trouble, as I did, by not installing them in the correct order. So I recommend the following:
(base) C:\WINDOWS\system32\conda install graphviz Solving environment: done [...] The following NEW packages will be INSTALLED: graphviz: 2.38.0-4 (base) C:\WINDOWS\system32\pip install graphviz Successfully installed graphviz-0.8.2 (base) C:\WINDOWS\system32\conda install pydotplus Solving environment: done [...] The following NEW packages will be INSTALLED: pydotplus: 2.0.2-py36_0 (base) C:\WINDOWS\system32\pip install pydotplus Requirement already satisfied: pyparsing>=2.0.1 in c:\programdata\anaconda3\lib\site-packages (from pydotplus)
If you’re getting an error, trying “import graphviz” in Python,…
InvocationException: GraphViz's executables not found
Then you need to add graphviz to your Windows Path (environmental variables). Once that’s done, you should be able to do…
#tree is: from sklearn import tree - see last post with open("c:/tree.dot", 'w') as dotfile: tree.export_graphviz(clf, out_file = dotfile, feature_names = X.columns)
This writes the tree to a ‘dot’ file. This is actually an established format for this kind of information. DOT being a graph description language. To display the tree you can use the code the below. There are various ways to do this, and you don’t have to create a file first the way I did.
import pydotplus from IPython.display import Image import graphviz graph = pydotplus.graphviz.graph_from_dot_file("c:\\tree.dot") Image(graph.create_png())
This plots the following tree diagram to the screen. It’s a bit overwhelming in fact.
If you want to save the file, you could do…
And Python drops a new PNG in your directory of choice:
, then you add some color and rounded boxes…
There is all kinds of configuration to allow fine tuning your tree and visualization. If you do something like the below, inspired by statinfer.com
Read all about Graphviz/Pydotplus reference here – http://pydotplus.readthedocs.io/reference.html.
There are lots of links about building Decision Trees with Scikit-learn on the web. Here are is just a selection of those I found helpful:
Entire walkthrough, not unlike mine, plus random forests… http://adataanalyst.com/scikit-learn/decision-trees-scikit-learn/
Walkthrough that also discusses GINI – http://dataaspirant.com/2017/02/01/decision-tree-algorithm-python-with-scikit-learn/
This article talks about information gain and impurity, likely a topic for one of my next tree related posts –http://benalexkeen.com/decision-tree-classifier-in-python-using-scikit-learn/
Another example from hackernoon – https://hackernoon.com/a-brief-look-at-sklearn-tree-decisiontreeclassifier-c2ee262eab9a
Finally, a much more comprehensive discussion worth reading twice – http://stackabuse.com/decision-trees-in-python-with-scikit-learn/