Data Analytic Matplotlib (Guide)



Data visualization is an important part of Data Analysis. Matplotlib is the most popular data visualization module in python. Other module like Tensorflow uses matplotlib for visualization. Creating visualization refers to creating stuffs. Matplotlib is designed to be as usable as MATLAB, with the ability to use Python, and the advantage of being free and open-source. Matplotlib is designed with the philosophy that you should be able to create simple plots with just a few commands, or just one! If you want to see a histogram of your data, you shouldn’t need to instantiate objects, call methods, set properties, and so on; it should just work.

History of Matplotlib:
Matplotlib was originally written by John D. Hunter, has an active development community, and is distributed under a BSD-style license. Michael Droettboom was nominated as matplotlib’s lead developer shortly before John Hunter’s death in August 2012, and further joined by Thomas Caswell.
Matplotlib was initially released 16 years ago in 2003.

Need of Matplotlib:
Matplotlib is a python library used to create 2D graphs and plots by using python scripts. It has a module named pyplot, which makes things easy for plotting, by providing feature to control line styles, font properties, formatting axes etc. It supports a very wide variety of graphs and plots namely – histogram, bar charts, power spectra, error charts etc. It is used along with NumPy to provide an environment that is an effective open source alternative for MatLab. It can also be used with graphics toolkits like PyQt and wxPython.

The beauty of an art lies in the message it conveys. At times, reality is not what we see or perceive. The endless efforts from the likes of Vinci and Picasso have tried to bring people closer to the reality using their exceptional artworks on a certain topic/matter.
Data scientists are no less than artists. They make paintings in form of digital visualization (of data) with a motive of manifesting the hidden patterns / insights in it. It is even more interesting to know that, the tendency of human perception, cognition and communication increases when he / she gets exposed to visualized form of any content/data.

Installation and Importing:
Python provides a flexible framework for importing modules and specific members of a module. 
Installation of matplotlib with pip:
pip install matplotlib
Matplotlib library is present in Jupyter notebook and can be imported using:



Matplotlib, pyplot and pylab:
Everything in matplotlib is organized in a hierarchical manner. At the top of the hierarchy is the matplotlib “state machine environment” provided by the matplotlib.pyplot module. Pyplot state machine environment behaves similarly to MATLAB.
Matplotlib.pyplot is a state based interface to matplotlib and pylab is a module that gets installed alongside matplotlib. Pyplot is mainly intended for interactive plots and simple cases of plot generation.
The Matplotlib code is conceptually divided into three parts: the pylab interface is the set of functions provided by matplotlib.pylab which allow the user to create plots with code quite similar to MATLAB figure generating code (Pyplot tutorial). The Matplotlib frontend or Matplotlib API is the set of classes that do the heavy lifting, creating and managing figures, text, lines, plots and so on (Artist tutorial). This is an abstract interface that knows nothing about output. The backend are device-dependent drawing devices, which transform the frontend representation to hardcopy or a display device.
A simple plot using matplotlib:





In jupyter notebook you can include %matplotlib inline for setting the backend of matplotlib of the inline backend.





With this backend, the output of plotting commands is displayed inline within frontends like the jupyter notebook, directly below the code cell that produced it. The resulting plots will then also be stored in the notebook document.
Sometimes we encounter irritating messages like the one shown below along with the output of the code





To avoid these use semi-colon in the end of code.

Passing alpha parameter to the plot can change transparency of the plot.
Alpha can be set from 0 to 1,where 0 is completely transparent  and 1 is no transparency.
For the above plot passing alpha parameter results in:



Customizing Matplotlib:
Properties of matplotlib can be customized using style sheets and rcParams.

Using style sheets:
The style package adds support for easy to switch plotting styles. Numerous predefined styles are provided by matplotlib that can be listed using:


To use any of these style, add plt.style.use(). Above plot using style sheet:

Temporary styling can be done using plt.style.context(). Unlike plt.style.use() this will not change global styling but only style a specific block. The context manager limits the changes to a specific scope.
Above plot using plt.style.context():

Using Matplotlib rcParams:
Matplotlib.rcParams is an instance of RcParams for handling default matplotlib values.  Using rcParams with the previous plot:

Data: Easily understandable and more presentable
Presentation of data is crucial in any field.  You will not be able to make the best out of the data if they are not presentable and appropriate. One of the most convincing and appealing ways in which statistical data can be presentable is through diagrams and graphs. Evidence of this can be found in newspapers, magazines, journals, advertisements, etc.
Matplotlib tries to make easy things easy and hard things possible. You can generate plots, histograms, power spectra, bar charts, error charts, scatterplots, etc., with just a few lines of code.

Reading csv file iris.csv and representing it graphically:

Counting number of elements in each species:
To view basic statistical details of the data:

Scatter plot with color coding for each flower species :


Using above plot sepal_width and sepal_length can be easily compared for each species.

Comments

Post a Comment