alt text

Intro to Pyplot

matplotlib.pyplot is a collection of functions that make matplotlib work like MATLAB. Each pyplot function makes some change to a figure. For example, creates a figure, creates a plotting area in a figure, plots some lines in a plotting area, decorates the plot with labels, etc.

In matplotlib.pyplot various states are preserved across function calls, so that it keeps track of things like the current figure and plotting area. The plotting functions are directed to the current axes (please note that "axes" here and in most places in the documentation refers to the axes part of a figure and not the strict mathematical term for more than one axis).

It allows you to create and customize the most common types of charts, including:

  • Bar charts.
  • Histogram.
  • Sector diagrams.
  • Violin diagrams.
  • Scatter plots or points.
  • Line diagrams.
  • Area diagrams.

Installing matplotlib

When you want to work with matplotlib locally, you should run the following command:

pip install matplotlib\ or\ conda install matplotlib In our case, 4Geeks have prepared all the environment in order that you can work comfortably.

Import the matplotlib package and the pyplot module in two different ways and call it "plt" (★☆☆)

Which one is "correct"? Check PEP8 (

In [ ]:

Chart creation with matplotlib

To create a graph with matplotlib it is usual to follow the following steps:

  • Import the pyplot module.
  • Define the figure that will contain the graph, which is the region (window or page) where it will be drawn and the axes on which the data will be drawn. For this, the subplots () function is used.
  • Plot the data on the axes. For this, different functions are used depending on the type of graph you want.
  • Customize the chart. For this, there are many functions that allow you to add a title, a legend, a grid, change colors or customize the axes.
  • Save the chart. For this, the savefig () function is used.
  • Show the graph. For this, the show () function is used.

An "empty" plot will be created if you run the following lines:

# empty plot
fig, ax = plt.subplots()

Create a scatter plot of the following vector list (★☆☆)

x = [1, 2, 3, 4], y = [1, 2, 0, 0.5]

Use ax.scatter

In [ ]:

Create a line plot of the following vector list (★☆☆)

x = [1, 2, 3, 4], y = [1, 2, 0, 0.5]

Use ax.plot

In [ ]:

Create an area plot of the following vector list (★☆☆)

x = [1, 2, 3, 4], y = [1, 2, 0, 0.5]

Use ax.fill_between

In [ ]:

Create a bar plot of the following vector list (★☆☆)

[1, 2, 3], [3, 2, 1]


In [ ]:

Create a horizontal bar plot of the following vector list (★☆☆)

[1, 2, 3], [3, 2, 1]

Use ax.barh

In [ ]:

Create a histogram plot of a vector array with normal distribution like $N ~ (2, 1.5)$ (★★☆)

Note: Remember np.random and use ax.hist.

In [ ]:

Draw a pie chart of the following vector list (★☆☆)

[5, 4, 3, 2, 1]

Use ax.pie

In [ ]:


The next image is a boxplot. A boxplot is a standardized way of displaying the distribution of data based on a five number summary (“minimum”, first quartile (Q1), median, third quartile (Q3), and “maximum”). It can tell you about your outliers and what their values are. It can also tell you if your data is symmetrical, how tightly it is grouped, and if and how it is skewed.

alt text

Boxplots have the following characteristics:

  • Median (Q2/50th Percentile): the middle value of the Dataset.
  • First quartile (Q1/25th Percentile): the middle number between the smallest number (not the “minimum”) and the median of the Dataset.
  • Third quartile (Q3/75th Percentile): the middle value between the median and the highest value (not the “maximum”) of the Dataset. -Iinterquartile range (IQR): 25th to the 75th percentile.
  • Whiskers (shown in blue).
  • Outliers (shown as green circles).
  • “Maximum”: Q3 + 1.5 * IQR.
  • “Minimum”: Q1 - 1.5 * IQR.

What defines an outlier, “minimum” or “maximum” may not be clear yet. We will explain it in the bootcamp.

Create a boxplot of the following vector list (★☆☆)

Use ax.boxplot

In [ ]:

Pick a random image from Google, read it with openCV and then make a plot (★★☆)

Use cv2.imread\ Use ax.imshow \

Do you see the image exactly as it is? What do you think happened?

In [ ]:

Change the appearance of charts

The graphics created with matplotlib are customizable and the appearance of almost all its elements can be changed. The elements that are most often modified are:

  • Colors.
  • Point markers.
  • Lines style.
  • Titles.
  • Axes.
  • Legend.
  • Rack.

To change the color of the objects, use the parameter color = color-name, where color-name is a string with the name of the color from among the available colors.

Check the full list here:

Create a line plot with the following vector lists and customize the colors, markers, line style and titles (★★☆)

In [ ]:

Multiple charts

It is possible to draw several graphs in different axes in the same figure organized in table form. To do this, when the figure and the axes are initialized, the number of rows and columns of the table that will contain the graphs must be passed to the subplots function. With this, the different axes are organized in an array and each of them can be accessed through their indexes. If you want the different axes to share the same limits for the axes, you can pass the parameters sharex = True for the x axis or sharey = True for the y axis.

In [ ]:
## Input
s = pd.Series({'Math': 6.0,  'Economy': 4.5, 'Programming': 8.5})

Run the following code and check what happens

In [ ]:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(2, 2, sharey = True)
days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
temperature = {'Madrid':[28.5, 30.5, 31, 30, 28, 27.5, 30.5], 'Barcelona':[24.5, 25.5, 26.5, 25, 26.5, 24.5, 25]}
ax[0, 0].plot(days, temperature['Madrid'])
ax[0, 1].plot(days, temperature['Barcelona'], color = 'tab:orange')
ax[1, 0].bar(days, temperature['Madrid'])
ax[1, 1].bar(days, temperature['Barcelona'], color = 'tab:orange')

Repeat 4 plots you already done previously in 4 subplots (★★☆)

In [ ]:

Integration with Pandas

Matplotlib integrates seamlessly with the Pandas library, allowing graphs to be drawn from data from Pandas series and DataFrames.

Check the following code which do the same as above:

In [ ]:
import pandas as pd 
import matplotlib.pyplot as plt

df = pd.DataFrame({'Days':['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'], 
                   'Madrid':[28.5, 30.5, 31, 30, 28, 27.5, 30.5], 
                   'Barcelona':[24.5, 25.5, 26.5, 25, 26.5, 24.5, 25]})
fig, ax = plt.subplots()
df.plot(x = 'Days', y = 'Madrid', ax = ax)
df.plot(x = 'Days', y = 'Barcelona', ax = ax)

Read the file titanic_train.csv located in this folder and plot the mean of Age by the group given by Sex (★★★)

In [ ]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

pop = pd.read_csv("pop_hist.csv")
pop = pop.iloc[np.where(pop.AGE=="TOTAL") and np.where(pop.LOCATION!="OECD") and (np.where(pop.LOCATION=="ESP") or np.where(pop.LOCATION=="USA")) ]
pop = pop.loc[:,["LOCATION","TIME","Value"]]
pop["Value"] = pop["Value"].astype(int)

fig, ax = plt.subplots()
pop.plot(x = 'TIME', y = 'Value', ax = ax)
pop.plot(x = 'TIME', y = 'Value', ax = ax)

What is 3D Data Visualization?

3-D Dimensional data provides the perception of depth, width, and height (it can be viewed from any angle). Three-dimensional visualizations were developed to provide both qualitative and quantitative information about an object. 3D visualizations are visualized with the three-phase process of scene, geometry, and rendering. Datasets increase in size, the need for analysis and visualization tools for the data also becomes essential.

Analysis operations, like visualization operations, may be either scene-based or object-based and deal with methods of quantifying object information.

Some examples of 3D shapes are prisms, pyramids, spheres, cones, cubes, or even figures!!!! 😲😲.

alt text

Why 3D Visualization Matters?

Three-Dimensional visualizations represent visualizations in all angles with just turning off the camera in the scene. While considering the two-dimensional formats, there is a limit on how much information to take and use visualization for making decisions, planning and targeting customers. Three-Dimensional visualization allows to draw which character of the scene changed. It easily communicates with the internal features. Some of the applications include GIS(Geographic Information Systems), geographic visualizations in a three-dimensional view, provides more interaction which is essential for understanding. It gives a sense of immersion of the environment where the user appreciates the scale of change and visualizes the impact of building design on the external environment and the inhabitants. GIS examples will include city planning, build information planning, coastal analysis, and modeling and wind farm assessment.

However, we will see in the following lessons that we can make visualizations in 4D and even these surfaces are not too common in data analysis.

In [ ]:
# We enable three-dimensional plots by importing the mplot3d toolkit
from mpl_toolkits import mplot3d

# Once this submodule is imported, we can create a three-dimensional axes by passing the keyword projection='3d' to any of the normal axes creation routines

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

fig = plt.figure()
ax = plt.axes(projection='3d')

Change the name of columns of the above DataFrame using two different methods (★★☆)

Check the function rename: (

In [ ]:
#Three-Dimensional Contour Plots

def f(x,y):
    return np.sin(np.sqrt(x**2+y**2))

x = np.linspace(-6, 6, 30)
y = np.linspace(-6, 6, 30)


ax = plt.axes(projection='3d')
ax.contour3D(X, Y, Z, 50, cmap='binary')

#Sometimes the default viewing angle is not optimal, in which case we can use the view_init method to set the elevation and azimuthal angles.