Python Lesson 2
Lesson outline
Working with pics.
More about NumPy.
Introduction to data representation with
matplotlib
.Exercises
Working with pics
Import NumPy
as in the previous lesson and pyplot
and image
libraries from matplotlib
.
import numpy as np
from matplotlib import pyplot as plt # import pyplot function from matplotlib library
from matplotlib import image # import image function from matplotlib library
Read a jpg
figure into an array and show it
imgarray=image.imread("iberian-lynx.png")
imgplot=plt.imshow(imgarray)
Let's examine the array shape
imgarray.shape
This is an RGB array with three values assigned per pixel. Let's now perform some basic manipulation of this array. We select the three RGB channels, the red, green and blue ones, transforming them to 2D arrays and showing them as heat maps
red_imgarray, gr_imgarray, bl_imgarray = imgarray[:,:,0], imgarray[:,:,1], imgarray[:,:,2]
plt.imshow(red_imgarray, cmap='gray')
plt.show()
plt.imshow(gr_imgarray, cmap='gray')
plt.show()
plt.imshow(bl_imgarray, cmap='gray')
Change to a color map
imgplot=plt.imshow(red_imgarray,cmap="hot")
More about NumPy
Apart from reading data from files or, as we will see in the next lesson, transforming native Python structures into NumPy ndarrays using np.array
NumPy provides a set of commands for the creation of arrays
ones
: Given array dimensions, it outputs an array with the given shape filled with the value 1.ones_like
: Given an array, it outputs an array with the same dimensions and filled with the value 1.zeros
: Given array dimensions, it outputs an array with the given shape filled with the value 0.zeros_like
: Given an array, it outputs an array with the same dimensions and filled with the value 0.empty
: Given array dimensions, it outputs an array with the same dimensions and with empty values (unitialized, be careful, getting into the wild side…).empty_like
: Given an array, it outputs an array with the same dimensions and with unitialized values.full
: Given array dimensions, it outputs an array with the same dimensions and with all elements equal to a given value.full_like
: Given an array, it outputs an array with the same dimensions and with all elements equal to a given value.eye
,identitiy
: Given a square array dimension, it outputs a unit array (diagonal array) with the given shape.arange
: Given start, stop [and step/] values, creates a 1D ndarray of evenly spaced values with /start as its first element, start + step the second, start + 2 step the third, and so on.linspace
: Given start, stop [and N/] values, creates a 1D ndarray of exactly /N evenly spaced values with start as its first element and stop as the last one.
NumPy offers many types of data, with different dtype
, for its storage in arrays. We are mainly interested in numerical data types, that are indicated by the prefix float (floating point numbers) or int (exact integer numbers) followed by a number indicating the number of bits per element. The standard double-precision floating point value is float64 (requires storage in 8 bytes) and the standard integer is int64. NumPy accepts complex values.
One of the main advantages of NumPy is vectorization, the possibility of performing simultaneously batches of operations in arrays without explicit loops. For example, we define a couple of arrays of random numbers and perform some operations with them
array_a = np.random.randn(3,3)
array_b = np.random.randn(3,3)
print(array_a, "\n\n", array_b, "\n\n", 10.0/(array_a + array_b))
#
print("\n\n")
#
print(array_a, "\n\n",array_b,"\n\n", np.sqrt(array_a**2 + array_b**2))
The function np.sqrt
is an example of what is called an universal function (ufunc) that performs element-wise operations in data arrays. You can find a list of such NumPy functions in https://docs.scipy.org/doc/numpy-1.14.0/reference/ufuncs.html. Among them you can find the mathematical constants np.pi
and np.e
and the imaginary unit denoted as 1j
..
One needs to be very aware that when working with NumPy arrays -and other data structures- Python uses the so called pass by reference and not the pass by value strategy of other programming languages. This means that an assignment implies a reference to data in the righthand side. This is completely different of what happens when we work with scalar data. If we execute
scalar_c = 8.5
scalar_c_2 = scalar_c
array_c = array_b[:2,:2]
#
print("array_b = ", array_b, "\n\n","array_c = ", array_c)
print("scalar_c = ", scalar_c, "\n\n","scalar_c_2 = ", scalar_c_2)
#
print("\n\n")
#
array_c[:] = 100.0
scalar_c_2 = 100.0
#
print("array_b = ", array_b, "\n\n","array_c = ", array_c)
print("scalar_c = ", scalar_c, "\n\n","scalar_c_2 = ", scalar_c_2)
Therefore array_b
and array_c
are bound to the same ndarray
object. This is due to the need of optimizing the work with large matrices. A side effect of this is that you cannot assign values to elements of an array that has not been previously created (the function np.zeros
is often used for this purpose). If you want a copy of the original matrix you can either use the copy
method
array_d = array_a[:2,:2].copy()
#
print(array_a, "\n\n", array_d)
#
print("\n\n")
#
array_d[:] = 1000.0
#
print(array_a, "\n\n", array_d)
NumPy also allows to index using integer arrays, something called fancy indexing. In this case the resulting array is copied and it is not a reference to the original array. This can be seen in the following example
array_e = np.empty((10,10))
for value in range(-10,10):
array_e[:, value] = value
print(array_e)
array_f = array_e[2:5,[-1,5,2,3,2]] # Selecting a subset of columns and slicing the rows
print(array_f)
print(array_e)
NumPy arrays can be transposed using the transpose
method or the special T
attribute
print(array_a)
print(array_a.transpose())
print()
print(array_f)
print(array_f.T)
This is useful for example when computing the inner matrix product using np.dot
print(np.dot(array_a.T, array_a))
print("")
print(np.dot(array_f.T, array_f))
However, to perform matrix multiplication it is preferred using np.matmul
or the a @ b
notation.
Two or more NumPy arrays can also be concatenated, building up a large array from smaller ones. This can be done with the hstack
and vstack
methods.
arr_a = np.random.randn(2,4)
arr_b = np.random.randn(2,4)
arr_horizontal=np.hstack((arr_a,arr_b))
print(arr_horizontal)
arr_vertical=np.vstack([arr_a,arr_b])
print(arr_vertical)
Notice that in the hstack~(~vstack
) case the number of rows(columns) in the arrays combined should be the same. These two are convenience functions, wrappers to the more general function concatenate
arr_v = np.concatenate([arr_a,arr_b])
arr_h = np.concatenate([arr_a,arr_b],axis=1)
print(arr_h)
print(arr_v)
Data in an array can also be flattened, tranforming the array into a vector (a one-dimensional array). This can be done with the NumPy ravel
or flatten
functions, both can act as a function or an array method.
arr_c = np.random.randn(4,4)
vec_c_0 = arr_c.ravel() # Equivalent to np.ravel(arr_c)
vec_c_1 = arr_c.flatten() # Equivalent to np.flatten(arr_c)
if (np.array_equal(vec_c_0, vec_c_1)): # Comparing two arrays.
print(vec_c_0)
Note how we check if the two vectors created are equal. The NumPy function np.array_equal
check if two arrays have identical shape and elements. You cannot check if two arrays are equal using the usual ==
conditional operator (try it). Both methods leave arr_c
unchanged, but the ravel
method provides an ndarray
vector with access to the original data, while flatten
copy the data and creates an independent object.
print(arr_c)
vec_c_0[0] = 1000.0
print(arr_c)
vec_c_1[0] = 10.0
print(arr_c)
The comparison between arrays yields Boolean arrays
print(array_a, "\n\n",array_b, "\n\n",array_a > array_b)
And you can use this Boolean arrays for indexing. In the example that follows we define a new matrix that only has negative non-zero elements, replacing the positive elements by zero.
boolean = array_a > 0
print(boolean)
array_e = array_a.copy()
array_e[boolean] = 0
print(array_a, "\n\n",array_e)
This is called vectorized computation, one of the greatest advantages of NumPy. We can, for example, select the positive elements of an array If you want to create a new array with the same shape of arr_c
and with 0 in negative elements and 1 in positive elements you can easily do this in vectorized form, without loops (see Lesson 3)
arr_e = np.copy(arr_c)
arr_e[arr_c>0]=1
arr_e[arr_c<0]=0
print(arr_e)
Be aware that Boolean selection will NOT fail if the Boolean array has not the correct shape and this can be error prone. We will learn a better way for doing this in Lesson 5, using the np.where
function.
Working with array you can construct complex conditionals combining simpler expressions with the logical operators &
(and) and |
(or) (the keywords and
and or
do not work in this context. For example
arr_f = np.copy(arr_c)
bool_mask = (arr_c > 1) | (arr_c < -1)
arr_f[bool_mask] = 2.0
print(arr_c,"\n", arr_f)
Selecting data with Booleans arrays always creates a copy of the original date, even if the data are unchanged.
Basic Data Plotting
We repeat what we did in the first lesson, reading one of the files with monthly temperatures and stripping the year from the array.
metdata_orig = np.loadtxt(fname='files/TData/T_Alicante_EM.csv', delimiter=',', skiprows=1)
metdata = metdata_orig[:,1:]
We can plot the array directly as a heat map
plt.imshow(metdata)
Using a different color map
plt.imshow(metdata, cmap="BrBG")
This is of limited utility. Let's compute and plot the mean monthly temperatures
ave_monthly = np.mean(metdata, axis=0)
ave_monthly_plot = plt.plot(ave_monthly)
and the average annual temperatures
ave_annual = np.mean(metdata, axis=1)
ave_annual_plot = plt.plot(ave_annual)
In the same fashion we can also plot the maximum and minimum monthly temperatures
max_monthly = np.max(metdata, axis=0)
min_monthly = np.min(metdata, axis=0)
max_monthly_plot = plt.plot(max_monthly)
min_monthly_plot = plt.plot(min_monthly)
And the annual maximum and minimum temperatures
max_annual = np.max(metdata, axis=1)
min_annual = np.min(metdata, axis=1)
max_annual_plot = plt.plot(max_annual)
min_annual_plot = plt.plot(min_annual)
This is the most basic plotting in pyplot
. You can improve the figure appearence as follows
fig, ax = plt.subplots()
ax.plot(max_monthly)
ax.plot(min_monthly)
ax.set_title("Cyprus Temperature Dataset")
ax.set_xlabel("Month (0-11)")
ax.set_ylabel("Max and Min average T (ºC)")
You can now solve exercises 2.1 and 2.2
We can combine several plots in a multi-panel figure
fig, ax = plt.subplots(nrows=2, ncols=2)
fig.tight_layout(pad=3.0)
ax[0,0].plot(max_monthly)
ax[0,1].plot(min_monthly)
ax[1,0].plot(max_annual)
ax[1,1].plot(min_annual)
You can now solve exercise 2.3.
Exercises
Exercise 2.1: Plot the monthly and annual difference between max and min temperatures as a function of the month (1-12) and the year (1961-2096), respectively. In this case try to combine the
plt.plot
andplt.scatter
functions. Hint: the plot function accept the syntaxplt.plot(x,y)
.Exercise 2.2: Plot the standard deviation of the monthly and annual temperatures as a function of the month (1-12) and the year (1961-2096), respectively. Hint: check the std function in NumPy.
Exercise 2.3: Prepare a plot with two panels (arranged as you wish) which depicts the annual dependence of the average Spring and Fall temperatures for meteorological seasons: Spring (Mar, Apr, May) and Fall (Sep, Oct, Nov).
Last updated
Was this helpful?