> For the complete documentation index, see [llms.txt](https://curropb.gitbook.io/python-notes/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://curropb.gitbook.io/python-notes/python-lesson-5.md).

# Python Lesson 5

## Lesson outline

1. Input and Output: dealing with files in Python and with NumPy
2. More on NumPy
3. More on Graphics
4. Exercises

## Writing and Reading data from files

We first explain the basics of file reading and writing in native Python and then move into the more specialized NumPy functions, better for dealing with data arrays.

### File access in native Python

Using the native Python function `open` (with an absolute or a relative trajectory with respect to the notebook active directory) and the method `close` allows to set a filehandle. By default files are opened in the `r` mode (readonly).

```
path = "~/.bashrc"
fileh = open(path) # fileh is the filehandle
```

You can treat the filehandle as an iterator and build a loop. In this case the number of characters in each line is printed

```
for line in fileh:
    print(len(line))
```

It is important to close the filehandle once finished with it to return resources to the processor.

```
fileh.close
```

The usual form to open files makes unnecessary the last step, and once the file is dealt with it is automatically closed

```
with open(path) as fileh:
    bashrc_lines = [line.rstrip() for line in fileh]
```

We are reading each line, removing the trailing spaces and carriage return into a list using a list comprehension.

The complete syntax for open would be `open(path, mode)` and the major modes are:

* **mode `r`:** Default mode. Read only.
* **mode `w`:** Write only. Beware, it deletes an existing file and overwrites it.
* **mode `x`:** Write only. It exits if the file exists.
* **mode `a`:** Write only. Append to the file if it exists.
* **mode `r+`:** Read and Write.
* **mode `b`:** add to the mode for binary files. E.g. "rb", "ab"…

The `write` and `writelines` methods allow to write into a filehandle. If, for example, we want to read a file and save it stripping the first and last lines we can run

```
with open("test_write.txt", "w") as tw_handle:
    tw_handle.writelines([line for line in open(path)][1:-2])
```

The most commonly used native Python file methods are

* **read(\[size]):** Read data from file, with optional argument of the number of bytes to read. The method returns the specified number of bytes from the file. Default is -1 which means the whole file.
* **readlines(\[size]):** Return list of lines from file, with optional argument of the number of bytes to read.
* **write(str):** Write string to file.
* **writelines(strings):** Write list of lines to file.
* **close():** Close the filehandle.
* **flush():** Flush the internal buffered I/O to the disk.
* **seek(pos):** Move to position `pos` in the file.
* **tell():** Tell the current position in the file.
* **closed:** `True` if the filehandle is closed.

### File access with NumPy

We have already used the NumPy function `loadtxt` that, together with `savetxt`, allows to load and save data to text files. For example

```
array_a = np.random.randn(3,3)
np.savetxt("array_a_saved.txt", array_a, delimiter=":")
!cat array_a_saved.txt
array_b = np.loadtxt("array_a_saved.txt", delimiter=":")
np.array_equal(array_a, array_b)
```

These functions are specially handy for reading `csv` and `tsv` data files

But using the `load` and `save` functions you can also save an array in binary format -saving cpu, time, and precision at the cost of not being able to open/edit the file- with standard suffix `npy`.

```
array_a = np.random.randn(5,5)
np.save("array_a_saved", array_a)
!ls -l array_a_saved.*
array_b = np.load("array_a_saved.npy")
np.array_equal(array_a, array_b)
```

With the function `savez` you can also save multiple arrays in an `npz` file that is not a compressed file. If you want to compress the data use the function `savez_compressed`

```
np.savez("savez_example", a=array_a, b=array_b, c=array_a)
np.savez_compressed("savez_example_compressed", a=array_a, b=array_b, c=array_a)
!ls -l savez*
```

Once you read them they are loaded into a hash-like NumPy object with the array argument names as keys

```
arr_hash = np.load("savez_example_compressed.npz")
print(type(arr_hash))
print(arr_hash["a"])
```

To improve portability, and especially when you read several files in your code, and probably in different code sections, it is considered a good practice to define variables for your home and data folders and then use the variables to define the path when needed. This improves the readability of your code and helps its maintenance. A possible example, using the dataset provided at the beginning of the course is the following:

```
home_dir = "/home/curro/"
data_dir = "files/TData/"
##
##
metdata_orig = np.loadtxt(fname=home_dir + data_dir + 'T_Alicante_EM.csv', delimiter=',', skiprows=1) 
```

### Saving Python objects with `pickle`

Pickling is used to save Python objects into a filesystem; lists, dictionaries, class objects, and more can be saved as a *pickle file*. Strictly speaking, pickling is the serializing and de-serializing of python objects to a byte stream. The opposite is *unpickling*.

This methodology is called *serialization*, *marshalling*, or *flattening* in other programming languages. Pickling is most useful when performing routine tasks on massive datasets or when you want to store, after sometimes massive calculations, a given Python object.

Once you import the `pickle` library, to save and load objects one makes use of the `pickle.dump` and `pickle.load` combined with the right filehandles. You can make this with the following two functions

```
import pickle

def save_obj(obj, name ):
    with open('./'+ name + '.pkl', 'wb') as f:
	pickle.dump(obj, f, pickle.HIGHEST_PROTOCOL)

def load_obj(name ):
    with open('./' + name + '.pkl', 'rb') as f:
	return pickle.load(f)
```

We can try saving a hash

```
test_hash = {0:"a", 1:"E",2:"I", 3: "J", 4: "H"}
save_obj(test_hash, "arr_hash_saving_test")

test_read = load_obj( "arr_hash_saving_test")
```

## More on NumPy

The vectorization provided by NumPy is a great advantage when dealing with large datasets and the computing times required are at least an order of magnitude less than the equivalent times for interpreted code alternatives.

We illustrate this with a simple example, plotting the 3D function `f(x,y) = np.exp(-xs ** 2 - ys ** 2)*xs**2*ys` as a heatmap

```
points = np.linspace(-2, 2, 200)
xs, ys = np.meshgrid(points, points) # Build a grid of x and a grid of y values
z = np.exp(-xs ** 2 - ys ** 2)*xs**2*ys
```

We plot the figure as a heatmap in the XY plane

```
plt.title("Image plot of $ x^2 y e^{-x^2 - y^2}$ for a grid of values")
plt.imshow(z, cmap=plt.cm.autumn); plt.colorbar()
```

Another powerful NumPy feature, already presented in Lesson 2, is the possibility of Boolean indexing. This is specially adequate when combined with the NumPy function `np.where`, a vectorized version of the standard Python ternary expression. The syntax for this function is `np.where(condition, Array_A, Array_B)`. When condition is `True` the corresponding element of `Array_A` is selected, the one of `Array_B` otherwise. In a previous example we replaced positive and negative values of a random array by one and minus one, respectively. This can be very conveniently done using `np.where`

```
arr_rand = np.random.randn(6,6)
np.where(arr_rand>0,1,-1)
```

You could also replace only negative values by -1

```
arr_rand = np.random.randn(6,6)
np.where(arr_rand>0,arr_rand,-1)
```

As you can see we can replace `arr_A` or `arr_B` by scalars. Avoiding the use of scalars, if we have two arrays and depending on the sign of a third array we can choose elements from one or the other.

```
arr_rand_A = np.random.randn(3,3)
arr_rand_B = np.random.randn(3,3)
arr_rand_C = np.random.randn(3,3)
print(arr_rand_A)
print(arr_rand_B)
print(arr_rand_C > 0)
np.where(arr_rand_C>0,arr_rand_A,arr_rand_B)
```

There are a set of statistical and mathematical functions available in NumPy, we quickly outline the main ones, applying them to a GOE matrix (Gaussian Orthogonal Ensamble, real and symmetric random matrices). We first define the matrix as a matrix of random normally distributed numbers with zero mean and unity variance, and then add it to its transpose, to get a symmetric matrix

```
GOE_arr = np.random.randn(20, 20)
GOE_arr += GOE_arr.T
```

We can now compute the mean and standard deviation of the full array or for the diagonal

```
GOE_arr.mean()
GOE_arr.std()
#
# Extract the diagonal of the array
np.diag(GOE_arr).mean()
np.diag(GOE_arr).std()
```

We already know that the option `axis` allows for limiting the calculation to rows or columns (or a given index in a multi index array)

```
print(GOE_arr.mean(axis=0))
print(GOE_arr.std(axis=0))
print(GOE_arr.mean(axis=1))
print(GOE_arr.std(axis=1))
```

Other useful functions are `cumsum` and `cumprod` for cumulative addition and product

```
print(GOE_arr.cumsum())
print(GOE_arr.cumprod())
```

Finally another useful NumPy methods are `min` (`argmin`) and `max` (`argmax`)

```
# min and argmin
print(GOE_arr.min())
print(GOE_arr.argmin())
# max and argmax
print(GOE_arr.max())
print(GOE_arr.argmax())
```

The indices are provided for a flattened array, if you need the *row* and *column* index values this can be obtained using the `np.unravel_index` function

```
print(np.unravel_index(GOE_arr.argmax(), GOE_arr.shape))
print(np.unravel_index(GOE_arr.argmin(), GOE_arr.shape))
```

You can sort values using the `np.sort` function or the `sort` method. There is a basic difference between these two. The function provides a sorted copy of the original array, while the method performs an in-place sort, without data copying.

```
# Sliced array copy 
GOE_arr_copy = GOE_arr[:4,:4].copy()

# Notice that array is sorted by row (axis = 1) by default
sorted_array = np.sort(GOE_arr_copy)
print(sorted_array)

# True if there are at least one non-zero element
print(np.any(sorted_array-GOE_arr_copy))

# In-place sorting
GOE_arr_copy.sort()
print(np.any(sorted_array-GOE_arr_copy))
```

Numpy also provides basic *linear algebra* functions, e.g. you can compute the trace of a matrix

```
print(np.trace(GOE_arr))
```

You can also calculate the product of two matrices with the `dot` or `matmul` method or function. Both provide the same result in the case of 2D array multiplication

```
dot_result = np.dot(GOE_arr,GOE_arr)
matmul_result=np.matmul(GOE_arr,GOE_arr)
#
np.all(np.equal(dot_result,matmul_result))
```

There are two main differences between these two functions. On the first hand multiplication by scalars is not allowed with `matmul` (use `*` instead). On the second hand, when dimension is larger than two, in the `matmul` case the operation is broadcasted together as if the matrices were elements residing in the last two indexes, respecting the signature `(n,k),(k,m)->(n,m)`. In the `dot` case it is a sum product over the last axis of first matrix and the second-to-last of the second matrix

```
a = np.random.rand(2,3,4)
b = np.random.rand(2,4,2)
c = np.matmul(a,b)
print(c)
print(c.shape)
d = np.dot(a,b)
print(d)
print(d.shape)
```

In the case of two vectors, `matmul` provides the inner product (without taking complex conjugate)

```
print(np.matmul(GOE_arr[0,:], GOE_arr[0,:]))
```

In Python 3.5 the `@` symbol works as an infix operator for matrix multiplication

```
print(np.any(np.matmul(GOE_arr, GOE_arr)-GOE_arr @ GOE_arr))
```

In `np.linalg` several functions of linear algebra are found.

* **`det`:** Matrix determinant
* **`eigvals`:** Eigenvalues of a square matrix
* **`eig`:** Eigenvalues and eigenstates of a square matrix
* **`eigvalsh`:** Eigenvalues of a symmetric or Hermitian square matrix
* **`eigh`:** Eigenvalues and eigenstates of a symmetric or Hermitian matrix
* **`inv`:** Inverse of a square matrix
* **`pinv`:** Compute the Moore-Penrose pseudo-inverse of a matrix.
* **`qr`:** Compute the QR decomposition.
* **`svd`:** Compute the singular value decomposition (SVD).
* **`solve`:** Solve the linear system `Ax = b` for `x`, where `A` is a square matrix.
* **`lstsq`:** Compute the least-squares solution to `Ax = b`.

For example

```
print(np.linalg.trace(GOE_arr))
print(np.linalg.det(GOE_arr))
print(np.linalg.eigvalsh(GOE_arr))
```

NumPy is also very efficient in pseudorandom number generation as it also applies vectorization techniques. The function `np.random.randn` generates arrays of random numbers from a standard normal distribution (zero mean and unity standard deviation) but you can draw numbers from a Gaussian with any other mean or standard deviation making use of the `np.random.normal` function

```
gaussian_array = np.random.normal(loc=10.0, scale=2.0, size=(3,3))
print(gaussian_array)
```

Instead of the NumPy option, you could have used the `normalvariate` function in `random` library, but the lack of vectorization makes this approach far slower for large datasets. We can benchmark them

```
from random import normalvariate
N = 1000000
# non-vectorized
%timeit gaussian_samples = [normalvariate(0,2) for _ in range(N)]
# vectorized
%timeit gaussian_samples_vec = np.random.normal(scale = 2, size = (N))
```

Notice that, depending on the system, the gain can be as large as one order of magnitude. Notice also that in the list comprehension we have used the conventional name for the loop variable in case the variable is not used in the loop body: `_`.

These numbers are called pseudorandom as they are not true random numbers and are derived through a deterministic algorithm that makes use of a *seed* value. To check or replicate your results you can fix the seed value

```
np.random.seed(123454321)
```

This is a global random seed value, you can also create different pseudorandom number generators isolated from each other

```
rng1 = np.random.RandomState(1234)
rng2 = np.random.RandomState(43211234)
print(rng1.randn(10))
print(rng2.randn(10))
```

Other available functions provided in `np.random` are

* **`permutation`:** Return a random permutation of a sequence, or return a permuted range
* **`shuffle`:** Randomly permute a sequence in-place
* **`rand`:** Draw samples from a uniform distribution
* **`randint`:** Draw random integers from a given low-to-high range
* **`randn`:** Draw samples from a normal distribution with mean 0 and standard deviation 1 (MATLAB-like interface)
* **`binomial`:** Draw samples from a binomial distribution
* **`normal`:** Draw samples from a normal (Gaussian) distribution
* **`beta`:** Draw samples from a beta distribution
* **`chisquare`:** Draw samples from a chi-square distribution
* **`gamma`:** Draw samples from a gamma distribution
* **`uniform`:** Draw samples from a uniform \[0, 1) distribution

## More on Graphics

We are going to elaborate on a seemingly neverending topic. You can find the complete Matplotlib documentation in <https://matplotlib.org/3.1.1/index.html>. We will cover some basic aspects with examples. Now, let's load the library and create some data to display and learn some useful techniques

```
import numpy as np
import matplotlib.pyplot as plt
#
x_data = np.linspace(0,4*np.pi,500)
y_data = np.cos(x_data**2)*np.cosh(x_data)
```

Let's first create a plot with a single panel. The recommended way is using `subplot` without arguments, as follows

```
import numpy as np
import matplotlib.pyplot as plt
#
x_data = np.linspace(0,4*np.pi,500)
y_data = np.cos(x_data**2)/np.cosh(x_data/5)

fig, ax = plt.subplots()
ax.plot(x_data, y_data, label="Some data")
ax.set_title('Single plot', fontsize = 20) # Set plot title and fontsize
ax.set_xlabel("Angle $\\theta$ (rad)", fontsize = 16) # Set x axis label and fontsize
ax.set_ylabel("F(w) (a.u.)", fontsize = 16) # idem for y axis
ax.legend() # Display labels
```

We have customized the `x` and `y` axis labels, the plot title and included a legend. There are several ways of plotting various curves in the same panel. The easiest one is to run one plot instance for each curve, as in the example that follows

```
x_data = np.linspace(0,4*np.pi,500)
y1_data = np.cos(x_data**2)/np.cosh(x_data/5)
y2_data = np.cos(x_data**3-5*x_data**2)/np.cosh(x_data/2)

fig, ax = plt.subplots()
ax.plot(x_data, y1_data, label="Data 1")
ax.plot(x_data, y2_data, label="Data 2")
ax.set_title('Single plot, several curves', fontsize = 20) # Set plot title and fontsize
ax.set_xlabel("Angle $\\theta$ (rad)", fontsize = 16) # Set x axis label and fontsize
ax.set_ylabel("F(w) (a.u.)", fontsize = 16) # idem for y axis
ax.legend()
```

We will later see other ways of plotting several curves. We can customize the style of the line and include symbols in the points in the plot command using one of the set of [Line 2D](https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.lines.Line2D.html#matplotlib.lines.Line2D) properties. For example

* **`c` or&#x20;**~~**color**~~**=/color/:** Control line color. Possible color abbreviations: `{'b', 'g', 'r', 'c', 'm', 'y', 'k', 'w'}`. You can also use colors from the [xkcd color name survey](https://xkcd.com/color/rgb/) with the prefix `xkcd:` or an RGB or RGBA (red, green, blue, alpha) tuple of float values or hex string.
* ~~**alpha**~~**=/float/:** Set the alpha value used for blending. This is not supported on all backends.
* **`ls` or `linestyle`:** Control line style. Possible options: `{'-', '--', '-.', ':', '', (offset, on-off-seq), ...}`.
* **`lw` or `linewidth`** ***float*****:** Control linewidth.
* **`marker`** ***markerstyle*****:** Control marker style. For possible options check [Matplotlib Markers](https://matplotlib.org/3.1.1/api/markers_api.html#module-matplotlib.markers).
* **`markersize` or `ms`** ***float*****:** Control marker size in points.
* **`markevery`** ***None or int or (int, int) or slice or List\[int] or float or (float, float)*****:** Control markers display

In this example we use some of these parameters

```
x_data = np.linspace(0,4*np.pi,500)
y1_data = np.cos(x_data**2)/np.cosh(x_data/5)
y2_data = np.cos(x_data**3-5*x_data**2)/np.cosh(x_data/2)

fig, ax = plt.subplots()
ax.plot(x_data, y1_data, label="Data 1", c="b", ls="-.", alpha = 0.6, marker="o", markersize=4)
ax.plot(x_data, y2_data, label="Data 2", linestyle="-", color="xkcd:olive", lw=2.0, marker = "+", markevery=3)
ax.set_title('Single plot, several curves', fontsize = 20) # Set plot title and fontsize
ax.set_xlabel("Angle $\\theta$ (rad)", fontsize = 16) # Set x axis label and fontsize
ax.set_ylabel("F(w) (a.u.)", fontsize = 16) # idem for y axis
ax.legend()
```

As already mentioned in the exercises of Lesson 2, you can also depict data using the `pyplot.scatter` function and another useful tool is the `pyplot.bar`. Let's try these two. We can try `scatter` with the previous function

```
x_data = np.linspace(0,4*np.pi,500)
y1_data = np.cos(x_data**2)/np.cosh(x_data/5)

fig, ax = plt.subplots()
ax.scatter(x_data, y1_data, s=10, label="Data 1", c="b", alpha = 0.6)
ax.set_title('Single scatter plot', fontsize = 20) # Set plot title and fontsize
ax.set_xlabel("Angle $\\theta$ (rad)", fontsize = 16) # Set x axis label and fontsize
ax.set_ylabel("$F(\theta)$ (a.u.)", fontsize = 16) # idem for y axis
ax.legend() 
```

And we can plot the squared components of two GOE arrays eigenvectors as a bar plot. In this case we fix the figure width and height using the `figsize` option in `subplots` (default units are inches, you can change to `cm` using a conversion factor `cm = 1/2.54`).

```
x = np.diag(GOE_arr)  # Using the diagonal GOE_arr values as an index
width = 0.11  # bars width

fig, ax = plt.subplots(figsize=(9,5))
rects1 = ax.bar(x - width/2, avec_GOE[:,0]**2, width, label="G.S.")
rects2 = ax.bar(x + width/2, avec_GOE[:,6]**2, width, label='6th exc. state')

# Add text for labels, title etc.
ax.set_ylabel('Squared Eigenvector Components')
ax.set_xlabel('GOE diagonal value')
ax.set_title('Diagonal value of the basis state in the GOE matrix')
ax.legend()
```

You can also include insets into the plot. A simple way is declaring a new axis in the figure, you can provide in unitless percentages of the figure size the position of the new axis -(0,0 is bottom left)- and the width and height of the inset.

```
x_data = np.linspace(0,3*np.pi,700)
y_data = ((np.sin(6*x_data**2-x_data**0.5)+2)*np.exp(-x_data/2))**2

fig, ax = plt.subplots(figsize=(9,9))
ax.plot(x_data, y_data, label="$[sin{(6\\theta^2-\sqrt{\\theta})}+2]^2e^{-\\theta}$")
ax.set_xlabel("Angle $\\theta$ (rad)", fontsize = 20) # Set x axis label and fontsize
ax.set_ylabel("$F(\\theta)$ (a.u.)", fontsize = 20)   # idem for y axis
ax.tick_params(axis='x', labelsize=16) # size of x axis tick labels
ax.tick_params(axis='y', labelsize=16) # idem for y axis
# 
left, bottom, width, height = [0.55, 0.35, 0.25, 0.3]
ax_new = fig.add_axes([left, bottom, width, height])

ax_new.plot(x_data[:200], (np.sin(6*x_data[:200]**2-x_data[:200]**0.5)+2)**2 , color='green')
ax_new.tick_params(axis='x', labelsize=14) # size of x axis ticks
ax_new.tick_params(axis='y', labelsize=14) # idem for y axis

ax.legend(fontsize = 22, loc="upper right")
```

You can get a finer control using the `inset_axes` function

```
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
#
x_data = np.linspace(0,3*np.pi,700)
y_data = ((np.sin(6*x_data**2-x_data**0.5)+2)*np.exp(-x_data/2))**2

fig, ax = plt.subplots(figsize=(9,9))
ax.plot(x_data, y_data, label="$[sin{(6\\theta^2-\sqrt{\\theta})}+2]^2e^{-\\theta}$")
#ax.set_title('Single plot, several curves', fontsize = 20) # Set plot title and fontsize
ax.set_xlabel("Angle $\\theta$ (rad)", fontsize = 20) # Set x axis label and fontsize
ax.set_ylabel("$F(\\theta)$ (a.u.)", fontsize = 20)   # idem for y axis
ax.tick_params(axis='x', labelsize=16) # size of x axis ticks
ax.tick_params(axis='y', labelsize=16) # idem for y axis
axins = inset_axes(ax, width="30%", height="40%", loc=1, borderpad = 2)
axins.plot(x_data, np.log(y_data))
axins.set_xlim(0,3)  # Setting the x coordinate range in the inset
axins.set_ylim(-3,2) # Setting the y coordinate range in the inset
axins.set_ylabel("$\\log(F(\\theta))$ (a.u.)",fontsize = 12)
axins.tick_params(axis='x', labelsize=14) # size of x axis ticks
axins.tick_params(axis='y', labelsize=14) # idem for y axis
ax.legend(fontsize = 22, bbox_to_anchor=(0.3,0.43)) # another way of locating the legend
```

### Subplots

If we need to add several subplots we can do it as we did in Lesson 2, but we can also make use of loops to automatize the task. Imagine that we need to plot the previous function `np.cos(x_data**A)/np.cosh(x_data/B)` for `(A,B) = {(1,2),(1,5),(2,2),(2,5),(3,2),(3,5)}`. We can use subplots as follows

```
x_data = np.linspace(0,2*np.pi,500)
#
A_parameter = [1,2,3]
A_index = list(range(0,3))
#
B_parameter = [2,5]
B_index = list(range(0,2))
#
fig, ax = plt.subplots(3,2,figsize=(9,7))
fig.suptitle("Title centered above subplots", fontsize=18)
#
for (A_value, A_i) in zip(A_parameter,A_index):
    for (B_value, B_j) in zip(B_parameter, B_index):
	#
	y_data = np.cos(x_data**A_value)/np.cosh(x_data/B_value)
	ax[A_i,B_j].plot(x_data, y_data, label="A = {0}, B = {1}".format(A_value, B_value))
	ax[A_i,B_j].set_xlabel("Angle $\\theta$ (rad)", fontsize = 14) # Set x axis label and fontsize
	ax[A_i,B_j].set_ylabel("F(w) (a.u.)", fontsize = 14) # idem for y axis
	ax[A_i,B_j].legend()
#
fig.tight_layout(pad=3.0) # Control the extra padding around the figure border and between subplots. The pads are specified in fraction of fontsize.
```

You can also plot several lines in each subplot

```
x_data = np.linspace(0,2*np.pi,500)
#
A_parameter = [1,2,3]
A_index = list(range(0,3))
#
B_parameter = [2,5]
B_index = list(range(0,2))
#
fig, ax = plt.subplots(3,2,figsize=(9,7))
#
for (A_value, A_i) in zip(A_parameter,A_index):
    for (B_value, B_j) in zip(B_parameter, B_index):
	y_data = np.cos(x_data**A_value)/np.cosh(x_data/B_value)
	y_data = np.vstack((y_data,np.sin(x_data**A_value)*np.tanh(x_data/B_value)))
	ax[A_i,B_j].plot(x_data, y_data.T, label="A = {0}, B = {1}".format(A_value, B_value))
	ax[A_i,B_j].set_xlabel("Angle $\\theta$ (rad)", fontsize = 14) # Set x axis label and fontsize
	ax[A_i,B_j].set_ylabel("F(w) (a.u.)", fontsize = 14) # idem for y axis
	ax[A_i,B_j].legend()
#
fig.tight_layout(pad=3.0)
fig.suptitle("Title centered above subplots", fontsize=18)
```

As all the panels share the same axis scaling you can only add ticks to the outer panels

```
x_data = np.linspace(0,2*np.pi,500)
#
A_parameter = [1,2,3]
A_index = list(range(0,3))
#
B_parameter = [1,2]
B_index = list(range(0,2))
#
fig, ax = plt.subplots(3,2,figsize=(9,7),sharex=True, sharey=True)
#
for (A_value, A_i) in zip(A_parameter,A_index):
    for (B_value, B_j) in zip(B_parameter, B_index):
	y_data = np.cos(x_data**A_value)/np.cosh(x_data/B_value)
	ax[A_i,B_j].plot(x_data, y_data, label="A = {0}, B = {1}".format(A_value, B_value))
	if A_i == 2:
	    ax[A_i,B_j].set_xlabel("Angle $\\theta$ (rad)", fontsize = 14) # Set x axis label and fontsize
	if B_j == 0:
	    ax[A_i,B_j].set_ylabel("F(w) (a.u.)", fontsize = 14) # idem for y axis
	ax[A_i,B_j].legend()
#
fig.tight_layout(pad=1)
fig.suptitle("Title centered above subplots", fontsize=18, y=1.020)
```

### Histograms

To generate 1D histograms a data vector is needed. We create one with the eigenvalues of a GOE array. We can create the array using a `randn`, a convenience function for users porting `Matlab` code

```
np.random.seed(123454321)
GOE_dim  = 2000
GOE_arr_old = np.random.randn(GOE_dim, GOE_dim)
GOE_arr_old += GOE_arr.T
aval_GOE_old = np.linalg.eigvalsh(GOE_arr)
```

But it is preferred to use the `default_rng` to get a new instance of a `Generator`, and then call its methods to obtain different distribution samples. You can find an quick reference to random numbers in NumPy in [this link](https://numpy.org/doc/stable/reference/random/index.html#random-quick-start). In the present case, for normally distributed random data

```
from numpy.random import default_rng
rng = default_rng()
GOE_arr = rng.standard_normal((GOE_dim, GOE_dim))
GOE_arr += GOE_arr.T
aval_GOE = np.linalg.eigvalsh(GOE_arr)
```

We can then plot the histogram, taking into account that the `hist` method returns as an output the number associated to each bin, the bins, and a set of patches that allows to operate on the histogram. In the left panel we display the absolute histogram while in the right one we normalize by the total number of counts and display a percentage.

```
n_bins = 51
fig, ax = plt.subplots(tight_layout=True)
bins_result_old = ax.hist(aval_GOE_old, bins=n_bins)
bins_result = ax.hist(aval_GOE, bins=n_bins)
```

You can combine a histogram with a plot. Let's plot the value of the theoretical GOE level density (a semicircle). Let's first compute it

```
sigma_nd = 2     # variance of non-diagonal GOE matrix elements
A_value = 1/(4*sigma_nd)
sqrd_a_value = GOE_dim/A_value
```

We now make use of the information about the bins in the output of `hist`.

```
n_bins = 71
fig, ax = plt.subplots(tight_layout=True)
bins_result = ax.hist(aval_GOE, bins=n_bins, density=True)
#
GOE_density = np.sqrt(sqrd_a_value-bins_result[1][1:]**2)/(np.pi*sqrd_a_value/2)
#
ax.plot(bins_result[1][1:], GOE_density)
```

To plot a 2D histogram, one only needs two data vectors of the same length, corresponding to each axis of the histogram.

```
x = np.random.normal(3, 1, 200)
y = np.random.normal(4, 0.1, 200)
```

It is not mandatory, but one can also define the plot bins

```
### Bin Edges
xedges = [0, 1, 3, 5, 7, 9, 11]
yedges = [0, 2, 3, 4, 6, 8]
```

The histogram is computed as

```
H, xedges, yedges = np.histogram2d(x, y, bins=(xedges, yedges))
# Histogram does not follow Cartesian convention (see Matplotlib Notes),
H = H.T
```

And it can be depicted using `imshow` or `meshgrid`

```
fig = plt.figure(figsize=(7, 3))
ax = fig.add_subplot(121, title='imshow: square bins')
plt.imshow(H, interpolation='nearest', origin='lower',
	extent=[xedges[0], xedges[-1], yedges[0], yedges[-1]])
##
ax = fig.add_subplot(122, title='pcolormesh: actual edges',
	aspect='equal')
X, Y = np.meshgrid(xedges, yedges)
ax.pcolormesh(X, Y, H)

plt.show()
```

### Saving your figures

You should save a script to easily recreate any of your figures, but you can also save them in a graphic format. The available formats depend on your Python distribution. To know them run

```
fig = plt.gcf()
fig.canvas.get_supported_filetypes()
```

To save a figure you can issue the command

```
plt.savefig("figure_name.extension")
```

If you want to further modify the figure in a program as *inkscape* or *xfig* you can save the figure as `svg` file -vector graphics- but in this case, when you import the `matplotlib` library, you should run the following statement

```
from matplotlib import rcParams 
rcParams['svg.fonttype'] = 'none'
```

And now you can save the figure of your interest

```
rcParams['svg.fonttype'] = 'none'
#
x_data = np.linspace(0,2*np.pi,500)
#
A_parameter = [1,2,3]
A_index = list(range(0,3))
#
B_parameter = [1,2]
B_index = list(range(0,2))
#
fig, ax = plt.subplots(3,2,figsize=(10,7),sharex=True, sharey=True, linewidth=3.0)
#
for (A_value, A_i) in zip(A_parameter,A_index):
    for (B_value, B_j) in zip(B_parameter, B_index):
	y_data = np.cos(x_data**A_value)/np.cosh(x_data/B_value)
	ax[A_i,B_j].plot(x_data, y_data, label="A = {0}, B = {1}".format(A_value, B_value))
	if A_i == 2:
	    ax[A_i,B_j].set_xlabel("Angle $\\theta$ (rad)", fontsize = 14) # Set x axis label and fontsize
	if B_j == 0:
	    ax[A_i,B_j].set_ylabel("F(w) (a.u.)", fontsize = 14) # idem for y axis
	ax[A_i,B_j].legend()
#
fig.tight_layout()
fig.suptitle("Title centered above subplots", fontsize=18, y=1.020)
plt.savefig("test.svg")
plt.savefig("test.pdf",facecolor="r",edgecolor="g",bbox_inches="tight")
```

The `savefig` command accepts some options that are quite useful:

* **dpi:** Figure resolution in dots-per-inch. Default value is 100.
* **facecolor:** Color of the background. Default value `w`, white.
* **edgecolor:** Color of the figure edge line. Default value `w`, white. Notice that in this case the figure `linewidth` option needs to be set to a value different from the null default one.
* **bboxinches:** Portion of the figure saved. The value `trim` attempts to trim empty white space around the figure.

Some of these options are used in the second `savefig` command in the previous example.

## Optimizing and function fitting with `SciPy`.

The `ScyPy` library provides many user-friendly and efficient numerical routines, such as routines for numerical integration, interpolation, optimization, linear algebra, and statistics. In this course we will only very briefly explain how to use it to perform linear and non-ĺinear data fittings. You can find more information about this useful library in its [documentation](https://docs.scipy.org/doc/).

In a linear fit case you should import the `Scipy` module `stats`.

```
from scipy import stats
```

We consider that we have a data set that we suspect it follows the functional law *f(N) = a\*N\*\*b*. Let's first prepare our dataset and add to it a normal error.

```
# Prepare Data assumint a power law f(N) = a*N**b
a_theor = 5.05
b_theor = -1.25
N_values = np.array(range(500,10600,500))
f_values = a_theor*N_values**b_theor
f_errors = np.array(list(map(lambda x:np.random.normal(x,np.abs(x)/20), f_values)))
```

The dependence is not linear but we can transform it into a linear dependence taking logarithm of both abscyssa and ordinate values

```
x_data = np.log10(N_values)
y_data = np.log10(f_errors)
result = stats.linregress(x_data, y=y_data)
result
```

The obtained result provides the straight line slope and intercept values, the correlation coefficient (`rvalue`), the *p*-value of the fit, and the coefficient errors. We can depict the data plus errors and the best fit line

```
fig, ax = plt.subplots()
ax.plot(x_data, y_data, label="Reference values", linestyle="-", color="xkcd:olive", lw=1.0, marker = "o", markevery=1, markersize=2)
ax.plot(x_data, result.slope*x_data + result.intercept, label="log-log linear fit", linestyle="-.", color="xkcd:red", lw=1.0, alpha = 0.76)
ax.set_xlabel("$log(N)$", fontsize = 16) # Set x axis label and fontsize
ax.set_ylabel("$log[f(N)]$", fontsize = 16) # idem for y axis
ax.legend()
```

In case you cannot linearize your function dependence, as in the next example, you import the `optimize` module.

```
from scipy import optimize
```

We now prepare our dataset and add again a random normal error.

```
# Prepare Data assumint a function g(x) = a + b*x + c/x**d
a_theor = 1.05
b_theor = 1.25
c_theor = -2.5
d_theor = 2
x_values = np.arange(0.5,12,0.25)
f_values = a_theor+ b_theor * x_values + c_theor/(x_values**d_theor)
f_errors = np.array(list(map(lambda x:np.random.normal(x,np.abs(x)/20), f_values)))
```

You now define the function that will be used to perform the nonlinear fit

```
def f_fit(x, a, b, c, d):
    '''    
    a + b*x + c*x**-d 
    '''
    return a + b*x + c/x**d
```

And the fit can be performed now as follows

```
params, pcov = optimize.curve_fit(f_fit,x_values,f_errors, p0=(0.5,0.7,-0.9,3)) #, method='trf')
parerr = np.sqrt(np.diag(pcov))
params, parerr
```

The array `pcov` is the covariance matrix and therefore the `parerr` vector is the vector that contains the error of the obtained parameters. We can plot the original values and the resulting optimized function.

```
fig, ax = plt.subplots()
ax.plot(x_values, f_errors, label="Reference values", linestyle="-", color="xkcd:olive", lw=1.0, marker = "o", markevery=1, markersize=2)
ax.plot(x_values, f_fit(x_values, *params) , label="nonlinear fit", linestyle="-.", color="xkcd:red", lw=1.0, alpha = 0.76)
ax.set_xlabel("$x$", fontsize = 16) # Set x axis label and fontsize
ax.set_ylabel("$f(x)$", fontsize = 16) # idem for y axis
ax.legend()
```

## Exercises

* **Exercise 5.1:** The Caesar's cipher is a historically relevant way of encrypting messages, already used by Julius Caesar, that is a substitution cipher. Given an integer value *m*, any character is replaced by the character found shifting the original character by *m* positions in the alphabet. Prepare a function that encrypts or decrypts a given message (assuming we only use the 26 capital letters of the English alphaber and replacing other symbols that are not letters by themselves, e.g. spaces and punctuation symbols) using a given value of *m*. Hints: (1) You can transform a string in uppercase characters using the `upper()` method (`up_strin = orig_string.upper()`). (2) You can obtain the English ascii uppercase letters using the module `string` (`import string`) with the command `ABC = string.ascii_uppercase`.
* **Exercise 5.2:** The [NIST Digital Library of Mathematical Functions](https://dlmf.nist.gov/) (DLMF) is a very useful site, where you can find an updated and expanded version of the well-known reference *Handbook of Mathematical Functions* compiled by Abramowitz and Stegun. Define a function to compute the Bessel function of the first kind of integer index from the series 10.2.2 in the DLMF, add a docscript and plot the functions of order 0, 1, and 2 in the interval of x between 0 and 10.
* **Exercise 5.3:** Define and test a function that estimates the value of the special constant pi by generating N pairs of random numbers in the interval -1 and 1 and checking how many of the generated number fall into a circumference of radius 1 centered in the origin. Improve the function showing in a graphical output the square, the circumference, and the points inside and outside the circumference with different colors.
* **Exercise 5.4:** The aim of this exercise is to generate a set of two-dimensional random walks, plot their trajectories and look and the end point distribution. The random walks considered always begin at the origin and take *Nstep* random steps of unit or zero size in both directions in the x and y axis. For a total number of *Nw* walks:
  1. Compute the trajectories and save the final point of all of them.
  2. Plot a sample of these random walks in the plane.
  3. Plot all the final points together.
  4. Compute the average distance of the final points from the origin.
  5. Plot a histogram with the values of the distance to the origin.
* **Exercise 5.5:** The Julia set is an important concept in fractal theory. Given a complex number *a*, a point *z* in the complex plane is said to be in the filled-in Julia set of a function *f(z) = z² + a* if the iteration of the function over the point does not finish with the point going to infinity. It can be proved that, if at some iterate of a point under *f(z)* the result has a module larger than 2 and larger than the module of *a*, this point will finish going to infinity. Build and plot the filled-in Julia sets for *f(z)* with *a = (-0.5,0),(0.25,-0.52), (-1,0), (-0.2, 0.66)* in the interval of *-1 < Re(z), Im(z) < 1* and consider that the point belongs to the set once the previous condition has not been accomplished after *Niter = 100*. Hint: You can make use of the NumPy `meshgrid` and the PyPlot `pplot` functions for displaying the filled-in Julia sets.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://curropb.gitbook.io/python-notes/python-lesson-5.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
