Python Lesson 4

Lesson outline

Native Python data structures: tuples, dicts and sets
Lists, sets, and dicts comprehensions. Sequence built-ins.
Python Functions
Some (hopefully good) advice…
Exercises

Native Python data structures: tuples, hashes and sets

Tuples

A tuple is a sequence of Python objects similar to a list, values are accessed with square brackets and they can be sliced. Tuples are created with a simple comma separated list of values (parentheses are optional).

# this is a tuple
tup_0 = 1,2,3,4,5
# and this is also a tuple
tup_1 = (2,3,4,5,6)
# and this is a tuple of tuples...
tup_2 = ((2,3,-1), (0,1,2.4), 3, (-33,-22))
###
print(tup_0)
print(tup_1[3])
print(tup_2[1])

The major difference is the immutable character of the tuple.

# 
tup_0[3] = 4

# 
tup_3 = 4, (2,3,-1), [4,4,5,5], True
# 
tup_3[2].append(6)
#
print(tup_3)
#
tup_3[3] = False

With the + operator you can join tuples and with * you can concatenate together several copies of the tuple

# 
print(tup_0 + tup_3)
print(4*tup_0)

A common use of tuples if for variable assignation. Whenever you provide a tuple-like expression of variables in the left-hand side of an assignation Python unpacks the values on the right hand side.

# 
a, b, c, d = tup_2
print(a)
print(b)
print(c)
print(d)

This makes specially easy to swap variable values

# 
print("a = ", a)
print("b = ", b)
a,b = b,a
print("a = ", a)
print("b = ", b)

This feature can also be used in loops for variable assignation

# 
tup_loop = (2,3,-1), (4,4,5), (5,6,7), (5,-1,0)
# 
for var_1, var_2, var_3 in tup_loop:
    print("var_1 = {0}, var_2 = {1}, var_3 = {2}".format(var_1,var_2,var_3))

Dicts

Dicts are also called hashes and are associative arrays, and can be considered like a list with an index not constrained to being a number, it can be other objects. The index in this case receives the name key and therefore hashes are mutable collections of key-value pairs of Python objects. The values of a hash can be any Python object but hash keys are required to be immutable objects, therefore they may be scalars or tuples.

They can be created using curly braces and the colon as the separator between keys and values. You can access or set element values as in lists

#
hash_0 = {"Guerras Médicas" : ["Termópilas", "Artemisio", "Salamina", "Platea"]: "Even integers", (0,2,4,6,8)} 
# 
print(hash_0)
#
print(hash_0["Guerras Médicas"])
#
hash_0["Guerras Médicas"].append("Micala")
#
hash_0["Fantastic Sea Creatures"] = ("Moby Dick", "The Kraken", "Mermaids")
#
print(hash_0)

You can also create a hash from a list of tuples of two elements using the dict function.

# 
seq =(1,3),(2,6),(3,9),(4,12)
dict_example = dict(seq)
print(dict_example)
#

Once a hash is created you can extract from it the keys and the corresponding values with the keys and values methods. The output of the two methods is not ordered but they keep the correspondence between keys and values.

You can extract values from a dict using the get method, extract and remove the value using the pop method, and and you can delete values using del(hash[key_value]).

# 
print(dict_example)
one_get = dict_example.get(1)
two_pop = dict_example.pop(2)
del(dict_example[3])
print(one_get)
print(two_pop)
print(dict_example)
#

Default values (*)

The following situation is very common, you need to read a hash key, if the key exists, accept the hash value as input and if it does not exist take as an input a default value. This can be achieved with an if block

# 
if (key_value in a_hash):
   value = a_hash[key_value]
else:
   value = default_value
#

Both get and pop methods working with hashes accept a default value as a second argument, that will be returned in case the hash for the given key is undefined

# 
value = a_hash.pop(key_value, default_value)
#

When setting values, you may also need to set a default value. Imagine you are reading a list of numbers and you want to separate them by their last digit as a dict of lists

# 
random_nums = np.random.randint(0,5000,[30])
last_digit_hash = {}
for number in random_nums:
    #
    last_digit = number % 10
    #
    if( last_digit in last_digit_hash):
	last_digit_hash[last_digit].append(number)
    else:
	last_digit_hash[last_digit] = []
	last_digit_hash[last_digit].append(number)
	#
print(last_digit_hash)
#

The setdefault method allows to greatly simplify this task.

# 
random_nums = np.random.randint(0,5000,[30])
last_digit_hash = {}
for number in random_nums:
    last_digit = number % 10
    #
    last_digit_hash.setdefault(last_digit, []).append(number)
    #
print(last_digit_hash)
#

Sets

A set is an collection of unique elements with no particular order. They can be considered as the keys of a hash but without the corresponding values. They can be created with the set literal or with curly braces.

# 
set_0 = {"a", 0, 1, "bc", 0.33, 0, 1}
set_1 = set(["a", "b", "c", "a", "a"])
print(set_0)
print(set_1)
#

# Union
print(set_0.union(set_1))
print(set_0|set_1)
#
# Intersection
print(set_0.intersection(set_1))
print(set_0 & set_1)
#
# Difference
print(set_0.difference(set_1))
print(set_0 - set_1)

Comprehensions and built-in sequence functions

List, dict, and set comprehensions are a terse and neat "Pythonic" way to define new structures in your program. In the list comprehensions case they have the syntax

list_0 = [expr for value in collection if condition]

which is equivalent to the loop

list_0 = []
#
for value in collection:
    #
    if (condition):
	list_0.append(expr)

The filter condition is not mandatory and may not be present.

For example, we can create using a loop a list, called list_mults, including the integers that are less than 4000 and can be divided exactly by 7 and 13.

list_mults = []
total = 0
for number in range(4000):
    if (number % 7 == 0 and number % 13 == 0): 
	list_mults.append(number)
	total+=1
print(total, list_mults)

We can repeat the same task using a comprehension in a more Pythonic way.

list_mults2 = [number for number in range(4000) if (number % 7 == 0 and number % 13 == 0)]
#
# Checking if both lists are equal.
list_mults2 == list_mults

The extension to sets and dicts is direct.

# Dicts
{key_expr(iter): value_expr(iter) for iter in collection if condition}
#
#
# Sets
{set_expr(iter) for iter in collection if condition}

Apart from comprehensions there are several built-in sequence functions to work with lists and other structures that are quite useful. One of them is enumerate that we have already covered. Other useful built-ins are sorted, reversed, and zip.

The built-in sorted returns a new, sorted, sequence. You can provide to sorted a key, a function that applied to the element provides a value used for the sorting.

random_nums = np.random.randint(0,1000,[30])
print(sorted(random_nums))
print(sorted(random_nums, key=str))
print(random_nums)

In the particular case of lists, you can sort them using the sort method, and this will be an in-place sorting

print(random_nums)
print(random_nums.sort())
print(random_nums)

The reversed built-in provides a generator to iterate over a sequence in reverse order.

for number in reversed(range(10)):
    print(number)

The zip built-in associates the elements of two or more given sequences. The ouput is a list of tuples.

names = ["Lisa", "Auxi", "Julia", "Lisanna", "Curro"]
random_nums = np.random.randint(0,20,[5])
zipped = list(zip(names, random_nums, sorted(random_nums)))
print(list(zipped))

This comes quite handy for the definition of hashes from two sequences

# 
hash_example = dict(zip(names, random_nums))
print(hash_example)
#

It is also used for iterate in a loop over the elements of various sequences

# 
for (var_1, var_2, var_3) in zip(seq_1, seq_2, seq_3):
    #
    # Code block
#

For example

for name, value_1, value_2 in zipped:
    print("Name {0}: ({1}, {2})".format(name, value_1, value_2))

names = ["Lisa", "Auxi", "Julia", "Lisanna", "Curro"]
random_nums = np.random.randint(0,20,[5])
zipped = zip(names, random_nums, sorted(random_nums))
for name, value_1, value_2 in zipped:
    print('Name {0}: ({1}, {2})'.format(name, value_1, value_2))

You can also transform a Python native structure into a Numpy ndarray structure using the np.array command

print(type(names))
npnames = np.array(names)
print(type(npnames))
print(npnames.dtype)
print(npnames.shape)

Numpy makes an educated guess to assign the best fitting type to the data.

l1 = [1,2,3,4,5,6]
l2 = [1, 2., 3.3, 0, 4, -1]
npl1 = np.array(l1)
npl2 = np.array(l2)
print(type(npl1), type(npl2))
print(npl1.dtype, npl2.dtype)
print(npl1.shape, npl2.shape)

You can apply np.array to a Numpy ndarray and in this way you obtain a copy of the initial set of data and not a reference to them. A similar command is np.asarray but in this case if the array is already a Numpy ndarray it does not perform the copying.

Python Functions

Basic concepts

Function definition allows for code wrapping for its later reuse, making life simpler (and they greatly help for organization and optimization). Let's start with a very simple function transforming from Kelvin to Celsius degrees. Functions start with the def keyword and return their result(s) with the return keyword. If there is no return statement the returned value is None.

def Kelvin_2_Celsius(T):
    return T - 273.15
#
Temp = 273.16 # Water triple point
print("{0} K are {1} ºC".format(Temp, Kelvin_2_Celsius(Temp)))

Another simple function, transforming from degrees Fahrenheit to Kelvin, and adding a docstring with the info about the function

def Fahren_2_Kelvin(Temp):
    ''' 
    Function to transform from degrees Fahrenheit to degrees Kelvin.

    Input: 

	  Temp   ::   Temperature expressed in degrees Fahrenheit.
    '''     
    return ((Temp - 32.) * (5./9.)) + 273.15 # Notice that 5/9 and 5./9. are not necessarily equal... (Python 2.7)


######################################
print('Water triple point: ', Kelvin_2_Celsius(273.16), 'ºC')

#
print('Water freezing point: ', Fahren_2_Kelvin(32), 'K')
print('Water boiling point: ', Fahren_2_Kelvin(212), 'K')

# 
print('Water freezing point: ', Kelvin_2_Celsius(Fahren_2_Kelvin(32)), 'ºC')
print('Water boiling point: ',  Kelvin_2_Celsius(Fahren_2_Kelvin(212.)), 'ºC')

The docstring is the multiline string just after the function definition that contains relevant information about the function for the end user. It can be accessed with the function attribute __doc__.

We can define a function into a function. This is shown in the next example, that computes the body mass index used as an example when we explained conditionals

def bmi_range(weight, height):
    '''
    Body mass index

    Input: 
	weight (kg)
	height (m)
    '''
    def bmi_val(weight, height):
	return weight/height**2
    #
    bmi_value = bmi_val(weight, height)
    #
    if bmi_value < 15:
	bmi_r = "Very severely underweight"
    elif bmi_value < 16:
	bmi_r = "Severely underweight"
    elif bmi_value < 18.5:
	bmi_r = "Underweight"
    elif bmi_value < 25:
	bmi_r = "Normal(healthy weight)"
    elif bmi_value < 30:
	bmi_r = "Overweight"
    elif bmi_value < 35:
	bmi_r = "Obese Class I (Moderately obese)"
    elif bmi_value < 40:
	bmi_r = "Obese Class II (Severely obese)"
    else:
	bmi_r = "Obese Class III (Very severely obese)"
    #
    return bmi_r
## 
bmi_range(70,1.80)

We can also use functions to benchmark loops versus list comprehensions. We define two functions to compute the square value of the first N and benchmark them using the magic function %timeit.

def f_loop(number):
    twice = []
    for num in range(number):
	twice.append(num*2)
    return twice
####
# loop
%timeit f_loop(10000)

# list comprehension
%timeit [num*2 for num in range(10000)]

It should be noticed that the use of range and np.arange is not equivalent. The first provides an iterator while the second array provides the full array, with an extra burden for the system. Notice the difference with

def f_loop_arange(number):
    import numpy as np
    twice = []
    for num in np.arange(number):
	twice.append(num*2)
    return twice
####

%timeit f_loop_arange(10000)

Of course, the vectorized calculation with Numpy is way faster than the previous two

%timeit np.arange(10000)*2

We can define these functions in an external file and read the file from the notebook using the magic function %run.

In a function there can be multiple arguments as well as multiple return statements (only one of them will be effective in a given invocation). There may also be no explicit return, which makes the function returns None.

Positional and keyword arguments

With regard to arguments, there are two argument types: positional and keyword arguments. Both can be found in the following example where we define a function that computes the saturation vapor pressure of water vapor over liquid water or ice for a given temperature.

def Magnus(Temp, ice = False):
    '''Function to to compute the saturation vapor pressure E(T) in hPa units 
    for water vapor on liquid water or ice according to Magnus formula. 

    Ref. Alduchov and Eskridge, J. Appl. Met. 35 (1996) 601

    Arguments:

    Temp :: Temperature expressed in degrees Celsius.
    ice  :: If True compute E(T) over ice.


    Example:

    Magnus(35.0)
    56.17569318925043    
    '''
    #
    import numpy as np
    #
    # AERKi and AERK parameters
    (A, B, C) = (22.587, 273.86, 6.1121) if ice else (17.625, 243.04, 6.1094)
    #
    E_value = C*np.exp((A*Temp)/(B+Temp))
    #
    return E_value

The Temp argument is a positional one and the ice argument is of keyword type. Keyword type arguments always follow positional ones and they are not mandatory. Whenever they are not provided, the default value is assumed and their order is not relevant. Positional arguments can also have keywords added in their invocation to increase code readability.

print(Magnus(35))
print(Magnus(35, ice = True))
print(Magnus(Temp = 35, ice = False))

Frequently, the None type is used as the default value of keyword arguments. This helps preventing unforeseen side effects that can arise whenever a mutable object is used as the default value of a parameter. Such side effects stem from the fact that the default value in the function is assigned only once, at compiling time when the function is defined, and not each time the function is called.

Therefore, when a function is defined the compiler includes an attribute called __defaults__ with a reference to the default values of keyword arguments. And this is not recreated anymore once the function is used, which can give rise to unexpected situations. Let's see an example of this

def ftest(keyw_arg_0 = [], keyw_arg_1 = ["2222"]):
    keyw_arg_0.append("0000")
    keyw_arg_1.append("1111")
    return keyw_arg_0, keyw_arg_1

print(ftest.__defaults__)
#
print(ftest())
for i in range(5):
    print(ftest())
#
print(ftest.__defaults__)

As mentioned above, this can be solved making use of None and dynamically defining the mutable object at run time

def ftest(keyw_arg_0 = None, keyw_arg_1 = None):
    if keyw_arg_0 is None:
	keyw_arg_0 = []
    if keyw_arg_1 is None:
	keyw_arg_1 = ["2222"]
    keyw_arg_0.append("0000")
    keyw_arg_1.append("1111")
    return keyw_arg_0, keyw_arg_1

print(ftest.__defaults__)
#
print(ftest())
for i in range(5):
    print(ftest())
#
print(ftest.__defaults__)

There are situations in programming where we do not know the precise number of positional parameters of a function. This is solved in Python with tuple references adding an asterisk (*) in front of the last parameter name. For example, we can compute the geometric mean of a set of values as follows

def argeo_mean(first_value, *values):
    '''Compute the arithetic and geometric mean of a set of values'''
    gmean = first_value
    amean = first_value
    n_terms = 1
    for value in values:
	amean += value
	gmean *= value
	n_terms += 1
    return amean/n_terms, gmean**(1/n_terms)
#
print(argeo_mean(1), argeo_mean(1, 2), argeo_mean(1, 2, 4))

You can also use the star operator in function invocation. This operator singularizes each element of the list (unpacking the list). Therefore, if you need to run the previously defined function over a list you can do it as follows

arguments = [1,2,4,32]
print(argeo_mean(*arguments))
# which is equivalent to
print(argeo_mean(arguments[0],arguments[1],arguments[2],arguments[3]))

The star operator can be used together with zip to easily alter lists structure, for example extracting the different lists that have been previously zipped.

list(zip(*zipped))

We can also have an undeterminate number of keyword parameters in a function. It is possible to pass them as a hash, using the double asterisk, **.

def f(a, b = 0, **kwargs):
    print(kwargs)
    return a+b
#
print(f(1))
print(f(1,b=2))
print(f(1,b=2, c=34, d="test", e="My dog", f=None))

Returning multiple values

A function can return several values and not only one. The values are returned as a tuple and can be assigned to different variables or to a data structure.

def E_T_WI(Temp):
    return Magnus(Temp, ice = False), Magnus(Temp, ice = True)
##
##
E_water, E_ice = E_T_WI(22)
print(E_water, E_ice)
#
E_wice = E_T_WI(22)
print(E_wice)

But you can also return values as a dictionary

def E_T_WI_hash(Temp):
    return {"E_Water": Magnus(Temp, ice = False), "E_Ice": Magnus(Temp, ice = True)}
##
##
E_wice = E_T_WI_hash(22)
print(E_wice)
print(E_wice["E_Water"])
print(E_wice["E_Ice"])

Variables scope

Another aspect of interest is that any variable defined in a funcion belongs by default to a local namespace which is destroyed once the function returns.

s = 10
t = 20
print("0: ",s, t)
def function_t():
    s = 5 # local variable
    print("1: ", s, t)
    return s
function_t()
print("2: ", s, t)

Note that once we define a variable as local in a function we cannot make any reference previous to the definition to the variable

s = 10
t = 20
print("0: ",s, t)
def function_t():
    print(s) # ERROR!
    s = 5 # local variable
    print("1: ", s, t)
    return s
function_t()
print("2: ", s, t)

Variables with the attribute global may be defined, which will solve the previous error, but one should be careful with this. Often, the definition of global variables increases the code complexity without offering much in return.

s = 10
print("0: ",s)
def function_t():
    global s
    print("1 :", s)
    s = 5
    print("2: ", s)
    return s
function_t()
print("3: ", s)

Note that we have changed inside the function the value of the variable.

Functions are references

A function name is a reference for the function. Therefore, we can assign multiple names to the same function, and if some of these names are deleted we can still access the function through the rest of them.

bmi_result = bmi_range
bmi_result(45,1.55)

Then, we can pass function names -references- as arguments or parameters to functions. Let's have a look to a simple example

def call_function(f, temp):
    print("I'm going to call function f on temperature ", temp)
    return f(temp)
################################
print(call_function(Fahren_2_Kelvin, 44))
print()
print(call_function(Kelvin_2_Celsius, 44))

By the way, if you try to print the name of the function using the f argument you will obtain the argument pointer. You can access the function name using the __name__ attribute as follows

def call_function(f, temp):
    print("I'm going to call function",  f.__name__," on temperature ", temp)
    return f(temp)
################################
print(call_function(Fahren_2_Kelvin, 44))
print()
print(call_function(Kelvin_2_Celsius, 44))

Another example

def apply_trig(trig_func, exponent, angle):
    return trig_func(angle)**exponent
#########
print(apply_trig(np.sin, 2, np.linspace(0,2*np.pi,20)))
print(apply_trig(np.cos, 2, np.linspace(0,2*np.pi,20)))

A function can also output a reference to a new function. A simple example of this is as follows

def f_0(a_value):
    def f_1(x):
	return a_value*x*(-a_value + x) # computes a*x**2 - x*a**2
    return f_1
####################
g_1 = f_0(1)
g_2 = f_0(2)
print(g_1(20), g_2(10))

We can use several arguments too

def ellipse(a_value, b_value):
    def f_ell(x):
	return b_value*(1-(x/a_value)**2)**0.5 
    return f_ell
#####################################
cal_ell = ellipse(2,1)
x_val = np.linspace(-2,2,220)
upper_ellipse = cal_ell(x_val)
####################################
plt.plot(x_val,upper_ellipse)
plt.plot(x_val,-upper_ellipse)
plt.axis('equal')

And using the asterisk notation we can also deal with an unknown number of parameters as in this case where we are given a certain number of terms in the Taylor expansion of a given function, the sine function in this case

def taylor_f(x0, *coef_values):
    def f_t(x):
	res = 0
	for index, coef in enumerate(coef_values):
	    res += coef*(x-x0)**index
	return res
    return f_t
#######################################
# sin(x) = x - x**3/3! + x**5/5! - x**7/7!
sin_0 = taylor_f(0, 0, 1)
sin_1 = taylor_f(0, 0, 1, 0, -1/6)
sin_2 = taylor_f(0, 0, 1, 0, -1/6, 0, 1/120)
########################################
x_val = np.linspace(0,np.pi,220)
s0 = sin_0(x_val)
s1 = sin_1(x_val)
s2 = sin_2(x_val)
plt.plot(x_val,np.sin(x_val))
plt.plot(x_val, s0, label = "Order 1")
plt.plot(x_val, s1, label = "Order 3")
plt.plot(x_val, s2, label = "Order 5")
plt.legend()

Some advice for future programming

Document with generosity your code. Include docstrings explaining what a function does, what are their arguments, what is the output format and provide an example in the docstring to be able to test the function. Have in mind the known quote Documentation is like sex; when it's good, it's very, very good, and when it's bad, it's better than nothing.
Use comments also in your code to explain what are you doing. (See previous item). Include expected physical units in your comments.
Use clear variable names, indicating its purpose. If you are debugging a code written sometime ago, it is of great help to face with a variable named valenceneutrons compared with vn, or worse, n or, even worse, x.
Follow the motto Don't duplicate, reuse often. This can be applied in different contexts. For example, if there is a constant in your program whose value is 34, define it at the beginning of the code (irrep_label = 34) and then use the variable name in the code. When the day arrives that 34 needs to be changed to 30 you do not have to find and replace every 34 instance in your code -a bug prone task- and you only need to change the initial variable assigment.
Again the motto Don't duplicate, reuse often. If you find yourself repeating lines of code in different functions, create a function and call it. Similar to the previous item, but even more important as the bugs in this case are more difficult to trap.
Before coding, stop for a while, think carefully about the task you are trying to solve and, if it is a complex one, break it into simpler steps and deal with each one of them. Check your code using simple cases, ideally ones that you know their solution.
When an error happens read the error and your code carefully.
Insert diagnostics in your code that may depend on a key argument (e.g. verbose = False) printing them for a given argument value.
Practice RDD (Rubber Duck Debugging). Ask your personal guru. Warning: gurus can be hot tempered. You can also ask in a forum like stackoverflow. Be polite and read (and follow) the forum policy. Sometimes forum guruses can also be hot-tempered.
If your code is complex enough, you might learn about breakpoints.

Exercises

Exercise 4.1: Prepare a file with a function called fence such that given two strings as arguments: string_1 = "aaa" and string_2 = "bbb", the output is aaa_bbb_aaa. In the same file define a second function called outer such that given a string returns another string made up of just the first and last characters of its input. Therefore if the input is Betis the function output should be Bs. Include in both cases a docstring with a brief function description and an example. Load the functions from the file and check what is the output of this statement: print outer(fence('carbon', '+')).
Exercise 4.2: Write a function that generates a random password. The password should have a random length of between 10 and 12 random characters from positions 33 to 122 in the ASCII table. Your function will not take any parameters and will return the password as its only result. Make another function that checks if the password has at least two lowcase, two uppercase, and two digit characters and the output of this function will be a compliant password and the number of times has been run the original function to obtain the compliant password.
Exercise 4.3: Gaussian distributed data are frequently normalized to have a mean value equal to zero and a standard deviation equal to one substracting the actual mean value and dividing by the standard deviation of the dataset. Making use of the mean and std NumPy methods, define a function that takes as an argument a data vector, a new mean value, and a new standard deviation value and transforms the original set of data to a new set with a the new mean as its average value and with a dispersion given by the new standard deviation value. By default the function should standardize the data to mean = 0 and sdev = 1.
Exercise 4.4: You can approximate the cubic root of a number a as x(n+1) = 2 x(n)/3 + a/(3 x(n)²) with x(0) = a/3. Prepare a function that computes the cubic root of a given root until the difference between successive computed values is less than a given threshod (e.g. 1E-8) and compare the obtained value and the value of x0**(1/3).
Exercise 4.5: Define a function that reads out temperature data from the sample Cyprus dataset and prepare graphics. Prepare a function with helpful docstring and comments that for given list of file names prepares a plot with three columns for each data file: the first including the max, min and mean monthly temperatures, the second the max, min, and mean annual temperatures, and the third depicting the monthly temperatures for all years.
Exercise 4.6: The sieve of Erastosthenes is an ancient algorithm (approx. 3rd century BCE) to find all prime numbers up to a given limit. The implementation of this algorithm in pseudocode is as follows:
algorithm Sieve of Eratosthenes is input: an integer n > 1. output: all prime numbers from 2 through n.
```
let A be an array of Boolean values, indexed by integers 2 to n, initially all set to true.

for i = 2, 3, 4, ..., not exceeding n**0.5 do
if A[i] is true
    for j = i**2, i**2+i, i**2+2i, i**2+3i, ..., not exceeding n do
	A[j] := false

return all i such that A[i] is true.
```

Implement this algorithm in a Python function.

PreviousPython Lesson 3 NextPython Lesson 5

Last updated 3 years ago

Was this helpful?