# Lesson outline

1. Native Python data structures: tuples, dicts and sets

2. Lists, sets, and dicts comprehensions. Sequence built-ins.

3. Python Functions (I)

5. Python Functions (II)

6. Exercises

# Native Python data structures: tuples, hashes and sets

## Tuples

A tuple is a sequence of Python objects similar to a list, values are accessed with square brackets and they can be sliced. Tuples are created with a simple comma separated list of values (parentheses are optional).

`# this is a tupletup_0 = 1,2,3,4,5# and this is also a tupletup_1 = (2,3,4,5,6)# and this is a tuple of tuples...tup_2 = ((2,3,-1), (0,1,2.4), 3, (-33,-22))###print(tup_0)print(tup_1)print(tup_2)`

The major difference is the immutable character of the tuple.

`# tup_0 = 4​# tup_3 = 4, (2,3,-1), [4,4,5,5], True# tup_3.append(6)#print(tup_3)#tup_3 = False`

With the `+` operator you can join tuples and with `*` you can concatenate together several copies of the tuple

`# print(tup_0 + tup_3)print(4*tup_0)`

A common use of tuples if for variable assignation. Whenever you provide a tuple-like expression of variables in the left-hand side of an assignation Python unpacks the values on the right hand side.

`# a, b, c, d = tup_2print(a)print(b)print(c)print(d)`

This makes specially easy to swap variable values

`# print("a = ", a)print("b = ", b)a,b = b,aprint("a = ", a)print("b = ", b)`

This feature can also be used in loops for variable assignation

`# tup_loop = (2,3,-1), (4,4,5), (5,6,7), (5,-1,0)# for var_1, var_2, var_3 in tup_loop:    print("var_1 = {0}, var_2 = {1}, var_3 = {2}".format(var_1,var_2,var_3))`

## Dicts

Dicts are also called hashes and are associative arrays, and can be considered like a list with an index not constrained to being a number, it can be other objects. The index in this case receives the name key and therefore hashes are mutable collections of key-value pairs of Python objects. The values of a hash can be any Python object but hash keys are required to be immutable objects, therefore they may be scalars or tuples.

They can be created using curly braces and the colon as the separator between keys and values. You can access or set element values as in lists

`#hash_0 = {"Guerras Médicas" : ["Termópilas", "Artemisio", "Salamina", "Platea"]: "Even integers", (0,2,4,6,8)} # print(hash_0)#print(hash_0["Guerras Médicas"])#hash_0["Guerras Médicas"].append("Micala")#hash_0["Fantastic Sea Creatures"] = ("Moby Dick", "The Kraken", "Mermaids")#print(hash_0)`

You can also create a hash from a list of tuples of two elements using the `dict` function.

`# seq =(1,3),(2,6),(3,9),(4,12)dict_example = dict(seq)print(dict_example)# `

Once a hash is created you can extract from it the keys and the corresponding values with the `keys` and `values` methods. The output of the two methos is not ordered but they keep the correspondence between keys and values.

You can extract values from a dict using the `get` method, extract and remove the value using the `pop` methos, and and you can delete values using `del(hash[key_value])`.

`# print(dict_example)one_get = dict_example.get(1)two_pop = dict_example.pop(2)del(dict_example)print(one_get)print(two_pop)print(dict_example)# `
1. Default values (*)

The following situation is very common, you need to read a hash key, if the key exists, accept the hash value as input and if it does not exist take as an input a default value. This can be achieved with an `if` block

`# if (key_value in a_hash):   value = a_hash[key_value]else:   value = default_value# `

Both `get` and `pop` methods working with hashes accept a default value as a second argument, that will be returned in case the hash for the given key is undefined

`# value = a_hash.pop(key_value, default_value)# `

When setting values, you may also need to set a default value. Imagine you are reading a list of numbers and you want to separate them by their last digit as a dict of lists

`# random_nums = np.random.randint(0,5000,)last_digit_hash = {}for number in random_nums:    #    last_digit = number % 10    #    if( last_digit in last_digit_hash):    last_digit_hash[last_digit].append(number)    else:    last_digit_hash[last_digit] = []    last_digit_hash[last_digit].append(number)    #print(last_digit_hash)# `

The `setdefault` method allows to greatly simplify this task.

`# random_nums = np.random.randint(0,5000,)last_digit_hash = {}for number in random_nums:    last_digit = number % 10    #    last_digit_hash.setdefault(last_digit, []).append(number)    #print(last_digit_hash)# `

## Sets

A set is an collection of unique elements with no particular order. They can be considered as the keys of a hash but without the corresponding values. They can be created with the `set` literal or with curly braces.

`# set_0 = {"a", 0, 1, "bc", 0.33, 0, 1}set_1 = set(["a", "b", "c", "a", "a"])print(set_0)print(set_1)# `

As could be expected, the set data structure supports the mathematical set operations: intersection, union or difference, among others (you can find a complete list of Python set operations in Real Python Sets).

`# Unionprint(set_0.union(set_1))print(set_0|set_1)## Intersectionprint(set_0.intersection(set_1))print(set_0 & set_1)## Differenceprint(set_0.difference(set_1))print(set_0 - set_1)`

## Comprehensions and built-in sequence functions

List, dict, and set comprehensions are a terse and neat "Pythonic" way to define new structures in your program. In the list comprehensions case they have the syntax

`list_0 = [expr for value in collection if condition]`

which is equivalent to the loop

`list_0 = []#for value in collection:    #    if (condition):    list_0.append(expr)`

The filter condition is not mandatory and may not be present.

For example, we can create using a loop a list, called `list_mults`, including the integers that are less than 4000 and can be divided exactly by 7 and 13.

`list_mults = []total = 0for number in range(4000):    if (number % 7 == 0 and number % 13 == 0):     list_mults.append(number)    total+=1print(total, list_mults)`

We can repeat the same task using a comprehension in a more Pythonic way.

`list_mults2 = [number for number in range(4000) if (number % 7 == 0 and number % 13 == 0)]## Checking if both lists are equal.list_mults2 == list_mults`

The extension to sets and dicts is direct.

`# Dicts{key_expr(iter): value_expr(iter) for iter in collection if condition}### Sets{set_expr(iter) for iter in collection if condition}`

Apart from comprehensions there are several built-in sequence functions to work with lists and other structures that are quite useful. One of them is `enumerate` that we have already covered. Other useful built-ins are `sorted`, `reversed`, and `zip`.

The built-in `sorted` returns a new, sorted, sequence. You can provide to sorted a key, a function that applied to the element provides a value used for the sorting.

`random_nums = np.random.randint(0,1000,)print(sorted(random_nums))print(sorted(random_nums, key=str))print(random_nums)`

In the particular case of lists, you can sort them using the `sort` method, and this will be an in-place sorting

`print(random_nums)print(random_nums.sort())print(random_nums)`

The `reversed` built-in provides a generator to iterate over a sequence in reverse order.

`for number in reversed(range(10)):    print(number)`

The `zip` built-in associates the elements of two or more given sequences. The ouput is a list of tuples.

`names = ["Lisa", "Auxi", "Julia", "Lisanna", "Curro"]random_nums = np.random.randint(0,20,)zipped = zip(names, random_nums, sorted(random_nums))print(list(zipped))`

This comes quite handy for the definition of hashes from two sequences

`# hash_example = dict(zip(names, random_nums))print(hash_example)# `

It is also used for iterate in a loop over the elements of various sequences

`# for (var_1, var_2, var_3) in zip(seq_1, seq_2, seq_3):    #    # Code block# `

For example

`for name, value_1, value_2 in zipped:    print("Name {0}: ({1}, {2})".format(name, value_1, value_2))​names = ["Lisa", "Auxi", "Julia", "Lisanna", "Curro"]random_nums = np.random.randint(0,20,)zipped = zip(names, random_nums, sorted(random_nums))for name, value_1, value_2 in zipped:    print('Name {0}: ({1}, {2})'.format(name, value_1, value_2))`

You can also transform a Python native structure into a Numpy ndarray structure using the `np.array` command

`print(type(names))npnames = np.array(names)print(type(npnames))print(npnames.dtype)print(npnames.shape)`

Numpy makes an educated guess to assign the best fitting type to the data.

`l1 = [1,2,3,4,5,6]l2 = [1, 2., 3.3, 0, 4, -1]npl1 = np.array(l1)npl2 = np.array(l2)print(type(npl1), type(npl2))print(npl1.dtype, npl2.dtype)print(npl1.shape, npl2.shape)`

You can apply `np.array` to a Numpy ndarray and in this way you obtain a copy of the initial set of data and not a reference to them. A similar command is `np.asarray` but in this case if the array is already a Numpy ndarray it does not perform the copying.

# Python Functions (I)

Function definition allows for code wrapping for reuse and makes life simpler (and greatly help for organization). Let's start with a very simple function transforming from Kelvin to Celsius degrees. Functions start with the `def` keyword and return their result(s) with the `return` keyword. If there is no `return` statement the returned value is `None`.

`def Kelvin_2_Celsius(T):    return T - 273.15#Temp = 273.16 # Water triple pointprint("{0} K are {1} ºC".format(Temp, Kelvin_2_Celsius(Temp)))`

Another simple function, transforming from degrees Fahrenheit to Kelvin, and adding a docstring with the info about the function

`def Fahren_2_Kelvin(Temp):    '''     Function to transform from degrees Fahrenheit to degrees Kelvin.​    Input: ​      Temp   ::   Temperature expressed in degrees Fahrenheit.    '''         return ((Temp - 32.) * (5./9.)) + 273.15 # Notice that 5/9 and 5./9. are not necessarily equal... (Python 2.7)​​######################################print('Water triple point: ', Kelvin_2_Celsius(273.16), 'ºC')​#print('Water freezing point: ', Fahren_2_Kelvin(32), 'K')print('Water boiling point: ', Fahren_2_Kelvin(212), 'K')​# print('Water freezing point: ', Kelvin_2_Celsius(Fahren_2_Kelvin(32)), 'ºC')print('Water boiling point: ',  Kelvin_2_Celsius(Fahren_2_Kelvin(212.)), 'ºC')`

The docstring is the multiline string just after the function definition that contains relevant information about the function for the end user.

We can also use a function to benchmark loops versus list comprehensions. We define two functions to compute the square value of the first `N` and benchmark them using the magic function `%timeit`.

`def f_loop(number):    twice = []    for num in range(number):    twice.append(num*2)    return twice####​%timeit f_loop(10000)​%timeit [num*2 for num in range(10000)]`

It should be noticed that the use of `range` and `np.arange` is not equivalent. The first provides an iterator while the second array provides the full array, with an extra burden for the system. Notice the difference with

`def f_loop_arange(number):    import numpy as np    twice = []    for num in np.arange(number):    twice.append(num*2)    return twice####​%timeit f_loop_arange(10000)`

Of course, the vectoriazed calculation with Numpy should be faster than the previous two

`%timeit np.arange(10000)*2 `

We can define this functions in an external file and read the file using the magic function `%run`.

In a function there can be multiple arguments as well as multiple `return` statements (only one of them will be effective in a given invocation) and may also have no explicit `return`, which makes the function return `None`.

With regard to arguments, there are two argument types: positional and keyword arguments. Both can be found in the following example

`def Magnus(Temp, ice = False):    '''Function to to compute the saturation vapor pressure E(T) in hPa units     for water vapor on liquid water or ice according to Magnus formula. ​    Ref. Alduchov and Eskridge, J. Appl. Met. 35 (1996) 601​    Arguments:​    Temp :: Temperature expressed in degrees Celsius.    ice  :: If True compute E(T) over ice.​​    Example:​    Magnus(35.0)    56.17569318925043        '''    #    import numpy as np    #    # AERKi and AERK parameters    (A, B, C) = (22.587, 273.86, 6.1121) if ice else (17.625, 243.04, 6.1094)    #    E_value = C*np.exp((A*Temp)/(B+Temp))    #    return E_value`

The `Temp` argument is positional and the `ice` argument is of keyword type. Keyword type arguments always follow positional ones and they are not mandatory. Whenever they are not provided the default value is assumed and their order is not relevant. Positional arguments can also have keywords added to increase code readability.

`print(Magnus(35))print(Magnus(35, ice = True))print(Magnus(Temp = 35, ice = False))`

Another aspect of interest is that any variable defined in a funcion belongs by default to a local namespace which is destroyed once the function returns. Variables with the attribute global may be defined but one should be careful with them. Often, they increase the code complexity without offering much in return.

A function can return several values and not only one. The values are returned as a tuple and can be assigned to different variables or to a data structure.

`def E_T_WI(Temp):    return Magnus(Temp, ice = False), Magnus(Temp, ice = True)####E_water, E_ice = E_T_WI(22)print(E_water, E_ice)#E_wice = E_T_WI(22)print(E_wice)`

But you can also return values as a dictionary

`def E_T_WI_hash(Temp):    return {"E_Water": Magnus(Temp, ice = False), "E_Ice": Magnus(Temp, ice = True)}####E_wice = E_T_WI_hash(22)print(E_wice)print(E_wice["E_Water"])print(E_wice["E_Ice"])`

# Some advice for future programming

1. Document with generosity your code. Include docstrings explaining what a function does, what are their arguments, what is the output format and provide an example in the docstring to be able to test the function. Have in mind the known quote Documentation is like sex; when it's good, it's very, very good, and when it's bad, it's better than nothing.

2. Use comments also in your code to explain what are you doing. (See previous item). Include expected physical units in your comments.

3. Use clear variable names, indicating its purpose. If you are debugging a code some time you wrote it, it is of great help to face with a variable named valenceneutrons compared with vn, or worse, n or even worse x.

4. Follow the motto Don't duplicate, reuse often. This can be applied in different contexts. For example, if there is a constant in your program whose value is `34`, define it at the beginning of the code (`irrep_label = 34`) and then use the variable name in the code. When the day arrives that `34` needs to be changed to `30` you do not have to find and replace every `34` instance in your code -a bug prone task- and you only need to change the initial variable assigment.

5. Again the motto Don't duplicate, reuse often. If you find yourself repeating lines of code in different functions, create a function and call it. Similar to the previous item, but even more important as the bugs in this case are more difficult to trap.

6. Before coding, stop for a while, think carefully about the task you are trying to solve and if it is a complex one, break it into simpler steps and deal with each one of them. Check your code using simple cases, ideally ones that you know their solution.

7. When an error happens read the error and your code carefully.

8. Insert diagnostics in your code that may depend on a key argument (e.g. `verbose = False`) printing them for a given argument value.

9. Practice RDD (Rubber Duck Debugging). Ask your personal guru. Warning: gurus can be hot tempered. You can also ask in a forum like `stackoverflow`. Be polite and read (and follow) the forum policy. Sometimes forum guruses can also be hot-tempered.

10. If your code is complex enough, you might learn about breakpoints.

# Python Functions (II)

## Lambda functions

Python encourages a functional programming approach, it treats the functions as objects and facilitates the use of functions as arguments. If you have three different functions

`def f1(x):    return x**2def f2(x):    import numpy as np    return np.log10(x)def f3(x):    return x**2/(x**2 + 1.0)def ffunc(x, fa = f1, fb = f2, fc = f3):    return fc(fb(fa(x)))############print(ffunc(2.0))print(ffunc(2.0, fa=f3, fb=f1, fc=f2))##`

In this case we can introduce the so called Python anonymous or lambda functions, to avoid defining functions `f1`, `f2`, and `f3`. These are oneliners consisting of a single statement whose result is the value returned. They are defined using the `lambda` keyword that implies the definition of an anonymous function. They are called anonymous because, lacking the `def` keyword, they have no name. We can use the previous example and introduce as arguments anonymous functions

`print(ffunc(2.0, fa=lambda z: z**2/(z**2 + 1.0) , fb= lambda z: z**2, fc=lambda z: np.log10(z)))print(ffunc(2.0, fa=lambda z: z**2/(z**2 + 1.0) , fb= lambda z: z**4, fc=lambda z: np.log(z)))##`

## Currying arguments

Currying -named after the mathematician Haskell Curry- a function means redefining a function by partial argument application. If we have a function that computed the distance of a point (x,y,z) given in Cartesian coordinates to the origin

`def dist_3D(x,y,z):    import numpy as np    return np.sqrt(x**2+y**2+z**2)##`

I we are limiting our work to two dimensions in the z = 0 plane we can fix `z = 0` and use an anonymous function

`from functools import partialdist_2D = partial(dist_3D, z=0)##`

You can also include new default values for existing keyword arguments.

## Errors and exception handling

A possible tool to avoid errors in your programs, making them more foolproof, is `assert`. The syntax of this command is `assert (condition), "Warning message string"`. When the condition evaluates to `True` the program continues, however if it is `False` the warning message is print and the program stops with and `AssertionError` message. The following line of code check whether the `time` variable is positive or zero before the program continues running

`assert (time >= 0), "Negative time value. Not allowed."`

This allows for an easy check of your program input to test if the values are sound.

Sometimes, specially when user input is involved, the input may be not what the program expects and you can make the program digest the input and not die miserably. Imagine you expect the user to provide a float or integer as `time` value. Notice that when you read with `input` a value it is recorded as a string.

`time_string = input(prompt="time parameter value = ")time = float(time_string)print("time = {0}".format(time))`

If the user provides a non numerical value the code crashes with a ValueError: could not convert string to float. If we want to avoid this we can use a `try/except` block

`def try_float(string_value):    try:    return float(string_value)    except:    return string_value#time_string = input(prompt="time parameter value = ")#time = try_float(time_string)print("time = {0}".format(time))`

You can specify wich kind of exception are you trapping with the syntax `except ValueError`. You can also trap several exception types including them in a tuple.

You can make use of this to keep asking for a value until it is of the correct type.

`while (1):    #    time_string = input(prompt="time parameter value = ")    #    try:    time = float(time_string)    break    except:    print("Not a number. Try again.")#print("time = {0}".format(time))`

You can have also some code block run independently of the success or failure of the `try` block using `finally` or code that runs if the `try` block is successful using `else`.

`trials = 0while (1):    #    time_string = input(prompt="time parameter value = ")    #    try:    time = float(time_string)    break    except:    print("Not a number. Try again.")    else:    print("Okay. That was a valid number.")    finally:    trials += 1#print("time = {0}. You needed {1} trials.".format(time, trials))`

# Exercises

• Exercise 4.1: Prepare a file with a function called `fence` such that given two strings as arguments: `string_1 = "aaa"` and `string_2 = "bbb"`, the output is `aaa_bbb_aaa`. In the same file define a second function called `outer` such that given a string returns another string made up of just the first and last characters of its input. Therefore if the input is `Betis` the function output should be `Bs`. Include in both cases a docstring with a brief function description and an example. Load the functions from the file and check what is the output of this statement: `print outer(fence('carbon', '+'))`.

• Exercise 4.2: Gaussian distributed data are frequently normalized to have a mean value equal to zero and a standard deviation equal to one substracting the actual mean value and dividing by the standard deviation of the dataset. Making use of the `mean` and `std` NumPy methods, define a function that takes as an argument a data vector, a new mean value, and a new standard deviation value and transforms the original set of data to a new set with a the new mean as its average value and with a dispersion given by the new standard deviation value. By default the function should standardize the data to `mean = 0` and `sdev = 1`.

• Exercise 4.3: Define a function that reads out temperature data from the sample Cyprus dataset and prepare graphics. Prepare a function with helpful docstring and comments that for given list of file names prepares a plot with three columns for each data file: the first including the max, min and mean monthly temperatures, the second the max, min, and mean annual temperatures, and the third depicting the monthly temperatures for all years.

• Exercise 4.4: Write a function that generates a random password. The password should have a random length of between 10 and 12 random characters from positions 33 to 122 in the ASCII table. Your function will not take any parameters and will return the password as its only result. Make another function that checks if the password has at least two lowcase, two uppercase, and two digit characters and check how many times you need to run the original function to obtain a compliant password.