Python Lesson 4

Lesson outline

  1. Native Python data structures: tuples, dicts and sets

  2. Lists, sets, and dicts comprehensions. Sequence built-ins.

  3. Python Functions (I)

  4. Some (hopefully good) advice…

  5. Python Functions (II)

  6. Exercises

Native Python data structures: tuples, hashes and sets

Tuples

A tuple is a sequence of Python objects similar to a list, values are accessed with square brackets and they can be sliced. Tuples are created with a simple comma separated list of values (parentheses are optional).

# this is a tuple
tup_0 = 1,2,3,4,5
# and this is also a tuple
tup_1 = (2,3,4,5,6)
# and this is a tuple of tuples...
tup_2 = ((2,3,-1), (0,1,2.4), 3, (-33,-22))
###
print(tup_0)
print(tup_1[3])
print(tup_2[1])

The major difference is the immutable character of the tuple.

#
tup_0[3] = 4
#
tup_3 = 4, (2,3,-1), [4,4,5,5], True
#
tup_3[2].append(6)
#
print(tup_3)
#
tup_3[3] = False

With the + operator you can join tuples and with * you can concatenate together several copies of the tuple

#
print(tup_0 + tup_3)
print(4*tup_0)

A common use of tuples if for variable assignation. Whenever you provide a tuple-like expression of variables in the left-hand side of an assignation Python unpacks the values on the right hand side.

#
a, b, c, d = tup_2
print(a)
print(b)
print(c)
print(d)

This makes specially easy to swap variable values

#
print("a = ", a)
print("b = ", b)
a,b = b,a
print("a = ", a)
print("b = ", b)

This feature can also be used in loops for variable assignation

#
tup_loop = (2,3,-1), (4,4,5), (5,6,7), (5,-1,0)
#
for var_1, var_2, var_3 in tup_loop:
print("var_1 = {0}, var_2 = {1}, var_3 = {2}".format(var_1,var_2,var_3))

Dicts

Dicts are also called hashes and are associative arrays, and can be considered like a list with an index not constrained to being a number, it can be other objects. The index in this case receives the name key and therefore hashes are mutable collections of key-value pairs of Python objects. The values of a hash can be any Python object but hash keys are required to be immutable objects, therefore they may be scalars or tuples.

They can be created using curly braces and the colon as the separator between keys and values. You can access or set element values as in lists

#
hash_0 = {"Guerras Médicas" : ["Termópilas", "Artemisio", "Salamina", "Platea"]: "Even integers", (0,2,4,6,8)}
#
print(hash_0)
#
print(hash_0["Guerras Médicas"])
#
hash_0["Guerras Médicas"].append("Micala")
#
hash_0["Fantastic Sea Creatures"] = ("Moby Dick", "The Kraken", "Mermaids")
#
print(hash_0)

You can also create a hash from a list of tuples of two elements using the dict function.

#
seq =(1,3),(2,6),(3,9),(4,12)
dict_example = dict(seq)
print(dict_example)
#

Once a hash is created you can extract from it the keys and the corresponding values with the keys and values methods. The output of the two methos is not ordered but they keep the correspondence between keys and values.

You can extract values from a dict using the get method, extract and remove the value using the pop methos, and and you can delete values using del(hash[key_value]).

#
print(dict_example)
one_get = dict_example.get(1)
two_pop = dict_example.pop(2)
del(dict_example[3])
print(one_get)
print(two_pop)
print(dict_example)
#
  1. Default values (*)

    The following situation is very common, you need to read a hash key, if the key exists, accept the hash value as input and if it does not exist take as an input a default value. This can be achieved with an if block

    #
    if (key_value in a_hash):
    value = a_hash[key_value]
    else:
    value = default_value
    #

    Both get and pop methods working with hashes accept a default value as a second argument, that will be returned in case the hash for the given key is undefined

    #
    value = a_hash.pop(key_value, default_value)
    #

    When setting values, you may also need to set a default value. Imagine you are reading a list of numbers and you want to separate them by their last digit as a dict of lists

    #
    random_nums = np.random.randint(0,5000,[30])
    last_digit_hash = {}
    for number in random_nums:
    #
    last_digit = number % 10
    #
    if( last_digit in last_digit_hash):
    last_digit_hash[last_digit].append(number)
    else:
    last_digit_hash[last_digit] = []
    last_digit_hash[last_digit].append(number)
    #
    print(last_digit_hash)
    #

    The setdefault method allows to greatly simplify this task.

    #
    random_nums = np.random.randint(0,5000,[30])
    last_digit_hash = {}
    for number in random_nums:
    last_digit = number % 10
    #
    last_digit_hash.setdefault(last_digit, []).append(number)
    #
    print(last_digit_hash)
    #

Sets

A set is an collection of unique elements with no particular order. They can be considered as the keys of a hash but without the corresponding values. They can be created with the set literal or with curly braces.

#
set_0 = {"a", 0, 1, "bc", 0.33, 0, 1}
set_1 = set(["a", "b", "c", "a", "a"])
print(set_0)
print(set_1)
#

As could be expected, the set data structure supports the mathematical set operations: intersection, union or difference, among others (you can find a complete list of Python set operations in Real Python Sets).

# Union
print(set_0.union(set_1))
print(set_0|set_1)
#
# Intersection
print(set_0.intersection(set_1))
print(set_0 & set_1)
#
# Difference
print(set_0.difference(set_1))
print(set_0 - set_1)

Comprehensions and built-in sequence functions

List, dict, and set comprehensions are a terse and neat "Pythonic" way to define new structures in your program. In the list comprehensions case they have the syntax

list_0 = [expr for value in collection if condition]

which is equivalent to the loop

list_0 = []
#
for value in collection:
#
if (condition):
list_0.append(expr)

The filter condition is not mandatory and may not be present.

For example, we can create using a loop a list, called list_mults, including the integers that are less than 4000 and can be divided exactly by 7 and 13.

list_mults = []
total = 0
for number in range(4000):
if (number % 7 == 0 and number % 13 == 0):
list_mults.append(number)
total+=1
print(total, list_mults)

We can repeat the same task using a comprehension in a more Pythonic way.

list_mults2 = [number for number in range(4000) if (number % 7 == 0 and number % 13 == 0)]
#
# Checking if both lists are equal.
list_mults2 == list_mults

The extension to sets and dicts is direct.

# Dicts
{key_expr(iter): value_expr(iter) for iter in collection if condition}
#
#
# Sets
{set_expr(iter) for iter in collection if condition}

Apart from comprehensions there are several built-in sequence functions to work with lists and other structures that are quite useful. One of them is enumerate that we have already covered. Other useful built-ins are sorted, reversed, and zip.

The built-in sorted returns a new, sorted, sequence. You can provide to sorted a key, a function that applied to the element provides a value used for the sorting.

random_nums = np.random.randint(0,1000,[30])
print(sorted(random_nums))
print(sorted(random_nums, key=str))
print(random_nums)

In the particular case of lists, you can sort them using the sort method, and this will be an in-place sorting

print(random_nums)
print(random_nums.sort())
print(random_nums)

The reversed built-in provides a generator to iterate over a sequence in reverse order.

for number in reversed(range(10)):
print(number)

The zip built-in associates the elements of two or more given sequences. The ouput is a list of tuples.

names = ["Lisa", "Auxi", "Julia", "Lisanna", "Curro"]
random_nums = np.random.randint(0,20,[5])
zipped = zip(names, random_nums, sorted(random_nums))
print(list(zipped))

This comes quite handy for the definition of hashes from two sequences

#
hash_example = dict(zip(names, random_nums))
print(hash_example)
#

It is also used for iterate in a loop over the elements of various sequences

#
for (var_1, var_2, var_3) in zip(seq_1, seq_2, seq_3):
#
# Code block
#

For example

for name, value_1, value_2 in zipped:
print("Name {0}: ({1}, {2})".format(name, value_1, value_2))
names = ["Lisa", "Auxi", "Julia", "Lisanna", "Curro"]
random_nums = np.random.randint(0,20,[5])
zipped = zip(names, random_nums, sorted(random_nums))
for name, value_1, value_2 in zipped:
print('Name {0}: ({1}, {2})'.format(name, value_1, value_2))

You can also transform a Python native structure into a Numpy ndarray structure using the np.array command

print(type(names))
npnames = np.array(names)
print(type(npnames))
print(npnames.dtype)
print(npnames.shape)

Numpy makes an educated guess to assign the best fitting type to the data.

l1 = [1,2,3,4,5,6]
l2 = [1, 2., 3.3, 0, 4, -1]
npl1 = np.array(l1)
npl2 = np.array(l2)
print(type(npl1), type(npl2))
print(npl1.dtype, npl2.dtype)
print(npl1.shape, npl2.shape)

You can apply np.array to a Numpy ndarray and in this way you obtain a copy of the initial set of data and not a reference to them. A similar command is np.asarray but in this case if the array is already a Numpy ndarray it does not perform the copying.

Python Functions (I)

Function definition allows for code wrapping for reuse and makes life simpler (and greatly help for organization). Let's start with a very simple function transforming from Kelvin to Celsius degrees. Functions start with the def keyword and return their result(s) with the return keyword. If there is no return statement the returned value is None.

def Kelvin_2_Celsius(T):
return T - 273.15
#
Temp = 273.16 # Water triple point
print("{0} K are {1} ºC".format(Temp, Kelvin_2_Celsius(Temp)))

Another simple function, transforming from degrees Fahrenheit to Kelvin, and adding a docstring with the info about the function

def Fahren_2_Kelvin(Temp):
'''
Function to transform from degrees Fahrenheit to degrees Kelvin.
Input:
Temp :: Temperature expressed in degrees Fahrenheit.
'''
return ((Temp - 32.) * (5./9.)) + 273.15 # Notice that 5/9 and 5./9. are not necessarily equal... (Python 2.7)
######################################
print('Water triple point: ', Kelvin_2_Celsius(273.16), 'ºC')
#
print('Water freezing point: ', Fahren_2_Kelvin(32), 'K')
print('Water boiling point: ', Fahren_2_Kelvin(212), 'K')
#
print('Water freezing point: ', Kelvin_2_Celsius(Fahren_2_Kelvin(32)), 'ºC')
print('Water boiling point: ', Kelvin_2_Celsius(Fahren_2_Kelvin(212.)), 'ºC')

The docstring is the multiline string just after the function definition that contains relevant information about the function for the end user.

We can also use a function to benchmark loops versus list comprehensions. We define two functions to compute the square value of the first N and benchmark them using the magic function %timeit.

def f_loop(number):
twice = []
for num in range(number):
twice.append(num*2)
return twice
####
%timeit f_loop(10000)
%timeit [num*2 for num in range(10000)]

It should be noticed that the use of range and np.arange is not equivalent. The first provides an iterator while the second array provides the full array, with an extra burden for the system. Notice the difference with

def f_loop_arange(number):
import numpy as np
twice = []
for num in np.arange(number):
twice.append(num*2)
return twice
####
%timeit f_loop_arange(10000)

Of course, the vectoriazed calculation with Numpy should be faster than the previous two

%timeit np.arange(10000)*2

We can define this functions in an external file and read the file using the magic function %run.

In a function there can be multiple arguments as well as multiple return statements (only one of them will be effective in a given invocation) and may also have no explicit return, which makes the function return None.

With regard to arguments, there are two argument types: positional and keyword arguments. Both can be found in the following example

def Magnus(Temp, ice = False):
'''Function to to compute the saturation vapor pressure E(T) in hPa units
for water vapor on liquid water or ice according to Magnus formula.
Ref. Alduchov and Eskridge, J. Appl. Met. 35 (1996) 601
Arguments:
Temp :: Temperature expressed in degrees Celsius.
ice :: If True compute E(T) over ice.
Example:
Magnus(35.0)
56.17569318925043
'''
#
import numpy as np
#
# AERKi and AERK parameters
(A, B, C) = (22.587, 273.86, 6.1121) if ice else (17.625, 243.04, 6.1094)
#
E_value = C*np.exp((A*Temp)/(B+Temp))
#
return E_value

The Temp argument is positional and the ice argument is of keyword type. Keyword type arguments always follow positional ones and they are not mandatory. Whenever they are not provided the default value is assumed and their order is not relevant. Positional arguments can also have keywords added to increase code readability.

print(Magnus(35))
print(Magnus(35, ice = True))
print(Magnus(Temp = 35, ice = False))

Another aspect of interest is that any variable defined in a funcion belongs by default to a local namespace which is destroyed once the function returns. Variables with the attribute global may be defined but one should be careful with them. Often, they increase the code complexity without offering much in return.

A function can return several values and not only one. The values are returned as a tuple and can be assigned to different variables or to a data structure.

def E_T_WI(Temp):
return Magnus(Temp, ice = False), Magnus(Temp, ice = True)
##
##
E_water, E_ice = E_T_WI(22)
print(E_water, E_ice)
#
E_wice = E_T_WI(22)
print(E_wice)

But you can also return values as a dictionary

def E_T_WI_hash(Temp):
return {"E_Water": Magnus(Temp, ice = False), "E_Ice": Magnus(Temp, ice = True)}
##
##
E_wice = E_T_WI_hash(22)
print(E_wice)
print(E_wice["E_Water"])
print(E_wice["E_Ice"])

Some advice for future programming

  1. Document with generosity your code. Include docstrings explaining what a function does, what are their arguments, what is the output format and provide an example in the docstring to be able to test the function. Have in mind the known quote Documentation is like sex; when it's good, it's very, very good, and when it's bad, it's better than nothing.

  2. Use comments also in your code to explain what are you doing. (See previous item). Include expected physical units in your comments.

  3. Use clear variable names, indicating its purpose. If you are debugging a code some time you wrote it, it is of great help to face with a variable named valenceneutrons compared with vn, or worse, n or even worse x.

  4. Follow the motto Don't duplicate, reuse often. This can be applied in different contexts. For example, if there is a constant in your program whose value is 34, define it at the beginning of the code (irrep_label = 34) and then use the variable name in the code. When the day arrives that 34 needs to be changed to 30 you do not have to find and replace every 34 instance in your code -a bug prone task- and you only need to change the initial variable assigment.

  5. Again the motto Don't duplicate, reuse often. If you find yourself repeating lines of code in different functions, create a function and call it. Similar to the previous item, but even more important as the bugs in this case are more difficult to trap.

  6. Before coding, stop for a while, think carefully about the task you are trying to solve and if it is a complex one, break it into simpler steps and deal with each one of them. Check your code using simple cases, ideally ones that you know their solution.

  7. When an error happens read the error and your code carefully.

  8. Insert diagnostics in your code that may depend on a key argument (e.g. verbose = False) printing them for a given argument value.

  9. Practice RDD (Rubber Duck Debugging). Ask your personal guru. Warning: gurus can be hot tempered. You can also ask in a forum like stackoverflow. Be polite and read (and follow) the forum policy. Sometimes forum guruses can also be hot-tempered.

  10. If your code is complex enough, you might learn about breakpoints.

Python Functions (II)

Lambda functions

Python encourages a functional programming approach, it treats the functions as objects and facilitates the use of functions as arguments. If you have three different functions

def f1(x):
return x**2
def f2(x):
import numpy as np
return np.log10(x)
def f3(x):
return x**2/(x**2 + 1.0)
def ffunc(x, fa = f1, fb = f2, fc = f3):
return fc(fb(fa(x)))
######
######
print(ffunc(2.0))
print(ffunc(2.0, fa=f3, fb=f1, fc=f2))
##

In this case we can introduce the so called Python anonymous or lambda functions, to avoid defining functions f1, f2, and f3. These are oneliners consisting of a single statement whose result is the value returned. They are defined using the lambda keyword that implies the definition of an anonymous function. They are called anonymous because, lacking the def keyword, they have no name. We can use the previous example and introduce as arguments anonymous functions

print(ffunc(2.0, fa=lambda z: z**2/(z**2 + 1.0) , fb= lambda z: z**2, fc=lambda z: np.log10(z)))
print(ffunc(2.0, fa=lambda z: z**2/(z**2 + 1.0) , fb= lambda z: z**4, fc=lambda z: np.log(z)))
##

Currying arguments

Currying -named after the mathematician Haskell Curry- a function means redefining a function by partial argument application. If we have a function that computed the distance of a point (x,y,z) given in Cartesian coordinates to the origin

def dist_3D(x,y,z):
import numpy as np
return np.sqrt(x**2+y**2+z**2)
##

I we are limiting our work to two dimensions in the z = 0 plane we can fix z = 0 and use an anonymous function

from functools import partial
dist_2D = partial(dist_3D, z=0)
##

You can also include new default values for existing keyword arguments.

Generators and itertools (*)

Errors and exception handling

A possible tool to avoid errors in your programs, making them more foolproof, is assert. The syntax of this command is assert (condition), "Warning message string". When the condition evaluates to True the program continues, however if it is False the warning message is print and the program stops with and AssertionError message. The following line of code check whether the time variable is positive or zero before the program continues running

assert (time >= 0), "Negative time value. Not allowed."

This allows for an easy check of your program input to test if the values are sound.

Sometimes, specially when user input is involved, the input may be not what the program expects and you can make the program digest the input and not die miserably. Imagine you expect the user to provide a float or integer as time value. Notice that when you read with input a value it is recorded as a string.

time_string = input(prompt="time parameter value = ")
time = float(time_string)
print("time = {0}".format(time))

If the user provides a non numerical value the code crashes with a ValueError: could not convert string to float. If we want to avoid this we can use a try/except block

def try_float(string_value):
try:
return float(string_value)
except:
return string_value
#
time_string = input(prompt="time parameter value = ")
#
time = try_float(time_string)
print("time = {0}".format(time))

You can specify wich kind of exception are you trapping with the syntax except ValueError. You can also trap several exception types including them in a tuple.

You can make use of this to keep asking for a value until it is of the correct type.

while (1):
#
time_string = input(prompt="time parameter value = ")
#
try:
time = float(time_string)
break
except:
print("Not a number. Try again.")
#
print("time = {0}".format(time))

You can have also some code block run independently of the success or failure of the try block using finally or code that runs if the try block is successful using else.

trials = 0
while (1):
#
time_string = input(prompt="time parameter value = ")
#
try:
time = float(time_string)
break
except:
print("Not a number. Try again.")
else:
print("Okay. That was a valid number.")
finally:
trials += 1
#
print("time = {0}. You needed {1} trials.".format(time, trials))

Exercises

  • Exercise 4.1: Prepare a file with a function called fence such that given two strings as arguments: string_1 = "aaa" and string_2 = "bbb", the output is aaa_bbb_aaa. In the same file define a second function called outer such that given a string returns another string made up of just the first and last characters of its input. Therefore if the input is Betis the function output should be Bs. Include in both cases a docstring with a brief function description and an example. Load the functions from the file and check what is the output of this statement: print outer(fence('carbon', '+')).

  • Exercise 4.2: Gaussian distributed data are frequently normalized to have a mean value equal to zero and a standard deviation equal to one substracting the actual mean value and dividing by the standard deviation of the dataset. Making use of the mean and std NumPy methods, define a function that takes as an argument a data vector, a new mean value, and a new standard deviation value and transforms the original set of data to a new set with a the new mean as its average value and with a dispersion given by the new standard deviation value. By default the function should standardize the data to mean = 0 and sdev = 1.

  • Exercise 4.3: Define a function that reads out temperature data from the sample Cyprus dataset and prepare graphics. Prepare a function with helpful docstring and comments that for given list of file names prepares a plot with three columns for each data file: the first including the max, min and mean monthly temperatures, the second the max, min, and mean annual temperatures, and the third depicting the monthly temperatures for all years.

  • Exercise 4.4: Write a function that generates a random password. The password should have a random length of between 10 and 12 random characters from positions 33 to 122 in the ASCII table. Your function will not take any parameters and will return the password as its only result. Make another function that checks if the password has at least two lowcase, two uppercase, and two digit characters and check how many times you need to run the original function to obtain a compliant password.