Python Lesson 4
Lesson outline
Native Python data structures: tuples, dicts and sets
Lists, sets, and dicts comprehensions. Sequence built-ins.
Python Functions
Some (hopefully good) advice…
Exercises
Native Python data structures: tuples, hashes and sets
Tuples
A tuple is a sequence of Python objects similar to a list, values are accessed with square brackets and they can be sliced. Tuples are created with a simple comma separated list of values (parentheses are optional).
The major difference is the immutable character of the tuple.
With the +
operator you can join tuples and with *
you can concatenate together several copies of the tuple
A common use of tuples if for variable assignation. Whenever you provide a tuple-like expression of variables in the left-hand side of an assignation Python unpacks the values on the right hand side.
This makes specially easy to swap variable values
This feature can also be used in loops for variable assignation
Dicts
Dicts are also called hashes and are associative arrays, and can be considered like a list with an index not constrained to being a number, it can be other objects. The index in this case receives the name key and therefore hashes are mutable collections of key-value pairs of Python objects. The values of a hash can be any Python object but hash keys are required to be immutable objects, therefore they may be scalars or tuples.
They can be created using curly braces and the colon as the separator between keys and values. You can access or set element values as in lists
You can also create a hash from a list of tuples of two elements using the dict
function.
Once a hash is created you can extract from it the keys and the corresponding values with the keys
and values
methods. The output of the two methods is not ordered but they keep the correspondence between keys and values.
You can extract values from a dict using the get
method, extract and remove the value using the pop
method, and and you can delete values using del(hash[key_value])
.
Default values (*)
The following situation is very common, you need to read a hash key, if the key exists, accept the hash value as input and if it does not exist take as an input a default value. This can be achieved with an
if
blockBoth
get
andpop
methods working with hashes accept a default value as a second argument, that will be returned in case the hash for the given key is undefinedWhen setting values, you may also need to set a default value. Imagine you are reading a list of numbers and you want to separate them by their last digit as a dict of lists
The
setdefault
method allows to greatly simplify this task.
Sets
A set is an collection of unique elements with no particular order. They can be considered as the keys of a hash but without the corresponding values. They can be created with the set
literal or with curly braces.
As could be expected, the set data structure supports the mathematical set operations: intersection, union or difference, among others (you can find a complete list of Python set operations in Real Python Sets).
Comprehensions and built-in sequence functions
List, dict, and set comprehensions are a terse and neat "Pythonic" way to define new structures in your program. In the list comprehensions case they have the syntax
which is equivalent to the loop
The filter condition is not mandatory and may not be present.
For example, we can create using a loop a list, called list_mults
, including the integers that are less than 4000 and can be divided exactly by 7 and 13.
We can repeat the same task using a comprehension in a more Pythonic way.
The extension to sets and dicts is direct.
Apart from comprehensions there are several built-in sequence functions to work with lists and other structures that are quite useful. One of them is enumerate
that we have already covered. Other useful built-ins are sorted
, reversed
, and zip
.
The built-in sorted
returns a new, sorted, sequence. You can provide to sorted a key, a function that applied to the element provides a value used for the sorting.
In the particular case of lists, you can sort them using the sort
method, and this will be an in-place sorting
The reversed
built-in provides a generator to iterate over a sequence in reverse order.
The zip
built-in associates the elements of two or more given sequences. The ouput is a list of tuples.
This comes quite handy for the definition of hashes from two sequences
It is also used for iterate in a loop over the elements of various sequences
For example
You can also transform a Python native structure into a Numpy ndarray structure using the np.array
command
Numpy makes an educated guess to assign the best fitting type to the data.
You can apply np.array
to a Numpy ndarray and in this way you obtain a copy of the initial set of data and not a reference to them. A similar command is np.asarray
but in this case if the array is already a Numpy ndarray it does not perform the copying.
Python Functions
Basic concepts
Function definition allows for code wrapping for its later reuse, making life simpler (and they greatly help for organization and optimization). Let's start with a very simple function transforming from Kelvin to Celsius degrees. Functions start with the def
keyword and return their result(s) with the return
keyword. If there is no return
statement the returned value is None
.
Another simple function, transforming from degrees Fahrenheit to Kelvin, and adding a docstring with the info about the function
The docstring is the multiline string just after the function definition that contains relevant information about the function for the end user. It can be accessed with the function attribute __doc__
.
We can define a function into a function. This is shown in the next example, that computes the body mass index used as an example when we explained conditionals
We can also use functions to benchmark loops versus list comprehensions. We define two functions to compute the square value of the first N
and benchmark them using the magic function %timeit
.
It should be noticed that the use of range
and np.arange
is not equivalent. The first provides an iterator while the second array provides the full array, with an extra burden for the system. Notice the difference with
Of course, the vectorized calculation with Numpy is way faster than the previous two
We can define these functions in an external file and read the file from the notebook using the magic function %run
.
In a function there can be multiple arguments as well as multiple return
statements (only one of them will be effective in a given invocation). There may also be no explicit return
, which makes the function returns None
.
Positional and keyword arguments
With regard to arguments, there are two argument types: positional and keyword arguments. Both can be found in the following example where we define a function that computes the saturation vapor pressure of water vapor over liquid water or ice for a given temperature.
The Temp
argument is a positional one and the ice
argument is of keyword type. Keyword type arguments always follow positional ones and they are not mandatory. Whenever they are not provided, the default value is assumed and their order is not relevant. Positional arguments can also have keywords added in their invocation to increase code readability.
Frequently, the None
type is used as the default value of keyword arguments. This helps preventing unforeseen side effects that can arise whenever a mutable object is used as the default value of a parameter. Such side effects stem from the fact that the default value in the function is assigned only once, at compiling time when the function is defined, and not each time the function is called.
Therefore, when a function is defined the compiler includes an attribute called __defaults__
with a reference to the default values of keyword arguments. And this is not recreated anymore once the function is used, which can give rise to unexpected situations. Let's see an example of this
As mentioned above, this can be solved making use of None
and dynamically defining the mutable object at run time
There are situations in programming where we do not know the precise number of positional parameters of a function. This is solved in Python with tuple references adding an asterisk (*
) in front of the last parameter name. For example, we can compute the geometric mean of a set of values as follows
You can also use the star operator in function invocation. This operator singularizes each element of the list (unpacking the list). Therefore, if you need to run the previously defined function over a list you can do it as follows
The star operator can be used together with zip
to easily alter lists structure, for example extracting the different lists that have been previously zipped.
We can also have an undeterminate number of keyword parameters in a function. It is possible to pass them as a hash, using the double asterisk, **
.
Returning multiple values
A function can return several values and not only one. The values are returned as a tuple and can be assigned to different variables or to a data structure.
But you can also return values as a dictionary
Variables scope
Another aspect of interest is that any variable defined in a funcion belongs by default to a local namespace which is destroyed once the function returns.
Note that once we define a variable as local in a function we cannot make any reference previous to the definition to the variable
Variables with the attribute global may be defined, which will solve the previous error, but one should be careful with this. Often, the definition of global variables increases the code complexity without offering much in return.
Note that we have changed inside the function the value of the variable.
Functions are references
A function name is a reference for the function. Therefore, we can assign multiple names to the same function, and if some of these names are deleted we can still access the function through the rest of them.
Then, we can pass function names -references- as arguments or parameters to functions. Let's have a look to a simple example
By the way, if you try to print the name of the function using the f argument you will obtain the argument pointer. You can access the function name using the __name__
attribute as follows
Another example
A function can also output a reference to a new function. A simple example of this is as follows
We can use several arguments too
And using the asterisk notation we can also deal with an unknown number of parameters as in this case where we are given a certain number of terms in the Taylor expansion of a given function, the sine function in this case
Some advice for future programming
Document with generosity your code. Include docstrings explaining what a function does, what are their arguments, what is the output format and provide an example in the docstring to be able to test the function. Have in mind the known quote Documentation is like sex; when it's good, it's very, very good, and when it's bad, it's better than nothing.
Use comments also in your code to explain what are you doing. (See previous item). Include expected physical units in your comments.
Use clear variable names, indicating its purpose. If you are debugging a code written sometime ago, it is of great help to face with a variable named valenceneutrons compared with vn, or worse, n or, even worse, x.
Follow the motto Don't duplicate, reuse often. This can be applied in different contexts. For example, if there is a constant in your program whose value is
34
, define it at the beginning of the code (irrep_label = 34
) and then use the variable name in the code. When the day arrives that34
needs to be changed to30
you do not have to find and replace every34
instance in your code -a bug prone task- and you only need to change the initial variable assigment.Again the motto Don't duplicate, reuse often. If you find yourself repeating lines of code in different functions, create a function and call it. Similar to the previous item, but even more important as the bugs in this case are more difficult to trap.
Before coding, stop for a while, think carefully about the task you are trying to solve and, if it is a complex one, break it into simpler steps and deal with each one of them. Check your code using simple cases, ideally ones that you know their solution.
When an error happens read the error and your code carefully.
Insert diagnostics in your code that may depend on a key argument (e.g.
verbose = False
) printing them for a given argument value.Practice RDD (Rubber Duck Debugging). Ask your personal guru. Warning: gurus can be hot tempered. You can also ask in a forum like
stackoverflow
. Be polite and read (and follow) the forum policy. Sometimes forum guruses can also be hot-tempered.If your code is complex enough, you might learn about breakpoints.
Exercises
Exercise 4.1: Prepare a file with a function called
fence
such that given two strings as arguments:string_1 = "aaa"
andstring_2 = "bbb"
, the output isaaa_bbb_aaa
. In the same file define a second function calledouter
such that given a string returns another string made up of just the first and last characters of its input. Therefore if the input isBetis
the function output should beBs
. Include in both cases a docstring with a brief function description and an example. Load the functions from the file and check what is the output of this statement:print outer(fence('carbon', '+'))
.Exercise 4.2: Write a function that generates a random password. The password should have a random length of between 10 and 12 random characters from positions 33 to 122 in the ASCII table. Your function will not take any parameters and will return the password as its only result. Make another function that checks if the password has at least two lowcase, two uppercase, and two digit characters and the output of this function will be a compliant password and the number of times has been run the original function to obtain the compliant password.
Exercise 4.3: Gaussian distributed data are frequently normalized to have a mean value equal to zero and a standard deviation equal to one substracting the actual mean value and dividing by the standard deviation of the dataset. Making use of the
mean
andstd
NumPy methods, define a function that takes as an argument a data vector, a new mean value, and a new standard deviation value and transforms the original set of data to a new set with a the new mean as its average value and with a dispersion given by the new standard deviation value. By default the function should standardize the data tomean = 0
andsdev = 1
.Exercise 4.4: You can approximate the cubic root of a number
a
asx(n+1) = 2 x(n)/3 + a/(3 x(n)²)
withx(0) = a/3
. Prepare a function that computes the cubic root of a given root until the difference between successive computed values is less than a given threshod (e.g.1E-8
) and compare the obtained value and the value ofx0**(1/3)
.Exercise 4.5: Define a function that reads out temperature data from the sample Cyprus dataset and prepare graphics. Prepare a function with helpful docstring and comments that for given list of file names prepares a plot with three columns for each data file: the first including the max, min and mean monthly temperatures, the second the max, min, and mean annual temperatures, and the third depicting the monthly temperatures for all years.
Exercise 4.6: The sieve of Erastosthenes is an ancient algorithm (approx. 3rd century BCE) to find all prime numbers up to a given limit. The implementation of this algorithm in pseudocode is as follows:
algorithm Sieve of Eratosthenes is input: an integer n > 1. output: all prime numbers from 2 through n.
Implement this algorithm in a Python function.
Last updated
Was this helpful?