Contents | Previous (3.1 Scripting) | Next (3.3 Error Checking)

3.2 More on Functions

Although functions were introduced earlier, very few details were provided on how they actually work at a deeper level. This section aims to fill in some gaps and discuss matters such as calling conventions, scoping rules, and more.

Calling a Function

Consider this function:

def read_prices(filename, debug):
    ...

You can call the function with positional arguments:

prices = read_prices('prices.csv', True)

Or you can call the function with keyword arguments:

prices = read_prices(filename='prices.csv', debug=True)

Default Arguments

Sometimes you want an argument to be optional. If so, assign a default value in the function definition.

def read_prices(filename, debug=False):
    ...

If a default value is assigned, the argument is optional in function calls.

d = read_prices('prices.csv')
e = read_prices('prices.dat', True)

Note: Arguments with defaults must appear at the end of the arguments list (all non-optional arguments go first).

Prefer keyword arguments for optional arguments

Compare and contrast these two different calling styles:

parse_data(data, False, True) # ?????

parse_data(data, ignore_errors=True)
parse_data(data, debug=True)
parse_data(data, debug=True, ignore_errors=True)

In most cases, keyword arguments improve code clarity–especially for arguments that serve as flags or which are related to optional features.

Design Best Practices

Always give short, but meaningful names to functions arguments.

Someone using a function may want to use the keyword calling style.

d = read_prices('prices.csv', debug=True)

Python development tools will show the names in help features and documentation.

Returning Values

The return statement returns a value

def square(x):
    return x * x

If no return value is given or return is missing, None is returned.

def bar(x):
    statements
    return

a = bar(4)      # a = None

# OR
def foo(x):
    statements  # No `return`

b = foo(4)      # b = None

Multiple Return Values

Functions can only return one value. However, a function may return multiple values by returning them in a tuple.

def divide(a,b):
    q = a // b      # Quotient
    r = a % b       # Remainder
    return q, r     # Return a tuple

Usage example:

x, y = divide(37,5) # x = 7, y = 2

x = divide(37, 5)   # x = (7, 2)

Variable Scope

Programs assign values to variables.

x = value # Global variable

def foo():
    y = value # Local variable

Variables assignments occur outside and inside function definitions. Variables defined outside are “global”. Variables inside a function are “local”.

Local Variables

Variables assigned inside functions are private.

def read_portfolio(filename):
    portfolio = []
    for line in open(filename):
        fields = line.split(',')
        s = (fields[0], int(fields[1]), float(fields[2]))
        portfolio.append(s)
    return portfolio

In this example, filename, portfolio, line, fields and s are local variables. Those variables are not retained or accessible after the function call.

>>> stocks = read_portfolio('portfolio.csv')
>>> fields
Traceback (most recent call last):
File "<stdin>", line 1, in ?
NameError: name 'fields' is not defined
>>>

Locals also can’t conflict with variables found elsewhere.

Global Variables

Functions can freely access the values of globals defined in the same file.

name = 'Dave'

def greeting():
    print('Hello', name)  # Using `name` global variable

However, functions can’t modify globals:

name = 'Dave'

def spam():
  name = 'Guido'

spam()
print(name) # prints 'Dave'

Remember: All assignments in functions are local.

Modifying Globals

If you must modify a global variable you must declare it as such.

name = 'Dave'

def spam():
    global name
    name = 'Guido' # Changes the global name above

The global declaration must appear before its use and the corresponding variable must exist in the same file as the function. Having seen this, know that it is considered poor form. In fact, try to avoid global entirely if you can. If you need a function to modify some kind of state outside of the function, it’s better to use a class instead (more on this later).

Argument Passing

When you call a function, the argument variables are names that refer to the passed values. These values are NOT copies (see section 2.7). If mutable data types are passed (e.g. lists, dicts), they can be modified in-place.

def foo(items):
    items.append(42)    # Modifies the input object

a = [1, 2, 3]
foo(a)
print(a)                # [1, 2, 3, 42]

Key point: Functions don’t receive a copy of the input arguments.

Reassignment vs Modifying

Make sure you understand the subtle difference between modifying a value and reassigning a variable name.

def foo(items):
    items.append(42)    # Modifies the input object

a = [1, 2, 3]
foo(a)
print(a)                # [1, 2, 3, 42]

# VS
def bar(items):
    items = [4,5,6]    # Changes local `items` variable to point to a different object

b = [1, 2, 3]
bar(b)
print(b)                # [1, 2, 3]

Reminder: Variable assignment never overwrites memory. The name is merely bound to a new value.

Exercises

This set of exercises have you implement what is, perhaps, the most powerful and difficult part of the course. There are a lot of steps and many concepts from past exercises are put together all at once. The final solution is only about 25 lines of code, but take your time and make sure you understand each part.

A central part of your report.py program focuses on the reading of CSV files. For example, the function read_portfolio() reads a file containing rows of portfolio data and the function read_prices() reads a file containing rows of price data. In both of those functions, there are a lot of low-level “fiddly” bits and similar features. For example, they both open a file and wrap it with the csv module and they both convert various fields into new types.

If you were doing a lot of file parsing for real, you’d probably want to clean some of this up and make it more general purpose. That’s our goal.

Start this exercise by opening the file called Work/fileparse.py. This is where we will be doing our work.

Exercise 3.3: Reading CSV Files

To start, let’s just focus on the problem of reading a CSV file into a list of dictionaries. In the file fileparse.py, define a function that looks like this:

# fileparse.py
import csv

def parse_csv(filename):
    '''
    Parse a CSV file into a list of records
    '''
    with open(filename) as f:
        rows = csv.reader(f)

        # Read the file headers
        headers = next(rows)
        records = []
        for row in rows:
            if not row:    # Skip rows with no data
                continue
            record = dict(zip(headers, row))
            records.append(record)

    return records

This function reads a CSV file into a list of dictionaries while hiding the details of opening the file, wrapping it with the csv module, ignoring blank lines, and so forth.

Try it out:

Hint: python3 -i fileparse.py.

>>> portfolio = parse_csv('Data/portfolio.csv')
>>> portfolio
[{'price': '32.20', 'name': 'AA', 'shares': '100'}, {'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}]
>>>

This is good except that you can’t do any kind of useful calculation with the data because everything is represented as a string. We’ll fix this shortly, but let’s keep building on it.

Exercise 3.4: Building a Column Selector

In many cases, you’re only interested in selected columns from a CSV file, not all of the data. Modify the parse_csv() function so that it optionally allows user-specified columns to be picked out as follows:

>>> # Read all of the data
>>> portfolio = parse_csv('Data/portfolio.csv')
>>> portfolio
[{'price': '32.20', 'name': 'AA', 'shares': '100'}, {'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}]

>>> # Read only some of the data
>>> shares_held = parse_csv('Data/portfolio.csv', select=['name','shares'])
>>> shares_held
[{'name': 'AA', 'shares': '100'}, {'name': 'IBM', 'shares': '50'}, {'name': 'CAT', 'shares': '150'}, {'name': 'MSFT', 'shares': '200'}, {'name': 'GE', 'shares': '95'}, {'name': 'MSFT', 'shares': '50'}, {'name': 'IBM', 'shares': '100'}]
>>>

An example of a column selector was given in Exercise 2.23. However, here’s one way to do it:

# fileparse.py
import csv

def parse_csv(filename, select=None):
    '''
    Parse a CSV file into a list of records
    '''
    with open(filename) as f:
        rows = csv.reader(f)

        # Read the file headers
        headers = next(rows)

        # If a column selector was given, find indices of the specified columns.
        # Also narrow the set of headers used for resulting dictionaries
        if select:
            indices = [headers.index(colname) for colname in select]
            headers = select
        else:
            indices = []

        records = []
        for row in rows:
            if not row:    # Skip rows with no data
                continue
            # Filter the row if specific columns were selected
            if indices:
                row = [ row[index] for index in indices ]

            # Make a dictionary
            record = dict(zip(headers, row))
            records.append(record)

    return records

There are a number of tricky bits to this part. Probably the most important one is the mapping of the column selections to row indices. For example, suppose the input file had the following headers:

>>> headers = ['name', 'date', 'time', 'shares', 'price']
>>>

Now, suppose the selected columns were as follows:

>>> select = ['name', 'shares']
>>>

To perform the proper selection, you have to map the selected column names to column indices in the file. That’s what this step is doing:

>>> indices = [headers.index(colname) for colname in select ]
>>> indices
[0, 3]
>>>

In other words, “name” is column 0 and “shares” is column 3. When you read a row of data from the file, the indices are used to filter it:

>>> row = ['AA', '6/11/2007', '9:50am', '100', '32.20' ]
>>> row = [ row[index] for index in indices ]
>>> row
['AA', '100']
>>>

Exercise 3.5: Performing Type Conversion

Modify the parse_csv() function so that it optionally allows type-conversions to be applied to the returned data. For example:

>>> portfolio = parse_csv('Data/portfolio.csv', types=[str, int, float])
>>> portfolio
[{'price': 32.2, 'name': 'AA', 'shares': 100}, {'price': 91.1, 'name': 'IBM', 'shares': 50}, {'price': 83.44, 'name': 'CAT', 'shares': 150}, {'price': 51.23, 'name': 'MSFT', 'shares': 200}, {'price': 40.37, 'name': 'GE', 'shares': 95}, {'price': 65.1, 'name': 'MSFT', 'shares': 50}, {'price': 70.44, 'name': 'IBM', 'shares': 100}]

>>> shares_held = parse_csv('Data/portfolio.csv', select=['name', 'shares'], types=[str, int])
>>> shares_held
[{'name': 'AA', 'shares': 100}, {'name': 'IBM', 'shares': 50}, {'name': 'CAT', 'shares': 150}, {'name': 'MSFT', 'shares': 200}, {'name': 'GE', 'shares': 95}, {'name': 'MSFT', 'shares': 50}, {'name': 'IBM', 'shares': 100}]
>>>

You already explored this in Exercise 2.24. You’ll need to insert the following fragment of code into your solution:

...
if types:
    row = [func(val) for func, val in zip(types, row) ]
...

Exercise 3.6: Working without Headers

Some CSV files don’t include any header information. For example, the file prices.csv looks like this:

"AA",9.22
"AXP",24.85
"BA",44.85
"BAC",11.27
...

Modify the parse_csv() function so that it can work with such files by creating a list of tuples instead. For example:

>>> prices = parse_csv('Data/prices.csv', types=[str,float], has_headers=False)
>>> prices
[('AA', 9.22), ('AXP', 24.85), ('BA', 44.85), ('BAC', 11.27), ('C', 3.72), ('CAT', 35.46), ('CVX', 66.67), ('DD', 28.47), ('DIS', 24.22), ('GE', 13.48), ('GM', 0.75), ('HD', 23.16), ('HPQ', 34.35), ('IBM', 106.28), ('INTC', 15.72), ('JNJ', 55.16), ('JPM', 36.9), ('KFT', 26.11), ('KO', 49.16), ('MCD', 58.99), ('MMM', 57.1), ('MRK', 27.58), ('MSFT', 20.89), ('PFE', 15.19), ('PG', 51.94), ('T', 24.79), ('UTX', 52.61), ('VZ', 29.26), ('WMT', 49.74), ('XOM', 69.35)]
>>>

To make this change, you’ll need to modify the code so that the first line of data isn’t interpreted as a header line. Also, you’ll need to make sure you don’t create dictionaries as there are no longer any column names to use for keys.

Exercise 3.7: Picking a different column delimiter

Although CSV files are pretty common, it’s also possible that you could encounter a file that uses a different column separator such as a tab or space. For example, the file Data/portfolio.dat looks like this:

name shares price
"AA" 100 32.20
"IBM" 50 91.10
"CAT" 150 83.44
"MSFT" 200 51.23
"GE" 95 40.37
"MSFT" 50 65.10
"IBM" 100 70.44

The csv.reader() function allows a different column delimiter to be given as follows:

rows = csv.reader(f, delimiter=' ')

Modify your parse_csv() function so that it also allows the delimiter to be changed.

For example:

>>> portfolio = parse_csv('Data/portfolio.dat', types=[str, int, float], delimiter=' ')
>>> portfolio
[{'price': '32.20', 'name': 'AA', 'shares': '100'}, {'price': '91.10', 'name': 'IBM', 'shares': '50'}, {'price': '83.44', 'name': 'CAT', 'shares': '150'}, {'price': '51.23', 'name': 'MSFT', 'shares': '200'}, {'price': '40.37', 'name': 'GE', 'shares': '95'}, {'price': '65.10', 'name': 'MSFT', 'shares': '50'}, {'price': '70.44', 'name': 'IBM', 'shares': '100'}]
>>>

Commentary

If you’ve made it this far, you’ve created a nice library function that’s genuinely useful. You can use it to parse arbitrary CSV files, select out columns of interest, perform type conversions, without having to worry too much about the inner workings of files or the csv module.

Contents | Previous (3.1 Scripting) | Next (3.3 Error Checking)