Defensive Programming: How to (Help) Shield Your Code From Error

Michael Bertolacci
michael_bertolacci@uow.edu.au
Centre for Environmental Informatics, UOW, Australia

2021-07-06

Who I am

A postdoc at the University of Wollongong, working on applied spatio-temporal statistical problems with Noel Cressie and Andrew Zammit-Mangion.

Before that, I was a PhD student at the University of Western Australia.

Before that, I was a software engineer for 10 years.

Assumed programming knowledge

You have programmed before.
You can read basic R or Python.

If you’re at an earlier stage and want to learn more about these, see Danielle Navarro’s great textbook [1] or the R for Data Science book [2].

⚠️ Warning ⚠️

This tutorial is rather informal and unscholarly. It is full of opinions and observations from my own experience.

The hope is to provoke discussion on good coding practices within this community.

Please share your thoughts, opinions, rants, etc. in Slack at any time!

We will stop for questions and discussion a few times.

Motivation

A meme

There are several things that can (should?) keep a statistician from falling asleep at night.

One of those is Simpson’s paradox:

Another is having bugs 🐞 in your code.

Story time 📊 🐞

Reinhart, C, M.; Rogoff, K. S. (2010). Growth in a Time of Debt. American Economic Review. 100 (2): 573–78. doi:10.1257/aer.100.2.573.

Their conclusions led several governments to implement financial austerity programs.

A student reproducing the work found that they’d made an error in a formula in their Excel spreadsheet, invalidating some of their empirical claims.

Story time

My first research project…

The first time I ever did research was in computer science, back in my undergrad.
I was trying to reproduce the results of a published paper. This was in computer science, but the specific topic doesn’t matter. I had implemented their methods, and some of the numbers I was getting in a simulation study were different to the authors.
So I asked the authors for their own code so I could compare. They kindly obliged, and it turned out they had a bug in one of their methods. In fact I had made the same error in the same method earlier but, luckily, I’d noticed it. They weren’t so lucky.
Perhaps it doesn’t matter in the grand scheme of things, but one part of their results were wrong, a little blip in science. I did tell them but nothing really came of it—I’m not blaming them—and I left it.
But it’s just sad, really, to have a little bit of science wrong. I have seen this again since.
In all cases the work was peer-reviewed.
How does this happen?

Your stories and a poll

If you’ve had a similar experience, feel free to share it in the Slack chat and we can discuss it a bit later.

🗳️ Now let’s just run a quick poll… 🗳️

Peer review: the crown jewel of science 👑

We peer review each other’s work prior to publication.

Keeps us honest
Encourages rigour
Ensures the work is useful
Catches errors

But it generally only applies to the written part of the work. If the numbers reported look plausible, there’s (usually) an assumption that they are computed correctly.

Code is a blind spot for peer review.

Bugs aren’t just a problem in academic work. Professional software developers invest huge effort into avoiding and fixing bugs.

Some gold standard strategies the best teams use:

Peer review: software is not released until the code has been read by multiple people
Dedicated testing teams: you can get paid to break code, all day! https://www.seek.com.au/software-tester-jobs

Most academics don’t get to use these strategies…

So bugs should be a huge concern in academic work.

But they are also a concern in professional software work. Software engineers invest huge effort into avoiding and fixing bugs.

There are lots and lots of strategies. But two gold-standard strategies, used by the best teams are peer review and dedicated testing teams.

In peer review, the software is not released until the code has been read by multiple people. Every time a change is made to the code, that is reviewed too. It’s similar to academic peer review.

Lots of companies have dedicated testing teams. People actually get paid to break code, all day, every day! They don’t even have to write the code.

Unfortunately, these are not an option for most academics. Most of the code is worked on by one person and no-one else looks at it.

There are some strategies that an individual programmer can use, right now.

We’ll cover three:

Strategy 1: Make your code easier to understand
Strategy 2: Defensive programming
Strategy 3: Automated testing

Every bug can be traced back to a wrong assumption.

The person who wrote the code made the assumption.
⇒ You wrote the code.
⇒ You caused the bug :(

What you need are ways to:

Make your assumptions more obvious so you avoid making wrong ones (Strategy 1).
Make it so you know when an assumption is violated so you can find and fix the error (Strategies 2 and 3).

Strategy 1: Make your code easier to understand

Code that is hard to read is hard to understand.

You should strive to make your code clean and readable.

This helps:

You now, when you debug the code
You in the future when you read the code
Everyone who reads the code

The DRY principle

Don’t Repeat Yourself.

\[ \textrm{RMS}(x_1, \ldots, x_n) = \sqrt{\frac{1}{n} \sum_{i = 1}^n x_i^2} \]

rms1 = np.sqrt(np.mean(
  np.square(error1)
))
rms2 = np.sqrt(np.mean(
  np.square(error2)
))

def rms(x):
    '''Return the root mean square of
    the vector x'''
    return np.sqrt(np.mean(
      np.square(x)
    ))


rms1 = rms(error1)
rms2 = rms(error2)

Follow a style guide

Style guides list rules to follow when laying out your code.

You should start from a publicly available guide.

For R, you can follow the Tidyverse style guide [3].

For Python, the standard is called PEP 8 [4].

You can tweak the guidelines to suit you, but not following some style guide is like writing a paper with inconsistent grammar, or using two different citation styles in the same document (Bertolacci, 2020).

R style example

Non-compliant:

y <- x*2
if(y > 2) {
  z <- y + 2
}
sqrt_z <- sqrt (z)

Compliant:

y <- x * 2
if (y > 2) {
  z <- y + 2
}
sqrt_z <- sqrt(z)

The R package lintr [5] can check your code automatically:

example.R:1:7: style: Put spaces around all infix operators.
y <- x*2
     ~^~
example.R:2:3: style: Place a space before left parenthesis, except in a function call.
if(y > 2) {
  ^
example.R:5:11: style: Remove spaces before the left parenthesis in a function call.
sqrt_z <- sqrt (z)

Python style example

Non-compliant:

y = x*2
if y> 2:
   z = y + 2
sqrt_z = sqrt (z)

Compliant:

y = x * 2
if y > 2:
    z = y + 2
sqrt_z = sqrt(z)

The program pycodestyle [6] checks if you are following the rules:

example.py:2:5: E225 missing whitespace around operator
example.py:3:4: E111 indentation is not a multiple of four
example.py:4:14: E211 whitespace before '('

Sometimes you knew what the code does when you wrote it, but a few months later…

loc<-c(137.5, -4.7)
loc2<-c(77.5, 18.1)
dat<-read.csv('locations.csv')
d=2*3389.5*asin(sqrt(
sin((dat$lat*pi/180-loc[2]*pi/180)/2)^2
+cos(dat$lat*pi/180)*cos(loc[2]*pi/180)
*sin((dat$lon*pi/180-loc[1]*pi/180)/2)^2))
d2=2*3389.5*asin(sqrt(
sin((dat$lat*pi/180-loc2[2]*pi/180)/2)^2
+cos(dat$lat*pi/180)*cos(loc2[2]*pi/180)
*sin((dat$lon*pi/180-loc2[1]*pi/180)/2)^2))
#dat$nrst=-1
dat$nrst=0
for(i in 1:nrow(dat)){
if(d[i]<d2[i])
dat$nrst[i]<-1
else
dat$nrst[i]<-2
}

MARS_RADIUS <- 3389.5  # in kilometres

hav_deg <- function(x) sin(x * pi / 360) ^ 2
cos_deg <- function(x) cos(x * pi / 180)
mars_haversine_dist <- function(locations, origin) {
  with(locations, {
    2 * MARS_RADIUS * asin(sqrt(
      hav_deg(latitude - origin['latitude'])
      + cos_deg(latitude) * cos_deg(origin['latitude'])
      * hav_deg(longitude - origin['longitude'])
    ))
  })
}

curiosity_location <- c(longitude = 137.5, latitude = -4.7)
perseverance_location <- c(longitude = 77.5, latitude = 18.1)

locations <- read.csv('locations.csv')
curiousity_dist <- mars_haversine_dist(locations, curiosity_location)
perseverance_dist <- mars_haversine_dist(locations, perseverance_location)

locations$nearest_rover <- 'none'
for (i in 1 : nrow(locations)) {
  if (curiousity_dist[i] < perseverance_dist[i]) {
    locations$nearest_rover[i] <- 'curiousity'
  } else {
    locations$nearest_rover[i] <- 'perseverance'
  }
}

Now here’s a cleaned-up version, luckily I wrote this program a few days ago, not months ago, so I know what it does.

Well the weird number now has a name, it’s called MARS_RADIUS. That’s a good hint.

Now there’s a function called mars_haversine_dist. Ahhh, so that computes a distance on Mars using the haversine formula. The repetition is gone for the conversions to radians too.

Now the location variables have better names, curiousity location and perseverance location. These are the locations of two Mars rovers. Actually Perseverance isn’t on Mars yet but it’s going to land here.

And now is we scroll down and look at the end, it might be a bit clearer that we have a table of locations ON MARS, and we are finding the distance between these locations and the rovers, then finding the closest rover.

All this is relatively easy to figure out because the code uses nice names for everything, doesn’t repeat itself, and has a logical layout. If you read this code years from now, there’s a good chance you will understand its purpose, at least partly.

Comments

A rule of thumb: Comments in code should say what your code does, not explain how.

Unhelpful:

mars_haversine_dist <- function(
  locations,
  origin
) {
  # Take two times the radius times
  # the arcsine of the square root
  # of ...
  ...
}

Helpful:

# Calculate the distance in
# kilometres between two coordinates
# on Mars
mars_haversine_dist <- function(
  locations,
  origin
) {
  ...
}

The code should be clean enough that the how is obvious.

(I break this rule around 10% of the time.)

Now a lot of people are told that it’s really important to comment your code. I don’t disagree with that at all, comments are good. But they shouldn’t substitute for good code.

My view on this is that comments should, basically, say what the code does, not how.

Here’s an example. Just repeating the haversine distance function from the previous example. On the left is what I think is an unhelpful comment, explaining the equation. But if the code is nicely written you shouldn’t need that.

On the right is a more helpful comment that explains what the function does. It would be even better if it described how the function arguments work, but there’s not enough space for that on this slide.

Like every rule of thumb, you end up breaking it sometimes. Sometimes the maths in the code is complicated enough that it needs some explanation or a reference, and that’s okay.

Long functions

A rule of thumb: functions should fit on one or two screenfuls.

The maximum length of a function is inversely proportional to the complexity and indentation level of that function. So, if you have a conceptually simple function that is just one long (but simple) case-statement, where you have to do lots of small things for a lot of different cases, it’s OK to have a longer function.
- Linus Torvalds [7]

(I break this rule around 10% of the time)

Long files

You should also organise your code into multiple well-named files.

A rule of thumb: code files should not be too long, say no longer than 1000-2000 lines.

(I break this rule less often.)

More clean code tips

Code for correctness and readability first.
Shorter isn’t necessarily better—short programs don’t run any faster!
Tab completion in your editor means you don’t have to type longer function and variable names every time.
Picking good names for things is hard, but it’s worth the effort.

Discussion on clean code 🗣️

Questions? Comments? Disagreements?

Feel free to add your own clean code tips to Slack!

Or raise your hand in the Zoom chat. ✋

Strategy 2: Defensive programming…

…or, dead programs tell no lies [8].

A tale of two errors…

n <- length(y)
sigma <- 1
log_likelihood <- (
  - n * log(2 * pi) / 2
  - n * log(sigma)
  - sum(y ^ 2 / sigma ^ 2) / 2
)
bic <- log(n) - 2 * log_likelihood
print(bic)

## [1] NA

No explicit error, but you probably don’t want that value to be NA. What’s the root cause?

dat <- data.frame(x = 1 : 10)
dat$y <- sin(df$x)
fit <- lm(y ~ x, data = dat)
print(coef(fit))
plot(dat$x, fitted(dat$y))

## Error in df$x: object of type
##   'closure' is not subsettable

Explicit error; annoying, but offending line is obvious.

So I’ve put two R programs here, one on the left, one on the right. On the left it’s calculating a Gaussian log likelihood of some vector y with standard deviation sigma, then a BIC value. And look, we got the dreaded NA answer, which is typically a missing value.

There’s no explicit error here, no line failed. So you have to figure out the root cause. Luckily at least we noticed the NA.

Now on the right, there’s code with another problem. And instead of a bad answer, the code threw an error. In fact a classic R error. Now in a lot of ways this is a much nicer error than on the left because you know the offending line immediately. It threw an error.

This is what is meant by dead programs tell no lies. Code that actually stops because of an error is better than code that continues to run and gives the wrong answer.

Dead programs tell no lies

Defensive programming can be used when you’re aware you’re making an assumption. The idea is to intentionally throw an error when an assumption found to be false.

Most languages have functions to help you do this.

This way, you don’t have to search for the offending line.

Example: the exponential covariance function,

\[ C(\mathbf{x}, \mathbf{x}') \equiv \exp\left( -\frac{||\mathbf{x} - \mathbf{x}'||}{l} \right) \]

where \(l > 0\).

In R, you can use the stopifnot function:

# Given a vector or matrix x, return the covariance matrix based on the
# exponential covariance function
cov_exponential <- function(x, length_scale) {
  stopifnot(length_scale > 0)
  exp(-fields::rdist(x) / length_scale)
}

cov_exponential(y, -1)

## Error in cov_exponential(y, -1): length_scale > 0 is not TRUE

Here’s a statistical example in R. At the top I’ve given the definition of the exponential covariance function, between two points x and x prime. One of the things you will notice is that it has one parameter, l, called the length scale, and it must be greater than zero.

And below is some R code that implements it, as the comment says. Now the first line of the function cov_exponential has a line using an R function called stopifnot. This function will throw an error if the expression inside it is false.

The error is very informative, as you can see. Now if you notice, without this stopifnot expression, the function would blindly accept a negative length and return a number. But you know up front that the length scale should be positive. So if you ever accidently pass a number in that’s negative, you’re find out straight away.

In Python, you can use the assert keyword:

import numpy as np

def cov_exponential(x, length_scale):
    '''Given a vector or matrix x, return the covariance matrix based on the
    exponential covariance function'''
    assert length_scale > 0, 'length_scale must be positive'
    return np.exp(-np.linalg.norm(x) / length_scale)

cov_exponential(y, -1)

Traceback (most recent call last):
  File "example.py", line 7, in <module>
    cov_exponential(y, -1)
  File "example.py", line 4, in cov_exponential
    assert length_scale > 0, 'length_scale must be positive'
AssertionError: length_scale must be positive

Back to the tale of two errors

stopifnot(all(!is.na(y)))
n <- length(y)
sigma <- 1
log_likelihood <- (
  - n * log(2 * pi) / 2
  - n * log(sigma)
  - sum(y ^ 2 / sigma ^ 2) / 2
)
bic <- log(n) - 2 * log_likelihood
print(bic)

## Error: all(!is.na(y)) is not TRUE

More defensive programming tips

You won’t always know what assumptions you’re making implicitly immediately.
But! when your code breaks, add a stopifnot or assert to catch it next time.
Don’t feel the need to check everything, just the most important parts.
In R, the assertthat package [9] can help.

Discussion on defensive programming 🗣️

Questions? Comments? Disagreements?

Feel free to add your own defensive programming tips to Slack!

Or raise your hand in the Zoom chat. ✋

(But note that we will talk about a related topic, automated testing, very soon.)

Strategy 3: Automated testing

Defensive programming actively tests your assumptions when the code is running.

Automated testing goes one step further: you write code that calls your code, actively probing for problems.

add_one.py:

def add_one(x):
  '''Returns the argument, plus one'''
  return x + 1.1

test_add_one.py:

from add_one import add_one

def test_add_one():
  assert add_one(1) == 2

Running the tests:

======================= test session starts ========================
collected 1 item

test_add_one.py F                                            [100%]
============================= FAILURES =============================
___________________________ test_add_one ___________________________

    def test_add_one():
>     assert add_one(1) == 2
E     assert 2.1 == 2
E      +  where 2.1 = add_one(1)

test_add_one.py:4: AssertionError
===================== short test summary info ======================
FAILED test_add_one.py::test_add_one - assert 2.1 == 2
=================== 1 failed, 0 passed in 0.15s ====================

What makes for a good set of tests?

Each test tests one thing at a time
Tests are short and simple; don’t test too many things
Boundary conditions are tested
All code lines are explored

def max_one(x):
  '''Return the maximum between the argument and one'''
  if x < 1:
    return 1
  else:
    return x

def test_max_one():
  assert max_one(0) == 1  # First code path
  assert max_one(1) == 1  # Boundary condition
  assert max_one(2) == 2  # Second code path

So there’s a bit of an art to writing tests. It can be very tempting to write lots and lots of complicated tests.

Generally it’s found that’s not worth the time. It’s better to have short, simple tests that test one thing at a time.

One important idea is to test the boundary conditions of the function, whatever that means in context. Here’s another example. Another related idea is to make sure that every line of code is run by some test.

Here’s an example. This function just takes the maximum of the argument and 1. So if the argument is less than 1, it returns 1.

The test has three cases: one for less than one, one equal to one, the boundary condition, and one more than one. Therefore all paths in the function are explored, and so is the boundary condition (though it’s trivial in this case).

covariance.py:

import numpy as np

def cov_exponential(x, length_scale):
    '''Given a vector or matrix x, return the covariance matrix based on the
    exponential covariance function'''
    assert length_scale > 0, 'length_scale must be positive'
    return np.exp(-np.linalg.norm(x) / length_scale)

test_covariance.py:

import pytest
import numpy as np

from covariance import cov_exponential

def test_cov_exponential():
    assert cov_exponential([0, 0], 1) == 1.0
    assert cov_exponential([0, 1], 1) == np.exp(-1)
    assert cov_exponential([0, 1], 2) == np.exp(-1 / 2)
    with pytest.raises(AssertionError):
        cov_exponential([0, 0], -1)

Testing in Python and R

The preceding Python examples use the pytest framework [10]. Tests go into files with names starting with “test_”

In R, you can use the testthat package [11]

add_one.R:

add_one <- function(x) {
  x + 1.1
}

test_add_one.R:

source('add_one.R')
context('add_one')

test_that('adding one to one makes two', {
  expect_equal(add_one(1), 2)
})

Running it in an R session:

> testthat::test_file('test_add_one.R')

══ Testing test_add_one.R ═════════════════════════════════════════════════
[ FAIL 1 | WARN 0 | SKIP 0 | PASS 0 ]

── Failure (test_add_one.R:8:3): adding one to one makes two ──────────────
add_one(1) not equal to 2.
1/1 mismatches
[1] 2.1 - 2 == 0.1

[ FAIL 1 | WARN 0 | SKIP 0 | PASS 0 ]

Testing an R package

In R you can also test a package. Here are the tests for one of mine, WoodburyMatrix [12]:

> devtools::test('WoodburyMatrix')

Testing WoodburyMatrix
✓ |  OK F W S | Context
✓ |  89       | WoodburyMatrix [0.7 s]
✓ |   1       | lints [3.9 s]
✓ |   9       | normal distribution

══ Results ═══════════════════════════════════════════════════════════════
Duration: 4.8 s

[ FAIL 0 | WARN 0 | SKIP 0 | PASS 99 ]

Here there are 99 separate checks.

A guide to setting this up is in the R packages book [13].

Testing and version control

You may have seen these badges on repositories on GitHub:

GitHub badges

Tests are automatically run when changes to the software are published. The code coverage metric counts how many lines of code in the software are tested.

This is beyond what most small research projects need, but it’s nice to have!

How to make the most of automated testing

A few thoughts on automated testing:

Automated testing is helpful when your code is already organised into functions (or packages).
If a function is hard to test, it might be too complicated anyway. Break it up into smaller functions.
If you find a bug, write a test for the bug that fails, THEN fix the bug

Other strategies

Version control, such as git [14].
Not reinventing the wheel. If a mature package has a function you need, use that rather than write your own.
Automated checking for common bugs:
- The pylint program can do this for Python [15].
- The lintr package does a little bit of this [5].
To me, most important: Internal peer review:
- Can a coauthor or a friend read your code for you?

Conclusion

Bugs can taint otherwise great research, and peer review (probably) won’t catch them.

Some easy-to-follow strategies can help you avoid some bugs. Practice makes perfect!

Sadly, though, there will always be more bugs. If you find a way to avoid them all, please tell me!

Thanks everyone!

Discussion 🗣️

That’s the end of the content, now this is space for free form discussion on anything related to the topic of good coding practices.

Please write your thoughts on Slack

Or raise your hand in the Zoom chat. ✋

Resources

[1] Navarro, D. (2021). Learning Statistics with R. Published online at https://learningstatisticswithr.com/.
[2] Wickham, H. and Grolemund, G. (2021). R for Data Science. Published online at https://r4ds.had.co.nz/.
[3] Wickham, H. (2021). The tidyverse Style Guide. Published online at https://style.tidyverse.org/.
[4] van Rossum, G., Warsaw, B., and Coghlan, N. PEP 8—Style Guide for Python Code. https://www.python.org/dev/peps/pep-0008/, last accessed 2021-02-16.
[5] Hester, J., Angly, F., and Hyde, R. (2020). lintr: A ‘Linter’ for R Code. R package version 2.0.1. https://CRAN.R-project.org/package=lintr.
[6] Rocholl, J. and Lee, I., and other contributors. pycodestyle—Python Style Guide Checker. https://pypi.org/project/pycodestyle/.
[7] Torvalds, L. (2016). Linus kernel coding style. https://www.kernel.org/doc/html/v4.10/process/coding-style.html#functions
[8] Thomas, D., Hunt, A. (2019). The Pragmatic Programmer: Your Journey to Mastery (20th ed.). Addison-Wesley Professional, Boston, MA.
[9] Wickham, H. (2019). assertthat: Easy Pre and Post Assertions. R package version 0.2.1. https://CRAN.R-project.org/package=assertthat.
[10] Krekel, H., and other contributors (2021). pytest. https://docs.pytest.org/en/stable/.
[11] Wickham, H. (2011). testthat: Get Started with Testing. The R Journal, vol. 3, no. 1, pp. 5–10.
[12] Bertolacci, M. (2020). WoodburyMatrix: Fast Matrix Operations via the Woodbury Matrix Identity. https://CRAN.R-project.org/package=WoodburyMatrix.
[13] Wickham, H., Bryan, J. (2021). R Packages (2nd ed.). Published online at https://r-pkgs.org/index.html.
[14] Hamano, J., Torvalds, L., and other contributors (2021). git. https://git-scm.com/.
[15] Logilab (2021). pylint. https://www.pylint.org/.

Defensive Programming: How to (Help) Shield Your Code From Error

Michael Bertolaccimichael_bertolacci@uow.edu.auCentre for Environmental Informatics, UOW, Australia

2021-07-06

Who I am

Assumed programming knowledge

⚠️ Warning ⚠️

Motivation

Story time 📊 🐞

Story time

Your stories and a poll

Peer review: the crown jewel of science 👑

How big is the blind spot?

Every bug can be traced back to a wrong assumption.

Strategy 1: Make your code easier to understand

The DRY principle

Follow a style guide

R style example

Python style example

Comments

Long functions

Long files

More clean code tips

Discussion on clean code 🗣️

Strategy 2: Defensive programming…

A tale of two errors…

Dead programs tell no lies

Back to the tale of two errors

More defensive programming tips

Discussion on defensive programming 🗣️

Strategy 3: Automated testing

What makes for a good set of tests?

Testing in Python and R

Testing an R package

Testing and version control

How to make the most of automated testing

Other strategies

Conclusion

Discussion 🗣️

Resources

Michael Bertolacci
michael_bertolacci@uow.edu.au
Centre for Environmental Informatics, UOW, Australia