CALCULATION ENGINES BEYOND EXCEL: PART 4

In Part 3 we added some repeatable tests to our project. We did not build a comprehensive testbed but hopefully the examples provided are enough to allow you to build additional tests for the remaining functions as you see fit. Next we will make large strides in scalability by reading rates from source files we define instead of hardcoding them, move the remainder of our projection logic to a function for better execution, and introduce command-line functionality to quickly run different combinations interactively.

By the end of this part, your code should look like this.

OUR PROJECT SO FAR

At this point, your project should look something like the project here. Your local repository may also include a venv directory and a __pycache__ but those are ignored by Git so they do not show up in the stored repository.

Currently, our main script for execution is simple.py and within it the parameters for a specific case are hardcoded and so are the rates. This is not ideal as the user would need to manually edit this file over and over which would be an error-prone process. Ideally, we would a process that accepts our case parameters, gather rates, then return the final result. Given the limited variability in our underlying rates, our case parameters would be gender, risk class, issue age, face amount, and annual premium.

GENERALIZE PROJECTIONS

Let’s create a function at_issue_projection that has these parameters and move our projection logic inside it. Include a docstring as well. Afterwards, our code would look something like this:

simple.py
import functions def at_issue_projection(gender: str, risk_class: str, issue_age: int, face_amount: int, annual_premium: float) -> float: """ Account value rollforward for new policy Parameters ---------- gender: str Gender for policy, "M" or "F" risk_class: str Risk class for policy, "NS" or "SM" issue_age: int Issue age for policy face_amount: int Policy face amount annual_premium: float Annual premium for projection Returns ------- float Value at end of projection """ maturity_age = 121 projection_years = maturity_age - issue_age interest_rate = 0.03 ... end_value = 0 for i in range(12*projection_years): policy_year = calculate_policy_year(i+1) ... end_value = calculate_end_value(value_for_interest, interest) return end_value if __name__ == '__main__': end_value = at_issue_projection("M", "NS", 35, 100000, 1255.03) print(end_value)

If you run your script you should still get the same result as before. Right now the gender and risk class parameters do not do much but we will need them to select rates. The issue age only controls the length of the projection at this point as well. We will want to modify our code so that the interest rate, coi rate, and other product rates are determined at runtime as needed. To achieve that, we will start by storing our product rates in csv rate files and retrieve them as part of processing.

RATE FILES

Before we dive in, we should define what our code will expect so we can build around that. We can think of our product rates in this way:

  • Premium load rate: does not vary

  • Policy fee: does not vary

  • NAAR discount rate: does not vary

  • Interest rate: does not vary

  • Unit load rate: varies by issue age and policy year

  • COI rate: varies by gender, risk class, issue age, and policy year

For this project, we will utilize a tall rate structure where our files will have a single column for the rates and additional columns that define the applicable variabilities. Let’s also assume that our program will always know what the underlying shape of the data will be (i.e., when we ask for the premium load it will already know that the rate does not vary). We will not require complete tables (e.g., all combinations present) and whatever tries to retrieve rates will provide a default value (presumably 0) if it is not found. With this in mind, we can define three expected structures.

  • Flat: has a single column with header “Rate” and a single row of data

  • IA and PY: has “Issue_Age” and “Policy_Year” columns for identification and “Rate”, has many rows of data

  • Gen and RC and IA and PY: has “Gender”, “Risk_Class”, “Issue_Age”, and Policy_Year” columns for identification and “Rate”, has many rows of data

And our first four rate files might looks something like this:

Rate xxxx.xx

Our unit load rate would look like this:

Issue_Age, Policy_Year, Rate aa, bb, cc.cccc dd, ee, ff.ffff ...

And our COI rates would have another two columns: Gender and Risk_Class.

For this project we will actually store the data with the code, albeit in a separate folder. Ideally we would not litter our codebase with rate files, but that will make things simpler for demonstration purposes.

Within your top-level illustrator directory create another folder called data. We will create our rate files and store them here. As a reminder, our rates should be the following:

Item File name Rate description
Premium load premium_load.csv 6% of premium
Per policy fee policy_fee.csv $120 annually (charged monthly)
NAAR discount rate naar_discount.csv Uses 1% (e.g., 1.01^(-1/12))
Interest rate interest_rate.csv 3%
Unit load unit_load.csv (Issue Age / 10) per $1,000 for 10 years then zero (e.g., 35-year-old has $3.5 per $1,000)
COI rates coi.csv 2015 VBT RR100 S&U ALB gender and smoker distinct

Feel free copy the rate files in this repository if you want to quickly move forward and not generate them yourself.

Now, let’s get to work on creating the code to retrieve the files.

In the illustrator directory with simple.py add another file data_functions.py and add the following code:

data_functions.py
import csv def get_premium_load() -> float: path = './data/premium_load.csv' rate = 0 with open(path, newline='') as f: reader = csv.DictReader(f) for row in reader: rate = float(row["Rate"]) break return rate

This function will initialize our rate output, open the file in the specified location, read it row by row into a dictionary using the headers as keys, and output the result.

Let’s quickly test to make sure this behaves as expected. Open a terminal, launch the Python REPL and run the function as shown below to make sure you get the same result.

>>> from illustrator.data_functions import get_premium_load >>> get_premium_load() 0.06

If we were to consider how this might change for the policy fee, interest rate, or NAAR discount files the only thing that would change is the path that is opened and the initialized rate. Generalizing the functionality we rename our function to read_flat_csv and allow a path and default parameters.

data_functions.py
import csv def read_flat_csv(path: str, default: float = 0.0) -> float: """ Retrieve rate from csv assuming "Flat" rate structure Parameters ---------- path: str Path to the csv Returns ------- float Rate from file """ rate = default with open(path, newline='') as f: reader = csv.DictReader(f) for row in reader: rate = float(row["Rate"]) break return rate

We can quickly test our new function interactively using the REPL in a terminal like we did the premium load:

>>> from illustrator.data_functions import read_flat_csv >>> read_flat_csv('./data/policy_fee.csv') 120.0

Awesome. We can use these concepts to build similar functions for reading the other structures, however those will need to return multiple rates. For this demonstration we are choosing the return them in a list where the first element (index 0) is for policy year 1, the next element is for policy year 2, and so on. We should not need more than 120 years of data so we will plan to return a list with 120 elements. Add the following code to data_functions.py after the read_flat_csv function.

data_functions.py
... def read_ia_py_csv(path: str, issue_age: int, default: float = 0.0) -> list[float]: rates = [default for _ in range(120)] with open(path, newline='') as f: reader = csv.DictReader(f) for row in reader: if row['Issue_Age'] == str(issue_age): policy_year = int(row['Policy_Year']) rate = float(row["Rate"]) rates[policy_year-1] = rate return rates

This returns rates which by default will be a list of 120 zeroes unless data is found in the specified file. When the file is being read line by line we check to see if the line corresponds to the provided issue age and if it does we retrieve the rate and put it in the output list in the applicable spot. CAUTION: THIS LOGIC DOES NOT WARN / ERROR IF THE SAME ISSUE AGE AND POLICY YEAR COMBINATION EXISTS MULTIPLE TIMES.

A similar function should be added after the above for the COI rate file:

data_functions.py
... def read_gen_rc_ia_py_csv(path: str, gender: str, risk_class: str, issue_age: int, default: float = 0.0) -> list[float]: rates = [default for _ in range(120)] with open(path, newline='') as f: reader = csv.DictReader(f) for row in reader: if row['Gender'] == gender and row['Risk_Class'] == risk_class and row['Issue_Age'] == str(issue_age): policy_year = int(row['Policy_Year']) rate = float(row["Rate"]) rates[policy_year-1] = rate return rates

Looking back to our function for reading the flat file, it might be a good idea to also return a list of 120 elements following the same structure. Let’s update our first function for consistency and the signature for the functions using it while adding the functions for unit loads and coi rates:

data_functions.py
... def read_flat_csv(path: str, default: float = 0.0) -> list[float]: """ Retrieve rate from csv assuming "Flat" rate structure Parameters ---------- path: str Path to the csv default: float Optional parameter for default return value if file empty Returns ------- list[float] Rates by policy year (index 0 = year 1); all values of list should be identical """ rate = default with open(path, newline='') as f: reader = csv.DictReader(f) for row in reader: rate = float(row["Rate"]) break rates = [rate for _ in range(120)] return rates ...

Now we have functions that will read the requisite rate files and return a list of rates by policy year. Go update simple.py to use the new functionality instead of the hardcoded rates.

BETTER COMMAND-LINE SUPPORT

While our project now reads rate files on the fly, we would still need to modify the code each time we want to run a different case. Let’s change that using the argparse module. Here is how we can update simple.py to make things a lot more flexible.

simple.py
import argparse ... if __name__ == '__main__': parser = argparse.ArgumentParser(description="Arguments for simple.py") parser.add_argument("-g", "--gender", default="M", choices=["M","F"], help="The gender for projection, default is M for male") parser.add_argument("-r", "--risk_class", default="NS", choices=["NS","SM"], help="The risk class for the projection, default is NS") parser.add_argument("-i", "--issue_age", default=35, type=int, choices=range(18,81), help="The issue age for the projection, default is 35") parser.add_argument("-f", "--face_amount", default=100000, type=int, help="The face amount for the projection, default is 100,000") parser.add_argument("-p", "--premium", default=1255.03, type=float, help="The annual premium for the projection, default is 1,255.03") args = parser.parse_args() end_value = at_issue_projection(args.gender, args.risk_class, args.issue_age, args.face_amount, args.premium) print(end_value)

We added optional command-line arguments that default to our original case and restricted choices for some of the inputs. We added help text so users can easily get help in the command-line when adding a “-h” to the execution:

(venv) .../illustrator> python illustrator/simple.py -h usage: simple.py [-h] [-g {M,F}] [-r {NS, SM}] [-i {18, 19, ..., 80}] [-f FACE_AMOUNT] [-p PREMIUM] Arguments for simple.py options: -h, --help show this help message and exit -g, --gender {M, F} The gender for projection, default is M for male -r, --risk_class {NS, SM} The risk class for the projection, default is NS -i, --issue_age {18, 19, ..., 80} The issue age for the projection, default is 35 -f, --face_amount FACE_AMOUNT The face amount for the projection, default is 100,000 -p, --premium PREMIUM The annual premium for the projection, default is 1,255.03

Now we can quickly change inputs without changing the underlying code. How? We add the additional command-line arguments when we run the script. While we know our current case is the defaults, we could run it explicitly this way:

(venv) .../illustrator> python illustrator/simple.py -g M -r NS -i 35 -f 100000 -p 1255.03 132184.0426761172

Running other cases is as simple as changing one of the input parameters. Support you wanted to run female “F” instead? It would look like this:

(venv) .../illustrator> python illustrator/simple.py -g F -r NS -i 35 -f 100000 -p 1255.03 321494.3318263291

TESTS

Let’s add two new files to our tests folder: test_data_functions.py and test_simple.py.

In the code below for testing the data functions we provide an example of how to generate temporary data for testing and use that to test the functions. We also include a test for confirming the policy fee retrieval function.

test_data_functions.py
from tempfile import TemporaryDirectory import unittest from illustrator import data_functions class Test_Data_Functions(unittest.TestCase): def test_read_flat_csv(self): # add a temporary directory temp_dir = self.enterContext(TemporaryDirectory(dir='.')) # create a temporary file and write dummy data to it temp_file_name = temp_dir + '/temp.csv' with open(temp_file_name, mode='w+') as temp_file: temp_file.writelines(['Rate\n','0.123']) # confirm the _read_flat_csv function returns a list of length 120 with desired data self.assertEqual(data_functions.read_flat_csv(temp_file_name), [0.123]*120) def test_read_flat_csv_incorrect_data_fail(self): temp_dir = self.enterContext(TemporaryDirectory(dir='.')) temp_file_name = temp_dir + '/temp.csv' with open(temp_file_name, mode='w+') as temp_file: temp_file.writelines(['Rate\n','0.123']) self.assertNotEqual(data_functions.read_flat_csv(temp_file_name), [0.0]*120) def test_get_policy_fee(self): self.assertEqual(data_functions.read_flat_csv("./data/policy_fee.csv"),[120]*120)

You should be able to run all tests and see that they pass.

For a test on simple.py we test our newly isolated projection function. We set up our current default use case and make sure the ending value matches our expectations. Now as we continue to make changes we should be able to not only check very small and specific pieces of the code but also that the whole process appears to be working as intended.

test_simple.py
import unittest from illustrator import simple class Test_Simple(unittest.TestCase): def test_at_issue_projection(self): gender = 'M' risk_class = 'NS' issue_age = 35 face_amount = 100000 annual_premium = 1255.03 self.assertAlmostEqual(simple.at_issue_projection(gender, risk_class, issue_age, face_amount, annual_premium), 132184.0427, places=4)

After creating this file, try running the tests again. You will likely see the following error or something fairly close:

..............E ====================================================================== ERROR: tests.test_simple (unittest.loader._FailedTest.tests.test_simple) ---------------------------------------------------------------------- ImportError: Failed to import test module: tests.test_simple Traceback (most recent call last): File "...\unittest\loader.py", line 396, in _find_test_path module = self._get_module_from_name(name) File "...\unittest\loader.py", line 339, in _get_module_from_name __import__(name) ~~~~~~~~~~^^^^^^ File "...\tests\test_simple.py", line 3, in from illustrator import simple File "...\illustrator\simple.py", line 24, in import data_functions as df ModuleNotFoundError: No module named 'data_functions'

The traceback information is a bit complicated but reading bottom help we see that the ‘data_functions’ module could not be found when trying to import in the simple.py file. This seems strange because when we execute our code normally things appear to work.

Without getting into the details of how Python imports work (there are plenty of materials out there that go over this) let’s focus on the solution. We are not giving the Python interpreter the best of instructions with our import statements and how we are executing our code. We need to make two changes in simple.py:

simple.py
... import illustrator.data_functions as df import illustrator.functions as functions ...

Now if we try to run our tests, we should see successful completion:

(venv) .../illustrator> python -m unittest ................ ---------------------------------------------------- Ran 16 tests in 0.026s OK

And when actually running the code we need to tweak the command a little bit too:

(venv) .../illustrator> python -m illustrator.simple 132184.0426761172

Running simple.py as a module (-m) with the explicit imports allows us to execute our code AND leverage unittest the way we have. We intentionally manufactured this error to emerge because it is something you may experience as your codebase grows in complexity for your own projects. We could have avoided this by always using explicit imports and running simple.py as a module (i.e. with -m). There is no change to the additional command-line arguments we added, those can still be used the same as before.

Update your README.md to reflect the new execution instructions and some information on the command-line arguments as well.

Don’t forget to save your files, commit to your Git repo, and push to your online repository!

If you are having difficulty with the code please refer to the v0.4 tagged version of this repository.

Have questions or feedback? Reach out to let us know!

Previous
Previous

CALCULATION ENGINES BEYOND EXCEL: PART 5

Next
Next

CALCULATION ENGINES BEYOND EXCEL: PART 3