CALCULATION ENGINES BEYOND EXCEL: PART 4
In Part 3 we added some repeatable tests to our project. We did not build a comprehensive testbed but hopefully the examples provided are enough to allow you to build additional tests for the remaining functions as you see fit. Next we will make large strides in scalability by reading rates from source files we define instead of hardcoding them, move the remainder of our projection logic to a function for better execution, and introduce command-line functionality to quickly run different combinations interactively.
By the end of this part, your code should look like this.
OUR PROJECT SO FAR
At this point, your project should look something like the project here. Your local repository may also include a venv directory and a __pycache__ but those are ignored by Git so they do not show up in the stored repository.
Currently, our main script for execution is simple.py and within it the parameters for a specific case are hardcoded and so are the rates. This is not ideal as the user would need to manually edit this file over and over which would be an error-prone process. Ideally, we would a process that accepts our case parameters, gather rates, then return the final result. Given the limited variability in our underlying rates, our case parameters would be gender, risk class, issue age, face amount, and annual premium.
GENERALIZE PROJECTIONS
Let’s create a function at_issue_projection that has these parameters and move our projection logic inside it. Include a docstring as well. Afterwards, our code would look something like this:
import functions
def at_issue_projection(gender: str, risk_class: str, issue_age: int, face_amount: int, annual_premium: float) -> float:
"""
Account value rollforward for new policy
Parameters
----------
gender: str
Gender for policy, "M" or "F"
risk_class: str
Risk class for policy, "NS" or "SM"
issue_age: int
Issue age for policy
face_amount: int
Policy face amount
annual_premium: float
Annual premium for projection
Returns
-------
float
Value at end of projection
"""
maturity_age = 121
projection_years = maturity_age - issue_age
interest_rate = 0.03
...
end_value = 0
for i in range(12*projection_years):
policy_year = calculate_policy_year(i+1)
...
end_value = calculate_end_value(value_for_interest, interest)
return end_value
if __name__ == '__main__':
end_value = at_issue_projection("M", "NS", 35, 100000, 1255.03)
print(end_value)
If you run your script you should still get the same result as before. Right now the gender and risk class parameters do not do much but we will need them to select rates. The issue age only controls the length of the projection at this point as well. We will want to modify our code so that the interest rate, coi rate, and other product rates are determined at runtime as needed. To achieve that, we will start by storing our product rates in csv rate files and retrieve them as part of processing.
RATE FILES
Before we dive in, we should define what our code will expect so we can build around that. We can think of our product rates in this way:
Premium load rate: does not vary
Policy fee: does not vary
NAAR discount rate: does not vary
Interest rate: does not vary
Unit load rate: varies by issue age and policy year
COI rate: varies by gender, risk class, issue age, and policy year
For this project, we will utilize a tall rate structure where our files will have a single column for the rates and additional columns that define the applicable variabilities. Let’s also assume that our program will always know what the underlying shape of the data will be (i.e., when we ask for the premium load it will already know that the rate does not vary). We will not require complete tables (e.g., all combinations present) and whatever tries to retrieve rates will provide a default value (presumably 0) if it is not found. With this in mind, we can define three expected structures.
Flat: has a single column with header “Rate” and a single row of data
IA and PY: has “Issue_Age” and “Policy_Year” columns for identification and “Rate”, has many rows of data
Gen and RC and IA and PY: has “Gender”, “Risk_Class”, “Issue_Age”, and “Policy_Year” columns for identification and “Rate”, has many rows of data
And our first four rate files might looks something like this:
Rate
xxxx.xx
Our unit load rate would look like this:
Issue_Age, Policy_Year, Rate
aa, bb, cc.cccc
dd, ee, ff.ffff
...
And our COI rates would have another two columns: Gender and Risk_Class.
For this project we will actually store the data with the code, albeit in a separate folder. Ideally we would not litter our codebase with rate files, but that will make things simpler for demonstration purposes.
Within your top-level illustrator directory create another folder called data. We will create our rate files and store them here. As a reminder, our rates should be the following:
Item | File name | Rate description |
---|---|---|
Premium load | premium_load.csv | 6% of premium |
Per policy fee | policy_fee.csv | $120 annually (charged monthly) |
NAAR discount rate | naar_discount.csv | Uses 1% (e.g., 1.01^(-1/12)) |
Interest rate | interest_rate.csv | 3% |
Unit load | unit_load.csv | (Issue Age / 10) per $1,000 for 10 years then zero (e.g., 35-year-old has $3.5 per $1,000) |
COI rates | coi.csv | 2015 VBT RR100 S&U ALB gender and smoker distinct |
Feel free copy the rate files in this repository if you want to quickly move forward and not generate them yourself.
Now, let’s get to work on creating the code to retrieve the files.
In the illustrator directory with simple.py add another file data_functions.py and add the following code:
import csv
def get_premium_load() -> float:
path = './data/premium_load.csv'
rate = 0
with open(path, newline='') as f:
reader = csv.DictReader(f)
for row in reader:
rate = float(row["Rate"])
break
return rate
This function will initialize our rate output, open the file in the specified location, read it row by row into a dictionary using the headers as keys, and output the result.
Let’s quickly test to make sure this behaves as expected. Open a terminal, launch the Python REPL and run the function as shown below to make sure you get the same result.
>>> from illustrator.data_functions import get_premium_load
>>> get_premium_load()
0.06
If we were to consider how this might change for the policy fee, interest rate, or NAAR discount files the only thing that would change is the path that is opened and the initialized rate. Generalizing the functionality we rename our function to read_flat_csv and allow a path and default parameters.
import csv
def read_flat_csv(path: str, default: float = 0.0) -> float:
"""
Retrieve rate from csv assuming "Flat" rate structure
Parameters
----------
path: str
Path to the csv
Returns
-------
float
Rate from file
"""
rate = default
with open(path, newline='') as f:
reader = csv.DictReader(f)
for row in reader:
rate = float(row["Rate"])
break
return rate
We can quickly test our new function interactively using the REPL in a terminal like we did the premium load:
>>> from illustrator.data_functions import read_flat_csv
>>> read_flat_csv('./data/policy_fee.csv')
120.0
Awesome. We can use these concepts to build similar functions for reading the other structures, however those will need to return multiple rates. For this demonstration we are choosing the return them in a list where the first element (index 0) is for policy year 1, the next element is for policy year 2, and so on. We should not need more than 120 years of data so we will plan to return a list with 120 elements. Add the following code to data_functions.py after the read_flat_csv function.
...
def read_ia_py_csv(path: str, issue_age: int, default: float = 0.0) -> list[float]:
rates = [default for _ in range(120)]
with open(path, newline='') as f:
reader = csv.DictReader(f)
for row in reader:
if row['Issue_Age'] == str(issue_age):
policy_year = int(row['Policy_Year'])
rate = float(row["Rate"])
rates[policy_year-1] = rate
return rates
This returns rates which by default will be a list of 120 zeroes unless data is found in the specified file. When the file is being read line by line we check to see if the line corresponds to the provided issue age and if it does we retrieve the rate and put it in the output list in the applicable spot. CAUTION: THIS LOGIC DOES NOT WARN / ERROR IF THE SAME ISSUE AGE AND POLICY YEAR COMBINATION EXISTS MULTIPLE TIMES.
A similar function should be added after the above for the COI rate file:
...
def read_gen_rc_ia_py_csv(path: str, gender: str, risk_class: str, issue_age: int, default: float = 0.0) -> list[float]:
rates = [default for _ in range(120)]
with open(path, newline='') as f:
reader = csv.DictReader(f)
for row in reader:
if row['Gender'] == gender and row['Risk_Class'] == risk_class and row['Issue_Age'] == str(issue_age):
policy_year = int(row['Policy_Year'])
rate = float(row["Rate"])
rates[policy_year-1] = rate
return rates
Looking back to our function for reading the flat file, it might be a good idea to also return a list of 120 elements following the same structure. Let’s update our first function for consistency and the signature for the functions using it while adding the functions for unit loads and coi rates:
...
def read_flat_csv(path: str, default: float = 0.0) -> list[float]:
"""
Retrieve rate from csv assuming "Flat" rate structure
Parameters
----------
path: str
Path to the csv
default: float
Optional parameter for default return value if file empty
Returns
-------
list[float]
Rates by policy year (index 0 = year 1); all values of list should be identical
"""
rate = default
with open(path, newline='') as f:
reader = csv.DictReader(f)
for row in reader:
rate = float(row["Rate"])
break
rates = [rate for _ in range(120)]
return rates
...
Now we have functions that will read the requisite rate files and return a list of rates by policy year. Go update simple.py to use the new functionality instead of the hardcoded rates.
BETTER COMMAND-LINE SUPPORT
While our project now reads rate files on the fly, we would still need to modify the code each time we want to run a different case. Let’s change that using the argparse module. Here is how we can update simple.py to make things a lot more flexible.
import argparse
...
if __name__ == '__main__':
parser = argparse.ArgumentParser(description="Arguments for simple.py")
parser.add_argument("-g", "--gender", default="M", choices=["M","F"], help="The gender for projection, default is M for male")
parser.add_argument("-r", "--risk_class", default="NS", choices=["NS","SM"], help="The risk class for the projection, default is NS")
parser.add_argument("-i", "--issue_age", default=35, type=int, choices=range(18,81), help="The issue age for the projection, default is 35")
parser.add_argument("-f", "--face_amount", default=100000, type=int, help="The face amount for the projection, default is 100,000")
parser.add_argument("-p", "--premium", default=1255.03, type=float, help="The annual premium for the projection, default is 1,255.03")
args = parser.parse_args()
end_value = at_issue_projection(args.gender, args.risk_class, args.issue_age, args.face_amount, args.premium)
print(end_value)
We added optional command-line arguments that default to our original case and restricted choices for some of the inputs. We added help text so users can easily get help in the command-line when adding a “-h” to the execution:
(venv) .../illustrator> python illustrator/simple.py -h
usage: simple.py [-h] [-g {M,F}] [-r {NS, SM}]
[-i {18, 19, ..., 80}]
[-f FACE_AMOUNT] [-p PREMIUM]
Arguments for simple.py
options:
-h, --help show this help message and exit
-g, --gender {M, F} The gender for projection, default is M for male
-r, --risk_class {NS, SM}
The risk class for the projection, default is NS
-i, --issue_age {18, 19, ..., 80}
The issue age for the projection, default is 35
-f, --face_amount FACE_AMOUNT
The face amount for the projection, default is 100,000
-p, --premium PREMIUM The annual premium for the projection, default is 1,255.03
Now we can quickly change inputs without changing the underlying code. How? We add the additional command-line arguments when we run the script. While we know our current case is the defaults, we could run it explicitly this way:
(venv) .../illustrator> python illustrator/simple.py -g M -r NS -i 35 -f 100000 -p 1255.03
132184.0426761172
Running other cases is as simple as changing one of the input parameters. Support you wanted to run female “F” instead? It would look like this:
(venv) .../illustrator> python illustrator/simple.py -g F -r NS -i 35 -f 100000 -p 1255.03
321494.3318263291
TESTS
Let’s add two new files to our tests folder: test_data_functions.py and test_simple.py.
In the code below for testing the data functions we provide an example of how to generate temporary data for testing and use that to test the functions. We also include a test for confirming the policy fee retrieval function.
from tempfile import TemporaryDirectory
import unittest
from illustrator import data_functions
class Test_Data_Functions(unittest.TestCase):
def test_read_flat_csv(self):
# add a temporary directory
temp_dir = self.enterContext(TemporaryDirectory(dir='.'))
# create a temporary file and write dummy data to it
temp_file_name = temp_dir + '/temp.csv'
with open(temp_file_name, mode='w+') as temp_file:
temp_file.writelines(['Rate\n','0.123'])
# confirm the _read_flat_csv function returns a list of length 120 with desired data
self.assertEqual(data_functions.read_flat_csv(temp_file_name), [0.123]*120)
def test_read_flat_csv_incorrect_data_fail(self):
temp_dir = self.enterContext(TemporaryDirectory(dir='.'))
temp_file_name = temp_dir + '/temp.csv'
with open(temp_file_name, mode='w+') as temp_file:
temp_file.writelines(['Rate\n','0.123'])
self.assertNotEqual(data_functions.read_flat_csv(temp_file_name), [0.0]*120)
def test_get_policy_fee(self):
self.assertEqual(data_functions.read_flat_csv("./data/policy_fee.csv"),[120]*120)
You should be able to run all tests and see that they pass.
For a test on simple.py we test our newly isolated projection function. We set up our current default use case and make sure the ending value matches our expectations. Now as we continue to make changes we should be able to not only check very small and specific pieces of the code but also that the whole process appears to be working as intended.
import unittest
from illustrator import simple
class Test_Simple(unittest.TestCase):
def test_at_issue_projection(self):
gender = 'M'
risk_class = 'NS'
issue_age = 35
face_amount = 100000
annual_premium = 1255.03
self.assertAlmostEqual(simple.at_issue_projection(gender, risk_class, issue_age, face_amount, annual_premium), 132184.0427, places=4)
After creating this file, try running the tests again. You will likely see the following error or something fairly close:
..............E
======================================================================
ERROR: tests.test_simple (unittest.loader._FailedTest.tests.test_simple)
----------------------------------------------------------------------
ImportError: Failed to import test module: tests.test_simple
Traceback (most recent call last):
File "...\unittest\loader.py", line 396, in _find_test_path
module = self._get_module_from_name(name)
File "...\unittest\loader.py", line 339, in _get_module_from_name
__import__(name)
~~~~~~~~~~^^^^^^
File "...\tests\test_simple.py", line 3, in
from illustrator import simple
File "...\illustrator\simple.py", line 24, in
import data_functions as df
ModuleNotFoundError: No module named 'data_functions'
The traceback information is a bit complicated but reading bottom help we see that the ‘data_functions’ module could not be found when trying to import in the simple.py file. This seems strange because when we execute our code normally things appear to work.
Without getting into the details of how Python imports work (there are plenty of materials out there that go over this) let’s focus on the solution. We are not giving the Python interpreter the best of instructions with our import statements and how we are executing our code. We need to make two changes in simple.py:
...
import illustrator.data_functions as df
import illustrator.functions as functions
...
Now if we try to run our tests, we should see successful completion:
(venv) .../illustrator> python -m unittest
................
----------------------------------------------------
Ran 16 tests in 0.026s
OK
And when actually running the code we need to tweak the command a little bit too:
(venv) .../illustrator> python -m illustrator.simple
132184.0426761172
Running simple.py as a module (-m) with the explicit imports allows us to execute our code AND leverage unittest the way we have. We intentionally manufactured this error to emerge because it is something you may experience as your codebase grows in complexity for your own projects. We could have avoided this by always using explicit imports and running simple.py as a module (i.e. with -m). There is no change to the additional command-line arguments we added, those can still be used the same as before.
Update your README.md to reflect the new execution instructions and some information on the command-line arguments as well.
Don’t forget to save your files, commit to your Git repo, and push to your online repository!
If you are having difficulty with the code please refer to the v0.4 tagged version of this repository.
Have questions or feedback? Reach out to let us know!