Skip to content

Method summary#852

Closed
henrydingliu wants to merge 17 commits into
mainfrom
method_summary
Closed

Method summary#852
henrydingliu wants to merge 17 commits into
mainfrom
method_summary

Conversation

@henrydingliu
Copy link
Copy Markdown
Collaborator

@henrydingliu henrydingliu commented May 26, 2026

Summary of Changes

adding a summary property/attribute to ibnr methods,

a private function to assemble the exhibit in methodbase
a property in methodbase that points to the function above
an attribute to the predicted triangle (only needed for chainladder and benktander), again using the function above
added summary to the attribute section of the docstring of various methods
added a test across the various methods

Related GitHub Issue(s)

#839

Additional Context for Reviewers

i went through various approaches:

  • directly assembling the underlying values in numpy. didn't want to deal complications with the other backends
  • using concat to bring together various triangles (ldf, ultimate, etc). main issue is that ultimate is not in a diagonal.
  • ultimately used join on outputs of .to_frame(). .to_frame() eliminates rows where the value is null. this is deeply embedded in the implementation of .to_frame() via the sparse conversion. had to resort to join (which is very sparingly used in the codebase) to work around no null rows.

also streamlined some tests while i was in there, so i didn't have to write a bunch of identical tests for each method

  • I passed tests locally for both code (uv run pytest) and documentation changes (uv run jb build docs --builder=custom --custom-builder=doctest)

Note

Medium Risk
Renaming MackChainladder’s summary_ to mack_summary_ is a breaking API change for existing callers; the new join/melt summary logic affects all listed IBNR methods.

Overview
Adds a shared summary_ exhibit on IBNR estimators: a pandas table of latest, ldf, cdf, and ultimate per origin/column, built in MethodBase._get_summary from melted triangle frames and left joins (so ultimates without a reported amount still appear).

predict on Chainladder and Benktander attaches summary_ on the returned triangle; docstrings for Cape Cod, BF, Expected Loss, etc. document the attribute. Benktander now requires sample_weight on predict (raises ValueError if missing).

MackChainladder: the old origin table (Latest / IBNR / Ultimate / Mack Std Err) is renamed to mack_summary_; summary_ is the new generic exhibit. Tests and notebooks/gallery updated accordingly.

Predict tests are parametrized across several estimators to assert summary_ matches latest_diagonal / ultimate_ and enforces sample_weight where required.

Reviewed by Cursor Bugbot for commit 1c3db5c. Bugbot is set up for automated code reviews on this repo. Configure here.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.22%. Comparing base (04e95f0) to head (1c3db5c).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #852      +/-   ##
==========================================
+ Coverage   86.94%   87.22%   +0.28%     
==========================================
  Files          86       86              
  Lines        4994     5097     +103     
  Branches      644      649       +5     
==========================================
+ Hits         4342     4446     +104     
  Misses        462      462              
+ Partials      190      189       -1     
Flag Coverage Δ
unittests 87.22% <100.00%> (+0.28%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread chainladder/methods/base.py Outdated
return self.X_.sum('development')

@property
def summary(self):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be summary_ with the _ to denote that the summary includes estimated values.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

Comment on lines +85 to +88
ldf = X.ldf_.to_frame(implicit_axis=True,keepdims=True).reset_index().melt(id_vars=dev_ids,var_name = 'column',value_name='ldf')
dev_factor_index = X.ldf_.key_labels + ['development','column']
ldf = ldf[dev_factor_index + ['ldf']]
ldf.set_index(dev_factor_index,inplace=True)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the LDFs useful? I think the CDFs are fine.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't have to be useful. we are trying to save the user extra work. manipulating ldf separately is not trivial.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would the user ever need LDFs in the summary?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

easy example, if they want to calculate the expected emergence over the next calendar period

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think it's not needed since in my mind summary_ is a summary, it's the end of the workflow and it should tell me everything I want from the model/method. If I want the expected emergence it shouldn't come out of the summary. But we can keep it if you think it's useful.

cdf = cdf[dev_factor_index + ['cdf']]
cdf.set_index(dev_factor_index,inplace=True)
#assemble full summary. start from ultimate, as some methods (e.g. BF) return an ultimate without any actual amount
output = ultimate.drop(columns=['development','valuation'])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also add the ibnr_ in there.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this actually came up a couple of weeks earlier. bugbot saw my friedland code and made the comment that ult - paid isn't IBNR. i think that's a valid point. ibnr_ isn't guaranteed to be IBNR.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ya that's another thing that bugs me, a better name would be unmerged or something like that. But this will be deprecate and move to a new name.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, unlike ldf, "ibnr" can just be df["ultimate"] - df["latest"], whenever the user wants.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends, because most actuaries care about the "ibnr", or the "unemerged".

I do think we should rename the column from ibnr_ to 'unemerged_or something like that. Similar in the Triangle class,developmentshould bevaluation, because development` are actual ages - but this is a separate topic.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that actuaries care about IBNR. I'm only not including it because I don't have a good name that generalizes between IBNR and unpaid. my reservation with unemerged is that no one uses it.

https://www.casact.org/search?query=unemerged

Comment thread chainladder/methods/base.py Outdated
#columns for melt
ids = X.key_labels + ['origin','development','valuation']
#create dataframe for amount
amount = X.incr_to_cum().latest_diagonal.to_frame(implicit_axis=True,keepdims=True).reset_index().melt(id_vars=ids,var_name = 'column',value_name='actual')
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call "Actual" "Latest" instead? Let's match the MackChainladder method. Here's an example.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

the known data
summary_:
summary of the model
mack_summary_:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the intend here, but I don't think we should change this right away as it will break the workflow. If we want to go this route, we need to throw a warning and deprecate appropriately. I think it's best to just leave as summary_.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i need clearer instructions then. these methods are inherited. i can't make mack have a different name than chainladder. we either go back to summary for this new dataframe, or change the old summary_ from mack to something else.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if i add st err to the summary_ just for mack, essentially merging the old summary_

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya that might be good...

So we have summary_ for everything, and std_err_ with actual numbers from Mack, and NaNs for deterministic models.

@kennethshsu
Copy link
Copy Markdown
Collaborator

@henrydingliu I am good with the summary_ implementation as is, after you respond to my final round of feedback?

I don't like change of mack_summary_, can you consider reverting back?

@henrydingliu henrydingliu marked this pull request as draft May 28, 2026 02:57
@henrydingliu
Copy link
Copy Markdown
Collaborator Author

henrydingliu commented May 29, 2026

closing due to finding this. will rework and resubmit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants