As somebody who worked as an accountant, I saw several times where automated reconciliation solutions devastated books with repeat mistakes and lacking audit logs. I saw interns do the same, too.
An automated solution to reconcile statements based on LLM matches removes transparency on how your books are prepared and might create a false sense of trust in the preparation of your books. In case of an audit, people will be in great trouble when their answer is ‘yeah, the AI booked everything.’
I think there is an opportunity here, but I don’t see it ending well until we can put the accountability of your accounts in check.
Sean here, author and EM of the recon team at Modern Treasury. I completely agree, we us AI to surface suggestions to users during manual review so there's full transparency and human confirmation at every step. Our automatic recon doesn't use LLMs for the reasons you outlined.
From a human-computer interaction standpoint you do raise a fair point.
Even if in reality the AI misses _considerably more_ than that, there's a good argument to be made that such hints may steer the human from their initial, correct assumptions, or even reasoning.
I agree with you. I feel that when faced with books with large amounts of transactions that have similar dollar amounts, you simply need to take another step in terms of reconciliation to make sure you are matching the correct transaction to assure the other side of the transaction versus matching the bank balance. You can't just use machine learning to try and match these transactions, but need a specific process to match them.
I think this is a niche issue related to accounts receivable potential errors as most companies I work with don't have the problem of constant repeating amount transactions.
I'm head of processing development at PayProp[^1] where we've been automatically reconciling rental payments for two decades using the techniques described in the article - we just don't call it an AI or a LLM. Our tech saves letting agents huge amounts of time.
We look at the data we have and if it's sufficient we can "automatically" reconcile it - i.e. suggest a match with 100% certainty that the user(s) can then confirm. Otherwise we make an informed suggestions based on all of the likely data from the transaction(s) and sometimes the suggestions are a list of possible matches.
IME the biggest problems in recon are the edge cases around failures in the banking system or the flow that are very difficult to code around and require manual intervention:
* failures of payments X days after they have been reconciled, now you have to pull things apart again
* bank reverses transactions but then puts them back and this appears in their intra day statements (MT942 files for example) but doesn't show on their online portals, leading to "duplicates" in one system that aren't really
* statement and reference data is incomplete or just wrong (who knew that free text fields can be problematic?)
* amounts simply don't match because you invoiced for X and were paid Y - payments are split up to get around constraints, amounts are rounded up, etc.
We deal with these every single day, and we are automating what we can - but you're always going to need a human to confirm the final step in these cases. Perhaps an LLM can improve suggestions, but when the data is just wrong or missing then I'm not so sure.
Reconciliation in finance is taking your records of sales and matching them to your bank records. This identifies sales that didn't result in bank deposits, bank deposits that didn't come from sales, situations where you were paid too much, and situations where you were paid too little. These all happen, disturbingly frequently, so reconciliation is a necessary admin burden in businesses. It's not a critical differentiator for retailers so it's ripe for automation.
Similarly, matching invoices to purchase orders and authorising payments. This catches fraud and avoids paying for goods you didn't receive ... but it's another necessary evil rather than a value-adding differentiator for the business. So companies exist to take your PDF invoices and your ERP's "we recorded that we got x, y, z" and match them up and authorise payment for unexceptional invoices (we wanted 5, you said you sent 5, we said we got 5, let's pay you for 5).
I profoundly disagree that ML models are a good fit for transaction reconciliation.
It’s at least arguable that this task is the oldest documented use of writing, and from double-entry accounting to price/time precedence in modern market microstructure, we have algorithms that align very well with human intuition.
I can think of few cases where gratuitous application of even simple statistical methods would cause more harm than this one.
With all respect, the conversation becomes stupider with every post like this one.
I have 15 debtors, a few dozen creditors, and 5 employee credit cards. There’s enough transactions that for the 3 years I’ve been reconciling the accounts I have wanted to either write an if/elseif/else-based reconciler assistant, or hire someone, or pay my accountant to do the job.
A few weeks ago, I decided, what the hell, and I spent two days writing a ChatGPT-powered reconciler assistant.
It’s so damn accurate. By feeding it relevant examples, it suggests the right journal entry for each bank transaction nearly every time, including saying “no matching entry” for when the corresponding journal entry hasn’t been posted yet.
It would have taken me a lot longer to write an if/elseif/else-based reconciler, and it have required a lot of manual attention… and the constant internal debate of whether the rules are code that should go in Git or data that should go in the DB.
I think ML models are a great fit for transaction reconciliation because they give good-enough results really fast at a reasonable price. I’d prefer that over continuing to spend my own time, or having to learn the more advanced algorithms you mentioned.
You can still reconcile the wrong things and have the books balance. e.g. you sell a subscription for $1. User A paid on the 1st, user B paid on the 3rd. You mistakenly reconcile user A's payment to user B's bill and vice versa. Your books still balance, but you better hope you didn't charge user A any late fees.
Would you be willing to share your approach? I'd love to see it. Disclosure: I'm in the early stages of an open-source ERP: https://github.com/barbinbrad/carbon
This brings me right back to June of 2009, as the markets started to spoil and the derivative markets fells to shambles. A few hapless souls in Baltimore would be tasked with manning 2 standing 12 hour shifts for as many weeks as it would take to properly reconcile what could only be described as a tsunami of unreconciled data for a single large custody player in the sub-prime markets. It took about 4 months, and went through several phases of rebalance but we got it done. As I look across the landscape of LLM, the word "context" screams back at me. I think the processing power involved to make all the poor choices necessary to find the right choices and wonder how that could've been done without a teams of people thinking about the same problem from many perspectives.
I wish I had better skills at searching academic papers for problems I'm trying to solve or that I'm just thinking about. I think just as there are some people that google better than others, I imagine a similar skill applies to academic papers. Anyone encounter this? How do I get better at it?
I thought at first it was an accessibility problem, and perhaps it still is. In that, I didn't have access to a library of academic papers. But, arxiv.org does make available a lot of content for free. The content seems to be growing too.
Another question I'm exploring is how do I decide which journals to subscribe to. I have a limited budget so have to pick wisely. What makes things difficult is that the papers that I have found interesting in the past, seemingly in a related field, are still published to various journals.
One more random comment. I really can't wait until LLMs are applied towards academic papers. Academic papers build on-top of each other and there are concepts that are considered "common knowledge" to experts and may require a long history of papers to consume to build a foundation of concepts and vocabulary. The difficulty is that recursively these papers introduce the same problem. A lot of times the concepts are not that difficult and it would be wonderful if an LLM could be used to fill the gaps as if I were talking to a expert.
I guess there are sort of expository papers that act as a checkpoint for a particular topic. I'm not sure how to find these.
Unfortunately this only comes with practice and it only applies to a specific domain. A physicist cannot easily weed out computer science literature and vice versa.
In fact, there are identical problems that are solved by different communities and you would not know because they use completely different lingo. Math optimization/dynamic programming/reinforcement learning is one of these.
Many of the accomplished scientists just read papers from other domains and adapt them to their own domain making huge progress.
So yes I see tremendous value to what you describe. A Google translate for academic work that can translate between domain specific lingos and common language.
just use google scholar to start (which pulls from almost all journals + arxiv), and consider searching through citations as well. you can search for survey papers and look at their bibliographies too.
do NOT subscribe to any academic journals. not worth it for an individual. find a way to get access for free, such as through a local library or institution, or by other means. also note that often google scholar gives a link to a PDF over on the right, or in alternate versions of the article.
there are services like perplexity.ai that can search arxiv, pull articles, and feed them through an LLM for you -- it's pretty much what you want. some of the LLM chat interfaces let you upload PDFs too. none of this actually works that well yet but sometimes useful.
I'm extremely skeptical of LLMs solving problems outside of "reproduce text that a human wrote in the past" problems. Which to be fair a lot of problems can be surprisingly reduced to, but still I'm much more skeptical that it seems most HNers are.
That said:
> lol at ai for solving deterministic knapsacks.
> Just get yourself a solver.
I don't think these are necessarily in conflict. "write me some Z3 code to solve this knapsack problem, then run it and tell me what the output means" seems like it might actually be in the right realm. The LLM isn't doing the solving, which makes sense because I agree there's no mechanism by which an LLM would be better at it than a solver, but as a UX to the solver it seems like it'd do okay. That's genuinely value added, I don't expect most accounts or even programmers to be familiar with Z3.
I don't understand the problem this solves. At least in my experience each transaction on the bank statement has a reference to a business transaction attached (usually an invoice number). The amount of money that just lands on the account without a reference is negligible in comparison and usually easily manually associated.
Often banks will batch up transactions into a single one, and especially in a real-time market, that may not match what you expect.
For example lets say I ask you to sell 10 shares of Google if the price goes over $140 (this is typically called a limit order). Now your bank comes back and says the sold 2 shares at $140.02, 7 shares at $140.03, and 1 share at $139.77. Did they satisfy their obligation?
The answer is yes, but it's difficult to determine that, and you can't use exact math to do it easily. You expected $1400 from that sale, but you got $1400.02. Now do it again, but you have half a dozen orders at different prices. That's where it turns into the knapsack problem.
The problem is severely compounded when you look at why you're reconciling (it's to make sure your assets changed the way you expected, and fix things when it didn't). Often banks will drop a transaction, or add an extra one (these systems are annoyingly manual, and subject to error). How do you find the exact error and track it down? Especially when the trade happened, but you don't have the actual record of it, and your records show that it didn't.
Can someone help me understand the premise of this article? I think the goal here is to map the internal books of a business to their bank account, but I have never seen the kind of "grouping" the article seems to assume as given. In which scenarios do these groupings happen? If there are several customers, there will be different invoices and therefore separate payments. Why would the bank just throw them together, thereby creating the problem this article is trying to solve? I'm asking from Germany, in case this is one of these Europe-US kinds of differences.
If you buy something for $0.99 today, they won't bill you immediately. If you buy another $0.99 item tomorrow, you'll get a consolidated charge for $1.98. You would need to do something like this to link the $1.98 to your app/song purchases.
The reason for this is that credit card processing often costs a flat service charge + a percentage of the bill: Stripe is 30 cents + 2.9% right now. The flat portion dominates for small charges, so you'd want to combine them if at all possible. (Apple certainly gets a better rate...but also has a scale where small savings add up).
That’s how it happens, and let me add the problem occurs _between systems_.
In this example we have Apple’s charge (receipts) and the consumer’s bank withdrawals (statements). This example gives you an idea of a consumer’s purchases, which are simple to reconcile.
The post is about the Business to Business situation, which deals with greater volume and therefore more complex problems. The OP uses a toy example. If you’ve done financial reconciliation, then you will recognize the problem behind the trivial example.
Yes but you don't have access to Apple's database, and therein lies the problem. Apple might send you a receipt that itemizes these things, or it might not. If you're a business you're dealing with 1000 different Apples all of whom have different policies about how granular their receipts will be, each with a different mechanism for you to access those receipts, and each making receipts available at different times.
Meanwhile you need to do a reconciliation for your company at the end of every period (e.g. the end of every day, every week, etc.) and you don't have time to wait around for all those receipts to be collected.
This is something I'm familiar with! I refined the above algorithm to reconcile statements according to the statement end balance. It works well, and yes it's knapsack.
I've written code for some reconciliation systems and it's nice to have confirmation they are indeed hard nuts to crack. The language matching isn't really the hard part IMO. What's tricky is that you're often matching batches "up to N", and the matching is not always just on a equal values or dates. For example, 300 credit card swipes at point of sale might match 3 deposits in the bank account a few days later with a 3% fee deducted. They never clear on the _same_ day, but need to match nearest day first. To brute-force, you need test all batches of up to 300 point of sale transactions and all batches of up to 3 deposits.
Maybe I'm misunderstanding the problem but you don't match payments to transactions. You would have a ledger that debits transaction and credits payment resulting in a current balance.
The reconciliation system that I work on tries to match data from different system to make sure they agree. Like match the Visa transaction file against what our system has internally recorded in its ledger. There is an unique key to match both records so no math involved.
Can anyone recommend a reconciliation application they've had a good experience with? My current work place rolled their own recon service. Is this the norm?
I was thinking that they were talking about netting/clearing and how to minize the amount of transactions that need to happen as actual wire transfers.
Actually no. This post is about doing something akin to first in first out accounting to match the payments to the invoices.
I wonder why they aren't simply using virtual IBANs...
Well, Reconciliation is certainly not a Knapsack problem, but I couldn't get over how most video game RPGs, e.g. Skyrim, are ultimately making a lot of fun out of the Knapsack problem.
I really want an automatic inventory pickup system based on it
Even better is no reconciliation at all. Just keep all the data where it belongs so you don't have to reconstruct the data and inevitably make up data.
This feels like a marketing piece without any good content.
The way it's described, it's not a knapsack problem at all. The knapsack problem is to maximize the total value of the items you fit into the container.
In reconciliation, you presumably want to get the best matching between transactions, which is not defined here, and in any case is a completely different problem.
Ignoring the knapsack comparison, the article doesn't describe why you'd want to check each possible combination. Assuming the individual amounts are correct, you can do each batch separately - no need to check each combination within one batch with each combination of a different batch. (And if you drop that assumption, that still won't be a sensible thing to do).
I can imagine you can have a "scoring" algorithm that gives a confidence score for a match - then if you check every combination, you can pick the combination with the best overall score. But the article doesn't actually describe anything like that.
It also doesn't describe any alternatives to "AI". For example, what about a greedy algorithm? What about alternative methods to do address comparisons? I'm sure there are issues with those, but none of that is described here.
To your point, bank statement reconciliation (not the only kind of or even the most important type of financial reconciliation) is typically audited by scoring transactions against house financials which then buckets them into a few categories such as matched, no match, "FRAUD!", etc. Solutions to this use case (bank reco), often look more like recommendation engines as human auditors end up manually reviewing anything larger than a defined variance threshold. One area where M/L could make sense is to provide a high confidence suggestion around "FRAUD!" against a real-time stream of transaction data which I have personally worked on at a large bank-it was a long time ago (20 years) and the ROC curve wasn't impressive.
Disclaimer: OpenEnvoy provides real-time auditing & reconciliation solutions but in front of the ERP so we don't directly compete with MT.
Very good points! I was a little confused about what this article was trying to show after taking an algorithms class, but I'm glad to know that I can somewhat justify my confusion (:
If you’re in an algorithms class addressing either partial or 0/1 knapsack (or even a computation theory class getting into if and when to break out a SAT solver), listen to the professor and not anyone talking about AI.
An automated solution to reconcile statements based on LLM matches removes transparency on how your books are prepared and might create a false sense of trust in the preparation of your books. In case of an audit, people will be in great trouble when their answer is ‘yeah, the AI booked everything.’
I think there is an opportunity here, but I don’t see it ending well until we can put the accountability of your accounts in check.