Hacker News new | past | comments | ask | show | jobs | submit login
Building a Scalable Accounting Ledger (scratchdata.com)
53 points by memset 9 months ago | hide | past | favorite | 36 comments



This claims 1 and -1 is confusing, then proceeds to use the confusion. Why not use the language of the problem domain?

The rules are unambiguous. An increase in assets are a DR, increase in liabilities a CR, in income a CR, in expenses a DR, in owners equity a CR.

Decreases the opposite.

Using 1 and -1 is nonsensical because their meaning in mathematics is fixed, while as shown by the rules above, in accounting DR/CR meanings change.

So a couple of example journal entries…

  Product sale for cash
  Sales CR 100
  Bank  DR 100

  Payment of rent
  Rent DR 50
  Bank CR 50

  Sale for cash where tax collected and owed
  Income      CR 70
  Tax Payable CR 30
  Bank        DR 100

  Monthly petty cash tin entry
  Postage             DR 5
  Accounts receivable DR 20 (loan to John)
  Transport           DR 15 (taxi Paul)
  Bank                CR 40
  
Note that your bank statement is from the banks perspective. When you put money into the bank it is a CR because it is increase in the banks liability.

Is this too hard to model? I suggest not?


I think you think it's confusing only because you're not used to it. The two models are mathematically equivalent.

The rules are unambiguous. Values which indicate capital are positive. Values which indicate ownership are negative. An increase in capital corresponds to an increase in ownership of said capital, which add up to zero. Perfectly balanced, as bookkeeping should be.

I find CR DR to be more confusing than +/-, because the latter works out naturally with basic arithmetic, whereas the former requires this arbitrary DR CR crud, which are made up terms that literally mean nothing and you have to rote memorize how they apply to different accounts (and to add insult to injury, the terms often have the exact opposite meaning to consumers due to how banks present the terminology, which means the terms literally can mean either thing depending on the context).


I think where this would get confusing is that a positive number would have to decrease some accounts, but increase others, right?


It does not because if you use + and - liabilities are negative and assets positive and if you add all balances the sum is zero. Actually the commercial accounting software I use works like that, it uses negative balances for liability accounts.


Exactly, it's completely unnecessary extra information that doesn't need to appear in the core data model that the user never directly sees. Extracting credits, debits, reversals, returns, etc. is all higher-level information.


Also, money is usually kept as integer of cents. It is clear, inambiguous and allows for other rounding rules that sometimes pop up in legislation.


Money exists outside us dollar and should be stored ideally a a composite of currency and decimal or just a decimal.

Then use currency specific rounding rules and always defer to CLDR [0] when displaying.

1. https://cldr.unicode.org/translation/number-currency-formats...


There is always a main currency for any entity, so any foreign currency dealings or having a reporting currency adds some fun in the form of changing exchange rates, realized/unrealized exchange rate differences, rounding errors and other shenanigans.


> Is this too hard to model? I suggest not?

I'm sure I could put my mind to following those rules, but what's to be gained?

If 'assets' and 'income' are opposites, and 'income' and 'liabilities' are the same, then I'm just not going to bother. What if my income is paid in assets in the form of $1 bills? CR or DR?

And 'equity', Jesus is that a nebulous concept.


Equity is the most important of all - it’s simply what the business is worth to the owners if you sold everything


Then you get to corrective transactions, where a normal return is

  Sales DR 100
  Bank CR 100
while an annulled sales invoice is

  Sales CR -100
  Bank DR -100
The point of using debit/credit vs +/- is just that, the ability to have correct account turnovers with corrections, and maybe some conventional understanding of what's an expense account in debit and what's a vendor account in credit.


A return and an annulled sale are still credits and debits, respectively, when it comes to the mechanical sum being performed. The specific category of the action is higher-level information that doesn't need to appear at this level of the model.


In my scheme, 1 always means DR and -1 always means CR. I use this, rather than the strings DR/CR, so the database can more easily do arithmetic to calculate balances. In order to do that arithmetic, we multiply the account's normal (again, 1 or -1) by the direction (1, -1) to determine if the amount represents an increase or decrease.


Did not expect this to hit the front page! A few notes:

- 1 always represents debit and -1 always represents credit. Regardless of whether it is an asset or liability account.

- Each account has what is called a "normal balance". This tells us whether you increase the value of the account using debits or credits. This is also modeled as 1 or -1.

- Amounts are always positive

- To find the actual dollar value for a transaction, you can multiply: amount * direction * normal. If the direction and normal are the same sign, then that means we increase the value. Otherwise we decrease. Your code doesn't need to know which accounts are assets vs liability vs equity in this scheme.

- Of course, you still need to know whether to insert transactions as debit or credit, and you need to know the normal balance of your accounts. But once you figure that out (by googling, or talking to your finance team) then the data model takes care of the rest.


Thank you for the post. One question: do you think that ClickHouse is a good database for solving this problem? ClickHouse does not guarantee ACID properties like isolation or consistency at levels above input blocks. It also uses eventually consistent replication between table replicas. So it's quite possible the same query may return different results depending on which server it runs.

I work on ClickHouse and am a big fan. That said, this consistency issue has arisen from time to time in customer cases when dealing with financial data. I'm curious if you have seen it in this case.


Tangentially related, I know little about double-entry accounting and was wondering what CR and DR meant. If CR was credit, then what was DR? It's somewhat obviously debit, but why DR and not DB or some other abbreviation that uses letters in "debit"? I found a beautiful answer here [0] in the Accounting Historians Journal.

[0]: https://egrove.olemiss.edu/aah_journal/vol13/iss2/12/

Edit: I should add that CR and DR at one point in time were related to creditor and debitor. I referenced that article, because it's the journey of the author to find an answer that I enjoyed about it.


I think the transactions table is misnamed. This is a table of posting legs. Which all must balance for a single transaction. I don't think you should get hung up on +ve-ve versus cr and Dr. When double entry was formalised negative numbers were not common and then thought errant. It is only in the eighteenth century it became normal. I enjoyed the entry. Note that you might you use different minimum fractions eg we use 0.00001p as a discrete value.


Legs might be better name. Line items?

I think a lot of people are glossing over the fact that you not only need 1/-1 for DR/CR, but also the "normal" balance of the account to tabulate balances.


How many legs are there in a transaction?


> How many legs are there in a transaction?

Almost always multiple, depending on what you consider a transaction, and the PoV of the accounts involved, etc.

I cannot think of a transaction that has a single leg only.

Here's a possibility: You buy something online. You pay $100 by credit card ($80 cost plus $20 shipping).

The accounting system needs to record your purchase. Maybe dr 'Mr ClientName' with $100, cr 'Clearing' with $100.

At some point bank recon will happen, the payment will reflect and the amount will be considered 'cleared'. At this point dr 'Clearing' with $100, cr 'depot' with $100 and delivery details.

The depot will cr 'Stock' for $80 worth of stock, dr 'Courier' for $20[1] using the order number and the companys client number at the courier for folio. Payment request fired off to people with authority to approve payments (PWATAP).

PWATAP will place order at supplier for $80 of stock (to replenish), dr 'PurchaseInProgress' with $80, pay courier service and cr 'Courier' with $20.

In reality, all of those things either happen in a more streamlined way (a holding account is used and replenishments are purchased periodically in batches, not individually), or behind the scenes and invisible to the end-user (dispatcher simply puts "fulfilled order" into the system, and the system will split it up however it needs to).

[1] The courier will go through the same process, for their services rendered


1 for a zero amount more realistically commonly 2. End of year may have one for each account do many.


Feels more as a way to introduce a product (scratch data), rather than a technical article. If you have only one entity to manage accounts for - why would you need to build an accounting engine? In other case, when you manage accounts for multiple entities, such as different "clients" or "branches" - you have millions of transactions, and a query such as "get accounts for this client" will take a long time - and you still need to introduce some kind of materialized views.


Two perspectives on this:

- I believe this is a good application of columnar stores (ie, Clickhouse) rather than traditional Postgres, where handling 1M transactions is really fast.

- You might still choose to use materialized views! This article is a suggestion for a table structure that powers those views.


This confirms what I’ve heard repeated before: it’s easier to teach programming to an accountant than accounting to a programmer.


Accounting is actually not hard, what's hard is the fact that accounting as a field didn't adapt to the existence of negative numbers so they still talk about the effects of "credits" and "debits" based on underlying account type.

Physics may have gotten the direction of current flow wrong, but at least they were _consistent_ and didn't start talking about "source voltage" and "load voltage" which would have muddled KVL into `sum{delta_sources} = sum{delta_loads}` compared to `sum{delta_voltage} = 0`.

The real issue is that besides the famous Kleppmann article, it's hard to find material on accounting that sheds this legacy baggage.


From my experience as a programmer who did accounting for a while: Accounting is the easy part, but the borders to taxes are nonexistent (at least here in Germany). So when you do accounting you always do taxes to a certain degree. And this is not something that can be understood with logic, it’s completely arbitrary.

On software that was written by accountants: look no further than DATEV. It’s an abomination I would not inflict on my worst enemy.


Plenty of articles of programmers talking about accounting to form one side of that opinion, where are the articles from accountants talking about programming to inform the other side?


How is this any difference with using the sign of the amount column to signify debit or credit? Using that approach at least you save a column. Otherwise have explicit debit and credit columns if you want to be "consistent with how accounting is done".


I don't really see how what is described is "scalable".

With hundred thousandth of transactions, performing the sums will be expensive.

And nothing ensure the safeness of the order of entries...


I think they were suggesting they chose "a column-oriented database like Clickhouse" to solve the summing problem for them, with some magic under the hood, maybe doing aggregates, as was mentioned separately as a solution if you wanted to roll your own.


I didn't get the date as text and the amount as real in table transactions.


SQLite, as used in the article, has a very simple data model: 64 bit ints, 64 bit floats, and textual data. If you used a different database engine, you would probably use the datatypes it offers instead.


Agree with this! I just used SQLite as kind of a pseudocode for demonstrating the concept.


Ah, thank you. I understood that it was a Clickhouae schema, it's not.


Why would you want to build your own accounting ledger from scratch? Accounting is a completely new domain for most engineers, and TigerBeetle (https://tigerbeetle.com/) already solves this problem.


I think our definition of «solved» might be slightly different hehe.

«The production release of TigerBeetle is imminent.»

Not putting money into this system any time soon in other words.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: