An oral history of Bank Python (2021)

161 points by tosh a day ago

I've worked at a bank and several large hedge funds.

Some additional interesting tech stories I would add:

- in 2010, the bank had retail Good Till Cancel orders from 1997. I think one was "Buy INTC at $6"

- There is a mix of "I didn't know technology could do this" in the good sense and "I'm amazed this code a. works at all and b. hasn't had an outage in 6 years"

- There is a strong desire, I chose this word carefully, to migrate off of legacy systems. That being said there are several; big issues: 1. it's a GIGANTIC amount of effort with often unclear ROI to the business, 2. upside is capped (maybe you get a promotion) but downside risk is huge (you could tank the business with an outage). 3. Slow, gradual refactors are generally better here but some things can only be "big bang" for various reasons

- You tend to see old but performant and battle tested systems get retired in favor of shiny, new systems with lots of bugs. Why? It looks better on a resume to say "I retired old, crufty legacy system and rolled out a new system" instead of "I refactored old system to be better"

- The complexities are wild e.g. Korean trading requires: a. traders to be licensed in Korea (even if they are working in NYC), b. servers to be in Korea c. tagging orders with not only their executions but also the exchange rate at the time

- There are entire SYSTEMS built to track trade breaks (e.g. Bank A doesn't agreed with Bank B on fill 1248383). Some of these trade breaks are open for for YEARS due to litigation, companies going out of business etc

I could go on and on about this.

If anyone is ever interested in having me on a podcast to talk more about it, I would totally be up for it.

elemeno 20 hours ago

To the best of my knowledge much of this originated with SecDB/Slang at Goldman - SecDB (securities db I believe) being the object store and slang the somewhat quirky C like language that ran with it (also the only language I’ve used professionally that let you have spaces in the variable names).

Some of the folk that built that (or worked on it) ended up at JPM and Merrill where they built the Python centric version - Alpha and Quartz respectively. Barclays Capital has/had a similar system as well I think, but it’s not one I know about offhand - they did though, memorably, have a system that was pretty much Haskell-in-Excel.

Thrymr 19 hours ago
JPM's version was Athena (not Alpha) [0]
[0] https://www.slideshare.net/slideshow/managing-python-at-scal...
- scelerat an hour ago
  Ha when I read “I won’t name the system; let’s just call it Minerva,” I thought, “I’ll bet it’s called Athena irl. “
- kayo_20211030 6 hours ago
  Aaah. I get it now. Athena -> Minerva.
Maven911 6 hours ago
A lot of them retired too and enjoyed making bank producing a pile of hot mess that the rest of the higher ups did not fully appreciate the mess they were producing. Collected their multi million packages and left
simonh 4 hours ago
I worked on Quartz at BAML for 4 years, it was great. I met Kirat Singh once when he visited the UK, he took the basic concept from Goldman to JPM, then to BAML.
- benjaminwootton 4 hours ago
  I was lucky enough to start my career sat on the next desk across from Kirat. Genius programmer and nice guy! He later went on to found Beacon platform which was the same again, as a cloud hosted service.
  - bostik an hour ago
    Yeah, and Beacon was acquired a year ago. The acquiring company in turn went private. Yesterday.
    Genius coder, yes. Nice guy, most definitely yes.

amyjess 3 hours ago

I remember reading this article when it was first posted here five years ago, and I've been fascinated by Bank Python since. It actually reminds me of a number of systems I've come up with in my head but never told anyone about or wrote down in any way.

manoDev 2 hours ago

If you ever worked with mainframe, you'll see a lot of similarities:

- Unified interface for object stores

- Source code stored with data files

- Job runner

I also see some similarities with Lisp machine, the fact Python also has a REPL, and able to dump/restore image state (but in this case discrete objects are serialized, not the entire memory).

This might sound crazy for people used to having 90% glue code / 10% business codebases, but to me seems like a very efficient way to have users directly drive what is effectively a large computer, and more like how things used to be.

The drawback is that it seems to be a monolith, and maybe hard to reimplement on top of more modern foundations. But as a general API, it seems to make sense.

tsukikage 19 hours ago

When first encountering these ecosystems and looking at the various pieces they contain, one may repeatedly ask: "why didn't they just use <off-the-shelf solution> for this problem instead of writing this component/subsystem from scratch"?

The answer is often that the battle-hardened mature off-the-shelf solution did not exist at the time the code was written. You're doing software archaeology.

gorgoiler 12 hours ago
In my experience it’s extremely difficult for a highly resourced corporate engineering team to get married to an open source project run by volunteers, consensus, or both. It is possible but you need to have a first class relationship with an upstream who will take your patches.
Every patch delay puts more pressure on you and your team to fork the codebase and go it alone. You and your team sit down and promise you’ll rebase over upstream releases and everyone nods wisely. Then you skip a release, and another, and presto: you now you have Bank Redis or Bank Selenium or Bank Hadoop trapped on the last version of upstream before the fork but to which you can patch changes as fast as you like. I’d liken this to crossing an event horizon except the astronaut sees the universe freeze and fade away instead of the outside observer.
It’s possible to make it work if the upstream project either gives you a majority vote (or at least a substantial share of the vote) on project direction, or you’re working on a project large enough to have lots of corporate (ie funded, high velocity) stakeholders already.
- CamouflagedKiwi 4 hours ago
  Yes, agreed. And it's not just delays - when the upstream decides that they actively don't want to take a patch that you have a burning need for (and maybe you already have systems in production depending on it), that can accelerate the process to Bank Redis a lot.
  I've rolled with the "we'll keep a local patch against upstream" for small changes before, which helps keep on track with upgrades, but depends how feasible that is.
lmm 18 hours ago
That's only half the answer. These large investment banks' value-add is partly that they can integrate everything they know into these closed-world environments (kind of like a Smalltalk image), which is something that simply isn't done in the wider world because you can't accrete it out of smaller pieces and it doesn't make sense at all for smaller entities.
- vshulcz 10 hours ago
  [flagged]
simonh 5 hours ago
Very much so, I worked on Quartz at BAML for a few years.
The whole idea was actually to use as much existing Open Source technology as possible. Hence Python and it's rich library ecosystem, instead of something home grown like SecDB/Slang. This was supplemented with proprietary infrastructure and libraries only where there was a clear need. For example a Directed Acyclic Graph library to ease migrations from the Excel sheets used by Quants. The distributed object store was pretty neat.
You could code up a basic web service with minimal functionality and have it running in nonprod in an afternoon, and then production the day after. All that boilerplate stuff was super low friction, so you could spend much more of your time on solving the actual problem.

axus 21 hours ago

What a well-written account of "how things are done".

> Time to drop a bit of a bombshell: the [Barbara] source code is in Barbara too, not on disk. Remain composed. It's kept in a special Barbara ring called sourcecode.

rbanffy 10 hours ago
This makes it feel like a gigantic Smalltalk instance.
- calpaterson 8 hours ago
  Interestingly, after writing this (some years ago) I spoke to some of the original authors. They had never used Smalltalk. So I suppose they invented this stuff independently
  - rbanffy 7 hours ago
    Could be. I got the same vibes with Zope, which is Python and has an object database underneath it. At the time I had the impression the idea was popular in finance.
    Recursing 7 hours ago
    Zope was strongly influenced by Smalltalk
    > The ZODB is an (almost) transparent python object persistence system, heavily influenced by Smalltalk.
    https://zodb.org/en/latest/articles/ZODB-overview.html#compa...
    I think Jim Fulton and other authors of Zope originally came from Smalltalk
    kayo_20211030 6 hours ago
    I loved Zope (and ZODB). No matter what I said, I could never convince the higher-ups it was the way to go. In retrospect, they were probably right. But those technologies were magnificent.
    rbanffy an hour ago
    Thanks to the Plone crowd, Zope now runs on Python 3 and still deserves some work. I’m not sure how to shoehorn Git-based workflows on top of it transparently, but I would still love to play with it.

nxobject 4 hours ago

I would give my left leg to learn how the permissions system worked – do end users (and PHBs) get to edit the rules directly? I fully expect some HR ass to go:

  can_view(Person) :- didPITAOnlineTraining(Person), ...

skissane 19 hours ago

I think it is a pity they’ll likely never open source any of this stuff

Of course, financial institutions have a lot of “secret sauce” - such as financial models - you’d never expect them to release.

But this kind of underlying infrastructure isn’t really “secret sauce”

lmm 18 hours ago
Morgan Stanley's version is open-source at https://github.com/morganstanley/optimus-cirrus , although I don't know how practical it is to actually run yourself. (They don't go quite as far as having the code itself be bitemporal and kept in the datastore, but most of the stuff in the article exists there)
- lesam 8 hours ago
  Poetic that the most recent update is 6 months ago, a one line change adding a 'Lifecycle: Active' emoji to the Readme.
Jianghong94 17 hours ago
Well I doubt these solutions are very useful outside; from what I read what they have is a universal data store (that isn't hard to implement using current off-the-shelf OSS), something for financial instruments that has a compositional nature (you won't encounter much of that in the outside world), plus some other quirky features.
veqq 17 hours ago
This infrastructure is more secret sauce than the financial models, which change rapidly.
jgalt212 18 hours ago
> I think it is a pity they’ll likely never open source any of this stuf
The more they use cloud-hosted LLMs, the more likely it will get leaked into training data.

tomrod 3 hours ago

A few of the scientific computing companies from the early 2010s got their traction due to Dodd Frank-required scenario and stress testing. SAS was not up to the challenge and R could not multithread.

These initiatives were independent of Minerva and Athena, which was good but not very useful to the more mundane parts of the bank everyone off the financial trading floors care about.

mhh__ 19 hours ago

People turning up in hedge funds (i.e. much smaller) and trying to rewrite the bit of a bank they used to work in's equivalent of this article is so annoying.

valzam 9 hours ago

> One of the great drawbacks of "Cloud Native Computing" as it now exists is that it's really, really complicated. It is often more complicated than the old, non-cloud, sort of computing. In order to deploy your app outside of Minerva you now need to know something about k8s, or Cloud Formation, or Terraform.

Highly agree with this. I think it's very underappreciated in startups that if you want people to deploy a lot of small services you have to make that really super easy. I always thought that the value of things like Spark is that you can run "things" without having to worry about how they run. K8s is similar but much more complex. AWS Lambda is nice but also comes with a lot of baggage at scale. I always wanted to try something like Dapr, which seems to provide a very opinionated happy path for application development.

horticulturist 17 hours ago

> This is because clients generally do not ring up about pennies.

I’ve had clients ring up about pennies… it can be crazy what some people are motivated by

AdamN 9 hours ago
The penny doesn't matter but being off by a cent can mean there are serious problems in the workflow.
TZubiri 17 hours ago
Precision?
Quite common in accounting, the accounting equation must balance, it's like a checksum
- nxobject 4 hours ago
  Yes, OP started the sentence with that. But, you do make me wonder if floating-point imprecision would eventually lead to material issues...

piinbinary 21 hours ago

Prior discussion: https://news.ycombinator.com/item?id=29104047

Havoc 19 hours ago

I've seen similar inside large financial orgs - what struck me was how there are these huge amounts of people that spend their entire working life inside this alternate IT reality. It's not unlike SAP consultants where their skillset is tied to one company.

Also...these things tend to have fuckin terrible documentation. Good luck figuring any of this out. And you can't google it and your AI is just as lost as you

janosch_123 9 hours ago
I was reading the article and got SAP/ABAP flashbacks.
- source code in database: yes
- own IDE for questionable reasons: you bet!
- custom table objects: we got your back.
- strange forks of common python libraries: would you like warnings with that?
nxobject 4 hours ago
> Also...these things tend to have fuckin terrible documentation. Good luck figuring any of this out. And you can't google it and your AI is just as lost as you
I convinced my boss to hire an intern for the summer to do this. They said: "wouldn't internship projects that involved actual coding be more attractive?"
I replied: "Well, they'll be having to do a lot of experimenting to figure things out..."
ForHackernews 7 hours ago
Isn't Google like this, too? They have their own source control system, their own IDE, their own databases.
It seems like any giant organization eventually develops its own software center of gravity.
- CamouflagedKiwi 4 hours ago
  Yes, it is, there's a translation table for xooglers too: https://github.com/jhuangtw/xg2xg
  The only real difference there (although it is a significant one) is that most of those internal Google tools tended to be very good, often ahead of the external state-of-the-art. That's a very different feeling to a baroque old stack inside a bank somewhere. Maybe the external world has caught up on a bunch of them more recently though which would start to change that.
  - ForHackernews 4 hours ago
    I suppose it depends what you're optimizing for, but this Bank Python looks like it would be great for enabling productivity in a consistent shared environment with minimal setup required for devs. This looks far better than setting up jupyter notebooks, installing all kinds of pypi dependencies, etc. whatever the equivalent on the outside would be.
Jianghong94 17 hours ago
lolol. Actually, I find AI has a reasonable chance to figure it out, as long as you point to the right source code. BTW to me these quirks actually can be used as some kind of job security. If it takes a year to onboard someone to do meaningful work, it sure raise the cost of firing.

coredog64 21 hours ago

And I thought rewriting 3rd party packages to work with AFS was crazy

debamitro 4 hours ago

Exhilerating article, but this has been shared on HN countless times

frays 13 hours ago

The previous discussion was fascinating: https://news.ycombinator.com/item?id=29104047

Does anyone working at one of these banks or similar know if this information still holds true?

And have any of the banks started using uv yet? Or will they forever be using pip?

roryirvine 5 hours ago
Back then, I was working on a project for a mid-market investment bank which aimed to build a self-service platform to host more "standard" python apps whilst still allowing them some of the benefits of what TFA refers to as Bank Python (speed of deployment, ability to spin up short-lived experiments with minimal hoop-jumping, structured data model, etc).
There was certainly a widespread understanding that doing things the "Bank Way" made recruitment difficult, and they hypothesised that it was also a significant drag on their ability to turn around new projects. The main goal of the new platform was to provide an alternative way of doing things which would allow them to quantify that drag.
I know that the pilot was completed, and it went on to a more widespread deployment - but my involvement with it had already ended so I can't say if it actually proved their hypothesis / provided the quantitative data they wanted.
neongreen 8 hours ago
I worked at Standard Chartered and it's a bit similar, but it's hard for me to judge how much.
SC has its own Haskell compiler that produces bytecode that you can run locally, serialize, send to be executed somewhere else, etc. Most of the code still lived in a monorepo, though.
We did have a global data store (well, several) that any code could access. I was working on a more "normal" application that was still written in the SC haskell dialect but otherwise mainstream architecture -- postgres, deploying to a boring linux server, etc.
A colleague once described our dialect as "Python that looks like Haskell". This is an exaggeration, but a) we did use a lot of untyped dicts and everything-is-a-giant-relational-table structures, and b) my understanding is that the actual financial modelling was done in C++ and the SC Haskell was glueing things together. Idk.
About uv -- I did try to convert ppl to uv but it probably didn't spread further than my few colleagues at the Warsaw office.. well and also I merged a monorepo-wide documentation system that used sphinx and uv, but idk if it's still alive after I left.
calpaterson 12 hours ago
They don't use pip, you just import the module and it is pulled from barbara

wirthjason 4 hours ago

  s/barbara/sandra/g

roywiggins 20 hours ago

Weirdly not dissimilar from MUMPS systems.

matt_daemon 6 hours ago

This is one of my fav blog posts

TZubiri a day ago

>Applications also commonly store their internal state in Barbara - writing dataclasses straight in and out with only very simple locking and transactions (if any).

Right out of the gates, it's crazy how this contrasts with Mercury's Haskell infra

https://blog.haskell.org/a-couple-million-lines-of-haskell/

lmm 19 hours ago
It sounds pretty similar actually. Barbara fills the same role that Temporal is doing at Mercury.
- TZubiri 17 hours ago
  I may be reading between lines, but temporal seems to be a virtual machine like the evm, it handles computation.
  Barbara is a company wide database, it handles data storage.
  When I read about internal app state being stored in Barbara I'm interpreting that the policy is for the data to be centralized for more vertical control.
  While the Temporal thing sounds like if something is written, it's done so in a containerized like manner, and other processes can't just read it.
  - lmm 16 hours ago
    > temporal seems to be a virtual machine like the evm, it handles computation.
    It stores the app's work-in-progress state as well (probably as a blob full of serialised internal datastructures, at least in some cases):
    > You write your workflow as ordinary sequential code, and the platform records every step in an event history. If a worker crashes mid-workflow, another worker replays the deterministic prefix to reconstruct the state, then continues from where it left off.
    > When I read about internal app state being stored in Barbara I'm interpreting that the policy is for the data to be centralized for more vertical control.
    That wasn't the way I experienced it, if anything it was the opposite: app developers would push to use Barbara for their internal state because it was easy: the app is already accessing it, the APIs are simple, and since it's just pickled objects you can just store your state without having to worry about serialisation (much) or ORM. Whereas policy and leadership would if anything prefer you to use a separate traditional database. The point of Barbara is to provide a unified interface onto "everything the bank knows", it's primarily for data that multiple teams use, not internal state owned by a single team.
devin 21 hours ago
Eh, to be fair, this post is about a _bank_, and the one you've linked is about _fintech_. They are not even close to the same space, even though they both deal with money.
But also I suppose you may be saying exactly this?
- miki123211 10 hours ago
  "Bank" in English is an extremely overloaded term.
  There's a very big difference between the kind of bank you walk into to get a checking account, versus one that has no (individual) customers and whose job it is to assist with IPOs or whatever.
- monknomo 21 hours ago
  to be fair, it is a fintech that wants to become a bank
  - TZubiri 17 hours ago
    It's a bank, that is, a bank in all but name for regulatory purposes.
    monknomo 2 hours ago
    I think they applied for an actual charter, so that will probably change
    TZubiri 9 hours ago
    *neobank

jgalt212 18 hours ago

> but the default ring is more or less a single, global, object database for the entire bank.

Is this really the case? I'm sure there are plenty of transactions that for umpteen different reasons must not be exposed on a global level.

lmm 11 hours ago
There are some things that are hidden, but they're the exception not the rule; but by default most things need to be global. You need to know how much capital the whole bank has at risk if <XYZ company> were to go bankrupt, or interest rates moved by x%, for example.
calpaterson 12 hours ago
Well, I say "more or less" :). But the fact that there is a single global database doesn't mean that you can read every key or value (but generally, yes, you can _read_ everything). I mention in passing prolog-style permission systems for evaluating perms.
But anyway, specific trades are rarely private to one part of the bank for many reasons. For example regulatory: these days you have to notify the regulator about every trade.
Jianghong94 17 hours ago
What I imply from the description is that, the default ring contains some shared global public data (e.g. a cache of bloomberg informations), and each individual team will have their own rings. Afterall there's no that many you can fit into 16mb
- calpaterson 12 hours ago
  No, no, there is a single world ring. the 16mb limit is for values

mv_d5339e31 17 hours ago

[dead]

Ozzie-D 17 hours ago

[flagged]