More than you can possibly imagine. There are warehouses full of unread papers. Any one of which could contain a reference to somebody or something important.
There was a recently discovered letter, possibly to Shakespeare's wife, which would completely change our understanding of their marriage, and even the way his plays depict women. The only way to find such things is by hordes of grad students trudging their way through fragile paper and messy handwriting.
I hate to say it, but might LLMs transform archival work? Not by replacing researchers, but by inputting everything (or orders of magnitude more than we could previously) and outputting to the researcher a prioritized list of documents / etc to examine?
The bottleneck is physical work, as I understand it. And primarily delicate physical work that does not destroy the already disintegrating materials that are piled up in boxes for miles.
If you could automate transcription, it would be an enormous boon to researchers.
Reading the handwriting would be really hard, and it would be a massive effort to move all that paper. Just handling it is hard; it's not like flipping through mass-manufactured books.
But I suspect that you could spend a few million dollars to revolutionize the field.
this also means trusting the LLM to decide what things mean. but there is very likely a great middle ground of having LLMs take their best guesses and then verifying the output on significant finds. the risk is in LLM understating something important, false negatives, leading to putting stuff at the bottom of the pile that appears mundane but isnt
That's why I suggest the output would be a prioritized list of documents for the researchers to review; the LLM doesn't get the final say, it just makes recommendations. Yes, things would be missed, but the resesarchers might in theory find much more value than their current search method.
Assuming they have been transcribed, yes. The key idea that makes LLMs special is the attention mechanism. Maintaining attention over volumes of data is boring for most humans.
Also, to be pedantic, just taking about LLMs in this context is a tad reductive. There are many deep learning models involved in archival work that aren't language models.
I had ChatGPT translate some old, handwritten French legal documents for family history purposes. It was far more accurate than I expected.
At scale, with better models, we might have a way to clear out the old archives. Not only could you translate, you could ask it to triage the discoveries. "Would the average person find this noteworthy?"
I have a ton of handwritten German stuff from the 19th century. My grandmother could make a fair stab at it, but nobody left can read it. I've shown modern Germans and they are at a loss. Thanks for your idea, I will give it a look. Any tips on model/method/training?
Try both Gemini Pro and ChatGPT. They are both outstanding at reading almost-unreadable documents. Use the highest thinking level your account supports.
(If you want to post a sample or two here, I'll try it. I like to collect difficult out-of-distribution test materials.)
I found a copy of the oldest film ever shot in China in the School of Oriental and African Studies (SOAS) library in London. The camera had been personally loaned to the French administrator in question by the Lumiere Brothers. The film had been entered in to the catalogue but nobody had looked at it in decades and they didn't have equipment to do so. The university wound up digitizing it with funds donated by the alumni and I was invited on my return from the US to address the alumni association on my research.
Apparently this was an exercise book he made for a parisian tutee, who later fled the french revolution, leading to the confiscation of the notebook by the revolutionaries.
Note-book, as in "book containing musical notes". I expected a regular notebook (for the other kind of notes, that people like you and me might write)...
> I have it on good authority that it is a handwritten notebook.
I'm suspicious. Didn't Mozart use a word processor?
I mean, not a PC program, that would be ridiculous, but one of those dedicated stand-alone word processor systems (like Smith-Corona made) that they used in ancient times.
One of my pet peeves is what seems to be an overwhelming desire in writers to always put an adjective in front of every noun. You can never just let it be a "notebook", it has to be some kind of notebook.
It's even worse in product naming and advertising. Nothing can be just "vanilla", you have to even put an adjective in front of your adjectives, like "Mexican vanilla".
But Mexican vanilla is different from Tahitian vanilla which is different from Madagascar. IYKYK and only ignorant people think all vanilla is the same. Saying "just vanilla" shows that ignorance just as someone that says "dirt cheap" has clearly never bought dirt.
I’m hoping that a full scan appears in the archive linked at the bottom of the page. I’m a composer and still hand-notate in a notebook. It’s so cool to the penmanship of someone writing in notebooks so quickly yet cleanly. In case you didn’t read, the contents are primarily exercises in composition where Mozart began a passage, the student continued, and Mozart corrected / guided the students work where needed. So there’s a higher percentage of Mozart in the pieces here than not. Like Brundlefly.
Mozart is among the most famous Western composers, and, like others of his stature, all his extant manuscripts have been cataloged and studied extensively. To find a previously unknown manuscript is a major event in that scholarship.
Schools used to spend a lot of time on penmanship. I visited a high school where they had a wall of notes left by each senior class. In the notes from the 1950s the writing was quite refined and looked very practiced, and notes left by kids in the 2020s looked like 2nd grade printing by comparison. I don't think cursive handwriting is really even taught/required anymore.
I can imagine that in the time of Bach or Mozart that writing was a big point of emphasis in schools.
Cursive is only useful for fountain pens. It was a sign of its times and is totally pointless and even counterproductive today. I get really sick of people exclaiming how important cursive is today when everyone types and everything is printed.
"Back in my day we taught the kids cursive!!" How many of them used fountain pens? I'm guessing zero. You just wasted their time instead of teaching them something valuable.
Knowing how to write cursive is a useful skill even if underutilised, heck, it’s a useful skill even if just for the artistry of it. It might be niche, it might be unnecessary, but saying it is a “waste of time” is just ridiculous.
Cursive was and still is a fast way to write by hand. If you want to be illiterate without a computer, then you don't want to learn handwriting. But if you don't want to be illiterate without a computer, learning cursive is a fast and efficient way to do it.
It’s a bit disingenuous of you to try and claim this when you have basically no understanding of the use of cursive nowadays. Not sure if you’re just a kid out of high school proclaiming rebel knowledge of domains you know nothing about, or just a really disingenuous person with a terrible case of arrogance, either way, you really ought to stop and reevaluate some of your positions—they are depressingly wrong.
In the town where I live there are buildings from a 19th century pipe organ factory. My wife used to work in one of the buildings. The employees had scribbled various names and dates and witticisms on the walls in pencil. Their handwriting was beautiful. I was gratified that no one had thought to beautify the walls after the factory closed. In the loft above there were ancient mechanical drawings of organ parts rolled up and stored on racks, and at the end of the loft was a designer's desk still waiting for him to come back and make more drawings.
What? What does one thing have to do with the other? Heck, what is up with everyone here just making up bullshit arguments about classical composers? The heck is this nonsense!?
Wow. Can we even be sure we're listening to the right thing? Is it actually possible to read this unambiguously or is there an element of context when reading music, similar to how if you're reading prose the next word is probably grammatically correct and makes sense?
The publisher was generally familiar with Beethoven’s writing and conventions. He’d prepare galleys that Beethoven would proof (and frequently edit). A substantial part of Beethoven’s known correspondence concerns corrections to galleys (and managing payments).
Any time something of popular historical interest like this pops up I think about that.
If you've not read it then Robert Harris's (factual) book about the affair is entertaining, not least because such a broad sweep of dislikeable characters were undone by greed and folly!
The whole affair was bizarre. At one point Kujau, the author of the fake diaries, ran out of ideas and let Hitler complain about his flatulence.
There is also a very funny German movie about it (https://en.wikipedia.org/wiki/Schtonk!) The director later said that he intentionally omitted some facts about the real scandal because the audience would find it too far fetched.
I think my favourite aspect of the tale (at least as Harris tells it) is that Kujau was such a bad forger, and the recipients wanted it all to be true so badly that they skipped several opportunities to actually check!
I shall see if I can find Schtonk! with subtitles, sounds up my alley.
Yes! Like when Kujau couldn't get the letter A, so he went with "FH" instead of "AH" for the cover initials. Heidemann convinced the people at Stern that it surely stands for "Führer Hitler" :-D
Schtonk! does a really great job at satirizing the Führerkult that was still very much present in large parts of German society.
Even inside the tiny niche of the classical music history world, a book of daily exercises - written for some now-obscure student, and owned by a national library - is actually a pretty minor thing.
Very few counterfeiters bother doing nickles and dimes.
He was a niche-specialty career archivist, sorting through his library's collection of stuff from the right era and area. That is the discovery story behind a rather large fraction of such documents.
If you like a discovered manuscript story, you should see "In the Hands of Dante", great movie.
https://www.imdb.com/title/tt1333644/
This review doesn't spoil the movie https://www.theguardian.com/film/2026/jun/19/in-the-hand-of-...
Side note, imdb's per country rating histograms are mesmerizing https://www.imdb.com/title/tt1333644/ratings/ how different the Iranian ratings are vs the UK.
That looks pretty engaging - all the right people hate it.
I do jump directly to the 1-star reviews, so there is that.
First step for a great features: movies hated by people you despise
"It is a sobering thought that when Mozart was my age, he had been dead for two years."
Tom Lehrer.
Mozart lived for 35 years
Lehrer did 97
> Lehrer did [sic] 97
FYI, most people speak the vast majority of their quotes before the day they die.
Unfortunately for Lehrer he embarrassed himself in his final words by misremembering how long Mozart lived
He’ll never live it down.
No he was correctly factoring in afterlife time dilation
Classic old guy
The reports of my death are greatly exaggerated.
I sure hope they speak all of them before they die. Bit hard to understand a corpse.
It is possible Lehrer said that before his last day on earth. Sometime around age 37 would make sense.
In fact, I had the original album from the 1960s and, yes, that's where I heard the line.
(Lehrer was a mathematician) he did the maths! Well.. arithmetic.
Was that the new math or the old?
There is not a single citation in this article, even though it uses quotations.
Here is a more reputable article for this news story: https://www.nytimes.com/2026/06/22/arts/music/mozart-music-f...
At least they didn't use quotation marks for "emphasis".
Turns out "technical debt" also applies to national archives.
More than you can possibly imagine. There are warehouses full of unread papers. Any one of which could contain a reference to somebody or something important.
There was a recently discovered letter, possibly to Shakespeare's wife, which would completely change our understanding of their marriage, and even the way his plays depict women. The only way to find such things is by hordes of grad students trudging their way through fragile paper and messy handwriting.
I hate to say it, but might LLMs transform archival work? Not by replacing researchers, but by inputting everything (or orders of magnitude more than we could previously) and outputting to the researcher a prioritized list of documents / etc to examine?
The bottleneck is physical work, as I understand it. And primarily delicate physical work that does not destroy the already disintegrating materials that are piled up in boxes for miles.
https://www.aaa.si.edu/documentation/digitizing-entire-colle...
If you could automate transcription, it would be an enormous boon to researchers.
Reading the handwriting would be really hard, and it would be a massive effort to move all that paper. Just handling it is hard; it's not like flipping through mass-manufactured books.
But I suspect that you could spend a few million dollars to revolutionize the field.
>automate transcription
this also means trusting the LLM to decide what things mean. but there is very likely a great middle ground of having LLMs take their best guesses and then verifying the output on significant finds. the risk is in LLM understating something important, false negatives, leading to putting stuff at the bottom of the pile that appears mundane but isnt
That's why I suggest the output would be a prioritized list of documents for the researchers to review; the LLM doesn't get the final say, it just makes recommendations. Yes, things would be missed, but the resesarchers might in theory find much more value than their current search method.
This is already the case with genealogical sites that have ML OCR creating searchable indices of handwritten documents.
Assuming they have been transcribed, yes. The key idea that makes LLMs special is the attention mechanism. Maintaining attention over volumes of data is boring for most humans.
Also, to be pedantic, just taking about LLMs in this context is a tad reductive. There are many deep learning models involved in archival work that aren't language models.
I encourage you to read into this post for more context on what I mean: https://news.ycombinator.com/item?id=48675179
I had ChatGPT translate some old, handwritten French legal documents for family history purposes. It was far more accurate than I expected.
At scale, with better models, we might have a way to clear out the old archives. Not only could you translate, you could ask it to triage the discoveries. "Would the average person find this noteworthy?"
I have a ton of handwritten German stuff from the 19th century. My grandmother could make a fair stab at it, but nobody left can read it. I've shown modern Germans and they are at a loss. Thanks for your idea, I will give it a look. Any tips on model/method/training?
Try both Gemini Pro and ChatGPT. They are both outstanding at reading almost-unreadable documents. Use the highest thinking level your account supports.
(If you want to post a sample or two here, I'll try it. I like to collect difficult out-of-distribution test materials.)
Oh, wow, that is actually an interesting application of ai
I found a copy of the oldest film ever shot in China in the School of Oriental and African Studies (SOAS) library in London. The camera had been personally loaned to the French administrator in question by the Lumiere Brothers. The film had been entered in to the catalogue but nobody had looked at it in decades and they didn't have equipment to do so. The university wound up digitizing it with funds donated by the alumni and I was invited on my return from the US to address the alumni association on my research.
> the Duke failed to pay Mozart for his work
You stiffed Mozart!? A curse on your ghost!
Worse could happen.
The Duke could have thrown him in the castle jail.
Apparently this was an exercise book he made for a parisian tutee, who later fled the french revolution, leading to the confiscation of the notebook by the revolutionaries.
That's exactly what the article says... so yes apparently that's what it is
I have it on good authority that it is a handwritten notebook.
Note-book, as in "book containing musical notes". I expected a regular notebook (for the other kind of notes, that people like you and me might write)...
> I have it on good authority that it is a handwritten notebook.
I'm suspicious. Didn't Mozart use a word processor?
I mean, not a PC program, that would be ridiculous, but one of those dedicated stand-alone word processor systems (like Smith-Corona made) that they used in ancient times.
One of my pet peeves is what seems to be an overwhelming desire in writers to always put an adjective in front of every noun. You can never just let it be a "notebook", it has to be some kind of notebook.
It's even worse in product naming and advertising. Nothing can be just "vanilla", you have to even put an adjective in front of your adjectives, like "Mexican vanilla".
EDIT: s/verb/noun/
But Mexican vanilla is different from Tahitian vanilla which is different from Madagascar. IYKYK and only ignorant people think all vanilla is the same. Saying "just vanilla" shows that ignorance just as someone that says "dirt cheap" has clearly never bought dirt.
Rich Corinthian leather! My dude!
The podcast is in French, but you can hear the first public performance of the works here:
> Discovery of an unpublished manuscript by Mozart at the BnF: behind the scenes of an extraordinary operation
https://www.radiofrance.fr/francemusique/podcasts/l-invite-e...
https://en.wikipedia.org/wiki/Leck_mir_den_Arsch_fein_recht_...
Hear perhaps here:
https://youtu.be/wk-sIeh7BcI?si=188fGFMD_f3DrkXP
Perhaps?
I hope we get to hear his new/old music. That would be amazing
french radio "France Musique" aired it the other day, i don't know if its available outside of france though
It was also played live for Fête de la musique in Paris last Sunday.
Perhaps here:
https://youtu.be/wk-sIeh7BcI?si=188fGFMD_f3DrkXP
The library where the discovery was made:
https://www.bnf.fr/en/actualitesEN/discovery-unpublished-aut...
I’m hoping that a full scan appears in the archive linked at the bottom of the page. I’m a composer and still hand-notate in a notebook. It’s so cool to the penmanship of someone writing in notebooks so quickly yet cleanly. In case you didn’t read, the contents are primarily exercises in composition where Mozart began a passage, the student continued, and Mozart corrected / guided the students work where needed. So there’s a higher percentage of Mozart in the pieces here than not. Like Brundlefly.
While interesting. Is it a 'Major discovery' ?
Mozart is among the most famous Western composers, and, like others of his stature, all his extant manuscripts have been cataloged and studied extensively. To find a previously unknown manuscript is a major event in that scholarship.
They aren’t making more Mozart notebooks so probably.
Why wouldn’t it be? Heck, how could it not be?
I love his handwriting style. I wonder if it was the first draft or a copy [1]
[1] https://www.youtube.com/watch?v=gkqfpkTTy2w
Composers were also handwriting masters. Bach also had incredible handwriting, there's a youtube channel about it.
Schools used to spend a lot of time on penmanship. I visited a high school where they had a wall of notes left by each senior class. In the notes from the 1950s the writing was quite refined and looked very practiced, and notes left by kids in the 2020s looked like 2nd grade printing by comparison. I don't think cursive handwriting is really even taught/required anymore.
I can imagine that in the time of Bach or Mozart that writing was a big point of emphasis in schools.
Cursive is only useful for fountain pens. It was a sign of its times and is totally pointless and even counterproductive today. I get really sick of people exclaiming how important cursive is today when everyone types and everything is printed.
"Back in my day we taught the kids cursive!!" How many of them used fountain pens? I'm guessing zero. You just wasted their time instead of teaching them something valuable.
> I'm guessing zero.
There’s your problem: guessing.
Knowing how to write cursive is a useful skill even if underutilised, heck, it’s a useful skill even if just for the artistry of it. It might be niche, it might be unnecessary, but saying it is a “waste of time” is just ridiculous.
> Knowing how to write cursive is a useful skill
No, it's not. At all
Spoken like a true child of the internet. Well done.
It actually is. Very much so.
Cursive was and still is a fast way to write by hand. If you want to be illiterate without a computer, then you don't want to learn handwriting. But if you don't want to be illiterate without a computer, learning cursive is a fast and efficient way to do it.
Literacy and cursive have nothing to do with each other.
It’s a bit disingenuous of you to try and claim this when you have basically no understanding of the use of cursive nowadays. Not sure if you’re just a kid out of high school proclaiming rebel knowledge of domains you know nothing about, or just a really disingenuous person with a terrible case of arrogance, either way, you really ought to stop and reevaluate some of your positions—they are depressingly wrong.
They spent more time in penmanship class than an individual grad student spent learning LaTeX in the pre-LLM time, for reference/scale.
In the town where I live there are buildings from a 19th century pipe organ factory. My wife used to work in one of the buildings. The employees had scribbled various names and dates and witticisms on the walls in pencil. Their handwriting was beautiful. I was gratified that no one had thought to beautify the walls after the factory closed. In the loft above there were ancient mechanical drawings of organ parts rolled up and stored on racks, and at the end of the loft was a designer's desk still waiting for him to come back and make more drawings.
Beethoven certainly wasn't.
You've named one composer who is. I don't see where the inductive step applies.
The composers who didn't have neat handwriting are forgotten today because nobody could read their (musical) notes...
This is simply not true. Look at Beethoven's manuscripts for instance.
https://guides.loc.gov/beethoven/manuscripts
That's one of the reasons why he spent several years to write a single symphony.
What? What does one thing have to do with the other? Heck, what is up with everyone here just making up bullshit arguments about classical composers? The heck is this nonsense!?
Wow. Can we even be sure we're listening to the right thing? Is it actually possible to read this unambiguously or is there an element of context when reading music, similar to how if you're reading prose the next word is probably grammatically correct and makes sense?
Exactly. The context makes it all pretty clear. Music has its own grammar, and particularly music of the common practice era from about 1650-1930.
The publisher was generally familiar with Beethoven’s writing and conventions. He’d prepare galleys that Beethoven would proof (and frequently edit). A substantial part of Beethoven’s known correspondence concerns corrections to galleys (and managing payments).
Such as famously the forgotten composer Ludwig von Beethoven.
Please, let’s leave the made up arguments to the LLMs.
You can check all this out for yourself at IMSL. Tons of holograph copies there for lots of composers. https://imslp.org/wiki/Main_Page
I see you've never worked your way through a manuscript by Donizetti.
Let's hope it is more authentic than the Hitler Diaries[1]
[1] https://en.wikipedia.org/wiki/Hitler_Diaries
Any time something of popular historical interest like this pops up I think about that.
If you've not read it then Robert Harris's (factual) book about the affair is entertaining, not least because such a broad sweep of dislikeable characters were undone by greed and folly!
The whole affair was bizarre. At one point Kujau, the author of the fake diaries, ran out of ideas and let Hitler complain about his flatulence.
There is also a very funny German movie about it (https://en.wikipedia.org/wiki/Schtonk!) The director later said that he intentionally omitted some facts about the real scandal because the audience would find it too far fetched.
I think my favourite aspect of the tale (at least as Harris tells it) is that Kujau was such a bad forger, and the recipients wanted it all to be true so badly that they skipped several opportunities to actually check!
I shall see if I can find Schtonk! with subtitles, sounds up my alley.
Yes! Like when Kujau couldn't get the letter A, so he went with "FH" instead of "AH" for the cover initials. Heidemann convinced the people at Stern that it surely stands for "Führer Hitler" :-D
Schtonk! does a really great job at satirizing the Führerkult that was still very much present in large parts of German society.
Confiscated during the revolution, kept by the national library. That's a bit different to "forged on schoolbooks with a Bic pen" provenance-wise.
[flagged]
Even inside the tiny niche of the classical music history world, a book of daily exercises - written for some now-obscure student, and owned by a national library - is actually a pretty minor thing.
Very few counterfeiters bother doing nickles and dimes.
BTW the metal in a nickel is worth about 7 cents.
> By coincidence, Goy had been looking at other documents Mozart had written for teaching just weeks earlier
Color me sceptical
He was a niche-specialty career archivist, sorting through his library's collection of stuff from the right era and area. That is the discovery story behind a rather large fraction of such documents.
So not much a coincidence I’d say. Very much by design.
parallel construction
[dead]
Anyone remember the Hitler diaries?
seems like more of a minor discovery to me
Seven previously unknown compositions for flute and harp is not minor
don't fret over dark keys