A Person Trained an LLM on Texts from 19th Century London

A person trained an LLM on texts from 19th century London. The model does not know what a telephone is, as it was trained only on data up to 1875, but it is well-versed in the religious discussions of that time. Yes, it is basic in every sense — a very small dataset, without fine-tuning on top, and only a billion parameters. But if you apply a basic chat fine-tune on top of this, it results in a wonderful excursion into the mindset of people from that era. Regular large LLMs do not fit well here — they have too much data about modernity, which causes them to constantly break character. In general, I would like such LLMs for various historical periods. I recall the scrolls that are deciphered as part of the Vesuvius Challenge, and we understand that it is possible to try to train an LLM on these ancient texts (though the dataset of several hundred scrolls is quite tiny).