Franz Laage

2023 / Feb 17 / 10:08 CET

I’ve been looking into extracting structured data from recipes recently. Especially on parsing ingredient phrases in something useful like ingredient name, quantity and unit. It’s a rabbit whole of NLP and dubious SaaS services. The NYT posted about how they used a linear-chain conditional random field model to extract data from their recipe archive. There’s even some code on Github. Sadly, but understandably, they don’t provide training data or a pretrained model. Time come up with my own!

← All atoms