2023 / Mar 19 / 15:35 CET
My NLP experiment for parsing recipe ingredients is going so-so. I’ve used both tensorflow and scikit to implement a CRF model for labelling. Tensorflow has been a mess, it’s complicated to approach if you don’t have ML expertise. I still haven’t figured out how I can pass multiple features for a single token („word“) into the model. The crfsuite wrapper of scikit makes that really easy. Throw in a python dict and off you go. Really shows how a specialised API can help beginners to get started. The keras/tf API basically has to work for all things ML, making it extremely generic (and thus flexible). On the upside both experiments had me look into the data itself much closer. I have a good grasp of what needs to be cleaned up, normalised or removed. That paved the way for a hand-rolled approach. Regex all the way down. A big thanks goes out to Tom Strange and his model guide.
← All atoms