Earlier this year I led a collaboration between Cray Supercomputers, Digital Catapult and Bloomsbury AI (my previous employer). This post is an informal report of how we used Cray’s compute resources to both boost the accuracy and accelerate the speed of training machine reading models. With parallel training, we were able to break accuracy records on the TriviaQA Wiki task, without any change in model architecture.
If you’re wondering how to scale up and parallelize your network training, there are excellent tools like Horovod that make it easy with almost no code changes required.
Welcome to my first real blog post. Read more about what it’s all for here. As a reminder, this is mainly a tool for me to organise my time and thoughts. These posts are not going to be infallible pieces of academic writing, (they’re not papers and shouldn’t be judged as such!) but friendly constructive feedback is welcome! Also, I expect to amend these pieces from time to time too.