Experimental Methods | Patrick Lewis

Setting the TriviaQA SoTA with Contextualized Word Embeddings and Horovod

Earlier this year I led a collaboration between Cray Supercomputers, Digital Catapult and Bloomsbury AI (my previous employer). This post is an informal report of how we used Cray’s compute resources to both boost the accuracy and accelerate the speed of training machine reading models. With parallel training, we were able to break accuracy records on the TriviaQA Wiki task, without any change in model architecture. If you’re wondering how to scale up and parallelize your network training, there are excellent tools like Horovod that make it easy with almost no code changes required.