Selected Publications

(2021). PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them. TACL 2021.

Preprint PDF Dataset Project Website

(2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS 2020.

Preprint PDF

(2020). How Context Affects Language Models' Factual Predictions. AKBC 2020.

Preprint PDF

(2019). Language Models as Knowledge Bases?. In EMNLP 2019.

PDF Code Dataset Project

(2021). The Web Is Your Oyster -- Knowledge-Intensive NLP against a Very Large Web Corpus. Arxiv Preprint.

Preprint PDF

(2021). Boosted Dense Retriever. Arxiv Preprint.

Preprint PDF

(2021). Salient Phrase Aware Dense Retrieval: Can a Dense Retriever Imitate a Sparse One?. Arxiv Preprint.

Preprint PDF


Here are some projects I’m involved with:


LAMA ia a probe for analyzing factual and commonsense knowledge in language models.


Code, Data and Models to run Unsupervised Question Answering data generation on your own documents


Cape is a software solution allowing for SUPER easy integration of Machine Reading into software.