I wish I’d gotten to work more with Steve Seitz at Google, as I’ve long admired his wide-ranging work (from Photosynth to Face Movies to the company’s new 3D video collaboration tech). Here he provides a pretty accessible overview of how large language models (e.g. those behind DALL•E & similar systems) actually work: