Skip to main content

Dec 13, 2021

What functions do self-attention blocks prefer to represent?

Date: December 13, 2021 | 4:00 pm – 6:00 pm
Speaker: Surbhi Goel, Microsoft
Location: Zoom Link: https://istaustria.zoom.us/j/94397239114?pwd=Q0JDSTg1bkpVUDc5TXlZWG1paWpUdz09 Meeting ID: 943 9723 9114 Passcode: 621023
Language: English

Self-attention, an architectural motif designed to model long-range interactions in sequential data, has driven numerous recent breakthroughs in natural language processing and beyond. In this talk, we will focus on studying the inductive bias of self-attention blocks by rigorously establishing which functions and long-range dependencies they statistically represent. Our main result shows that bounded-norm Transformer layers can represent sparse functions of the input sequence, with sample complexity scaling only logarithmically with the context length. Furthermore, we propose new experimental protocols to support this analysis, built around the large body of work on provably learning sparse Boolean functions.

Based on joint work with Benjamin L. Edelman, Sham Kakade and Cyril Zhang.

More Information:

Date:
December 13, 2021
4:00 pm – 6:00 pm

Speaker:
Surbhi Goel, Microsoft

Location:
Zoom Link: https://istaustria.zoom.us/j/94397239114?pwd=Q0JDSTg1bkpVUDc5TXlZWG1paWpUdz09 Meeting ID: 943 9723 9114 Passcode: 621023

Language:
English

Contact:

Ksenja Harpprecht

Share

facebook share icon
twitter share icon


sidebar arrow up
Back to Top