Shared functional specialization in transformer-based language models and the human brain

Sreejan Kumar, Theodore R. Sumers, Samuel A. Nastase · 2024 · View original paper

← Back to overview
Evidence (4)
Representational Structure # Continue PAPER_TPL AI
Embeddings act as a residual stream accumulating context; headwise self-attention transformations inject contextual information that is nonlinearly fused into embeddings.
"Embeddings represent the contextualized semantic content, with information accumulating across successive layers as the Transformer blocks extract increasingly nuanced relationships between tokens... The second set of features we extract are the headwise “transformations” (Eq. 1), which capture the contextual information introduced by a particular head into the residual stream prior to the feedforward layer (MLP)."
Transformer-based features, p. 12
This clarifies the internal representational structure: a residual embedding stream that accumulates context and per-head transformations that introduce context-specific updates at each layer, an architecture-level analogue of representational subspaces and feature geometry relevant to consciousness-inspired analyses in AI and the brain .
"Although the transformations at a given layer are “cued” by the embedding arriving from the previous layer, they are not derived from this embedding; similarly, the transformations are nonlinearly fused with the content of the output embedding... Embeddings are sometimes referred to as the “residual stream”: the transformations at one layer are added to the embedding from the previous layer, so the embeddings accumulate previous computations that subsequent computations may access."
Transformer-based features, p. 12
The paper distinguishes two representational objects—embeddings and headwise transformations—and how they are integrated, which maps cleanly onto analyses of representational subspaces, superposition, and low-rank integration in current AI-meets-neuroscience frameworks .
Limitations: This evidence is model-internal and descriptive; it does not by itself establish which of these representational components are necessary for conscious access or report in humans.
Information Integration # Continue PAPER_TPL BIO
Contextual Transformer features (embeddings and headwise transformations) predict fMRI responses across cortical language areas during naturalistic story listening, outperforming classical linguistic features.
"First, we confirmed that Transformer embeddings and transformations outperform classical linguistic features in most language ROIs (p < 0.005 in HG, PostTemp, AntTemp, AngG, IFG, IFGorb, vmPFC, dmPFC, and PMC for both embeddings and transformations; permutation test; FDR corrected; Table S1)."
Comparing three classes of language models across cortical language areas, p. 4
This shows broad, system-wide integration: context-enriched model features explain variance across a distributed fronto-temporal language network, consistent with integrated representations spanning multiple regions in the human brain during language comprehension .
"We adopted a model-based encoding framework... to map Transformer features onto brain activity measured using fMRI while subjects listened to naturalistic spoken stories."
Functional anatomy of a Transformer model, p. 2
Naturalistic, continuous narratives induce distributed brain activity that can be predicted by integrated, context-sensitive model features, linking distributed elements to unified cortical responses during ongoing cognition .
Figures
Fig. 2 (p. 4) : Figure 2 visualizes that contextual Transformer features robustly predict responses across language ROIs, supporting system-wide integration across the cortical language network via context-sensitive representations .
Limitations: Encoding-model predictivity under naturalistic listening is correlational; it demonstrates shared information but not causal necessity for conscious access.
Selective Routing # Continue PAPER_TPL BIO
Headwise (per-attention-head) transformations show structured correspondences with cortical parcels and syntactic dependencies; shuffling features across heads abolishes this structure.
"Headwise correspondence between dependencies and ROIs indicates that attention heads containing information about a given dependency also tend to contain information about brain activity for a given ROI—thus linking that ROI to the computation of that dependency."
Interpreting transformations via headwise analysis, p. 8
Different attention heads (selective routes) preferentially map to different cortical computations, suggesting a gating-like correspondence between head-specific routing and regional processing in the human language network .
"After this perturbation, the first two PCs accounted for only 17% of variance... and look-back distance... reduced... This control analysis indicates that the structure observed in Fig. 4 does not arise trivially, and results from the grouping of transformation features into functionally specialized heads; transformation features map onto brain activity in a way that systematically varies head by head, and shuffling features across heads (even within layers) disrupts this structure."
Interpreting transformations via headwise analysis, p. 6
Ablating head identity by shuffling disrupts the mapping, implying that selective routing by heads is functionally meaningful for the brain–model correspondence rather than an artifact of pooling .
Figures
Fig. 4 (p. 7) : Fig. 4 shows organized headwise structure (layer and look-back gradients) that disappears when head assignments are shuffled, consistent with selective routing by attention heads mapping to cortical organization .
Fig. 5 (p. 8) : Fig. 5 links specific syntactic dependencies to particular ROIs via headwise correspondences, supporting targeted, selective information flow between model heads and cortical regions .
Limitations: Head–ROI correspondences are correlative and depend on model training; causal routing in the brain is not manipulated.
Temporal Coordination # Continue PAPER_TPL AI
Headwise ‘backward attention distance’ quantifies how far back in the token stream heads integrate information; cortical mappings show gradients tied to look-back distance.
"To generate the “backward attention” metric (Fig. 4)... for each TR selected the 128 tokens preceding the end of the TR... We extracted each head’s matrix of token-to-token attention weights... multiplied each token-to-token attention weight by the distance between the two tokens... averaged this metric over all TRs... to obtain the headwise attention distances."
Transformer-based features, p. 11
The paper defines a timing-sensitive measure of how attention heads integrate across time (tokens), enabling temporal coordination analyses of routing scales and their cortical correlates .
"We observed a strong gradient of look-back distance increasing along PC2 (Fig. 4E)... the upper quartile of headwise attention distances exceeds 30 tokens, corresponding to look-back distances on the scale of multiple sentences."
Interpreting transformations via headwise analysis, p. 6
Temporal coordination emerges as graded headwise look-back ranges that map systematically across cortical parcels, suggesting organized timing windows linking model attention dynamics to brain organization .
Figures
Fig. 4 (p. 7) : Fig. 4E visualizes headwise timing preferences (look-back distances), tying temporal coordination in model attention to spatial organization across language parcels .
Limitations: Look-back distance is defined in token space rather than neural time in milliseconds; linking these scales requires further temporal-resolution data (e.g., MEG/iEEG).