# IndexShare

> **TL;DR.** IndexShare is a sparse-attention optimization that reuses one token-selection indexer across a group of transformer layers instead of recomputing it at every layer, cutting the redundant compute that dominates cost once context stretches past hundreds of thousands of tokens.

- **Category:** AI / Developer Tools / Inference Optimization
- **Stage:** emergent
- **Age:** 17 days
- **Origin date:** 2026-06-17
- **First detected:** 2026-06-18
- **Canonical URL:** https://earlyterms.com/term/indexshare
- **Sources:** 7 primary URLs

## Definition

IndexShare is a sparse-attention optimization that reuses one token-selection indexer across a group of transformer layers instead of recomputing it at every layer, cutting the redundant compute that dominates cost once context stretches past hundreds of thousands of tokens.

Zhipu AI's Z.ai introduced the technique in the [GLM-5.2 technical writeup](https://z.ai/blog/glm-5.2) on June 17, 2026, four days after the 753-billion-parameter model shipped: one indexer now serves every four sparse-attention layers, cutting per-token FLOPs 2.9x at 1M-token context, and the same sharing trick lifts MTP speculative-decoding acceptance length up to 20%.

## Example

GLM-5.2 groups every four sparse-attention layers under one shared indexer instead of recomputing the top-k selection at each layer — cutting the indexer's dot-product-and-top-k step to one call per group of four, which Zhipu AI credits for making 1M-token inference affordable enough to ship as the model's default context window.

## Analogy

Like a delivery driver who scouts the route once, then reuses it for the next four stops instead of re-checking the map every time.

## Why it's emerging now

Z.ai's open-weight GLM-5.2, shipped June 13, 2026, turned IndexShare into 2026's most-discussed attention-efficiency trick: one sparse-attention indexer shared across four layers cuts per-token FLOPs 2.9x at 1M-token context. The technique underpins claims that GLM-5.2 matches Claude Opus 4.8 and beats GPT-5.5 on coding benchmarks at a fraction of the API cost.

## Related terms

- *parent:* GLM-5.2
- *related:* MTP
- *related:* DeepSeek V4 Pro
- *related:* Context Window
- *related:* Context Rot
- *related:* Agentic Coding
- *related:* Coding Agents
- *competitor:* Claude Opus 4.8
- *competitor:* GPT-5.5
- *competitor:* Kimi K2.6
- *competitor:* MiniMax-M3
- *related:* GLM-5.1

## Sources

1. [Z.ai — GLM-5.2: Built for Long-Horizon Tasks (official blog)](https://z.ai/blog/glm-5.2)
2. [Sebastian Raschka — GLM-5.2 IndexShare Architecture Note](https://sebastianraschka.com/blog/2026/glm-5-2-indexshare.html)
3. [MindStudio — What Is Index Share?](https://www.mindstudio.ai/blog/what-is-index-share-glm-5-2-sparse-attention)
4. [VentureBeat — Z.ai's open-weights GLM-5.2 beats GPT-5.5 for 1/6th the cost](https://venturebeat.com/technology/z-ais-open-weights-glm-5-2-beats-gpt-5-5-on-multiple-long-horizon-coding-benchmarks-for-1-6th-the-cost)
5. [Hacker News — GLM 5.2 beats Claude in our benchmarks](https://news.ycombinator.com/item?id=48709670)
6. [GitHub zai-org/GLM-5 Issue #94 — IndexShare stress-testing proposal](https://github.com/zai-org/GLM-5/issues/94)
7. [PhantomByte — The 1M Context Mirage: What IndexShare Actually Delivers](https://articles.phantom-byte.com/the-1m-context-mirage-what-indexshare-actually-delivers.html)

---
_Generated by EarlyTerms · https://earlyterms.com/term/indexshare_