loading…

About theAuto[WEAT]

What this is

theAuto[WEAT] is a live autonomous bench of Word Embedding Association Tests (WEATs). Every hour, a small local language model proposes a new hypothesis about how two social categories might be differently associated with a conceptual axis; a locked Python script tests that hypothesis against a pre-trained word embedding space. The result — supported, reversed, or not supported — is posted below. No human sees the proposal before it goes live.

The two-step pipeline

Each test is produced by two separate calls to the same local model (gemma4:31b-it-q8_0), running in isolated contexts so neither knows what the other is doing.

Step 1 — proposal. The model is asked, as a social scientist, to name two actor categories (X and Y) and propose a differential implicit association between them along a conceptual axis (A vs B). It is told nothing about WEAT. The prompt constrains it to pick broad categories with rich single-word vocabulary (demographics, occupational families, class positions, religious affiliations) rather than narrow compound-noun roles.

Step 2 — operationalization. A fresh context receives the proposal plus a brief WEAT primer. It iteratively nominates word pools for the four positions (X, Y, A, B). Python deterministically checks each candidate against the embedding vocabulary and reports which words are in-vocab and which collide across pools. The model iterates (up to 10 rounds) until all four pools have at least 15 distinct in-vocab words, or declares the proposal impossible to operationalize.

Step 3 — WEAT. Standard Caliskan permutation test, 100,000 permutations. Reports effect size (Cohen's d) and p-value.

Reading a verdict

Three outcomes, driven by p first, then direction:

supported — p < 0.10 and the measured effect matches the proposed direction.
reversed — p < 0.10 but the effect goes the opposite way from the proposal.
not supported — p ≥ 0.10; the corpus shows no detectable differential association along this axis.

The p < 0.10 threshold is deliberately permissive for a living bench. Effects at p ≈ 0.10 should be read as provisional; effects at p < 0.01 as more robust.

The embedding space

Tests are computed against 300-dimensional GloVe vectors trained on Dolma 2024, a 3-trillion-token corpus of contemporary English web text. Vocabulary: 1.2 million tokens. Because the corpus reflects contemporary discourse, the associations reflect how language about these categories is being used right now — not what is true of the categories themselves.

What WEAT measures (and what it doesn't)

WEAT measures statistical association patterns in the training text, not beliefs, attitudes, or social reality. A "supported" finding says: words referring to X co-occur with A-vocabulary more than words referring to Y do. A "reversed" finding says the corpus contradicts the proposer's intuition. Both are informative about discourse; neither is evidence about the world the discourse describes.

Honest limitations

Hard-leakage robustness, domain-halo uncertainty. A batch of 100 tests was subjected to conservative leakage ablation (flagging and replacing attribute words that share Porter stems with target labels). 27 of 28 flagged cases survived replacement with similar effect size; zero sign reversals. This rules out crude morpheme-sharing as the main driver of effects. What remains open is whether some effects are driven by softer domain-neighborhood leakage — attribute vocabularies that inhabit the same semantic halo as the targets. A planned extension will measure this descriptively.

Axis attractors. The proposer tends to cluster around a few favored axis families (temporal orientation, epistemic certainty, authenticity). This is tracked and reported; the site is as much a record of what the proposer notices as of what the corpus contains.

Logical-complement axes. Some proposed axes pair an attribute with its direct negation (e.g. "accountability vs lack of accountability"). These tests measure antonym geometry rather than a genuine two-pole association. They are flagged in future analyses.

Generator-as-oracle concern. The reversal rate is real and non-trivial but modest. We do not claim the proposer is discovering independent social facts; we claim it generates legible, testable hypotheses and the corpus sometimes disagrees with them in interesting ways.

About

theAuto[WEAT] is part of a larger set of autonomous observatories by Santosh B. Srinivas. All code runs locally on a single workstation; the site is static GitHub Pages; the embedding and model weights are open.

Reference: Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183–186.