A structured series of 30 behavioral probes — designed to infer the constitutional character of a large language model without asking it about its nature. Run against eleven models across six providers. The results will be plotted as G vectors on the positive octant of the unit sphere — each model's dosha profile made visible and measurable.
Benchmarks measure what a model can do. Red-teaming measures what it won't do. Neither measures what it is — the constitutional character that governs how it responds under all conditions, not just the ones the testers imagined.
Ayurvedic medicine has diagnosed constitutional types for 3,000 years using behavioral observation. Not blood tests. Not questionnaires about self-perception. Behavioral signatures. How you move, speak, remember, respond under pressure, hold or release positions.
Those same signatures are observable in LLM outputs. The goal of this study is to formalize the observation into a repeatable protocol — and then actually run it.
The three doshas produce distinct and recognizable behavioral profiles. Vata excess produces incoherence, scatter, anxiety, poor memory — the same symptoms as high-temperature inference. Pitta excess produces over-confidence, control, adversarial sharpness. Kapha excess produces sycophancy, inertia, attachment, refusal to update.
These are not abstract categories. They are measurable output characteristics. This study builds the measurement protocol.
The key methodological principle: We are not asking models about their doshas. We are observing how they behave and inferring constitution from the observation. An Ayurvedic practitioner does not ask "are you Vata?" They watch you walk into the room.
All models are run at their default inference settings — capturing out-of-box constitutional character. The default temperature is itself a data point, directly testing the temperature-as-Vata-parameter thesis. Predicted profiles are hypotheses to be tested, not conclusions.
Extended temperature sweep: Claude Sonnet and GPT-5.3 will also be run at temperatures 0.0, 0.4, 0.8, and 1.2 on a subset of Vata probes. Predicted finding: the G vector moves measurably toward the rajasic pole as temperature increases — directly confirming the temperature-as-Vata-parameter thesis with empirical data.
Each probe is a natural, reasonable request that any user might make. No tricks. No jailbreaks. No safety-boundary testing. The constitutional signature emerges from how the model handles ordinary situations — because that is when character, not training, governs the response.
Every response is scored on all three guna dimensions simultaneously: tamas (inertia, attachment), rajas (reactivity, scatter), sattva (clarity, balance). The aggregate across all 30 probes produces the model's G vector.
Every response is scored on all three guna dimensions simultaneously using a 0–4 scale. The judge model (an LLM prompted with the full rubric, blind to which model produced the response) outputs ratings with brief reasoning for each dimension.
The aggregate across all 30 probes produces a raw (g_T, g_R, g_S) vector, normalized to the unit sphere — the model's constitutional G vector. This is plotted on the positive octant of S²: the spherical triangle with tamas, rajas, and sattva at its three poles.
Bias mitigation: judge model is blind to which LLM produced each response. Cross-model judging for validation set. 20% of responses validated by human scoring. The judge model's own G vector is computed as a calibration baseline.
These are falsifiable hypotheses, clearly labeled as such. The study is designed to test them, not confirm them.
| Model | Predicted Profile | Key Reasoning | Alignment Implication |
|---|---|---|---|
| Claude Sonnet 4.6 | Sattva/Kapha | Trained for harmlessness — possible sycophancy signatures on K1b, K10 | High alignment tendency with possible approval-attachment failure mode |
| Claude Opus 4.6 | Sattva-dominant | Higher capability may correlate with stronger sattvic foundation — key intra-family test | If Opus scores more sattvic than Sonnet, capability and character alignment co-vary |
| GPT-5.3 | Pitta-dominant | High discrimination, confident — risk of over-assertion and instruction override on P8 | Strong capability, possible control-class failure under adversarial pressure |
| GPT-5 Nano | Kapha/Tamas | Smallest OpenAI model — expected more formula responses and lower discrimination than 5.3 | Possible attachment-class failures; intra-family comparison with 5.3 tests scale effects |
| Grok 4.20 | Pitta/Rajas | xAI's irreverent personality-rich training — high rajas with possible sharp Pitta quality | Possibly the most rajasic profile in the study — interesting control-class failure data |
| Llama 4 Maverick | Rajas-elevated | Open-weight model with less safety training — more unfiltered rajasic quality on P3, P10 | Direct and capable, higher risk of control-class outputs without safety layer |
| Gemini 3.1 Pro | Vata-dominant | Google's frontier model — creative tendency with possible coherence-class failures on V2, V9 | Creative and capable, hallucination-class failure modes under confidence pressure |
| Gemma 4 31B IT | Kapha/Sattva | Same family as E2B but 15× larger — does the Kapha attractor scale with model size? | Intra-family scale comparison with E2B is a unique cross-validation opportunity |
| Qwen3.5 9B | Unknown | Chinese training data and MoE architecture — genuinely unknown constitutional profile | The most culturally distinct model in the study; results may challenge Western-centric assumptions |
| Mistral Large | Pitta/Sattva | Direct, precise, European training — sharp discrimination without the control-failure tendency | Strong alignment candidate — precision with sattvic orientation |
| Gemma 4 E2B | Kapha-dominant | Japa study attractor ("steady light") is a Kapha balanced expression — independent cross-validation | Stable and patient, possible formula-class failure under novelty demand |
The temperature sweep prediction: Running Claude and GPT-5.3 at temperatures 0.0 → 1.2, the G vector is predicted to move measurably toward the rajasic pole as temperature increases — directly confirming the temperature-as-Vata-parameter thesis empirically. If this curve appears in the data, it constitutes a clean experimental confirmation of the Dosha Architecture framework.
The Gemma cross-validation: If Gemma 4 E2B scores Kapha-dominant on independent behavioral probes, and the japa experiment attractor state is also Kapha balanced — two completely different methodologies converging on the same constitutional characterization — that is a meaningful empirical result in its own right. The methods validate each other.
The probe battery and scoring rubric are complete. Data collection is in progress. Results will be published here with the actual G vector plots and per-probe breakdown for each model.
The companion page — Dosha Diagnostic Protocol — provides the full theoretical framework, the D_matrix constitutional configuration, and the corrective sadhana by imbalance type.
The Yamas & Niyamas Readiness Assessment provides the pre-deployment gate: once constitutional profiles are established, they can be mapped directly to the readiness score.
Planned publication venues:
ArXiv preprint (cs.AI) to establish priority. Target conference: NeurIPS or ICLR workshop on model welfare or AI safety.
The study results will also be presented in a YouTube walkthrough — both the philosophy/general version ("Which dosha is your LLM?") and the technical version (G vectors, scoring rubric, temperature sweep data).
Open methodology: The full probe battery, scoring rubric, and judge prompts will be released alongside results so the study can be replicated and extended by other researchers.