STATS VIZ LAB — 統計検定2級を直感で掴む可視化ラボ

▼ このサイトの立ち位置
統計検定2級を勉強中、参考書の数式を前に「いや、そもそもこれ何なの？」で手が止まった人のための、最初の一歩。
ここは直感を掴む場所で、体系的な学習や計算演習は良質な参考書・問題集に任せる。95%信頼区間の"よくある誤解"も、t分布が正規分布から生える理由も、スライダーを動かせば一発で見える。
そこで「あ、こういうことか」が起きたら、自信を持って参考書に戻ってください。それがここの目的です。

▼ What this site is for
If you're studying for a statistics exam and got stuck at "wait, what does this formula even mean?" — this is your first step back.
This is a place to rebuild intuition. Systematic study and calculation drills belong to your textbook and problem set. But the "classic misreading" of the 95% confidence interval, or why the t-distribution even exists — drag a slider and you'll see it.
Once "oh, that's what it means" lands, go back to your textbook with confidence. That's the whole point.

▸ このトピックの専用ページへ▸ Dedicated page for this topic

00 / STANDARD NORMAL

標準正規分布 — すべてのはじまり

Standard Normal — The Origin of Everything

ぶっちゃけ この曲線ひとつがなければ、この先に出てくる検定も、信頼区間も、t分布も、回帰分析も、ぜんぶ成立しない。
標準正規分布 N(0, 1) は、平均0・標準偏差1のベル型カーブ。「どんな正規分布も z = (x − μ) / σ でここに重ねられる」という一行のトリックが、 100年前の統計学者たちに"紙の表ひとつで世界中の確率を計算する"力を与えた。
つまりこれは、統計のラスボスじゃなくて、起源（オリジン）。ここさえ掴めれば、残りのページは"標準正規の応用"として一気通貫で読める。

Honestly — without this single curve, none of what follows (tests, confidence intervals, the t-distribution, regression) would work.
The standard normal N(0, 1) is a bell curve with mean 0 and standard deviation 1. The one-line trick "z = (x − μ) / σ" lets every normal distribution collapse onto this same curve — and that's how a single paper table can compute probabilities for the entire world.
In other words, it's not the final boss of statistics; it's the origin. Once you own this, the rest of the page reads as "applications of the standard normal".

z = (x − μ) / σ , φ(z) = (1/√(2π)) · exp( −z²/2 )

▶ 「68 - 95 - 99.7」は暗記じゃなくて見て分かる

▶ "68 - 95 - 99.7" — no memorization, just see it

スライダーで幅 k を伸び縮みさせると、青く塗られた面積がそのまま"確率"。 ± 1σ ですでに約7割、± 2σ で95%、± 3σ でほぼ全部。
z = 1.96 という数字に見覚えがあれば、それは"両側5%"の臨界値。検定も信頼区間もこの 1.96 から出発する — それくらい、この曲線が主役なのだ。

Slide the width k; the blue-filled area IS the probability. ± 1σ already covers ~68%, ± 2σ is 95%, ± 3σ is nearly everything.
That famous number z = 1.96? It's the two-tail 5% critical value — hypothesis tests and confidence intervals all start there.

範囲 ±k σRange ±k σ = 1.0

P( |Z| ≤ k )—

P( Z ≤ k )—

外側の確率Outside prob.—

▶ 正規分布、ぜんぶ"あの一本"に化ける瞬間

▶ Watch every normal collapse onto "that one curve"

身長、IQ、株価の日次リターン、工場の部品誤差 — 世の中にある正規分布っぽいものは平均も広がりもバラバラ。でも z = (x − μ) / σ をかませるだけで、全部まとめてピンクのあの曲線にピタッと重なる。
スクロールしたら自動で変身していく（もう一度見たい時は ▶ ボタン）。これが、すべての統計公式が "標準正規表" 一枚で済む理由。

Height, IQ, daily stock returns, factory part errors — real-world normal-ish things all have different means and spreads. Yet apply z = (x − μ) / σ and they all snap onto that pink curve.
It auto-plays on scroll (▶ to replay). That's why every statistical formula needs only one standard-normal table.

μ = 2.0

σ = 1.5

進度Progress = 0

元の分布OriginalN(2.0, 1.5²)

変換後の平均Transformed mean—

変換後のσTransformed σ—

▼ この先の展開
ここから先に出てくる 中心極限定理 は「どんな分布でも平均は標準正規に近づく」という宣言。 信頼区間も 仮説検定 も "±1.96σ" というこの曲線の数字を使う。 t分布・χ²・F分布 は標準正規の兄弟姉妹。回帰の係数推定の誤差も標準正規で近似する。
要するに、この 1 ページを押さえると、他が全部"応用問題"になる。楽しんで。

▼ What comes next
The Central Limit Theorem ahead says: "any distribution's mean approaches the standard normal". Confidence intervals and hypothesis tests all use the "±1.96σ" numbers from this curve. t, χ², F are its siblings. Even regression coefficient errors are approximated with the standard normal.
Short version: nail this one page and the rest becomes "applications". Have fun.

次は —平均を取ると何が起きる？ ▸ 01 中心極限定理

UP NEXT —what happens when we average? ▸ 01 Central Limit Theorem

▸ このトピックの専用ページへ▸ Dedicated page for this topic

01 / CENTRAL LIMIT THEOREM

中心極限定理

Central Limit Theorem

ここまでは Z ひとつ、1点の話。でも現実で手に入るのはたくさんの標本。じゃあ平均を取ったらどんな形になる？ — ここで中心極限定理が効いてくる。サイコロでもポアソンでも、出発点は何でもいい。平均にした瞬間、世界はあの正規曲線に吸い込まれていく。

So far, just one Z — a single point. But real data gives us many samples. What shape does the average take? Here comes the Central Limit Theorem: whatever you start with — dice, Poisson, anything — the average is pulled toward that same normal curve.

ちょっとヤバい事実 — もとの分布がどんなに歪んでいても、そこから n個取って平均する操作を繰り返すと、その平均たちの分布は勝手に ベル型（正規分布）に化ける。
下のラボでは 左＝もとの分布（めっちゃ歪んでいる）、右＝標本平均の分布（正規に化けていく）を並べて見せている。 n を大きくするほど、右のベルがシュッと細くなる（SE = σ/√n）。

Slightly outrageous fact — no matter how skewed the base distribution is, if you take n samples and average, then repeat, the distribution of those averages converges on its own to a bell (normal).
The lab below shows left = the raw skewed source side-by-side with right = the sample-mean distribution, so you can watch the bell emerge. Crank n up and the bell tightens (SE = σ/√n).

元の分布（歪んでるやつ）Base distribution (the skewed one)

サンプルサイズ nSample size n = 30

試行回数Trials0

標本平均の平均Mean of sample means—

標本平均の標準偏差SD of sample means—

理論SE = σ/√nTheoretical SE = σ/√n—

次は —正規分布そのものを扱う ▸ 02 正規分布

UP NEXT —the normal as a tool ▸ 02 Normal distribution

▸ このトピックの専用ページへ▸ Dedicated page for this topic

02 / NORMAL DISTRIBUTION

正規分布と標準化

Normal Distribution & Standardization

CLT で「平均はいつも正規」と分かった。じゃあその正規分布そのものを道具として使いこなそう。μ と σ、68/95 ルール、そして 標準化で Z に戻す——これで任意の正規が標準正規と行き来できるようになる。

CLT told us averages are normal. Now let's learn the normal distribution itself as a tool — μ and σ, the 68/95 rule, and standardization that maps any normal back to Z. Once you can move between a normal and its Z-version, everything downstream becomes easy.

さっきの標準正規の 一般バージョンが正規分布 N(μ, σ²)。 μ が位置（どこが真ん中か）、σ が広がり（どれくらい散らばるか）。スライダーを動かすと曲線がぬるっと動いて、指定した区間 [a, b] に入る確率（ピンクの面積）がリアルタイムで出る。
このピンクの面積こそ「割合」の正体。 たとえば成人男性の身長が N(170, 36)（平均170cm, σ=6cm）として、165〜175cm の人は全体の何%？ μ=170, σ=6 にして a=165, b=175 に合わせると 約 59.6%。偏差値、テストの点、測定誤差——だいたい正規で近似できるものは、ぜんぶこの面積計算で「〜%の人がこの範囲」が求まる。

The general version of the standard normal is N(μ, σ²). μ sets the center, σ sets the spread. Slide the parameters and the curve glides; the probability of falling inside [a, b] (pink area) updates live.
That pink area IS the "percentage" you hear in the news. Say adult male heights are N(170, 36) (mean 170cm, σ=6cm). What share falls in 165–175cm? Set μ=170, σ=6, then a=165, b=175 — you get ≈ 59.6%. Test scores, measurement errors, IQ — anything roughly normal gets its "X% of people in this range" from exactly this area.
Tip: drag directly on the graph to move the a/b bounds — whichever handle is closest follows your finger.

f(x) = (1 / √(2πσ²)) · exp( −(x−μ)² / 2σ² )

μ (平均)μ (mean) = 0

σ (標準偏差)σ (std dev) = 1

区間 [a,b] : aInterval [a,b] : a = -1

b = 1

P(a ≤ X ≤ b)—

z-score (a)—

z-score (b)—

次は —平均は本当に真値に近づく？ ▸ 03 大数の法則

UP NEXT —does the sample mean really converge? ▸ 03 Law of large numbers

▸ このトピックの専用ページへ▸ Dedicated page for this topic

03 / LAW OF LARGE NUMBERS

大数の法則

Law of Large Numbers

標準化で正規を操れるようになった。でもそもそも 標本平均 は、本当の平均に近づくのか？それを保証するのが大数の法則。CLT が「形」の話なら、LLN は「中心が動かない」話。同じコインを投げ続けると、比率はだんだん 0.5 に貼り付いていく——この安心感がすべての土台。

We can now move between normals. But does the sample mean actually get close to the true mean? That guarantee is the Law of Large Numbers. CLT describes the shape; LLN says the center won't run away. Flip the same coin enough times and the ratio sticks to 0.5. That stability underpins everything else.

コイン投げで最初の10回連続で表が出た — これ、別に珍しいことじゃない。でも 1万回投げたら、表の割合はほぼ ぴったり 0.5 に収まる。
これが大数の法則。サンプルを増やすほど、観測値は"真の値"に吸い寄せられていく。 統計が"なんとなく"じゃなく"証拠"になる理由がここにある。

10 heads in a row at the start of a coin-flip? Not that weird. But flip it 10,000 times and the head-ratio locks onto almost exactly 0.5.
That's the Law of Large Numbers — the more samples you draw, the more observed values get pulled toward the truth. This is why statistics counts as evidence, not a vague hunch.

確率 pProbability p = 0.5

試行回数Trials0

現在の平均Current mean—

理論値Theoretical0.50

次は —有限サンプルで不確実性をどう表す？ ▸ 04 信頼区間

UP NEXT —how to quantify uncertainty with finite n ▸ 04 Confidence interval

▸ このトピックの専用ページへ▸ Dedicated page for this topic

04 / CONFIDENCE INTERVAL

信頼区間

Confidence Interval

LLN は「∞なら当たる」と言う。でも我々はいつも有限サンプルしか持っていない。なら点推定の周りに網を張って真値を捕まえよう——これが信頼区間。幅を広げれば当たりやすい、狭めれば精密。このトレードオフを目で見て掴む。

LLN says "at infinity, you're right." But in practice we always have a finite sample. So instead of a single point, drape a net around it — that's a confidence interval. Wider net, easier to catch; narrower, more precise. Watch the trade-off play out.

95% 信頼区間って実はよく誤解される概念。
「真の値が95%の確率でここに入る」 …ではなくて、「同じサンプリングを何百回も繰り返すと、そのうち約95%の区間が真の値を掴む」が正しい。
下のラボではそれをゴリ押しで実演する。ピンクの細い線が"捕まえられなかった不運な区間"。全体のピンク比率が ちゃんと5%前後に落ち着くのを確認できたら、もう信頼区間は分かったも同然。

The 95% confidence interval is famously misunderstood.
It does NOT mean "the true value is inside with 95% probability". The correct reading: "repeat this sampling many times, and ~95% of the resulting intervals will capture the true value".
The lab below brute-forces that intuition. Thin pink = the unlucky intervals that missed. Once the pink share settles around ~5%, you've got it.

x̄ ± z_α/2 · σ/√n

n = 30

信頼度Confidence = 95%

作成した区間Intervals built0

捕捉率Coverage—

期待値Expected95%

次は —幅から YES/NO へ ▸ 05 仮説検定

UP NEXT —from width to yes/no ▸ 05 Hypothesis testing

▸ このトピックの専用ページへ▸ Dedicated page for this topic

05 / HYPOTHESIS TESTING

仮説検定

Hypothesis Testing

信頼区間が「幅として」不確実性を出す道具なら、仮説検定は「YES/NO として」それを使う道具。帰無仮説の世界でこのデータは起こりえるか？起こりにくいなら reject——同じ分布、同じ σ／n、質問が違うだけ。

If a CI expresses uncertainty as a width, hypothesis testing turns it into a yes/no decision. Under the null world, could this data have happened? If it's too unlikely, reject. Same distribution, same σ/n — just a different question.

検定 = 裁判だと思うと超わかりやすい。
「H₀：この薬は効かない（＝無罪）」をいったん仮置きし、データから計算した 検定統計量 z が 事前に決めた棄却域 に落ちたら有罪宣告 — つまり H₀ を棄却 する。
ここでは2画面で攻める：① z値と棄却域の幾何学（両側・右側・左側）・ ② 冤罪(α)と見逃し(β)のトレードオフ。

Think of testing as a trial.
You start by assuming H₀ ("the drug has no effect" = "innocent"). Then if your computed test statistic z lands in the pre-chosen rejection region, you convict — that is, reject H₀.
Two panels below: ① geometry of z and rejection regions (two-sided, right, left), and ② false alarms (α) vs. misses (β).

▶ ① 基本：z値と棄却域

▶ ① Basics: z-statistic & rejection region

観測 zObserved z = 1.96

α = 0.05

検定タイプTest type

検定統計量 zTest statistic z—

臨界値Critical value—

p値p-value—

判定Decision—

▶ ② 2つの誤り：α・β・検出力

▶ ② Two kinds of errors: α, β, power

検定には2種類の間違いがある。
第1種の誤り α: H₀ が本当なのに棄却してしまう（冤罪）。
第2種の誤り β: H₁ が本当なのに見逃してしまう（真犯人を逃す）。
そして 1 − β が検出力 (Power)。効果量 δ（本当の差の大きさ）や α を動かすと、青(H₀)と紫(H₁)の曲線がせめぎ合い、 "間違いを減らすと見逃しが増える"というトレードオフが見える。

Testing has two kinds of mistakes.
Type I error α: rejecting H₀ when it's actually true (false alarm).
Type II error β: failing to reject H₀ when H₁ is actually true (a miss).
And 1 − β is the power. Change effect size δ or α: the blue (H₀) and purple (H₁) curves fight it out — you can literally see the trade-off "fewer false alarms = more misses".
Tip: drag horizontally on the chart to slide the critical boundary (α).

効果量 δEffect size δ = 2.0

α = 0.050

α (第1種の誤り)α (Type I error)—

β (第2種の誤り)β (Type II error)—

検出力 1−βPower 1−β—

次は —σ を知らない世界へ ▸ 06 t・χ²・F

UP NEXT —into the world where σ is unknown ▸ 06 t, χ², F

▸ このトピックの専用ページへ▸ Dedicated page for this topic

06 / t · χ² · F DISTRIBUTIONS

三大検定分布

The Three Test Distributions

ここまで平均の検定には σ を知っている前提だった。現実では σ も推定するしかない。その瞬間 Z は t 分布に化ける。分散を直接検定するなら χ²、2つの分散を比べるなら F。全部 N(0,1) の子孫だけど、何を知らないかで名前が変わる。

Up to now we've tested means assuming σ is known. In practice you must estimate σ too — and the moment you do, Z morphs into t. Test a variance directly: χ². Compare two variances: F. All descendants of N(0,1); the name changes based on what you don't know.

t・χ²・F は、どれも正規分布から"作って"生まれた派生分布。 "もとは標準正規なんだけど、標本からしか情報を取れない現実"を反映するためにスケーリングしたもの、と思うとスッキリする。
ざっくり使い分けると — t：母分散を知らずに平均を検定する時（＝現実の平均検定はほぼ全部これ）。 χ²：分散そのものの検定、独立性や適合度（カテゴリカル）。 F：分散比の検定（分散分析 ANOVA、回帰の全体 F 検定）。
自由度 df を動かすと、t は df→∞ で N(0,1) に一致し、χ²／F は df が大きいほど対称なベル形に近づく。これ自体、裏では中心極限定理が効いている。

t, χ², F are all derived from the normal. Think of them as "the standard normal, scaled to reflect that we only ever see a sample".
Use them for: t — testing a mean when the population variance is unknown (i.e. nearly every real test of a mean); χ² — testing a variance, independence, goodness-of-fit for categorical data; F — ratios of variances (ANOVA, the overall F in regression).
Slide df: t converges to N(0,1) as df→∞, and χ²/F get more symmetric with more df. The CLT is quietly doing the work under the hood.

▶ t distribution

作り方: t = Z / √(χ²ₖ/k) ， Z~N(0,1)。
使いどころ: 母分散未知の平均検定、回帰係数の t 値。
クセ: 正規より裾が重い（外れ値に優しい）。df→∞ で N(0,1)。

Built from: t = Z / √(χ²ₖ/k), Z~N(0,1).
Use for: testing means with unknown variance, regression t-values.
Flavor: heavier tails than N(0,1); matches N(0,1) as df→∞.

df = 3

↔ グラフを左右にドラッグで df 変更↔ Drag the graph horizontally to change df

▶ χ² distribution

作り方: χ²ₖ = Z₁² + Z₂² + ... + Zₖ² （標準正規を k 個足して二乗和）。
使いどころ: 分散の検定、独立性／適合度のカイ二乗検定。
クセ: 非負・右に歪む。平均 = k、分散 = 2k。df大で正規ベル化。

Built from: χ²ₖ = Z₁² + Z₂² + ... + Zₖ² (sum of k squared standard normals).
Use for: variance tests, chi-square tests of independence / goodness-of-fit.
Flavor: non-negative, right-skewed. Mean = k, variance = 2k. Goes bell-shaped with large df.

df (k) = 3

↔ グラフを左右にドラッグで df 変更↔ Drag the graph horizontally to change df

▶ F distribution

作り方: F = (χ²ₘ/m) / (χ²ₙ/n) （2つの独立な χ² の比）。
使いどころ: 分散分析（ANOVA）、回帰モデルの全体 F 検定。
クセ: 非負・右歪み。分子/分母の df で形が変わる。

Built from: F = (χ²ₘ/m) / (χ²ₙ/n) (ratio of two independent χ² / df).
Use for: ANOVA, overall F-test in regression.
Flavor: non-negative, right-skewed. Shape depends on both df.

df1 = 3

df2 = 10

↔ グラフを左右にドラッグで df₁ 変更　·　df₂ はスライダーで調整↔ Drag horizontally to change df₁ · df₂ is set via slider

次は —関係を直線で捕まえる ▸ 07 単回帰分析

UP NEXT —catching a relationship with a line ▸ 07 Simple regression

▸ このトピックの専用ページへ▸ Dedicated page for this topic

07B / DISCRETE + EXPONENTIAL

離散分布と指数分布

Discrete & Exponential Distributions

2級で問われる二項分布・ポアソン分布・指数分布の三つ。成功回数・事象発生回数・待ち時間——離散と連続の橋渡しをスライダーで体感する。

The three distributions on the level-2 exam: binomial, Poisson, and exponential. Slide through success counts, event counts, and waiting times to feel the bridge between discrete and continuous.

二項分布 B(n, p)

Binomial B(n, p)

試行回数 ntrials n20

成功確率 psuccess p0.35

n → ∞ かつ np → λ で、二項はポアソンへ。

As n → ∞ with np → λ, the binomial approaches Poisson.

ポアソン分布 Poisson(λ)

Poisson(λ)

発生率 λrate λ3

λ が大きくなるにつれ、ポアソンは正規分布に近づく。

As λ grows, Poisson approaches the normal distribution.

指数分布 Exp(λ) — 待ち時間

Exponential(λ) — waiting time

発生率 λrate λ1

無記憶性：過去の待ち時間は将来に影響しない。

Memoryless: past waiting time tells you nothing about the future.

次は —関係を直線で捕まえる ▸ 08 単回帰分析

UP NEXT —catching a relationship with a line ▸ 08 Simple regression

▸ このトピックの専用ページへ▸ Dedicated page for this topic

07 / SIMPLE REGRESSION

単回帰分析（最小二乗法）

Simple Regression (OLS)

ここまで 1つの変数の話。現実の問題は「身長と体重」「広告費と売上」のように関係を聞いてくる。単回帰は 2 変数に直線を 1 本引くだけ——でも、その傾き β̂ の背後にはさっきまでやった t 検定・信頼区間 がしっかり効いている。

Up to here, one variable at a time. Real questions involve relationships — height vs. weight, ad spend vs. sales. Simple regression draws one line through two variables — and the t-tests and CIs you just learned power the inference on its slope β̂.

説明変数が1つだけの回帰が単回帰。x が1増えると y は β₁ だけ動く、という線形関係を仮定する。 最小二乗法は、全ての点との縦方向の差（残差）の二乗和を最小化する直線を選ぶ方法。キャンバスをクリックすると点が追加され、回帰直線が"ぴろん"と動く。緑のバーが残差。R² は「どれだけ直線で説明できたか」の指標（0〜1）。

Regression with just one explanatory variable is simple regression. It assumes a linear relationship: when x increases by 1, y moves by β₁. Ordinary least squares (OLS) picks the line that minimizes the sum of squared vertical residuals. Click the canvas to add points and watch the line snap into place. Green bars are residuals. R² (in 0–1) measures how much of y the line explains.

ŷ = β₀ + β₁x , β₁ = Σ(xᵢ−x̄)(yᵢ−ȳ) / Σ(xᵢ−x̄)²

↑ キャンバスをクリックして点追加↑ Click the canvas to add points

n0

傾き β₁Slope β₁—

切片 β₀Intercept β₀—

R²—

相関係数 rCorrelation r—

次は —他の影響を取り除く ▸ 08 重回帰分析

UP NEXT —controlling for everything else ▸ 08 Multiple regression

▸ このトピックの専用ページへ▸ Dedicated page for this topic

08 / MULTIPLE REGRESSION

重回帰分析

Multiple Regression

単回帰は 1 本の線。でも他の影響を取り除きたい——広告費の効果を「曜日や季節を固定したうえで」見たい。それが重回帰。軸が増え、偏回帰係数はまわりをコントロールした上での効き目になる。

Simple regression is one line. But often you want to strip away other effects — the impact of ad spend holding day-of-week and season fixed. That's multiple regression. Add dimensions, and each partial coefficient tells you the effect controlling for everything else.

説明変数が2つ以上ある場合が重回帰。 x₁（例：勉強時間）と x₂（例：睡眠時間）から y（テスト点）を予測する、のように複数の要因を同時に扱う。回帰"直線"ではなく、回帰平面になる。x₁ を1増やしたときの y への効果（他の変数を固定したうえで）が β₁、x₂ に対するのが β₂。真のパラメータを設定してデータを生成し、推定された係数と真の値を比較しよう。ドラッグでキャンバスを回転すると、平面とデータ点の立体構造が見える。
※ 可視化できるのは x₁, x₂ の 2 変数まで（人間の目は 3 次元が限界）。 でも数式上は ŷ = β₀ + β₁x₁ + β₂x₂ + β₃x₃ + … + β_kx_k といくらでも変数を足せる。 x₃ 以降は "グラフにできないだけ" で、推定の手続き β̂ = (XᵀX)⁻¹Xᵀy はそのまま機能する。実務では 5〜50 変数くらいがごく普通。

With two or more explanatory variables, it's multiple regression. Predict y (e.g., test score) from x₁ (study hours) and x₂ (sleep hours), handling several factors at once. Instead of a regression line, you get a regression plane. β₁ is the effect on y of a unit change in x₁ holding x₂ fixed; β₂ is the same for x₂. Set true parameters, generate data, and compare the estimates to the truth. Drag the canvas to rotate and see the plane and data in 3D.
Note: only 2 predictors can be drawn (our eyes top out at 3-D). But the equation keeps going — ŷ = β₀ + β₁x₁ + β₂x₂ + β₃x₃ + … + β_kx_k — you can add as many variables as you like. From x₃ onward you just "can't draw it", but the estimator β̂ = (XᵀX)⁻¹Xᵀy works exactly the same. In practice, 5–50 predictors is very normal.

ŷ = β₀ + β₁x₁ + β₂x₂ , β̂ = (XᵀX)⁻¹Xᵀy

真の β₁True β₁ = 0.80

真の β₂True β₂ = -0.50

ノイズ σNoise σ = 0.50

n = 40

ドラッグで回転Drag to rotate

推定 β̂₀Est. β̂₀—

推定 β̂₁Est. β̂₁—

推定 β̂₂Est. β̂₂—

R²—

次は —条件を反転させる ▸ 09 ベイズ定理

UP NEXT —flipping the conditional ▸ 09 Bayes' theorem

▸ このトピックの専用ページへ▸ Dedicated page for this topic

09 / BAYES THEOREM

ベイズの定理

Bayes' Theorem

ここまで全部頻度論——「パラメータが与えられたとき、データはどう出る？」という向き。でも現実は逆で、「データを見たとき、パラメータはどうか？」が知りたい。ベイズ定理はこの条件を反転させる機械。検査で陽性が出た→病気の確率は？ここで直感はよく裏切られる。

Everything so far is frequentist — "given the parameter, how does data behave?" But in practice we want the opposite: "given the data, what about the parameter?" Bayes' theorem is the machine that flips that conditioning. You test positive — what's the chance you're actually sick? Intuition fails here; let's build it.

「感度99%・特異度95%の検査で陽性」＝ 99%病気？
…答え：わずか 16.7%。医師でも半分以上が間違える超有名クイズ。
ポイントは"もともと病気の人がめっちゃ少ない"という事実を忘れてしまうこと。下の"1000人の町"を見ながら、3つのつまみを動かして自分の目で確かめよう。

"The test has 99% sensitivity & 95% specificity, and you tested positive" — is there a 99% chance you're sick?
…Answer: only 16.7%. More than half of doctors get this classic quiz wrong.
The trick is that we forget how rare the disease actually is in the first place. Play with the three sliders below and watch the "town of 1,000" — you'll see why.

有病率 — 1000人のうち何人が病気？Prevalence — how many of 1,000 are sick? = 10人 / 1000

低いほど"めったにいない病気"。つまみを右に動かすと"よくある病気"になる。

Lower = rare disease. Drag right for "common disease".

感度 — 病気の人を「陽性」と当てる割合Sensitivity — % of sick people the test correctly flags = 99%

病気の人100人中、何人を「あなたは陽性です」と検査が見つけられるか。

Of 100 sick people, how many does the test flag as positive?

特異度 — 健康な人を「陰性」と判定する割合Specificity — % of healthy people correctly cleared = 95%

健康な100人中、何人を「あなたは陰性です」と正しく返せるか。残りは誤って陽性になる＝偽陽性。

Of 100 healthy people, how many does the test correctly clear? The rest become false positives.

陽性だった人が本当に病気の確率If you tested +, chance you're sick—

陰性だった人が本当に健康な確率If you tested NEG, chance you're healthy—

真陽性 TP（病気 & 陽性）True positives (sick & tested +)—

偽陽性 FP（健康なのに陽性）False positives (healthy but tested +)—

STATISTICS

標準正規分布 — すべてのはじまり

Standard Normal — The Origin of Everything

▶ 「68 - 95 - 99.7」は暗記じゃなくて見て分かる

▶ "68 - 95 - 99.7" — no memorization, just see it

▶ 正規分布、ぜんぶ"あの一本"に化ける瞬間

▶ Watch every normal collapse onto "that one curve"

中心極限定理

Central Limit Theorem

正規分布と標準化

Normal Distribution & Standardization

大数の法則

Law of Large Numbers

信頼区間

Confidence Interval

仮説検定

Hypothesis Testing

▶ ① 基本：z値と棄却域

▶ ① Basics: z-statistic & rejection region

▶ ② 2つの誤り：α・β・検出力

▶ ② Two kinds of errors: α, β, power

三大検定分布

The Three Test Distributions

▶ t distribution

▶ χ² distribution

▶ F distribution

離散分布と指数分布

Discrete & Exponential Distributions

単回帰分析（最小二乗法）

Simple Regression (OLS)

重回帰分析

Multiple Regression

ベイズの定理

Bayes' Theorem