Probability Distributions
為什麼分布這麼重要?
幾乎所有統計推論、假設檢定、機器學習模型的基礎都建立在機率分布上。面試時能快速選出正確的分布,並解釋為什麼選它,是很強的信號。
Probability Basics
在談分布之前,先確認基礎概念扎實:
Random Variable
A random variable is a function that maps outcomes of a random experiment to numbers.
- Discrete: possible values are finite or countable — 擲骰子(1-6)、使用者點擊次數(0, 1, 2, ...)
- Continuous: possible values form an interval — 等待時間、身高、股票價格
Probability Rules
面試常在複合事件問題中考這些基礎規則:
Addition rule (mutually exclusive events):
General addition rule:
Multiplication rule (independent events):
Conditional probability:
面試經典:Independent vs Mutually Exclusive
Mutually exclusive:A 發生時 B 不可能發生 →
Independent:A 是否發生不影響 B 的機率 →
陷阱:互斥的事件不是獨立的!如果 A 發生了,B 一定不發生 → A 的發生影響了 B 的機率。
Law of Total Probability
Decompose an event into mutually exclusive and exhaustive scenarios:
這是 Bayes' theorem 的分母,也是面試中解 conditional probability 題目的關鍵工具。
PDF, PMF, and CDF
PDF vs PMF
| PMF (Probability Mass Function) | PDF (Probability Density Function) | |
|---|---|---|
| Applies to | Discrete random variables | Continuous random variables |
| Gives probability directly | Always equals 0 | |
| Value range | , can exceed 1 | |
| Compute probability | Summation | Integration |
PDF — for continuous random variables, describes the relative likelihood of near . The probability over an interval is the area under the curve:
PDF Value ≠ Probability
面試官常問:「PDF 在某一點的值代表機率嗎?」答案是不代表。PDF 的值可以大於 1(例如 Uniform(0, 0.5) 的 PDF = 2),只有積分才是機率。連續分布在任何單點 。
CDF
Cumulative Distribution Function — , the accumulated probability up to :
CDF 的三個性質(面試常考):
- ,
- is non-decreasing
Expected Value and Variance
Expected Value
Expected value (mean) — the theoretical long-run average:
Linearity of expectation (no independence required!):
這是面試中解題的利器。例如「10 個人隨機戴帽子,期望有幾人戴到自己的帽子?」— 用 linearity 拆成 10 個 indicator variables,每個的期望是 ,答案是 。
Variance and Standard Deviation
Variance — measures the spread of data around the mean:
Shortcut formula 在推導和面試中非常常用。
Variance properties:
如果 獨立,,所以 。
Covariance and Correlation
Covariance — measures how two variables move together:
Correlation — standardized covariance, bounded in :
Correlation ≠ Independence
只代表沒有線性關係,不代表獨立。例如 ,:(沒有線性相關),但 完全由 決定(強烈非線性相關)。
反過來:獨立一定代表 。
Chebyshev's Inequality
For any random variable with mean and finite variance :
不需要知道 distribution 的 shape — 只需要 mean 和 variance 就能 bound probability。
| At most this fraction outside | Compare with Normal | |
|---|---|---|
| 1 | 100%(trivial bound) | 31.7% |
| 2 | 25% | 4.6% |
| 3 | 11.1% | 0.3% |
| 4 | 6.25% | 0.006% |
Chebyshev 的 bound 比 Normal 的 68-95-99.7 rule 寬鬆很多 — 因為它適用於任何分布。如果你知道分布是 Normal,用 68-95-99.7 更 tight。
ML Connections
| Application | How Chebyshev Is Used |
|---|---|
| Outlier detection | 超過 的 data point → by Chebyshev, 最多 11% 的 data → 可能是 outlier |
| Convergence bounds | Sample mean 的 error bound: (Law of Large Numbers 的弱版本) |
| Feature engineering | Clip features at — Chebyshev 保證最多 11% 被 clip |
| Distribution-free inference | 不知道 distribution 時,Chebyshev 提供 conservative bound |
import numpy as np
# Verify Chebyshev on different distributions
for dist_name, samples in [
("Normal", np.random.normal(0, 1, 100000)),
("Exponential", np.random.exponential(1, 100000)),
("Uniform", np.random.uniform(-1, 1, 100000)),
]:
mu, sigma = samples.mean(), samples.std()
for k in [2, 3]:
actual = np.mean(np.abs(samples - mu) >= k * sigma)
chebyshev_bound = 1 / k**2
print(f"{dist_name}: P(|X-μ|≥{k}σ) = {actual:.4f} ≤ {chebyshev_bound:.4f} (Chebyshev)")
# All satisfy the bound, but Normal is much tighter than the bound
面試連結:Why 3-Sigma Rule?
「為什麼用 做 outlier threshold?」— 如果 data 是 Normal → 只有 0.3% 在外面(68-95-99.7 rule)。如果 distribution 不知道 → Chebyshev guarantees 最多 11% 在外面。 是一個 distribution-free 的 reasonable threshold。
Key Distributions
Bernoulli Distribution
The simplest distribution: a single trial with two outcomes (success/failure).
- , with
- ,
Use case: 一次點擊是否轉換、一封郵件是否被開啟、一筆交易是否為詐欺。
Binomial Distribution
The number of successes in independent Bernoulli trials:
- ,
Use case: 100 個使用者中有幾人轉換、一批 1000 件產品中有幾件瑕疵品。
Normal approximation: 當 且 時,。這是 A/B testing 中 z-test for proportions 的理論基礎。
Geometric Distribution
The number of trials needed to get the first success:
- ,
- Also has the memoryless property — the discrete counterpart of Exponential
Use case: 使用者平均要看幾個廣告才會點擊、要打幾通客服電話才會解決問題。
Negative Binomial Distribution
The number of failures before the -th success:
- Geometric is the special case when
Use case: 常用來建模 overdispersed count data(variance > mean 的計數資料),例如網頁點擊次數、保險理賠次數。當資料的 variance 明顯大於 mean 時,Poisson 不適合,Negative Binomial 是替代方案。
Poisson Distribution
Models the number of events in a fixed interval of time or space, when events occur independently at a constant average rate :
- ,
- Mean equals variance — this is a defining property
Use case: 每小時收到的客訴數、每天的伺服器錯誤數、每分鐘的網站點擊數。均值等於變異數是 Poisson 分布最容易辨認的特徵。
Three assumptions (面試常考):
- Events in non-overlapping intervals are independent
- The average rate is constant over time
- At most one event can occur in an infinitesimally small interval
Poisson vs Binomial
When is large and is small, . 面試中如果看到「rare events over many trials」,優先考慮 Poisson。
判斷標準: 且 (或 ),就可以用 Poisson 近似。
Normal (Gaussian) Distribution
The most important distribution in statistics, defined by mean and variance :
- ,
- The 68-95-99.7 rule: approximately 68% of data falls within , 95% within , and 99.7% within
Standard Normal: . Any normal can be standardized to .
Why it appears everywhere: Central Limit Theorem — 不論母體是什麼分布,樣本均值在 夠大時趨近 Normal。這讓 z-test 和 t-test 在大樣本下都能使用。
Key properties (面試加分):
- Symmetric: mean = median = mode
- Linear combination of normals is also normal: (if independent)
- Skewness = 0, Kurtosis = 3 (excess kurtosis = 0)
Log-Normal Distribution
If , then follows a Log-Normal distribution.
- Right-skewed, positive-only, arises from multiplicative processes
Use case: 收入分布、股票價格、城市人口、檔案大小 — 任何「由很多獨立乘法因子構成」的資料都適合用 Log-Normal 建模。
面試判斷:Normal vs Log-Normal
如果你的資料:
- 有負值 → 不是 Log-Normal
- 右偏且只有正值 → 考慮 Log-Normal
- 取 log 後看起來對稱 → 很可能是 Log-Normal
許多 ML 場景(房價、交易金額)取 log 後做 regression 效果更好,原因就是原始資料更接近 Log-Normal。
Exponential Distribution
Models the time between events in a Poisson process:
- ,
- Memoryless property: — the remaining waiting time doesn't depend on how long you've already waited
- Exponential is the only continuous distribution with the memoryless property
Use case: 到下一個客戶到達的時間、伺服器故障之間的間隔、放射性衰變。
Poisson-Exponential duality: 如果事件的次數服從 Poisson(),那事件之間的等待時間服從 Exponential()。同一個 Poisson process 的兩種觀點:一個看「數量」,一個看「間隔」。
Gamma Distribution
A generalization of the Exponential — the waiting time until the -th event:
- ,
- Exponential is the special case when
- Chi-squared distribution is Gamma()
Use case: 等待第 5 個客戶到達的總時間、Bayesian 分析中作為 Poisson 的 conjugate prior。
Beta Distribution
Defined on , with shape controlled by and :
- ,
- Extremely flexible shape: is Uniform, is symmetric bell, is skewed
Use case: 建模機率或比例(CTR、conversion rate)。在 Bayesian inference 中是 Binomial likelihood 的 conjugate prior。
直覺:Beta() 可以理解成「已經看到 次成功和 次失敗之後,對成功率的信念」。
Uniform Distribution
All values equally likely over an interval :
- ,
- Maximum entropy distribution for bounded support — 在只知道上下界時,最「無資訊」的分布
Use case: 隨機數生成、建模完全不確定性、hash function 的理想輸出。
Relationships Between Distributions
理解分布之間的關係是面試中的強力加分點。點擊任一分布可以查看它的公式、性質以及和其他分布的關係:
Real-World Use Cases
Case 1: 信用卡詐欺偵測
資料特性:99.9% 正常交易、0.1% 詐欺 → 極度不平衡
- 詐欺的發生可以用 Poisson 建模(每小時平均幾筆詐欺)
- 詐欺之間的間隔用 Exponential 建模
- 每筆交易是否詐欺是 Bernoulli(p = 0.001)
- 100 筆交易中有幾筆詐欺 → Binomial(n=100, p=0.001)≈ Poisson(0.1)
- 交易金額通常右偏 → Log-Normal(取 log 後做 feature engineering)
Case 2: 房價預測
- 房價本身是右偏的 → Log-Normal(取 log 做 target transform)
- Regression 假設 residuals 服從 Normal 分布
- 房屋面積等 features 接近 Normal(可能有些偏態)
- 每個月的成交筆數 → Poisson
Case 3: 客戶分群
- 每位客戶的月消費次數 → Poisson 或 Negative Binomial(如果 overdispersed)
- 轉換率的先驗 → Beta 分布
- 消費金額 → Log-Normal
- GMM(Gaussian Mixture Model)假設每個 cluster 是 Normal 分布
Quick Reference Table
| Distribution | Type | Parameters | Mean | Variance | Key Property |
|---|---|---|---|---|---|
| Bernoulli | Discrete | Single binary trial | |||
| Binomial | Discrete | Sum of Bernoulli trials | |||
| Geometric | Discrete | Memoryless (discrete) | |||
| Neg. Binomial | Discrete | Overdispersed counts | |||
| Poisson | Discrete | Mean = Variance | |||
| Normal | Continuous | CLT makes it universal | |||
| Log-Normal | Continuous | Multiplicative processes | |||
| Exponential | Continuous | Memoryless (continuous) | |||
| Gamma | Continuous | Sum of Exponentials | |||
| Beta | Continuous | see formula | Conjugate prior for proportions | ||
| Uniform | Continuous | Maximum entropy (bounded) |
Hands-on: Distributions in Python
Sampling & Verifying Distribution Properties
import numpy as np
from scipy import stats
# Verify Poisson: mean ≈ variance (defining property)
samples = np.random.poisson(lam=5, size=10000)
# sample mean ≈ 5.0, sample variance ≈ 5.0 → Mean = Variance ✓
# Verify Exponential: memoryless property
# P(X > s+t | X > s) should equal P(X > t)
exp_samples = np.random.exponential(scale=0.5, size=100000)
p_conditional = np.mean(exp_samples[exp_samples > 1] > 3) # P(X>3 | X>1)
p_marginal = np.mean(exp_samples > 2) # P(X>2)
# Both ≈ 0.018 → memoryless ✓
# Verify CLT: sample means become normal regardless of source distribution
population = np.random.exponential(1, size=100000) # heavily skewed
sample_means = [np.random.choice(population, 50).mean() for _ in range(5000)]
_, p_value = stats.normaltest(sample_means)
# p_value > 0.05 → sample means are normally distributed ✓
Fitting Distributions to Data
from scipy import stats
import numpy as np
# Transaction amounts (right-skewed, positive-only → try Log-Normal)
amounts = np.random.lognormal(mean=3, sigma=1, size=1000)
# Fit log-normal
shape, loc, scale = stats.lognorm.fit(amounts, floc=0)
# KS test: does the data fit the distribution?
ks_stat, p_value = stats.kstest(amounts, "lognorm", args=(shape, loc, scale))
# p > 0.05 → cannot reject that data follows log-normal ✓
# Compare: if you take log, it should look normal
log_amounts = np.log(amounts)
_, p_normal = stats.normaltest(log_amounts)
# p > 0.05 → log(amounts) is normally distributed ✓
Explore Distributions
Use this interactive tool to build intuition about how distribution shapes change with parameters. Try adjusting and for the normal, or for the exponential, and toggle between PDF and CDF:
Distribution Explorer
Interview Signals
What interviewers listen for:
- 你能根據問題場景選擇正確的分布,而不是亂套公式
- 你理解每個分布的假設條件(例如 Poisson 要求事件獨立且速率恆定)
- 你能解釋分布之間的關係(Bernoulli → Binomial → Normal;Poisson ↔ Exponential)
- 你知道什麼時候用近似(Binomial → Poisson, Binomial → Normal)
- 你能區分 correlation 和 independence
- 你知道 PDF 的值不是機率
Practice
Flashcards
Flashcards (1/10)
Poisson 分布有什麼特殊性質讓你一眼就能辨認?
Mean 等於 Variance(均值 = 變異數 = λ)。如果資料的樣本均值和變異數接近,Poisson 是很好的候選分布。如果 variance 明顯大於 mean(overdispersion),考慮 Negative Binomial。
Quiz
一家網站每小時平均收到 5 個客訴,你想計算某小時收到 0 個客訴的機率,應該用什麼分布?