Recommendation Systems

Interview Context

推薦系統是 FAANG 面試的高頻 system design 題。面試官不只問演算法,還問你怎麼處理 cold start、怎麼 evaluate、怎麼從 offline metric 到 online impact。要能從 product 角度思考,不只是技術。

What You Should Understand

  • 能比較 collaborative filtering、content-based、hybrid 的適用場景
  • 理解 matrix factorization 的數學和 implicit feedback 的差異
  • 知道 cold start 問題的具體解法
  • 能設計 offline + online evaluation 策略
  • 理解 feedback loop、position bias、popularity bias 等 production 問題

Overview

The Three Paradigms

ParadigmInputIdeaExample
Collaborative Filtering (CF)User-item interactions和你相似的人喜歡的東西,你也可能喜歡"Users who bought X also bought Y"
Content-BasedItem/user features推薦和你過去喜歡的東西特徵相似的 item"Because you watched sci-fi movies..."
HybridBoth結合兩者的優勢Netflix, Spotify, Amazon

Collaborative Filtering

User-Based CF

Find users similar to you, recommend items they liked:

  1. Compute similarity between users(based on their rating patterns)
  2. Find top-kk most similar users to the target user
  3. Recommend items those similar users rated highly but the target user hasn't seen
r^ui=rˉu+vN(u)sim(u,v)(rvirˉv)vN(u)sim(u,v)\hat{r}_{ui} = \bar{r}_u + \frac{\sum_{v \in N(u)} \text{sim}(u, v) \cdot (r_{vi} - \bar{r}_v)}{\sum_{v \in N(u)} |\text{sim}(u, v)|}

Item-Based CF

Find items similar to items the user already liked:

  1. Compute similarity between items(based on who rated them similarly)
  2. For each item the user liked, find similar items
  3. Recommend the most similar items the user hasn't seen
r^ui=jN(i)sim(i,j)rujjN(i)sim(i,j)\hat{r}_{ui} = \frac{\sum_{j \in N(i)} \text{sim}(i, j) \cdot r_{uj}}{\sum_{j \in N(i)} |\text{sim}(i, j)|}

User-Based vs Item-Based

AspectUser-BasedItem-Based
Similarity matrixnusers×nusersn_{\text{users}} \times n_{\text{users}}nitems×nitemsn_{\text{items}} \times n_{\text{items}}
StabilityUsers change preferences → unstableItem similarities are more stable
Cold startNew user problem(no history)New item problem(no ratings)
Scalability通常 users >> items → matrix 更大Items 更少 → matrix 更小、更 cacheable
Industry preference較少用(Amazon 早期)更常用(Amazon's "item-to-item CF")

Item-Based CF 在實務中更常見 — item similarity 更穩定(電影的特性不會變,但使用者的偏好會),計算量更小(items 通常比 users 少),且 similarity matrix 可以 pre-compute。

Similarity Metrics

Cosine Similarity: cos(u,v)\cos(\mathbf{u}, \mathbf{v}) — normalized dot product, range [-1, 1]。不受 rating scale 影響。

Pearson Correlation: Mean-centered cosine — 減去每個 user 的 mean rating → 消除 rating bias(有些人都給高分,有些人嚴格)。

Jaccard Similarity: J(A,B)=J(A,B) = intersection / union — 只看有沒有互動,不看 rating value。適合 binary implicit feedback。

Cosine vs Pearson

Cosine 不 center data → 如果 user A 都給 4-5 分、user B 都給 1-2 分,cosine 會認為他們不相似。Pearson 先減去 mean → 捕捉 pattern 而非 absolute level。通常 Pearson for ratings, Cosine for implicit feedback

Content-Based Filtering

Recommend items similar to what the user has liked, based on item features:

score(u,i)=sim(user_profile(u),item_features(i))\text{score}(u, i) = \text{sim}(\text{user\_profile}(u), \text{item\_features}(i))

Item Features

DomainFeatures
MoviesGenre, director, actors, keywords, release year
E-commerceCategory, brand, price range, description (TF-IDF)
NewsTopic, entities, source, recency
MusicGenre, tempo, energy, artist, audio features

User Profile

Aggregate features of items the user has interacted with:

user_profile(u)=iliked(u)features(i)liked(u)\text{user\_profile}(u) = \frac{\sum_{i \in \text{liked}(u)} \text{features}(i)}{|\text{liked}(u)|}

或用 weighted average(rating 高的 item 權重大)。

Content-Based vs CF

AspectContent-BasedCollaborative Filtering
Needs user historyYes (past items)Yes (ratings/interactions)
Needs item featuresYesNo
Cold start (new item)Handles well(有 features 就能推薦)Can't recommend(no ratings yet)
Cold start (new user)Can't recommend(no history)Can't recommend(no ratings)
SerendipityLow(只推薦相似的 → filter bubble)High(similar users may have diverse tastes)
Domain knowledgeRequired(need good features)Not required

Filter Bubble Problem

Content-based 系統只推薦和使用者過去喜歡的相似的 item → 使用者被困在 echo chamber 中。解法:加入 diversity penalty、exploration(random recommendations)、或 hybrid with CF(引入 serendipity)。

Matrix Factorization

The Idea

Decompose the sparse user-item rating matrix R\mathbf{R} (m×nm \times n) into two low-rank matrices:

RPQ\mathbf{R} \approx \mathbf{P} \mathbf{Q}^\top

where P\mathbf{P} (m×km \times k) = user latent factors, Q\mathbf{Q} (n×kn \times k) = item latent factors, kmin(m,n)k \ll \min(m, n).

Predicted rating:

r^ui=puqi=f=1kpufqif\hat{r}_{ui} = \mathbf{p}_u^\top \mathbf{q}_i = \sum_{f=1}^{k} p_{uf} \cdot q_{if}

每個 latent factor 代表一個 hidden dimension(例如 factor 1 = 「動作程度」, factor 2 = 「深度」)— 但 factors 通常不 directly interpretable。

SVD-based Methods

Vanilla SVD: R=UΣV\mathbf{R} = \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}^\top

問題:R\mathbf{R} 有大量 missing values → standard SVD 不能直接用在 incomplete matrix。

Regularized SVD (Funk SVD): Only minimize loss on observed ratings:

minP,Q(u,i)observed(ruipuqi)2+λ(pu2+qi2)\min_{\mathbf{P}, \mathbf{Q}} \sum_{(u,i) \in \text{observed}} (r_{ui} - \mathbf{p}_u^\top \mathbf{q}_i)^2 + \lambda(\|\mathbf{p}_u\|^2 + \|\mathbf{q}_i\|^2)

用 SGD 或 ALS(Alternating Least Squares)optimize。

Adding Biases

Users and items have inherent biases(某些 user 都打高分、某些 item 本身就好):

r^ui=μ+bu+bi+puqi\hat{r}_{ui} = \mu + b_u + b_i + \mathbf{p}_u^\top \mathbf{q}_i
  • μ\mu: global average rating
  • bub_u: user bias(相對於平均)
  • bib_i: item bias

ALS (Alternating Least Squares)

ALS algorithm:

  1. Fix Q\mathbf{Q}, solve for P\mathbf{P}(closed-form linear regression per user)
  2. Fix P\mathbf{P}, solve for Q\mathbf{Q}(closed-form per item)
  3. Repeat until convergence

ALS 的優勢:每步都是 closed-form → 可以高度 parallelize(Spark MLlib 用 ALS)。SGD 是 sequential,不好 parallelize。

Implicit Feedback

大部分推薦場景沒有 explicit ratings — 只有 implicit signals:clicks, views, purchases, time spent.

Key differences from explicit feedback:

AspectExplicit (ratings)Implicit (clicks/views)
SignalClear preference (1-5 stars)Ambiguous(click ≠ like)
Missing dataUnknown preferenceCould mean "not interested" or "not seen"
Negative signalLow ratingAbsence of interaction(但可能只是沒看到)
VolumeSparseMore abundant

Weighted ALS for implicit feedback (Hu et al., 2008):

minP,Qu,icui(puipuqi)2+λ(P2+Q2)\min_{\mathbf{P}, \mathbf{Q}} \sum_{u,i} c_{ui}(p_{ui} - \mathbf{p}_u^\top \mathbf{q}_i)^2 + \lambda(\|\mathbf{P}\|^2 + \|\mathbf{Q}\|^2)
  • pui=1p_{ui} = 1 if user interacted with item, 0 otherwise(binary preference)
  • cui=1+αf(rui)c_{ui} = 1 + \alpha \cdot f(r_{ui})(confidence — 互動越多 confidence 越高)
from implicit.als import AlternatingLeastSquares

# CSR matrix of user-item interactions (implicit)
model = AlternatingLeastSquares(factors=50, regularization=0.1, iterations=20)
model.fit(user_item_matrix)

# Recommend for user 42
ids, scores = model.recommend(42, user_item_matrix[42], N=10)

Deep Learning for Recommendations

Two-Tower Model

Modern large-scale recommendation systems 常用 two-tower architecture:

User tower: Encode user features → user embedding u\mathbf{u}

Item tower: Encode item features → item embedding v\mathbf{v}

score(u,i)=uvorσ(uv)\text{score}(u, i) = \mathbf{u}^\top \mathbf{v} \quad \text{or} \quad \sigma(\mathbf{u}^\top \mathbf{v})

兩個 tower 分開訓練 → item embeddings 可以 pre-compute 和 index → 用 ANN(Approximate Nearest Neighbor)做 millisecond-level retrieval。

Multi-Stage Architecture

Production 推薦系統通常是 multi-stage pipeline:

StagePurposeCandidatesModel Complexity
Candidate generation從百萬 items 中快速篩出 ~1000 candidates1M → 1KSimple (two-tower, ANN)
Ranking精排 1000 candidates1K → 100Complex (deep model, many features)
Re-rankingBusiness rules, diversity, freshness100 → 10Rules + optimization

面試中的 RecSys Design

面試官問「設計一個推薦系統」時,不要只講 algorithm。要講 multi-stage pipeline:candidate generation(fast, recall-oriented)→ ranking(accurate, precision-oriented)→ re-ranking(diversity, business rules)。然後再討論每個 stage 用什麼 model。

Cold Start Problem

Types

TypeProblemSolutions
New userNo interaction history → CF 無法工作Content-based(用 user demographics)、popularity-based、onboarding quiz
New itemNo ratings → CF 無法推薦Content-based(用 item features)、editorial curation、explore-exploit
New systemBoth user and item data are sparseStart with content-based + popular items → gradually transition to CF

Solutions in Practice

For new users:

  • Show popular / trending items(non-personalized → safe baseline)
  • Onboarding:ask for preferences or import from other platforms
  • 用 demographic features 做 content-based(age, location → similar users' preferences)
  • Contextual bandits:explore user preferences while serving recommendations

For new items:

  • Inject new items into small % of recommendations(exploration)
  • Use item features + content-based similarity to existing items
  • Editorial / curated placement
  • Time-based boost(new items get temporary ranking lift)

Evaluation

Offline Metrics

MetricWhat It MeasuresNote
RMSERating prediction accuracyOnly for explicit feedback
Precision@KRelevant items in top-K / K前 K 個推薦中有多少是 relevant
Recall@KRelevant items in top-K / total relevant所有 relevant items 中有多少被推薦
NDCG@KRanking quality with position discount考慮 item 的 relevance level 和 position
Hit Rate@K% of users with ≥1 relevant in top-K最簡單的推薦 success metric
MAPAverage precision across users每個 relevant item 的 precision 平均
Coverage% of items ever recommended低 = 只推薦熱門 item
DiversityIntra-list dissimilarity高 = 推薦的 items 彼此不同

Offline vs Online Gap

Offline metrics 好不代表 online metrics 好:

ReasonExample
Selection biasOffline 只能 evaluate 被推薦過的 items(missing not at random)
Feedback loopModel 推薦的 items 得到更多 clicks → 自我強化
Position biasUsers click higher-ranked items regardless of relevance
ContextOffline 無法捕捉 user 的 real-time intent

Online metrics(A/B test):CTR, conversion rate, revenue per session, time spent, user retention.

Offline NDCG ↑ ≠ Online CTR ↑

永遠要用 A/B test 驗證 offline 的改善是否 translate to online。常見失敗案例:offline NDCG 提升 5% 但 online CTR 不變 — 因為 offline evaluation data 有 selection bias 或 model 只改善了 tail items(users 不太看到的位置)。

Production Challenges

Popularity Bias

Model 傾向推薦已經熱門的 items(因為 training data 中熱門 items 有更多 interactions)→ 越推越熱門 → 長尾 items 永遠沒機會。

Solutions: Inverse propensity scoring, exploration budget, diversity penalty in re-ranking.

Position Bias

Users click items partly because they're ranked higher, not because they're more relevant.

Training data 中高位置的 items 得到更多 clicks → model 學到「高位置 = good」→ 自我強化。

Solutions: Inverse propensity weighting on position, random shuffle experiments to collect unbiased data.

Feedback Loop

Model predictions → affect what users see → affect user behavior → affect training data → affect model → ...

這個循環讓 model 的 mistakes 自我強化。

Solutions: Regular exploration(隨機推薦一小比例 items)、counterfactual evaluation、定期用 unbiased data retrain。

Real-World Use Cases

Case 1: 電商推薦(E-commerce)

ComponentChoiceReasoning
Candidate genItem-Based CF + popularity快速從百萬 products 中選出 ~1000
RankingGBM with user × item featuresPurchase probability + revenue optimization
FeaturesBrowsing history, purchase history, price sensitivity, category affinity
Cold start (new user)Popular items + category-basedOnboarding 時收集偏好
Cold start (new item)Content-based (category, brand) + explore新品上架時 boost + exploration budget
Online metricRevenue per session, conversion rate不只看 CTR — clicks without purchase 沒有 business value

Case 2: 影音串流(Video/Music)

ComponentChoiceReasoning
Candidate genTwo-tower (user embed + item embed)從百萬 content 中做 ANN retrieval
RankingDeep model with sequence featuresWatch history → 預測下一個
FeaturesWatch time, completion rate, genre, audio features, time of day
Cold startContent-based (metadata) + editorial新 content 靠 metadata 和編輯推薦
Key challengeEngagement vs satisfaction — 使用者看完 ≠ 喜歡(binge watch guilt)
Online metricWatch time, retention, survey satisfaction

Case 3: 社群 Feed(Social Media)

ComponentChoiceReasoning
Candidate genFriends' activities + interest graphSocial signal + topic affinity
RankingMulti-objective: engagement + quality + diversity
FeaturesPost features, author features, social graph, user engagement history
Key challengeFilter bubble — 只看到和自己觀點一致的內容
GuardrailsContent diversity, quality score, misinformation filter
Online metricDAU, time spent, meaningful social interactions

Hands-on: Recommendation in Python

Item-Based CF with Cosine Similarity

from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# user_item_matrix: rows=users, cols=items, values=ratings
item_sim = cosine_similarity(user_item_matrix.T)  # items × items

def recommend(user_id, user_item_matrix, item_sim, top_k=10):
    user_ratings = user_item_matrix[user_id]
    scores = item_sim.dot(user_ratings)  # weighted sum of similarities
    # Zero out already-rated items
    scores[user_ratings > 0] = 0
    return np.argsort(scores)[::-1][:top_k]

Matrix Factorization with ALS

from implicit.als import AlternatingLeastSquares
from scipy.sparse import csr_matrix

# Implicit feedback: user_item_matrix (sparse, values = interaction counts)
sparse_matrix = csr_matrix(user_item_matrix)

model = AlternatingLeastSquares(
    factors=64,
    regularization=0.05,
    iterations=30,
)
model.fit(sparse_matrix)

# Recommend for user 0
recommended_ids, scores = model.recommend(0, sparse_matrix[0], N=10)

# Similar items to item 42
similar_ids, scores = model.similar_items(42, N=5)

Two-Tower with PyTorch

import torch
import torch.nn as nn

class TwoTowerModel(nn.Module):
    def __init__(self, n_users, n_items, embed_dim=64):
        super().__init__()
        self.user_tower = nn.Sequential(
            nn.Embedding(n_users, embed_dim),
            nn.Linear(embed_dim, embed_dim),
            nn.ReLU(),
            nn.Linear(embed_dim, embed_dim),
        )
        self.item_tower = nn.Sequential(
            nn.Embedding(n_items, embed_dim),
            nn.Linear(embed_dim, embed_dim),
            nn.ReLU(),
            nn.Linear(embed_dim, embed_dim),
        )

    def forward(self, user_ids, item_ids):
        user_emb = self.user_tower(user_ids)    # [B, embed_dim]
        item_emb = self.item_tower(item_ids)    # [B, embed_dim]
        return (user_emb * item_emb).sum(dim=1) # dot product → score

Interview Signals

What interviewers listen for:

  • 你能描述 multi-stage pipeline(candidate gen → ranking → re-ranking)
  • 你知道 CF 和 content-based 的 tradeoffs,以及各自的 cold start 問題
  • 你能解釋 implicit vs explicit feedback 的差異和處理方式
  • 你知道 offline NDCG ≠ online CTR,以及 feedback loop / position bias 等 production 問題
  • 你能結合 business context 設計 evaluation 策略

Practice

Flashcards

Flashcards (1/10)

User-Based CF 和 Item-Based CF 哪個在 production 更常用?為什麼?

Item-Based CF 更常用。原因:(1)item similarity 比 user similarity 更穩定(item 特性不變,user 偏好會變),(2)items 通常比 users 少 → similarity matrix 更小、更 cacheable,(3)可以 pre-compute。Amazon 的 'item-to-item CF' 是經典案例。

Click card to flip

Quiz

Question 1/10

Content-based filtering 最大的優勢是什麼?

Mark as Complete

3/5 — Okay