CUHK-STAT3009 Quiz 2

Name (Print): ______________               Student ID: ________________________



Problem 1 (Baseline Methods)

Given glb_mean and user_mean methods as follows (exactly same with the code in the Github).

class glb_mean(object): def __init__(self): self.glb_mean = 0 def fit(self, train_rating): self.glb_mean = np.mean(train_rating) def predict(self, test_pair): pred = np.ones(len(test_pair)) pred = pred*self.glb_mean return pred class user_mean(object): def __init__(self, n_user): self.n_user = n_user self.glb_mean = 0. self.user_mean = np.zeros(n_user) def fit(self, train_pair, train_rating): self.glb_mean = train_rating.mean() for u in range(self.n_user): ind_train = np.where(train_pair[:,0] == u)[0] if len(ind_train) == 0: self.user_mean[u] = self.glb_mean else: self.user_mean[u] = train_rating[ind_train].mean() def predict(self, test_pair): pred = np.ones(len(test_pair))*self.glb_mean j = 0 for row in test_pair: user_tmp, item_tmp = row[0], row[1] pred[j] = self.user_mean[user_tmp] j = j + 1 return pred

Given train dataset, suppose we consider three baseline methods,

### Method A: user-mean user_ave = user_mean(n_user=n_user) user_ave.fit(train_pair=train_pair, train_rating=train_rating) pred_A = user_ave.predict(test_pair) ### Method B: glb mean + user mean glb_ave = glb_mean() glb_ave.fit(train_rating) pred = glb_ave.predict(test_pair) train_rating_cm = train_rating - glb_ave.predict(train_pair) user_ave = user_mean(n_user=n_user) user_ave.fit(train_pair=train_pair, train_rating=train_rating_cm) pred_B = pred + user_ave.predict(test_pair) ### Method C: user mean + glb mean user_ave = user_mean(n_user=n_user) user_ave.fit(train_pair=train_pair, train_rating=train_rating) pred = user_ave.predict(test_pair) train_rating_cm = train_rating - user_ave.predict(train_pair) glb_ave = glb_mean() glb_ave.fit(train_rating_cm) pred_C = pred + glb_ave.predict(test_pair)

1.1. In general, which of the following statement is correct?

- 🔲 pred_A == pred_B \neq pred_C

- 🔲 pred_A == pred_B == pred_C

- 🔲 pred_A \neq pred_B == pred_C

- 🔲 pred_A \neq pred_B \neq pred_C

- 🔲 pred_A == pred_C \neq pred_B


Problem 2 (LFM)

2.1. Consider a LFM including all users (1, ..., n) and items (1, ..., m):

(Pundefined,Qundefined)=argminP,Q1Ω(u,i)Ω(ruipuTqi)2+λu=1npu22+λu=1mqi22(\widehat{\mathbf P}, \widehat{\mathbf{Q}}) = \text{argmin}_{P, Q} \frac{1}{|\Omega|} \sum_{(u,i) \in \Omega} \big( r_{ui} - \mathbf{p}_u^T \mathbf{q}_i \big)^2 + \lambda \sum_{u=1}^n \| \mathbf{p}_u \|_2^2 + \lambda \sum_{u=1}^m \| \mathbf{q}_i \|_2^2

What's the predicted ratings for cold-start users/items based on (Pundefined,Qundefined)(\widehat{\mathbf P}, \widehat{\mathbf{Q}}):

- 🔲 Random initialization

- 🔲 Zero

- 🔲 User mean

- 🔲 Item mean

- 🔲 Global mean

- 🔲 Infinity


Consider a LFM:

(1)(Pundefined,Qundefined)=argminP,Q1Ω(u,i)Ω(ruipuTqi)2(1) \quad (\widehat{\mathbf P}, \widehat{\mathbf{Q}}) = \text{argmin}_{P, Q} \frac{1}{|\Omega|} \sum_{(u,i) \in \Omega} \big( r_{ui} - \mathbf{p}_u^T \mathbf{q}_i \big)^2

(2)minQ1Ω(u,i)Ω(ruipuTqi)2(2) \quad \min_{Q} \frac{1}{|\Omega|} \sum_{(u,i) \in \Omega} \big( r_{ui} - \mathbf{p}_u^T \mathbf{q}_i \big)^2

Then, by fixing Q\mathbf{Q}, we solve P\mathbf{P} as

(3)minP1Ω(u,i)Ω(ruipuTqi)2,(3) \quad \min_{P} \frac{1}{|\Omega|} \sum_{(u,i) \in \Omega} \big( r_{ui} - \mathbf{p}_u^T \mathbf{q}_i \big)^2,

(4)qundefinedi=argminqi1ΩuUi(ruipuTqi)2,for i=1,,m;(5)pundefinedu=argminpu1ΩiIu(ruipuTqi)2,for u=1,,n;(4) \quad \widehat{\mathbf{q}}_i = \text{argmin}_{q_i} \frac{1}{|\Omega|} \sum_{u \in \mathcal{U}_i} \big( r_{ui} - \mathbf{p}_u^T \mathbf{q}_i \big)^2, \quad \text{for } i = 1, \cdots, m; \\ (5) \quad \widehat{\mathbf{p}}_u = \text{argmin}_{p_u} \frac{1}{|\Omega|} \sum_{i \in \mathcal{I}_u} \big( r_{ui} - \mathbf{p}_u^T \mathbf{q}_i \big)^2, \quad \text{for } u = 1, \cdots, n;

2.2. In Step 1, (???) stands for

- 🔲 Linear regression

- 🔲 Alternative Least Squares (ALS)

- 🔲 Separability of the objective function

- 🔲 (Stochastic) Gradient Descent


2.3. In Step 2, (???) stands for

- 🔲 Linear regression

- 🔲 Alternative Least Squares (ALS)

- 🔲 Separability of the objective function

- 🔲 (Stochastic) Gradient Descent


Problem 3 (Cross Validation)

3.1. Which of the following case can cause over-fitting?

- 🔲 Low Testing Error & Low Training Error

- 🔲 Low Testing Error & High Training Error

- 🔲 High Testing Error & High Training Error

- 🔲 High Testing Error & Low Training Error


3.2. If you get following feedback from LFM on a dataset, you may

Fitting Reg-LFM: K: 3, lam: 0.00010 Reg-LFM: ite: 0; diff: 0.527 RMSE: 0.939 Reg-LFM: ite: 1; diff: 0.050 RMSE: 0.892 Reg-LFM: ite: 2; diff: 0.042 RMSE: 0.854 Reg-LFM: ite: 3; diff: 0.021 RMSE: 0.836 Reg-LFM: ite: 4; diff: 0.010 RMSE: 0.828 Reg-LFM: ite: 5; diff: 0.006 RMSE: 0.823 Reg-LFM: ite: 6; diff: 0.003 RMSE: 0.820 Reg-LFM: ite: 7; diff: 0.002 RMSE: 0.819 Reg-LFM: ite: 8; diff: 0.001 RMSE: 0.818 Reg-LFM: ite: 9; diff: 0.001 RMSE: 0.817 Validation RMSE for LFM: 1.975

- 🔲 Increase K and reduce lam

- 🔲 Increase K and reduce lam

- 🔲 Reduce K and increase lam

- 🔲 Reduce K and increase lam


Problem 4 (Neural Network)

Given a SideNCF network as follows,

class SideNCF(keras.Model): def __init__(self, numA, numB, numC, embedding_size, **kwargs): super(SideNCF, self).__init__(**kwargs) self.numA = numA self.numB = numB self.numC = numC self.embedding_size = embedding_size self.embeddingA = layers.Embedding( numA, embedding_size, embeddings_initializer="he_normal", embeddings_regularizer=keras.regularizers.l2(1e-2), ) self.embeddingB = layers.Embedding( numB, embedding_size, embeddings_initializer="he_normal", embeddings_regularizer=keras.regularizers.l2(1e-2), ) self.embeddingC = layers.Embedding( numC, embedding_size, embeddings_initializer="he_normal", embeddings_regularizer=keras.regularizers.l2(1e-2), ) self.concatenate = layers.Concatenate() def call(self, inputs): A_vector = self.embeddingA(inputs[:,0]) B_vector = self.embeddingB(inputs[:,1]) C_vector = self.embeddingC(inputs[:,2]) D_vector = self.embeddingC(inputs[:,3]) dot_ = tf.tensordot(A_vector, B_vector, 2) + tf.tensordot(A_vector, C_vector, 2) + tf.tensordot(A_vector, D_vector, 2) return dot_

4.1. Given a dataset with side information with inputs = (user_id (u), item_id (i), tag1_id (t1), tag2_id (t2)), which of the following model is corresponding to the given SideNCF?

- 🔲 rundefinedui=auTbi+auTct1+auTdt2+μu\widehat{r}_{ui} = \mathbf{a}^T_u \mathbf{b}_i + \mathbf{a}^T_u \mathbf{c}_{t1} + \mathbf{a}^T_u \mathbf{d}_{t2} + \mu_u

- 🔲 rundefinedui=auTbi+auTct1+auTdt2+μi\widehat{r}_{ui} = \mathbf{a}^T_u \mathbf{b}_i + \mathbf{a}^T_u \mathbf{c}_{t1} + \mathbf{a}^T_u \mathbf{d}_{t2} + \mu_i

- 🔲 rundefinedui=auTbi+auTct1+auTdt2\widehat{r}_{ui} = \mathbf{a}^T_u \mathbf{b}_i + \mathbf{a}^T_u \mathbf{c}_{t1} + \mathbf{a}^T_u \mathbf{d}_{t2}

- 🔲 rundefinedui=auTbi+auTct1+auTct2\widehat{r}_{ui} = \mathbf{a}^T_u \mathbf{b}_i + \mathbf{a}^T_u \mathbf{c}_{t1} + \mathbf{a}^T_u \mathbf{c}_{t2}