CUHK-STAT3009 Quiz 2

CUHK-STAT3009 Quiz 2

Name (Print): ______               Student ID: ________


  • This exam contains 4 Problems: Problem 1 (1.1); Problem 2 (2.1, 2.2, 2.3); Problem 3 (3.1, 3.2); Problem 4 (4.1).

  • You have 45 minutes to complete this exam. NO LATE Submission!


Problem 1 (Baseline Methods)

Given glb_mean and user_mean methods as follows (exactly same with the code in the Github).

class glb_mean(object):
	def __init__(self):
		self.glb_mean = 0
	
	def fit(self, train_rating):
		self.glb_mean = np.mean(train_rating)
	
	def predict(self, test_pair):
		pred = np.ones(len(test_pair))
		pred = pred*self.glb_mean
		return pred

class user_mean(object):
	def __init__(self, n_user):
		self.n_user = n_user
		self.glb_mean = 0.
		self.user_mean = np.zeros(n_user)
	
	def fit(self, train_pair, train_rating):
		self.glb_mean = train_rating.mean()
		for u in range(self.n_user):
			ind_train = np.where(train_pair[:,0] == u)[0]
			if len(ind_train) == 0:
				self.user_mean[u] = self.glb_mean
			else:
				self.user_mean[u] = train_rating[ind_train].mean()
	
	def predict(self, test_pair):
		pred = np.ones(len(test_pair))*self.glb_mean
		j = 0
		for row in test_pair:
			user_tmp, item_tmp = row[0], row[1]
			pred[j] = self.user_mean[user_tmp]
			j = j + 1
		return pred

Given train dataset, suppose we consider three baseline methods,

### Method A: user-mean
user_ave = user_mean(n_user=n_user)
user_ave.fit(train_pair=train_pair, train_rating=train_rating)
pred_A = user_ave.predict(test_pair)

### Method B: glb mean + user mean
glb_ave = glb_mean()
glb_ave.fit(train_rating)
pred = glb_ave.predict(test_pair)

train_rating_cm = train_rating - glb_ave.predict(train_pair)
user_ave = user_mean(n_user=n_user)
user_ave.fit(train_pair=train_pair, train_rating=train_rating_cm)
pred_B = pred + user_ave.predict(test_pair)

### Method C: user mean + glb mean
user_ave = user_mean(n_user=n_user)
user_ave.fit(train_pair=train_pair, train_rating=train_rating)
pred = user_ave.predict(test_pair)

train_rating_cm = train_rating - user_ave.predict(train_pair)
glb_ave = glb_mean()
glb_ave.fit(train_rating_cm)
pred_C = pred + glb_ave.predict(test_pair)

1.1. In general, which of the following statement is correct?

- 🔲 pred_A $=$ pred_B $\neq$ pred_C

- 🔲 pred_A $=$ pred_B $=$ pred_C

- 🔲 pred_A $\neq$ pred_B $=$ pred_C

- 🔲 pred_A $\neq$ pred_B $\neq$ pred_C

- 🔲 pred_A $=$ pred_C $\neq$ pred_B


Problem 2 (LFM)

2.1. Consider a LFM including all users (1, …, n) and items (1, …, m):

\((\widehat{\mathbf P}, \widehat{\mathbf{Q}}) = \text{argmin}_{P, Q} \frac{1}{|\Omega|} \sum_{(u,i) \in \Omega} \big( r_{ui} - \mathbf{p}_u^T \mathbf{q}_i \big)^2 + \lambda \sum_{u=1}^n \| \mathbf{p}_u \|_2^2 + \lambda \sum_{u=1}^m \| \mathbf{q}_i \|_2^2\)

What’s the predicted ratings for cold-start users/items based on $(\widehat{\mathbf P}, \widehat{\mathbf{Q}})$:

- 🔲 Random initialization

- 🔲 Zero

- 🔲 User mean

- 🔲 Item mean

- 🔲 Global mean

- 🔲 Infinity


Consider a LFM:

\((1) \quad (\widehat{\mathbf P}, \widehat{\mathbf{Q}}) = \text{argmin}_{P, Q} \frac{1}{|\Omega|} \sum_{(u,i) \in \Omega} \big( r_{ui} - \mathbf{p}_u^T \mathbf{q}_i \big)^2\)

  • Step 1. Based on (???), we can sequentially update $\mathbf{P}$ and $\mathbf{Q}$, that is, by fixing $\mathbf{P}$, we tend to solve via $\mathbf{Q}$: \((2) \quad \min_{Q} \frac{1}{|\Omega|} \sum_{(u,i) \in \Omega} \big( r_{ui} - \mathbf{p}_u^T \mathbf{q}_i \big)^2\) Then, by fixing $\mathbf{Q}$, we solve $\mathbf{P}$ as \((3) \quad \min_{P} \frac{1}{|\Omega|} \sum_{(u,i) \in \Omega} \big( r_{ui} - \mathbf{p}_u^T \mathbf{q}_i \big)^2,\)
  • Step 2. Based on (???), (2)-(3) can be reduced to item-wise/user-wise parallel updates (4)-(5), that is, \((4) \quad \widehat{\mathbf{q}}_i = \text{argmin}_{q_i} \frac{1}{|\Omega|} \sum_{u \in \mathcal{U}_i} \big( r_{ui} - \mathbf{p}_u^T \mathbf{q}_i \big)^2, \quad \text{for } i = 1, \cdots, m; \\ (5) \quad \widehat{\mathbf{p}}_u = \text{argmin}_{p_u} \frac{1}{|\Omega|} \sum_{i \in \mathcal{I}_u} \big( r_{ui} - \mathbf{p}_u^T \mathbf{q}_i \big)^2, \quad \text{for } u = 1, \cdots, n;\)

2.2. In Step 1, (???) stands for

- 🔲 Linear regression

- 🔲 Alternative Least Squares (ALS)

- 🔲 Separability of the objective function

- 🔲 (Stochastic) Gradient Descent


2.3. In Step 2, (???) stands for

- 🔲 Linear regression

- 🔲 Alternative Least Squares (ALS)

- 🔲 Separability of the objective function

- 🔲 (Stochastic) Gradient Descent


Problem 3 (Cross Validation)

3.1. Which of the following case can cause over-fitting?

- 🔲 Low Testing Error & Low Training Error

- 🔲 Low Testing Error & High Training Error

- 🔲 High Testing Error & High Training Error

- 🔲 High Testing Error & Low Training Error


3.2. If you get following feedback from LFM on a dataset, you may

	Fitting Reg-LFM: K: 3, lam: 0.00010
	Reg-LFM: ite: 0; diff: 0.527 RMSE: 0.939
	Reg-LFM: ite: 1; diff: 0.050 RMSE: 0.892
	Reg-LFM: ite: 2; diff: 0.042 RMSE: 0.854
	Reg-LFM: ite: 3; diff: 0.021 RMSE: 0.836
	Reg-LFM: ite: 4; diff: 0.010 RMSE: 0.828
	Reg-LFM: ite: 5; diff: 0.006 RMSE: 0.823
	Reg-LFM: ite: 6; diff: 0.003 RMSE: 0.820
	Reg-LFM: ite: 7; diff: 0.002 RMSE: 0.819
	Reg-LFM: ite: 8; diff: 0.001 RMSE: 0.818
	Reg-LFM: ite: 9; diff: 0.001 RMSE: 0.817
	Validation RMSE for LFM: 1.975

- 🔲 Increase K and reduce lam

- 🔲 Increase K and reduce lam

- 🔲 Reduce K and increase lam

- 🔲 Reduce K and increase lam


Problem 4 (Neural Network)

Given a SideNCF network as follows,

class SideNCF(keras.Model):
    def __init__(self, numA, numB, numC, embedding_size, **kwargs):
        super(SideNCF, self).__init__(**kwargs)
        self.numA = numA
        self.numB = numB
		self.numC = numC
        self.embedding_size = embedding_size

        self.embeddingA = layers.Embedding(
            numA,
            embedding_size,
            embeddings_initializer="he_normal",
            embeddings_regularizer=keras.regularizers.l2(1e-2),
        )
        self.embeddingB = layers.Embedding(
            numB,
            embedding_size,
            embeddings_initializer="he_normal",
            embeddings_regularizer=keras.regularizers.l2(1e-2),
        )
        self.embeddingC = layers.Embedding(
            numC,
            embedding_size,
            embeddings_initializer="he_normal",
            embeddings_regularizer=keras.regularizers.l2(1e-2),
        )
        self.concatenate = layers.Concatenate()

    def call(self, inputs):
        A_vector = self.embeddingA(inputs[:,0])
        B_vector = self.embeddingB(inputs[:,1])
        C_vector = self.embeddingC(inputs[:,2])
        D_vector = self.embeddingC(inputs[:,3])
		dot_ = tf.tensordot(A_vector, B_vector, 2) + tf.tensordot(A_vector, C_vector, 2) + tf.tensordot(A_vector, D_vector, 2) 
        return dot_

4.1. Given a dataset with side information with inputs = (user_id (u), item_id (i), tag1_id (t1), tag2_id (t2)), which of the following model is corresponding to the given SideNCF?

- 🔲 $\widehat{r}{ui} = \mathbf{a}^T_u \mathbf{b}_i + \mathbf{a}^T_u \mathbf{c}{t1} + \mathbf{a}^T_u \mathbf{d}_{t2} + \mu_u$

- 🔲 $\widehat{r}{ui} = \mathbf{a}^T_u \mathbf{b}_i + \mathbf{a}^T_u \mathbf{c}{t1} + \mathbf{a}^T_u \mathbf{d}_{t2} + \mu_i$

- 🔲 $\widehat{r}{ui} = \mathbf{a}^T_u \mathbf{b}_i + \mathbf{a}^T_u \mathbf{c}{t1} + \mathbf{a}^T_u \mathbf{d}_{t2}$

- 🔲 $\widehat{r}{ui} = \mathbf{a}^T_u \mathbf{b}_i + \mathbf{a}^T_u \mathbf{c}{t1} + \mathbf{a}^T_u \mathbf{c}_{t2}$