mrec.mf.model Package¶

`warp` Module¶

class mrec.mf.model.warp.WARP(d, gamma, C, max_iters, validation_iters, batch_size=10, positive_thresh=1e-05, max_trials=50)¶

Bases: object

Learn low-dimensional embedding optimizing the WARP loss.

Parameters :

d : int

Embedding dimension.

gamma : float

Learning rate.

C : float

Regularization constant.

max_iters : int

Maximum number of SGD updates.

validation_iters : int

Number of SGD updates between checks for stopping condition.

batch_size : int

Mini batch size for SGD updates.

positive_thresh: float :

Training entries below this are treated as zero.

max_trials : int

Number of attempts allowed to find a violating negative example during training updates. This means that in practice we optimize for ranks 1 to max_trials-1.

Attributes

U_	numpy.ndarray	Row factors.
V_	numpy.ndarray	Column factors.

Methods

`compute_updates`(train, decomposition, updates)
`estimate_precision`(decomposition, train, ...)	Compute prec@k for a sample of training rows.
`estimate_warp_loss`(train, u, N)
`fit`(train[, validation])	Learn factors from training set.
`precompute_warp_loss`(num_cols)	Precompute WARP loss for each possible rank:
`sample`(train, decomposition)

compute_updates(train, decomposition, updates)¶

estimate_precision(decomposition, train, validation, k=30)¶

Compute prec@k for a sample of training rows.

Parameters :

decomposition : WARPDecomposition

The current decomposition.

train : scipy.sparse.csr_matrix

The training data.

k : int

Measure precision@k.

validation : dict or int

Validation set over which we compute precision. Either supply a dict of row -> list of hidden cols, or an integer n, in which case we simply evaluate against the training data for the first n rows.

Returns :

prec : float

Precision@k computed over a sample of the training rows.

Notes

At the moment this will underestimate the precision of real recommendations because we do not exclude training cols with zero ratings from the top-k predictions evaluated.

estimate_warp_loss(train, u, N)¶

fit(train, validation=None)¶

Learn factors from training set. The dot product of the factors reconstructs the training matrix approximately, minimizing the WARP ranking loss relative to the original data.

Parameters :

train : scipy.sparse.csr_matrix

Training matrix to be factorized.

validation : dict or int

Validation set to control early stopping, based on precision@30. The dict should have the form row->[cols] where the values in cols are those we expected to be highly ranked in the reconstruction of row. If an int is supplied then instead we evaluate precision against the training data for the first validation rows.

Returns :

self : object

This model itself.

precompute_warp_loss(num_cols)¶

Precompute WARP loss for each possible rank:

L(i) = sum_{0,i}{1/(i+1)}

sample(train, decomposition)¶

class mrec.mf.model.warp.WARPBatchUpdate(batch_size, d)¶

Bases: object

Collection of arrays to hold a batch of WARP sgd updates.

Methods

`clear`()
`set_update`(ix, update)

clear()¶

set_update(ix, update)¶

class mrec.mf.model.warp.WARPDecomposition(num_rows, num_cols, d)¶

Bases: object

Matrix embedding optimizing the WARP loss.

Parameters :

num_rows : int

Number of rows in the full matrix.

num_cols : int

Number of columns in the full matrix.

d : int

The embedding dimension for the decomposition.

Methods

`apply_updates`(updates, gamma, C)
`compute_gradient_step`(u, i, j, L)	Compute a gradient step from results of sampling.
`reconstruct`(rows)

apply_updates(updates, gamma, C)¶

compute_gradient_step(u, i, j, L)¶

Compute a gradient step from results of sampling.

Parameters :

u : int

The sampled row.

i : int

The sampled positive column.

j : int

The sampled violating negative column i.e. U[u].V[j] is currently too large compared to U[u].V[i]

L : int

The number of trials required to find a violating negative column.

Returns :

u : int

As input.

i : int

As input.

j : int

As input.

dU : numpy.ndarray

Gradient step for U[u].

dV_pos : numpy.ndarray

Gradient step for V[i].

dV_neg : numpy.ndarray

Gradient step for V[j].

reconstruct(rows)¶

`warp2` Module¶

class mrec.mf.model.warp2.WARP2(d, gamma, C, max_iters, validation_iters, batch_size=10, positive_thresh=1e-05, max_trials=50)¶

Bases: mrec.mf.model.warp.WARP

Learn low-dimensional embedding optimizing the WARP loss.

Parameters :

d : int

Embedding dimension.

gamma : float

Learning rate.

C : float

Regularization constant.

max_iters : int

Maximum number of SGD updates.

validation_iters : int

Number of SGD updates between checks for stopping condition.

batch_size : int

Mini batch size for SGD updates.

positive_thresh: float :

Training entries below this are treated as zero.

max_trials : int

Number of attempts allowed to find a violating negative example during training updates. This means that in practice we optimize for ranks 1 to max_trials-1.

Attributes

U_	numpy.ndarray	Row factors.
V_	numpy.ndarray	Column factors.
W_	numpy.ndarray	Item feature factors.

Methods

`compute_updates`(train, decomposition, updates)
`estimate_precision`(decomposition, train, ...)	Compute prec@k for a sample of training rows.
`estimate_warp_loss`(train, u, N)
`fit`(train, X[, validation])	Learn embedding from training set.
`precompute_warp_loss`(num_cols)	Precompute WARP loss for each possible rank:
`sample`(train, decomposition)

fit(train, X, validation=None)¶

Learn embedding from training set. A suitable dot product of the factors reconstructs the training matrix approximately, minimizing the WARP ranking loss relative to the original data.

Parameters :

train : scipy.sparse.csr_matrix

Training matrix to be factorized.

X : array_like, shape = [num_cols, num_features]

Item features.

validation : dict or int

Validation set to control early stopping, based on precision@30. The dict should have the form row->[cols] where the values in cols are those we expected to be highly ranked in the reconstruction of row. If an int is supplied then instead we evaluate precision against the training data for the first validation rows.

Returns :

self : object

This model itself.

sample(train, decomposition)¶

class mrec.mf.model.warp2.WARP2BatchUpdate(batch_size, num_features, d)¶

Bases: mrec.mf.model.warp.WARPBatchUpdate

Collection of arrays to hold a batch of sgd updates.

Methods

`clear`()
`set_update`(ix, update)

clear()¶

set_update(ix, update)¶

class mrec.mf.model.warp2.WARP2Decomposition(num_rows, num_cols, X, d)¶

Bases: mrec.mf.model.warp.WARPDecomposition

Joint matrix and feature embedding optimizing the WARP loss.

Parameters :

num_rows : int

Number of rows in the full matrix.

num_cols : int

Number of columns in the full matrix.

X : array_like, shape = [num_cols, num_features]

Features describing each column in the matrix.

d : int

The embedding dimension.

Methods

`apply_matrix_update`(W, dW, gamma, C)
`apply_updates`(updates, gamma, C)
`compute_gradient_step`(u, i, j, L)	Compute a gradient step from results of sampling.
`reconstruct`(rows)

apply_matrix_update(W, dW, gamma, C)¶

apply_updates(updates, gamma, C)¶

compute_gradient_step(u, i, j, L)¶

Compute a gradient step from results of sampling.

Parameters :

u : int

The sampled row.

i : int

The sampled positive column.

j : int

The sampled violating negative column i.e. U[u].V[j] is currently too large compared to U[u].V[i]

L : int

The number of trials required to find a violating negative column.

Returns :

u : int

As input.

i : int

As input.

j : int

As input.

dU : numpy.ndarray

Gradient step for U[u].

dV_pos : numpy.ndarray

Gradient step for V[i].

dV_neg : numpy.ndarray

Gradient step for V[j].

dW : numpy.ndarray

Gradient step for W.

reconstruct(rows)¶

Table Of Contents

This Page

mrec.mf.model Package¶

`warp` Module¶

`warp2` Module¶

Navigation

Table Of Contents

This Page

Quick search

mrec.mf.model Package¶

warp Module¶

warp2 Module¶

Navigation

`warp` Module¶

`warp2` Module¶