Table Of Contents

Previous topic

API documentation

This Page

mrec Package

sparse Module

Sparse data structures and convenience methods to load sparse matrices from file.

class mrec.sparse.fast_sparse_matrix(X, col_view=None)

Bases: object

Adds fast columnar reads and updates to a scipy.sparse.csr_matrix, at the cost of keeping a csc_matrix of equal size as a column-wise index into the same raw data. It is updateable in the sense that you can change the values of all the existing non- zero entries in a given column. Trying to set other entries will result in an error.

For other functionality you are expected to call methods on the underlying csr_matrix:

>>> fsm = fast_sparse_matrix(data) # data is a csr_matrix
>>> col = fsm.fast_get_col(2)      # get a column quickly
>>> row = fsm.X[1]                 # get a row as usual

Methods

ensure_sparse_cols(max_density[, remove_lowest]) Ensure that no column of the matrix excess the specified density, setting excess entries to zero where necessary.
fast_get_col(j) Return column j of the underlying matrix.
fast_update_col(j, vals) Update values of existing non-zeros in column of the underlying matrix.
load(filepath) Load a fast_sparse_matrix from file written by fast_sparse_matrix.save().
loadmm(filepath) Create a fast_sparse_matrix from matrixmarket data.
loadtxt(filepath[, comments, delimiter, ...]) Create a fast_sparse_matrix from simply formatted data such as TSV, handles
save(filepath) Save to file as arrays in numpy binary format.
ensure_sparse_cols(max_density, remove_lowest=True)

Ensure that no column of the matrix excess the specified density, setting excess entries to zero where necessary.

This can be useful to avoid popularity bias in collaborative filtering, by pruning the number of users for popular items:

>>> num_users,num_items = train.shape
>>> f = fast_sparse_matrix(train)
>>> f.ensure_sparse_cols(max_density=0.01)

Now any item in train has non-zero ratings from at most 1% of users.

Parameters :

max_density : float

The highest allowable column-wise density. A value of one or more is treated as an absolute limit on the number of non-zero entries in a column, while a value of less than one is treated as a density i.e. a proportion of the overall number of rows.

remove_lowest : boolean (default: True)

If true then excess entries to be set to zero in a column are chosen lowest first, otherwise they are selected randomly.

fast_get_col(j)

Return column j of the underlying matrix.

Parameters :

j : int

Index of column to get.

Returns :

col : scipy.sparse.csc_matrix

Copy of column j of the matrix.

fast_update_col(j, vals)

Update values of existing non-zeros in column of the underlying matrix.

Parameters :

j : int

Index of the column to update.

vals : array like

The new values to be assigned, must satisfy len(vals) == X[:,j].nnz i.e. this method can only change the value of existing non-zero entries of column j, it cannot add new ones.

static load(filepath)

Load a fast_sparse_matrix from file written by fast_sparse_matrix.save().

Parameters :

filepath : str

The filepath to load.

static loadmm(filepath)

Create a fast_sparse_matrix from matrixmarket data.

Parameters :

filepath : file or str

The matrixmarket file to read.

Returns :

mat : mrec.sparse.fast_sparse_matrix

A fast_sparse_matrix holding the data in the file.

static loadtxt(filepath, comments='#', delimiter=None, skiprows=0, usecols=None, index_offset=1)

Create a fast_sparse_matrix from simply formatted data such as TSV, handles similar input to numpy.loadtxt().

Parameters :

filepath : file or str

File containing simply formatted row,col,val sparse matrix data.

comments : str, optional

The character used to indicate the start of a comment (default: #).

delimiter : str, optional

The string used to separate values. By default, this is any whitespace.

skiprows : int, optional

Skip the first skiprows lines; default: 0.

usecols : sequence, optional

Which columns to read, with 0 being the first. For example, usecols = (1,4,5) will extract the 2nd, 5th and 6th columns. The default, None, results in all columns being read.

index_offset : int, optional

Offset applied to the row and col indices in the input data (default: 1). The default offset is chosen so that 1-indexed data on file results in a fast_sparse_matrix holding 0-indexed matrices.

Returns :

mat : mrec.sparse.fast_sparse_matrix

A fast_sparse_matrix holding the data in the file.

save(filepath)

Save to file as arrays in numpy binary format.

Parameters :

filepath : str

The filepath to write to.

shape

Return the shape of the underlying matrix.

mrec.sparse.loadtxt(filepath, comments='#', delimiter=None, skiprows=0, usecols=None, index_offset=1)

Load a scipy sparse matrix from simply formatted data such as TSV, handles similar input to numpy.loadtxt().

Parameters :

filepath : file or str

File containing simply formatted row,col,val sparse matrix data.

comments : str, optional

The character used to indicate the start of a comment (default: #).

delimiter : str, optional

The string used to separate values. By default, this is any whitespace.

skiprows : int, optional

Skip the first skiprows lines; default: 0.

usecols : sequence, optional

Which columns to read, with 0 being the first. For example, usecols = (1,4,5) will extract the 2nd, 5th and 6th columns. The default, None, results in all columns being read.

index_offset : int, optional

Offset applied to the row and col indices in the input data (default: 1). The default offset is chosen so that 1-indexed data on file results in a fast_sparse_matrix holding 0-indexed matrices.

Returns :

mat : scipy.sparse.csr_matrix

The sparse matrix.

mrec.sparse.loadz(file)

Load a sparse matrix saved to file with savez.

Parameters :

file : str

The open file or filepath to read from.

Returns :

mat : scipy.sparse.coo_matrix

The sparse matrix.

mrec.sparse.savez(d, file)

Save a sparse matrix to file in numpy binary format.

Parameters :

d : scipy.sparse.coo_matrix

The sparse matrix to save.

file : str or file

Either the file name (string) or an open file (file-like object) where the matrix will be saved. If file is a string, the .npz extension will be appended to the file name if it is not already there.

popularity Module

Trivial unpersonalized item popularity recommender intended to provide a baseline for evaluations.

class mrec.popularity.ItemPopularityRecommender(method='count', thresh=0)

Bases: mrec.base_recommender.BaseRecommender

Create an unpersonalized item popularity recommender, useful to provide a baseline for comparison with a “real” one.

Parameters :

method : ‘count’, ‘sum’, ‘avg’ or ‘thresh’ (default: ‘count’)

How to calculate the popularity of an item based on its ratings from all users: count - popularity is its total number of ratings of any value sum - popularity is the sum of its ratings avg - popularity is its mean rating thresh - popularity is its number of ratings higher than thresh

thresh : float, optional

The threshold used by the ‘thresh’ method of calculating item popularity.

Methods

batch_recommend_items(dataset[, max_items, ...]) Recommend new items for all users in the training dataset.
fit(dataset[, item_features]) Compute the most popular items using the method specified in the constructor.
load(filepath) Load a recommender model from file after it has been serialized with save().
range_recommend_items(dataset, user_start, ...) Recommend new items for a range of users in the training dataset.
read_recommender_description(filepath) Read a recommender model description from file after it has been saved by save(), without loading any additional associated data into memory.
recommend_items(dataset, u[, max_items, ...]) Recommend new items for a user.
save(filepath) Serialize model to file.
fit(dataset, item_features=None)

Compute the most popular items using the method specified in the constructor.

Parameters :

dataset : scipy sparse matrix or mrec.sparse.fast_sparse_matrix

The user-item matrix.

item_features : array_like, shape = [num_items, num_features]

Features for items in training set, ignored here.

recommend_items(dataset, u, max_items=10, return_scores=True, item_features=None)

Recommend new items for a user. Assumes you’ve already called fit().

Parameters :

dataset : scipy.sparse.csr_matrix

User-item matrix containing known items.

u : int

Index of user for which to make recommendations (for compatibility with other recommenders).

max_items : int

Maximum number of recommended items to return.

return_scores : bool

If true return a score along with each recommended item.

item_features : array_like, shape = [num_items, num_features]

Features for items in training set, ignored here.

Returns :

recs : list

List of (idx,score) pairs if return_scores is True, else just a list of idxs.

reranking_recommender Module

Recommender that gets candidates using an item similarity model and then reranks them using a matrix factorization model.

class mrec.reranking_recommender.RerankingRecommender(item_similarity_recommender, mf_recommender, num_candidates=100)

Bases: mrec.base_recommender.BaseRecommender

A secondary recommender that combines an item similarity model and a matrix factorization one. The item similarity model is used to select candidate items for each user which are then reranked based on their latent factors.

Parameters :

item_similarity_recommender : mrec.item_similarity.recommender.ItemSimilarityRecommender

The model used to select candidates.

mf_recommender : mrec.mf.recommender.MatrixFactorizationRecommender

The model used to rerank them.

num_candidates : int (default: 100)

The number of candidate items drawn from the first model for each user.

Methods

batch_recommend_items(dataset[, max_items, ...]) Recommend new items for all users in the training dataset.
fit(train[, item_features]) Fit both models to the training data.
load(filepath) Load a recommender model from file after it has been serialized with save().
range_recommend_items(dataset, user_start, ...) Recommend new items for a range of users in the training dataset.
read_recommender_description(filepath) Read a recommender model description from file after it has been saved by save(), without loading any additional associated data into memory.
recommend_items(dataset, u[, max_items, ...]) Recommend new items for a user.
rerank(u, candidates, max_items, return_scores) Use latent factors to rerank candidate recommended items for a user and return the highest scoring.
save(filepath) Serialize model to file.
batch_recommend_items(dataset, max_items=10, return_scores=True, item_features=None)

Recommend new items for all users in the training dataset. Assumes you’ve already called fit() to learn the similarity matrix.

Parameters :

dataset : scipy.sparse.csr_matrix

User-item matrix containing known items.

max_items : int

Maximum number of recommended items to return.

return_scores : bool

If true return a score along with each recommended item.

show_progress: bool :

If true print something to stdout to show progress.

item_features : array_like, shape = [num_items, num_features]

Features for items in training set, required by some recommenders.

Returns :

recs : list of lists

Each entry is a list of (idx,score) pairs if return_scores is True, else just a list of idxs.

fit(train, item_features=None)

Fit both models to the training data.

Parameters :

train : scipy.sparse.csr_matrix, shape = [num_users, num_items]

The training user-item matrix.

item_features : array_like, shape = [num_items, num_features]

Features for items in training set, required by some recommenders.

Notes

You are not obliged to call this, alternatively you can pass ready trained models to the RerankingRecommender constructor.

range_recommend_items(dataset, user_start, user_end, max_items=10, return_scores=True, item_features=None)

Recommend new items for a range of users in the training dataset. Assumes you’ve already called fit() to learn the similarity matrix.

Parameters :

dataset : scipy.sparse.csr_matrix

User-item matrix containing known items.

user_start : int

Index of first user in the range to recommend.

user_end : int

Index one beyond last user in the range to recommend.

max_items : int

Maximum number of recommended items to return.

return_scores : bool

If true return a score along with each recommended item.

item_features : array_like, shape = [num_items, num_features]

Features for items in training set, required by some recommenders.

Returns :

recs : list of lists

Each entry is a list of (idx,score) pairs if return_scores is True, else just a list of idxs.

recommend_items(dataset, u, max_items=10, return_scores=True, item_features=None)

Recommend new items for a user.

Parameters :

dataset : scipy.sparse.csr_matrix

User-item matrix containing known items.

u : int

Index of user for which to make recommendations.

max_items : int

Maximum number of recommended items to return.

return_scores : bool

If true return a score along with each recommended item.

item_features : array_like, shape = [num_items, num_features]

Features for items in training set, required by some recommenders.

Returns :

recs : list

List of (idx,score) pairs if return_scores is True, else just a list of idxs.

rerank(u, candidates, max_items, return_scores)

Use latent factors to rerank candidate recommended items for a user and return the highest scoring.

Parameters :

u : int

Index of user for which to make recommendations.

candidates : array like

List of candidate item indices.

max_items : int

Maximum number of recommended items to return.

return_scores : bool

If true return a score along with each recommended item.

Returns :

recs : list

List of (idx,score) pairs if return_scores is True, else just a list of idxs.

mrec.reranking_recommender.main()

base_recommender Module

class mrec.base_recommender.BaseRecommender

Bases: object

Minimal interface to be implemented by recommenders, along with some helper methods. A concrete recommender must implement the recommend_items() method and should provide its own implementation of __str__() so that it can be identified when printing results.

Notes

In most cases you should inherit from either mrec.mf.recommender.MatrixFactorizationRecommender or mrec.item_similarity.recommender.ItemSimilarityRecommender and not directly from this class.

These provide more efficient implementations of save(), load() and the batch methods to recommend items.

Methods

batch_recommend_items(dataset[, max_items, ...]) Recommend new items for all users in the training dataset.
fit(train[, item_features]) Train on supplied data.
load(filepath) Load a recommender model from file after it has been serialized with save().
range_recommend_items(dataset, user_start, ...) Recommend new items for a range of users in the training dataset.
read_recommender_description(filepath) Read a recommender model description from file after it has been saved by save(), without loading any additional associated data into memory.
recommend_items(dataset, u[, max_items, ...]) Recommend new items for a user.
save(filepath) Serialize model to file.
batch_recommend_items(dataset, max_items=10, return_scores=True, show_progress=False, item_features=None)

Recommend new items for all users in the training dataset.

Parameters :

dataset : scipy.sparse.csr_matrix

User-item matrix containing known items.

max_items : int

Maximum number of recommended items to return.

return_scores : bool

If true return a score along with each recommended item.

show_progress: bool :

If true print something to stdout to show progress.

item_features : array_like, shape = [num_items, num_features]

Optionally supply features for each item in the dataset.

Returns :

recs : list of lists

Each entry is a list of (idx,score) pairs if return_scores is True, else just a list of idxs.

Notes

This provides a default implementation, you will be able to optimize this for most recommenders.

fit(train, item_features=None)

Train on supplied data. In general you will want to implement this rather than computing recommendations on the fly.

Parameters :

train : scipy.sparse.csr_matrix or mrec.sparse.fast_sparse_matrix, shape = [num_users, num_items]

User-item matrix.

item_features : array_like, shape = [num_items, num_features]

Features for items in training set, required by some recommenders.

static load(filepath)

Load a recommender model from file after it has been serialized with save().

Parameters :

filepath : str

The filepath to read from.

range_recommend_items(dataset, user_start, user_end, max_items=10, return_scores=True, item_features=None)

Recommend new items for a range of users in the training dataset.

Parameters :

dataset : scipy.sparse.csr_matrix

User-item matrix containing known items.

user_start : int

Index of first user in the range to recommend.

user_end : int

Index one beyond last user in the range to recommend.

max_items : int

Maximum number of recommended items to return.

return_scores : bool

If true return a score along with each recommended item.

item_features : array_like, shape = [num_items, num_features]

Optionally supply features for each item in the dataset.

Returns :

recs : list of lists

Each entry is a list of (idx,score) pairs if return_scores is True, else just a list of idxs.

Notes

This provides a default implementation, you will be able to optimize this for most recommenders.

static read_recommender_description(filepath)

Read a recommender model description from file after it has been saved by save(), without loading any additional associated data into memory.

Parameters :

filepath : str

The filepath to read from.

recommend_items(dataset, u, max_items=10, return_scores=True, item_features=None)

Recommend new items for a user.

Parameters :

dataset : scipy.sparse.csr_matrix

User-item matrix containing known items.

u : int

Index of user for which to make recommendations.

max_items : int

Maximum number of recommended items to return.

return_scores : bool

If true return a score along with each recommended item.

item_features : array_like, shape = [num_items, num_features]

Optionally supply features for each item in the dataset.

Returns :

recs : list

List of (idx,score) pairs if return_scores is True, else just a list of idxs.

save(filepath)

Serialize model to file.

Parameters :

filepath : str

Filepath to write to, which must have the ‘.npz’ suffix.

Notes

Internally numpy.savez may be used to serialize the model and this would add the ‘.npz’ suffix to the supplied filepath if it were not already present, which would most likely cause errors in client code.