Sparse data structures and convenience methods to load sparse matrices from file.
Bases: object
Adds fast columnar reads and updates to a scipy.sparse.csr_matrix, at the cost of keeping a csc_matrix of equal size as a column-wise index into the same raw data. It is updateable in the sense that you can change the values of all the existing non- zero entries in a given column. Trying to set other entries will result in an error.
For other functionality you are expected to call methods on the underlying csr_matrix:
>>> fsm = fast_sparse_matrix(data) # data is a csr_matrix
>>> col = fsm.fast_get_col(2) # get a column quickly
>>> row = fsm.X[1] # get a row as usual
Methods
ensure_sparse_cols(max_density[, remove_lowest]) | Ensure that no column of the matrix excess the specified density, setting excess entries to zero where necessary. |
fast_get_col(j) | Return column j of the underlying matrix. |
fast_update_col(j, vals) | Update values of existing non-zeros in column of the underlying matrix. |
load(filepath) | Load a fast_sparse_matrix from file written by fast_sparse_matrix.save(). |
loadmm(filepath) | Create a fast_sparse_matrix from matrixmarket data. |
loadtxt(filepath[, comments, delimiter, ...]) | Create a fast_sparse_matrix from simply formatted data such as TSV, handles |
save(filepath) | Save to file as arrays in numpy binary format. |
Ensure that no column of the matrix excess the specified density, setting excess entries to zero where necessary.
This can be useful to avoid popularity bias in collaborative filtering, by pruning the number of users for popular items:
>>> num_users,num_items = train.shape
>>> f = fast_sparse_matrix(train)
>>> f.ensure_sparse_cols(max_density=0.01)
Now any item in train has non-zero ratings from at most 1% of users.
Parameters : | max_density : float
remove_lowest : boolean (default: True)
|
---|
Return column j of the underlying matrix.
Parameters : | j : int
|
---|---|
Returns : | col : scipy.sparse.csc_matrix
|
Update values of existing non-zeros in column of the underlying matrix.
Parameters : | j : int
vals : array like
|
---|
Load a fast_sparse_matrix from file written by fast_sparse_matrix.save().
Parameters : | filepath : str
|
---|
Create a fast_sparse_matrix from matrixmarket data.
Parameters : | filepath : file or str
|
---|---|
Returns : | mat : mrec.sparse.fast_sparse_matrix
|
Create a fast_sparse_matrix from simply formatted data such as TSV, handles similar input to numpy.loadtxt().
Parameters : | filepath : file or str
comments : str, optional
delimiter : str, optional
skiprows : int, optional
usecols : sequence, optional
index_offset : int, optional
|
---|---|
Returns : | mat : mrec.sparse.fast_sparse_matrix
|
Save to file as arrays in numpy binary format.
Parameters : | filepath : str
|
---|
Return the shape of the underlying matrix.
Load a scipy sparse matrix from simply formatted data such as TSV, handles similar input to numpy.loadtxt().
Parameters : | filepath : file or str
comments : str, optional
delimiter : str, optional
skiprows : int, optional
usecols : sequence, optional
index_offset : int, optional
|
---|---|
Returns : | mat : scipy.sparse.csr_matrix
|
Load a sparse matrix saved to file with savez.
Parameters : | file : str
|
---|---|
Returns : | mat : scipy.sparse.coo_matrix
|
Save a sparse matrix to file in numpy binary format.
Parameters : | d : scipy.sparse.coo_matrix
file : str or file
|
---|
Trivial unpersonalized item popularity recommender intended to provide a baseline for evaluations.
Bases: mrec.base_recommender.BaseRecommender
Create an unpersonalized item popularity recommender, useful to provide a baseline for comparison with a “real” one.
Parameters : | method : ‘count’, ‘sum’, ‘avg’ or ‘thresh’ (default: ‘count’)
thresh : float, optional
|
---|
Methods
batch_recommend_items(dataset[, max_items, ...]) | Recommend new items for all users in the training dataset. |
fit(dataset[, item_features]) | Compute the most popular items using the method specified in the constructor. |
load(filepath) | Load a recommender model from file after it has been serialized with save(). |
range_recommend_items(dataset, user_start, ...) | Recommend new items for a range of users in the training dataset. |
read_recommender_description(filepath) | Read a recommender model description from file after it has been saved by save(), without loading any additional associated data into memory. |
recommend_items(dataset, u[, max_items, ...]) | Recommend new items for a user. |
save(filepath) | Serialize model to file. |
Compute the most popular items using the method specified in the constructor.
Parameters : | dataset : scipy sparse matrix or mrec.sparse.fast_sparse_matrix
item_features : array_like, shape = [num_items, num_features]
|
---|
Recommend new items for a user. Assumes you’ve already called fit().
Parameters : | dataset : scipy.sparse.csr_matrix
u : int
max_items : int
return_scores : bool
item_features : array_like, shape = [num_items, num_features]
|
---|---|
Returns : | recs : list
|
Recommender that gets candidates using an item similarity model and then reranks them using a matrix factorization model.
Bases: mrec.base_recommender.BaseRecommender
A secondary recommender that combines an item similarity model and a matrix factorization one. The item similarity model is used to select candidate items for each user which are then reranked based on their latent factors.
Parameters : | item_similarity_recommender : mrec.item_similarity.recommender.ItemSimilarityRecommender
mf_recommender : mrec.mf.recommender.MatrixFactorizationRecommender
num_candidates : int (default: 100)
|
---|
Methods
batch_recommend_items(dataset[, max_items, ...]) | Recommend new items for all users in the training dataset. |
fit(train[, item_features]) | Fit both models to the training data. |
load(filepath) | Load a recommender model from file after it has been serialized with save(). |
range_recommend_items(dataset, user_start, ...) | Recommend new items for a range of users in the training dataset. |
read_recommender_description(filepath) | Read a recommender model description from file after it has been saved by save(), without loading any additional associated data into memory. |
recommend_items(dataset, u[, max_items, ...]) | Recommend new items for a user. |
rerank(u, candidates, max_items, return_scores) | Use latent factors to rerank candidate recommended items for a user and return the highest scoring. |
save(filepath) | Serialize model to file. |
Recommend new items for all users in the training dataset. Assumes you’ve already called fit() to learn the similarity matrix.
Parameters : | dataset : scipy.sparse.csr_matrix
max_items : int
return_scores : bool
show_progress: bool :
item_features : array_like, shape = [num_items, num_features]
|
---|---|
Returns : | recs : list of lists
|
Fit both models to the training data.
Parameters : | train : scipy.sparse.csr_matrix, shape = [num_users, num_items]
item_features : array_like, shape = [num_items, num_features]
|
---|
Notes
You are not obliged to call this, alternatively you can pass ready trained models to the RerankingRecommender constructor.
Recommend new items for a range of users in the training dataset. Assumes you’ve already called fit() to learn the similarity matrix.
Parameters : | dataset : scipy.sparse.csr_matrix
user_start : int
user_end : int
max_items : int
return_scores : bool
item_features : array_like, shape = [num_items, num_features]
|
---|---|
Returns : | recs : list of lists
|
Recommend new items for a user.
Parameters : | dataset : scipy.sparse.csr_matrix
u : int
max_items : int
return_scores : bool
item_features : array_like, shape = [num_items, num_features]
|
---|---|
Returns : | recs : list
|
Use latent factors to rerank candidate recommended items for a user and return the highest scoring.
Parameters : | u : int
candidates : array like
max_items : int
return_scores : bool
|
---|---|
Returns : | recs : list
|
Bases: object
Minimal interface to be implemented by recommenders, along with some helper methods. A concrete recommender must implement the recommend_items() method and should provide its own implementation of __str__() so that it can be identified when printing results.
Notes
In most cases you should inherit from either mrec.mf.recommender.MatrixFactorizationRecommender or mrec.item_similarity.recommender.ItemSimilarityRecommender and not directly from this class.
These provide more efficient implementations of save(), load() and the batch methods to recommend items.
Methods
batch_recommend_items(dataset[, max_items, ...]) | Recommend new items for all users in the training dataset. |
fit(train[, item_features]) | Train on supplied data. |
load(filepath) | Load a recommender model from file after it has been serialized with save(). |
range_recommend_items(dataset, user_start, ...) | Recommend new items for a range of users in the training dataset. |
read_recommender_description(filepath) | Read a recommender model description from file after it has been saved by save(), without loading any additional associated data into memory. |
recommend_items(dataset, u[, max_items, ...]) | Recommend new items for a user. |
save(filepath) | Serialize model to file. |
Recommend new items for all users in the training dataset.
Parameters : | dataset : scipy.sparse.csr_matrix
max_items : int
return_scores : bool
show_progress: bool :
item_features : array_like, shape = [num_items, num_features]
|
---|---|
Returns : | recs : list of lists
|
Notes
This provides a default implementation, you will be able to optimize this for most recommenders.
Train on supplied data. In general you will want to implement this rather than computing recommendations on the fly.
Parameters : | train : scipy.sparse.csr_matrix or mrec.sparse.fast_sparse_matrix, shape = [num_users, num_items]
item_features : array_like, shape = [num_items, num_features]
|
---|
Load a recommender model from file after it has been serialized with save().
Parameters : | filepath : str
|
---|
Recommend new items for a range of users in the training dataset.
Parameters : | dataset : scipy.sparse.csr_matrix
user_start : int
user_end : int
max_items : int
return_scores : bool
item_features : array_like, shape = [num_items, num_features]
|
---|---|
Returns : | recs : list of lists
|
Notes
This provides a default implementation, you will be able to optimize this for most recommenders.
Read a recommender model description from file after it has been saved by save(), without loading any additional associated data into memory.
Parameters : | filepath : str
|
---|
Recommend new items for a user.
Parameters : | dataset : scipy.sparse.csr_matrix
u : int
max_items : int
return_scores : bool
item_features : array_like, shape = [num_items, num_features]
|
---|---|
Returns : | recs : list
|
Serialize model to file.
Parameters : | filepath : str
|
---|
Notes
Internally numpy.savez may be used to serialize the model and this would add the ‘.npz’ suffix to the supplied filepath if it were not already present, which would most likely cause errors in client code.