System summary personalized recommendation system

Most people have heard of personalized recommendations, and know thousands of people, so what is the personalized recommendation system? I have done a little summary recently.

Nowadays, people face increasing information overload problems. Good personalized recommendations will improve the user experience, improve the efficiency of users using the products to complete tasks, better retain users, and further expand product profitability.

For some e-commerce products, personalized recommendations can also help reduce the effects of the Matthew effect and the long tail effect, resulting in higher utilization of goods and higher profitability.

【Note】

Matthew effect: The hot things in the product will be seen by more people, the hot things will become more popular, and the unpopular things will be more unpopular.

Long tail theory: Under certain conditions, the market share of products with low demand and low sales can be compared with the market share of mainstream products.

The explanation for the recommendation system is divided into 4 parts.

4 aspects, the system summarizes the personalized recommendation system

First, the common recommendation algorithm principle (time, position impact)

Some of the most common recommendations are as follows:

Based on content recommendation: Analyze what users have seen (historical content, etc.) and then recommend.

User-based Collaborative Filtering Recommendation (UserCF): Recommends to the user other items that are similar to his interests.

Item-Based Collaborative Filtering Recommendation (ItemCF): Recommends to the user an item that is similar to the item he liked before.

Label-based recommendations: Content has tags, and users are tagged for user behavior, by tagging users or by tagging products to recommend items.

Implicit Semantic Model Recommendation (LFM): An item that matches the user's interest by implicit feature recommendation.

Social recommendation: Let friends recommend items to themselves.

Recommended based on time context: Optimize the recommendation algorithm with the time the user accesses the product, or recommend it based on seasonal seasonal changes. (such as Spring Festival recommended Spring Festival related items)

Location-based recommendations (LARS): recommendations based on the user's geographic location.

The most common ones are the first four recommendations. 7,8 actually adds a layer of weighted filtering based on time and position to the basic recommendation algorithm.

Various recommendation algorithms can be superimposed, and the most precise and intelligent recommendation is given to the user according to the weight adjustment of different algorithms.

(1) Content-based recommendations

Content-based recommendations are the basic recommendation strategy. If you have browsed or purchased some type of content, you are recommended to other content under this type.

The benefits of content-based recommendations are easy to understand, but the lack of recommendation is not smart enough, diversity and novelty.

For example, in the following picture, the user wants to buy a SLR one day, but buying a SLR is not a frequent behavior, and buying a high-end SLR, then the user is recommended to the high-end SLR, the recommended conversion rate will be much lower.

4 aspects, the system summarizes the personalized recommendation system

Or it is recommended according to the browsing history, but if I have already bought the item, and then recommend it to me, the possibility of repeated purchase will be much lower.

4 aspects, the system summarizes the personalized recommendation system

(2) User-based collaborative filtering algorithm:

The user-based collaborative filtering (UserCF) algorithm evaluates the similarity between users by the behavior of different content by users, and makes recommendations based on the similarity between users. This part of the recommendation is essentially something that the user is interested in recommending people like him.

For example, the movies you once liked (multiple viewings) are sci-fi movies, such as Aliens, Terminator, Star Wars, etc. Through data analysis, I have found people who have seen aliens, Terminators, Star Wars, and I. Found that he often watches the Avengers’ electricity

Shadow, then I can recommend you will most likely also like to watch the Avengers, then I can recommend the Avengers Alliance.

The following is a detailed description of UserCF, the rest of the algorithm will be similar:

Use N(u) to represent the collection of items that user u has had positive feedback.

Use N(v) to represent the collection of items that user v has had positive feedback.

Use the jaccard formula to represent the similarity of interest between u and v:

W(uv)=|N(u)∩N(v)|/|N(u)∪N(v)| or use cosine similarity W(uv)=|N(u)∩N(v)|/ √|N(u)||N(v)|

The corresponding table is as follows, the meaning of the table is that user A has had an act on the item {a, b, c}, is interested in {a, b, c}, and user B is interested in {a, c}. of

4 aspects, the system summarizes the personalized recommendation system

Then use the cosine formula to calculate the similarity of interest between user A and user B is W(ab)=|{a,b,c}∩{a,c}|/√|{a,b,c}||{a, c}|=1/√6

In fact, many users do not act on the same item, ie |N(u)∩N(v)|=0. In order to optimize this situation, we can first calculate |N(u)∩N (v)|≠0 user (u,v) divided by denominator √|N(u)||N(v)|

First, you need to create an inverted list of items to the user. For each item, save the list of users whose behavior has been generated, so that the sparse matrix C[u][v]=|N(u)∩N(v)|, assuming User u and user v belong to the user list corresponding to the K items in the inverted list.

That is, C[u][v]=K, then scan the user list corresponding to each item in the inverted list, and add 1 to the C[u][v] corresponding to the two users in the user list, and finally all the users can be obtained. C[u][v] not between 0

4 aspects, the system summarizes the personalized recommendation system

As shown in the figure, a 4X4 user similarity matrix is ​​established. For item a, add W[A][B] and W[B][A], and for item b, W[A][C] and W [C][A] plus 1, after scanning all the items, you can get the final W matrix, where W is the cosine

The numerator of the similarity formula is divided by the denominator √|N(u)||N(v)| to obtain the final user interest similarity.

After obtaining the user similarity degree, according to the UserCF algorithm, the user is recommended to the K-like items that are most similar to his interests. The following formula calculates the degree of interest of the user u in the UserCF for the item i, and the formula is as follows:

4 aspects, the system summarizes the personalized recommendation system

S(u,k): the closest user K to the user u interest

N(i): a collection of users who have behavior on item i

W(uv): interest similarity between user u and user v

Rvi: User v's interest in item i, because the implicit feedback data of a single behavior is used, so all rvi=1

The above algorithm formula is still relatively rough. If two people purchase the same item, it does not mean that their interests must be the same, so the algorithm can be improved to improve the performance of the algorithm.

4 aspects, the system summarizes the personalized recommendation system

The new formula will punish the impact of popular items in the user u and user v common interest lists on their similarity by derating.

Different algorithms have different effects, and there are different limitations and shortcomings. In the use, users who combine products should constantly adjust and optimize to achieve the best results.

UserCF limitations and disadvantages: If the number of users is getting larger, the similarity matrix between users, the time and complexity of system operation, and the overall cost will increase greatly.

(3) Object-based collaborative filtering algorithm

The Item-Based Collaborative Filtering (ItemCF) algorithm calculates the similarity between items by analyzing the user's behavior record. For example, item A and item B have a large similarity because users who like A mostly like item B.

For example, in the picture below, I used to search for a tabletop lucky cat, and then the system recommended me a motorcycle model that is also a desktop decoration.

4 aspects, the system summarizes the personalized recommendation system

Calculate the similarity between items

Generate a recommendation list based on the similarity between items and the user's historical behavior

W(i,j)=|N(i)∩N(j)|/√|N(i)||N(j)|

N(i) and N(j) represent the number of users who like item i. The algorithm structure of ItemCF is basically similar to the algorithm of UserCF, which is not explained too much here.

The algorithm is not omnipotent, it needs to be constantly adjusted and optimized, or the algorithm is simplified according to the shape.

UserCF's recommendations are more social, reflecting the popularity of items in the small interest groups that users are in, faster.

ItemCF's recommendations are more personalized, reflecting the user's own interests, and user interests need to be stable and durable.

UserCF

Performance: Applicable to occasions with few users, users have more cost to calculate similar matrix

Field: A field with strong effectiveness and less obvious user interest

Real-time: Users have new behaviors that do not necessarily result in immediate changes in recommendation results

Cold start: New users can't personally recommend them to a small number of items. After the new item is launched, once the user has acted on the item, the new item can be recommended to other users who have similar interests to the user who generated the behavior.

ItemCF

Performance: Applicable when the number of items is significantly smaller than the number of users. If there are more items, the matrix calculation is costly.

Field: A field where long-tailed items are abundant and users have strong personal needs

Real-time: Users have new behaviors that will definitely lead to real-time changes in recommendation results.

Cold start: A new user can recommend other items related to the item as long as he or she acts on an item. New items cannot be recommended to the user without updating the item similarity table offline.

Some limitations of the two algorithms:

If an item is too popular, it may appear in all recommendations, you need to punish the hot item, the penalty formula xxxx

There is often a high degree of similarity between the hottest items in different fields. (User data alone cannot solve this problem)

(4), label-based recommendations

Label-based recommendations are generally divided into two types, one is to label some features of the user, and the other is to let the user label the items themselves. Here, the user mainly labels the items (UGC).

The UGC-based label recommendation mainly uses the user's labeling behavior as the recommended item, and the user also provides the label suitable for the item when the user labels the item. The user describes the object with a label, which is an important data source that reflects the user's interests.

A data set of user behavior is generally represented by a set of triples, where the record {u, i, b} indicates that the user u has tagged the item i (of course, the user attribute, the item attribute, etc. are actually included, complex).

– (The specific algorithm is hidden here, understand the principle) –

There are several ways to provide labels to users:

Recommend users to the most popular tags in the system

Recommend the most popular tags on item i to users

Recommend users the labels they often use

Combine Method 2 and Method 3, linearly weight the above recommendation results by a coefficient to generate the final recommendation result.

Common label-based (UGC) recommendations are Douban:

4 aspects, the system summarizes the personalized recommendation system

(5), implicit semantic model LFM

The core idea of ​​LFM: to connect user interests and items through implicit features

For a user, first get his interest classification, and then select the items he might like from the category. To get the classification of the items he likes, you need to consider three questions:

1. How to classify items?

At present, it is relatively simple to classify items by hand and according to different item classification methods.

In addition, through implicit semantic analysis technology, automatic clustering based on user behavior statistics is used to solve this problem. The famous models and methods include pLSA, LDA, implicit category model, implicit topic model, matrix analysis and so on.

2. How do you determine which items the user is interested in and how much interest they are interested in?

The user behavior of the recommendation system is divided into implicit feedback and explicit feedback. The main discussion is the implicit feedback data set. This data set has only positive samples (what the user likes), no negative samples (the user is not interested in what items), Applying LFM to the recommended problem on the implicit feedback dataset requires generating negative samples for each user. There are several ways to do this:

For a user, use an item that he has not acted as a negative sample

For a user, evenly sample some items as negative samples with items that he has not done.

For a user, sample some items from the items he has not acted as a negative sample. When sampling, ensure that the number of positive and negative samples per user is equivalent.

For a user, some items are sampled from the items he has not acted as a negative sample, and when sampling, the items that are not popular are emphasized.

Some principles for taking negative samples:

For each user, the balance of positive and negative samples is guaranteed during sampling.

For each user to sample a negative sample, select those items that are popular and that the user has no behavior.

The above method is combined with the user behavior frequency calculation to determine the items and degrees of interest of the user.

3. For a given class, choose which items belonging to this class are recommended to the user, and how to determine the weight of those items in a class?

The main solution to this problem is to use the calculation algorithm of 1.2 calculation results, calculate and adjust the weight of different items according to the algorithm, and continuously optimize the parameters in the algorithm through iteration.

The important parameters in LFM are (only understand, you need to combine the algorithm formula):

Number of hidden features F

Algorithm learning rate

Regularization parameter

Negative sample/positive sample ratio

LFM has the ability to learn and implement self-learning to continuously optimize the model.

(6), social recommendation

According to an agency survey, when purchasing items, about 90% of users will trust their friends' recommendations, and 70% of users will trust other users on the Internet to comment on the products.

The most obvious social recommendation in the Internet is to use social network data for recommendation. The use of social network data recommendation can generally start from the following aspects:

Email social relationship information

User registration information

User's location data, web IP and mobile phone GPS

Discussion groups and forums

List of friends in the chat tool

Friend relationship data in social networking sites

Socialized recommendations based on social information can use the relationship of friends to solve some of the cold start problems.

Situation 1. You enter through the sharing of friends, your friends have been playing on the website for a long time, there are recommended data. Since you didn't have any data on the site before, then if I want to recommend items to you, I can recommend something you might like based on your friend's recommendation list.

Situation 2. If you have just arrived at a website and you have no friends, I would like to give you a social recommendation. You can recommend friends based on your registration information, location, and common interests, and then give you a friend recommendation.

(7), recommended according to the time context

The context includes the time, location, mood, etc. of the user's access to the recommendation system. The recommendation according to the time context is to be able to accurately predict the user's interest at a certain moment or a certain moment.

For example, when e-commerce products are selling clothes, the clothes recommended in winter are different from those recommended in summer. As shown below, Taobao's recommendation in winter:

4 aspects, the system summarizes the personalized recommendation system

The impact of time information on user interest is mainly reflected in the following aspects:

User's interest is changing

Items are life cycle

Seasonal effect

After considering the time information, the recommendation system also changed from a static system to a time-varying system, and the user behavior data also became a time series.

After a given data set, you can study the time characteristics of the recommendation system by counting the following information:

The growth of the number of independent users per day in the data set (stationary phase, growth phase, fading phase, etc.)

System item changes

User access

Recommended system real-time

The user's interest is constantly changing. The change is reflected in the user's increasing new behavior. A real-time recommendation system needs to be able to respond to the user's new behavior in real time, so that the recommendation list is constantly changing to meet the changing interests of the user.

The real-time recommendation system should satisfy:

Access to user behavior is real-time (calculated when the user accesses the recommendation system).

The recommendation algorithm itself is real-time (taking into account the user's recent behavior and long-term behavior).

The time diversity of the recommended algorithm: the degree of change in the recommended results of the recommended system every day. In some recommendation systems, users often see different recommendation results.

Time context recommendation algorithm

Recommend the latest and most popular items

The ItemCF algorithm of the time context uses the user behavior to calculate the similarity between items offline, and makes online personalized recommendations based on the user's historical behavior and the item similarity matrix.

Item similarity calculation: The user has a higher similarity in items that he likes in a short period of time.

Online recommendation: The user's recent behavior is more reflective of the user's current interests than the user's long-term behavior.

3. Time above UserCF algorithm

User's interest similarity calculation: If two users like the same item at the same time, the similarity of interest is greater.

Recent behavior of similar interest users (recommended items that users like their interests have recently liked).

(8), based on geographical location recommendations

The location-based recommendation algorithm (LARS) will recommend according to the country, city, and street search rules of the user, and find the user location and interest-related features, including interest localization and activity localization.

The basic idea of ​​LARS is to divide the data set into a number of subsets according to the user's location. The location is a tree structure, such as the structure of the country, province, city, district, and county, so the data set is also divided into a tree structure.

According to the user's location, it is assigned to a leaf node, which will include all user behavior data sets in the same location.

LARS will use the user behavior data on the leaf node to recommend the user through ItemCF or UserCF.

The data set will include a record of (user, user location, item, item location, item rating)

For example, the recommendation of the public comment:

4 aspects, the system summarizes the personalized recommendation system

Second, the cold start problem of the recommended system

The cold start problem of the recommended system means that when there is no user behavior or item data after the recommendation system is deployed, the recommendation system cannot recommend items to the user based on the user behavior data. Generally divided into user cold start, cold start of items and cold start of system.

There are usually ways to alleviate the cold start problem

1. Using user registration information recommendation: that is, obtaining the user's registration information, then classifying the user, and recommending to the user the items that may be of interest in the category to which he belongs. Add the associated query results according to a weight. The more user information you use, the more accurate you can.

Match the user interest.

2. Give the user some content Select the appropriate item to start the user's interest: select some popular, representative, differentiated, and diverse items to recommend to the user.

3. Use the content information of the item to recommend to the user: You can manually filter out the items recommended by the user.

Third, recommend the system architecture

4 aspects, the system summarizes the personalized recommendation system

If various user behaviors, features, and tasks are taken into account in a system, the system can be very complex and difficult to configure. Therefore, the recommendation system needs to be composed of multiple recommendation engines. Each recommendation engine is responsible for one type of special and one task, and the recommendation system is only

The results of the recommendation engine are merged, sorted, and returned according to a certain weight or priority.

The advantage of this is that each engine represents a recommendation rate that can be optimized by a single engine adjustment.

How to design a recommendation engine becomes a core part of the recommended system design.

4 aspects, the system summarizes the personalized recommendation system

Module A: Get user behavior data from the database or cache, and generate the feature vector of the current user by analyzing different behaviors.

Module B: Converts the user's feature vector into a list of initial recommended items by feature-item related proof.

Module C: Filtering, ranking, etc. the initial recommendation list to generate the final recommendation result.

Generating user feature vectors: User feature vectors generally include two types:

Extracting from the user's registration information, including the demographic characteristics of the user, etc., directly obtains the user's bad data generation feature vector when recommending.

Calculated from user behavior

Generate features through user behavior (you need to consider the following):

The type of user behavior (users can have many different behaviors on items).

The time the user's behavior occurred (the recent behavior is more important).

The number of user behaviors (the higher the weight of items with a higher number of behaviors).

The popularity of the item (the user may behave in a behavior that is very popular), and the recommendation engine will increase the weight of the corresponding feature of the non-hot item when generating the user feature.

IC SOCKET & Machined SOCKET

IC SOCKET & Machined SOCKET

IC SOCKET & Machined SOCKET

ATKCONN ELECTRONICS CO., LTD , https://www.atkconn.com

This entry was posted in on