z-logo
open-access-imgOpen Access
The identifying hidden data features problem solution
Author(s) -
S.Yu. Petrova,
M A Boikova
Publication year - 2019
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1352/1/012039
Subject(s) - discretization , computer science , spark (programming language) , matrix decomposition , collaborative filtering , recommender system , data stream , factorization , data stream mining , matrix (chemical analysis) , rank (graph theory) , data mining , raw data , algorithm , machine learning , mathematics , mathematical analysis , telecommunications , eigenvalues and eigenvectors , physics , materials science , quantum mechanics , combinatorics , composite material , programming language
In the article, we considered recommender models based on matrix factorization demonstrate excellent performance in collaborative filtering. The standard Matrix Factorization approach in MLlib deals with clear ratings. To work with implicit data, we used the trainImplicit method. To simulate the processing of real-time data streams, we used the Spark Streaming library, which is responsible for receiving data from the input source and converting the raw data into a discretized stream discretized stream (DStream) consisting of Spark RDD. The rank parameter determines the number of hidden features in the low rank approximation matrices. As a rule, the greater the number of factors, the better, but for a large number of users or elements, it will directly affect the memory usage of the computing system and the amount of data required for training. Therefore, in our problem it was a compromise solution.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here