An online-updating algorithm on probabilistic matrix factorization with active learning for task recommendation in crowdsourcing systems

Dr. YUEN Man-Ching, Connie; King Irwin; Prof. LEUNG Kwong Sak

Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.11861/9019

Title:	An online-updating algorithm on probabilistic matrix factorization with active learning for task recommendation in crowdsourcing systems
Authors:	Dr. YUEN Man-Ching, Connie King Irwin Prof. LEUNG Kwong Sak
Issue Date:	2016
Source:	Big Data Analytics, 2016, vol. 1, article no. 14.
Journal:	Big Data Analytic
Abstract:	Background To ensure the output quality, current crowdsourcing systems highly rely on redundancy of answers provided by multiple workers with varying expertise, however massive redundancy is very expensive and time-consuming. Task recommendation can help requesters to receive good quality output quicker as well as help workers to find their right tasks faster. To reduce the cost, a number of previous works adopted active learning in crowdsourcing systems for quality assurance. Active learning is a learning approach to achieve certain accuracy with a very low cost. However, previous works do not consider the varying expertise of workers for various task categories in real crowdsourcing scenarios; and they do not consider new workers who are not willing to work on a large amount of tasks before having a list of preferred tasks recommended. In this paper, we propose ActivePMFv2, Probabilistic Matrix Factorization with Active Learning (version 2), on a task recommendation framework called TaskRec to recommend tasks to workers in crowdsourcing systems for quality assurance. By assigning the most uncertain task for new workers to work on, this paper identifies a flaw in our previous ActivePMFv1, Probabilistic Matrix Factorization with Active Learning (version 1). Therefore, ActivePMFv2 can give new workers a list of preferred tasks recommended faster than that of ActivePMFv1. Our factor analysis model considers not only worker task selection preference, but also worker performance history. It actively selects the most uncertain task for the most reliable workers to work on to retrain the classification model. Moreover, we propose a generic online-updating method for learning the model, ActivePMFv2. The larger the profile of a worker (or task) is, the less important is retraining its profile on each new work done. In case of the worker (or task) having large profile, our online-updating algorithm retrains the whole feature vector of the worker (or task) and keeps all other entries in the matrix fixed. Our online-updating algorithm runs batch update to reduce the running time of model update. Results Complexity analysis shows that our model is efficient and is scalable to large datasets. Based on experiments on real-world datasets, the result shows that the MAE results and RMSE results of our proposed ActivePMFv2 are improved up to 29 % and 35 % respectively comparing with ActivePMFv1, where ActivePMFv1 outperforms the PMF with other active learning approaches significantly as shown in previous work. Experiment results show that our online-updating algorithm is accurate in approximating to a full retrain of the learning model while the average runtime of model update for each work done is reduced by more than 80 % (decreases from a few minutes to several seconds). Conclusions To the best of our knowledge, we are the first one to use PMF, active learning and dynamic model update to recommend tasks for quality assurance in crowdsourcing systems for real scenarios.
Type:	Peer Reviewed Journal Article
URI:	http://hdl.handle.net/20.500.11861/9019
ISSN:	2058-6345
DOI:	10.1186/s41044-016-0012-2
Appears in Collections:	Applied Data Science - Publication

Find@HKSYU

Show full item record

Page view(s)

33

Last Week
0

Last month

checked on Jun 29, 2025

Google Scholar^TM

Impact Indices

Page view(s)

Google Scholar^TM

Altmetric

PlumX
Metrics

Publisher copyright policies & self-archiving

Page view(s)

Google ScholarTM

Altmetric

PlumX Metrics

Google Scholar^TM

PlumX
Metrics