TY - JOUR
T1 - Systematically Landing Machine Learning onto Market-Scale Mobile Malware Detection
AU - Gong, Liangyi
AU - Lin, Hao
AU - Li, Zhenhua
AU - Qian, Feng
AU - Li, Yang
AU - Ma, Xiaobo
AU - Liu, Yunhao
N1 - Publisher Copyright:
© 1990-2012 IEEE.
PY - 2021/7/1
Y1 - 2021/7/1
N2 - Despite being crucial to today's mobile ecosystem, app markets have meanwhile become a natural, convenient malware delivery channel as they actually 'lend credibility' to malicious apps. In the past few years, machine learning (ML) techniques have been widely explored for automated, robust malware detection, but till now we have not seen an ML-based malware detection solution applied at market scales. To systematically understand the real-world challenges, we conduct a collaborative study with T-Market, a popular Android app market that offers us large-scale ground-truth data. Our study illustrates that the key to successfully developing such systems is multifold, including feature selection and encoding, feature engineering and exposure, app analysis speed and efficacy, developer and user engagement, as well as ML model evolution. Failure in any of the above aspects could lead to the 'wooden barrel effect' of the whole system. This article presents our judicious design choices and first-hand deployment experiences in building a practical ML-powered malware detection system. It has been operational at T-Market, using a single commodity server to check \sim∼12K apps every day, and has achieved an overall precision of 98.9 percent and recall of 98.1 percent with an average per-app scan time of 0.9 minutes.
AB - Despite being crucial to today's mobile ecosystem, app markets have meanwhile become a natural, convenient malware delivery channel as they actually 'lend credibility' to malicious apps. In the past few years, machine learning (ML) techniques have been widely explored for automated, robust malware detection, but till now we have not seen an ML-based malware detection solution applied at market scales. To systematically understand the real-world challenges, we conduct a collaborative study with T-Market, a popular Android app market that offers us large-scale ground-truth data. Our study illustrates that the key to successfully developing such systems is multifold, including feature selection and encoding, feature engineering and exposure, app analysis speed and efficacy, developer and user engagement, as well as ML model evolution. Failure in any of the above aspects could lead to the 'wooden barrel effect' of the whole system. This article presents our judicious design choices and first-hand deployment experiences in building a practical ML-powered malware detection system. It has been operational at T-Market, using a single commodity server to check \sim∼12K apps every day, and has achieved an overall precision of 98.9 percent and recall of 98.1 percent with an average per-app scan time of 0.9 minutes.
KW - Android emulation
KW - Machine learning
KW - app market
KW - dynamic analysis
KW - mobile malware detection
UR - http://www.scopus.com/inward/record.url?scp=85098801035&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85098801035&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2020.3046092
DO - 10.1109/TPDS.2020.3046092
M3 - Article
AN - SCOPUS:85098801035
SN - 1045-9219
VL - 32
SP - 1615
EP - 1628
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 7
M1 - 9301262
ER -