Systematically Landing Machine Learning onto Market-Scale Mobile Malware Detection

Liangyi Gong; Hao Lin; Zhenhua Li; Feng Qian; Yang Li; Xiaobo Ma; Yunhao Liu

doi:10.1109/TPDS.2020.3046092

Systematically Landing Machine Learning onto Market-Scale Mobile Malware Detection

Liangyi Gong, Hao Lin, Zhenhua Li, Feng Qian, Yang Li, Xiaobo Ma, Yunhao Liu

Computer Science and Engineering

Research output: Contribution to journal › Article › peer-review

9 Scopus citations

Abstract

Despite being crucial to today's mobile ecosystem, app markets have meanwhile become a natural, convenient malware delivery channel as they actually 'lend credibility' to malicious apps. In the past few years, machine learning (ML) techniques have been widely explored for automated, robust malware detection, but till now we have not seen an ML-based malware detection solution applied at market scales. To systematically understand the real-world challenges, we conduct a collaborative study with T-Market, a popular Android app market that offers us large-scale ground-truth data. Our study illustrates that the key to successfully developing such systems is multifold, including feature selection and encoding, feature engineering and exposure, app analysis speed and efficacy, developer and user engagement, as well as ML model evolution. Failure in any of the above aspects could lead to the 'wooden barrel effect' of the whole system. This article presents our judicious design choices and first-hand deployment experiences in building a practical ML-powered malware detection system. It has been operational at T-Market, using a single commodity server to check \sim∼12K apps every day, and has achieved an overall precision of 98.9 percent and recall of 98.1 percent with an average per-app scan time of 0.9 minutes.

Original language	English (US)
Article number	9301262
Pages (from-to)	1615-1628
Number of pages	14
Journal	IEEE Transactions on Parallel and Distributed Systems
Volume	32
Issue number	7
DOIs	https://doi.org/10.1109/TPDS.2020.3046092
State	Published - Jul 1 2021

Bibliographical note

Publisher Copyright:
© 1990-2012 IEEE.

Keywords

Android emulation
Machine learning
app market
dynamic analysis
mobile malware detection

Access

10.1109/TPDS.2020.3046092

OpenUrl availability

Full text

Cite this

@article{314423b4b9134f8a99657350da0e3faf,

title = "Systematically Landing Machine Learning onto Market-Scale Mobile Malware Detection",

abstract = "Despite being crucial to today's mobile ecosystem, app markets have meanwhile become a natural, convenient malware delivery channel as they actually 'lend credibility' to malicious apps. In the past few years, machine learning (ML) techniques have been widely explored for automated, robust malware detection, but till now we have not seen an ML-based malware detection solution applied at market scales. To systematically understand the real-world challenges, we conduct a collaborative study with T-Market, a popular Android app market that offers us large-scale ground-truth data. Our study illustrates that the key to successfully developing such systems is multifold, including feature selection and encoding, feature engineering and exposure, app analysis speed and efficacy, developer and user engagement, as well as ML model evolution. Failure in any of the above aspects could lead to the 'wooden barrel effect' of the whole system. This article presents our judicious design choices and first-hand deployment experiences in building a practical ML-powered malware detection system. It has been operational at T-Market, using a single commodity server to check \sim∼12K apps every day, and has achieved an overall precision of 98.9 percent and recall of 98.1 percent with an average per-app scan time of 0.9 minutes.",

keywords = "Android emulation, Machine learning, app market, dynamic analysis, mobile malware detection",

author = "Liangyi Gong and Hao Lin and Zhenhua Li and Feng Qian and Yang Li and Xiaobo Ma and Yunhao Liu",

note = "Publisher Copyright: {\textcopyright} 1990-2012 IEEE.",

year = "2021",

month = jul,

day = "1",

doi = "10.1109/TPDS.2020.3046092",

language = "English (US)",

volume = "32",

pages = "1615--1628",

journal = "IEEE Transactions on Parallel and Distributed Systems",

issn = "1045-9219",

publisher = "IEEE Computer Society",

number = "7",

}

TY - JOUR

T1 - Systematically Landing Machine Learning onto Market-Scale Mobile Malware Detection

AU - Gong, Liangyi

AU - Lin, Hao

AU - Li, Zhenhua

AU - Qian, Feng

AU - Li, Yang

AU - Ma, Xiaobo

AU - Liu, Yunhao

PY - 2021/7/1

Y1 - 2021/7/1

N2 - Despite being crucial to today's mobile ecosystem, app markets have meanwhile become a natural, convenient malware delivery channel as they actually 'lend credibility' to malicious apps. In the past few years, machine learning (ML) techniques have been widely explored for automated, robust malware detection, but till now we have not seen an ML-based malware detection solution applied at market scales. To systematically understand the real-world challenges, we conduct a collaborative study with T-Market, a popular Android app market that offers us large-scale ground-truth data. Our study illustrates that the key to successfully developing such systems is multifold, including feature selection and encoding, feature engineering and exposure, app analysis speed and efficacy, developer and user engagement, as well as ML model evolution. Failure in any of the above aspects could lead to the 'wooden barrel effect' of the whole system. This article presents our judicious design choices and first-hand deployment experiences in building a practical ML-powered malware detection system. It has been operational at T-Market, using a single commodity server to check \sim∼12K apps every day, and has achieved an overall precision of 98.9 percent and recall of 98.1 percent with an average per-app scan time of 0.9 minutes.

AB - Despite being crucial to today's mobile ecosystem, app markets have meanwhile become a natural, convenient malware delivery channel as they actually 'lend credibility' to malicious apps. In the past few years, machine learning (ML) techniques have been widely explored for automated, robust malware detection, but till now we have not seen an ML-based malware detection solution applied at market scales. To systematically understand the real-world challenges, we conduct a collaborative study with T-Market, a popular Android app market that offers us large-scale ground-truth data. Our study illustrates that the key to successfully developing such systems is multifold, including feature selection and encoding, feature engineering and exposure, app analysis speed and efficacy, developer and user engagement, as well as ML model evolution. Failure in any of the above aspects could lead to the 'wooden barrel effect' of the whole system. This article presents our judicious design choices and first-hand deployment experiences in building a practical ML-powered malware detection system. It has been operational at T-Market, using a single commodity server to check \sim∼12K apps every day, and has achieved an overall precision of 98.9 percent and recall of 98.1 percent with an average per-app scan time of 0.9 minutes.

KW - Android emulation

KW - Machine learning

KW - app market

KW - dynamic analysis

KW - mobile malware detection

UR - http://www.scopus.com/inward/record.url?scp=85098801035&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85098801035&partnerID=8YFLogxK

U2 - 10.1109/TPDS.2020.3046092

DO - 10.1109/TPDS.2020.3046092

M3 - Article

AN - SCOPUS:85098801035

SN - 1045-9219

VL - 32

SP - 1615

EP - 1628

JO - IEEE Transactions on Parallel and Distributed Systems

JF - IEEE Transactions on Parallel and Distributed Systems

IS - 7

M1 - 9301262

ER -

Systematically Landing Machine Learning onto Market-Scale Mobile Malware Detection

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this