深度学习在 Android 恶意软件防御中的应用

2023-04-14 21005 words 42 minutes

Contents

本文为论文 Deep Learning for Android Malware Defenses - A Systematic Literature Review 的阅读笔记（还没读完），主要讨论了基于 Deep Learning 的 Android 恶意软件防御的研究趋势、研究重点、挑战和未来研究方向。原文地址：https://dl.acm.org/doi/full/10.1145/3544968

Abstract

Malicious applications (particularly those targeting the Android platform) pose a serious threat to developers and end-users. Numerous research efforts have been devoted to developing effective approaches to defend against Android malware. However, given the explosive growth of Android malware and the continuous advancement of malicious evasion technologies like obfuscation and reflection, Android malware defense approaches based on manual rules or traditional machine learning may not be effective. In recent years, a dominant research field called deep learning (DL), which provides a powerful feature abstraction ability, has demonstrated a compelling and promising performance in a variety of areas, like natural language processing and computer vision. To this end, employing DL techniques to thwart Android malware attacks has recently garnered considerable research attention. Yet, no systematic literature review focusing on DL approaches for Android malware defenses exists. In this article, we conducted a systematic literature review to search and analyze how DL approaches have been applied in the context of malware defenses in the Android environment. As a result, a total of 132 studies covering the period 2014–2021 were identified. Our investigation reveals that, while the majority of these sources mainly consider DL-based Android malware detection, 53 primary studies (40.1%) design defense approaches based on other scenarios. This review also discusses research trends, research focuses, challenges, and future research directions in DL-based Android malware defenses.

翻译：

恶意应用程序（尤其是针对 Android 平台的应用程序）对开发者和终端用户构成严重威胁。许多研究工作致力于开发有效的 Android 恶意软件防御方法。然而，鉴于 Android 恶意软件的爆炸式增长以及类似混淆和反射的恶意逃避技术的不断进步，基于人工规则或传统机器学习的 Android 恶意软件防御方法可能无效。

近年来，一个名为深度学习（DL）的主导研究领域，具有强大的特征抽象能力，在自然语言处理和计算机视觉等多个领域展示出引人注目且有前景的性能。为此，采用 DL 技术来阻止 Android 恶意软件攻击最近引起了相当多的研究关注。然而，尚未有针对 Android 恶意软件防御的 DL 方法的系统性文献综述。在本文中，我们进行了一项系统性文献综述，以搜索和分析 DL 方法如何在 Android 环境中的恶意软件防御背景下得到应用。结果共识别出覆盖 2014-2021 年期间的 132 项研究。

我们的调查发现，尽管这些来源中的大部分主要考虑基于 DL 的 Android 恶意软件检测，但 53 篇主要研究（占 40.1%）基于其他场景设计防御方法。该综述还讨论了基于 DL 的 Android 恶意软件防御的研究趋势、研究重点、挑战和未来研究方向。

1 INTRODUCTION

Android is one of the most popular smartphone operating systems (OSs), having dominated more than 70% of the mobile OS market share since October 2016, according to a Statista report [138]. Due to its openness and popularity, Android has become one of the primary targets of cyber-attacks [38]. Developers may take advantage of crafted malicious applications to divulge mobile user privacy or perform other dangerous operations on users’ mobiles, which is extremely harmful to mobile users. On the other hand, there is a large scale of Android apps in the real world, with over 3 million Android apps available through the official store, Google Play. Although Google is constantly upgrading its protection against malicious attacks and has developed Google Play Protection (GPP) [52], it is not reliable and researchers have proved that crafted dangerous apps can easily bypass the GPP’s detection [33, 68, 93, 100]. Apart from the official market, there are hundreds of unofficial and third-party markets, where the security of Android apps is highly unpredictable [96, 97, 186, 189]. Therefore, it is a pressing demand to propose an available and reliable approach to defend against malware attacks on the Android platform.

翻译：

Android 是最受欢迎的智能手机操作系统之一，自 2016 年 10 月以来，其在移动操作系统市场份额已占据超过 70%。由于其开放性和流行度，Android 成为了网络攻击的主要目标之一。开发者可能利用精心制作的恶意应用程序泄露手机用户的隐私或在用户手机上执行其他危险操作，这对手机用户来说极为有害。

另一方面，现实世界中有大量的 Android 应用程序，通过官方商店 Google Play 可以获取超过 300 万个 Android 应用。尽管谷歌一直在不断升级其防范恶意攻击的能力，并开发了 Google Play 保护（GPP），但它并不可靠，研究人员已经证明，精心制作的危险应用程序可以轻易绕过 GPP 的检测。除了官方市场外，还有数百个非官方和第三方市场，这些市场上 Android 应用的安全性高度不可预测。因此，提出一种可用且可靠的方法来防御 Android 平台上的恶意软件攻击是迫切的需求。

Android malware defenses are a critical research topic in computer security. Manually analyzing malware, by formulating corresponding rules and inspecting the behaviors and source code of suspicious Android apps, is a time-consuming process—i.e., it does not scale to a large amount of Android software. Besides, with malware techniques constantly evolving, manual malware analysis couldn’t keep pace with the evolving attack strategies. In recent years, a large volume of research related to automatic Android malware analysis has been proposed, utilizing data mining and machine learning approaches to achieve acceptable malware detection performance. These approaches employ a series of machine learning algorithms (e.g., support vector machine, random forest) to build a prediction model based on feature vectors extracted from the Android application package (APK) [13, 133, 168]. However, traditional machine learning algorithms are limited in their ability to learn complicated representations in high-dimensional spaces [83]. In addition, the performance of machine learning models heavily relies on the training data, and these trained models are likely to become obsolete as the Android apps evolve and software engineering advances. What’s more, attackers continue to update their fraud techniques to bypass protection software as well as well-trained machine learning models in order to victimize users and businesses. In front of the increasing difficulty of Android malware defenses, it is non-trivial to construct a robust and transparent defense model or system only by traditional machine learning techniques [191].

翻译：

Android 恶意软件防御是计算机安全领域的一个关键研究课题。通过制定相应的规则并检查可疑 Android 应用的行为和源代码来手动分析恶意软件是一个耗时的过程，即它无法应对大量的 Android 软件。此外，随着恶意软件技术不断发展，手动恶意软件分析无法跟上不断变化的攻击策略。

近年来，大量关于自动 Android 恶意软件分析的研究已经提出，利用数据挖掘和机器学习方法实现可接受的恶意软件检测性能。这些方法采用了一系列机器学习算法（例如，支持向量机、随机森林）来基于从 Android 应用程序包（APK）中提取的特征向量构建预测模型。

然而，传统机器学习算法在学习高维空间中复杂表示的能力上受到限制。此外，机器学习模型的性能在很大程度上依赖于训练数据，而这些训练过的模型很可能随着 Android 应用的演变和软件工程的进步而过时。

更重要的是，攻击者继续更新他们的欺诈技术以绕过保护软件以及经过良好训练的机器学习模型，以便使用户和企业受害。面对 Android 恶意软件防御的日益困难，仅通过传统机器学习技术构建一个强大且透明的防御模型或系统是不易的。

Deep learning has emerged as the dominant research field of machine learning over the last decade, with notable achievements in many domains like speech recognition [8, 60] and image processing [142, 173]. In contrast to conventional machine learning techniques, feature extraction can be performed automatically when deep learning methods are fed with raw data. Deep learning can learn feature representation from the inputted raw data with little prior knowledge, which is the key advantage of deep learning. In 2014, deep learning tools were applied to Android malware defenses and demonstrated superior performance [184]. Subsequently, an increasing number of researchers have developed Android malware defense models or frameworks based on a variety of deep learning techniques. As a result, an up-to-date comprehensive survey of DL-based Android malware defenses is urgently required.

翻译：

在过去的十年里，深度学习已成为机器学习领域的主导研究领域，在诸如语音识别和图像处理等众多领域取得了显著成就。与传统的机器学习技术相比，当深度学习方法输入原始数据时，特征提取可以自动完成。深度学习可以从输入的原始数据中学习特征表示，几乎不需要先验知识，这是深度学习的关键优势。

2014 年，深度学习工具应用于 Android 恶意软件防御，表现出卓越的性能。随后，越来越多的研究人员基于各种深度学习技术开发了 Android 恶意软件防御模型或框架。因此，迫切需要对基于深度学习的 Android 恶意软件防御进行最新的全面调查。

The domain of Android malware defenses has been widely researched in recent years, and we present related contributions of other researchers in Table 1. Several early studies [38, 95, 145] have comprehensively reviewed Android malware techniques and traditional defensive approaches. With the wide use of advanced machine learning techniques, many researchers have reviewed relevant studies on Android malware defenses with machine learning or deep learning [3, 110, 127, 137, 162, 169]. However, these previous works couldn’t provide a complete picture of current research interests and trends on DL-based Android malware defenses though they analyze all possible available methods. First, these previous studies focus only on one aspect of Android malware defenses, using machine learning/deep learning techniques to detect Android malware (ML/DL-based malware detection), but neglect other critical aspects of using DL to prevent/defend against malicious behaviors (e.g., malware evolution, adversarial malware detection, deployment, malware families). While distinguishing malware from benign apps is critical, enhancing Android software security is not a straightforward binary classification task. Indeed, it requires not only locating malicious applications but also comprehending malicious behaviors, to which many researchers have contributed. However, these research studies are overlooked from previous review work, making it difficult for future researchers to comprehend the state of the art of this research field. More importantly, these early surveys are not based on completed systematic approaches, and thus they could not provide a comprehensive overview of the research trends and open issues in this domain. Thus, a number of unanswered questions remain regarding the development of DL-based Android malware defenses. For example, the prior works still could not answer what are the state-of-the-art DL-based malware defense approaches (e.g., DL models and feature processing approaches) and what aspects require more research efforts in the future. Furthermore, most previous works focused on relevant studies published before 2019. However, DL-based Android malware defenses have attracted significant research attention in recent 2 years, which means it is necessary to conclude the significant recent research achievements. As a result, this article fills the research gap in this field by conducting a systematic and organized literature review, summarizing previous research and presenting research trends on Android malware defenses related to deep learning.

翻译：

近年来，Android恶意软件防御领域得到了广泛的研究，我们在表1中呈现了其他研究者的相关贡献。

一些早期研究[38, 95, 145]全面回顾了Android恶意软件技术和传统防御方法。随着先进机器学习技术的广泛应用，许多研究人员回顾了与Android恶意软件防御相关的机器学习或深度学习研究[3, 110, 127, 137, 162, 169]。然而，尽管这些先前的工作分析了所有可能的可用方法，但它们无法提供关于基于深度学习的Android恶意软件防御的当前研究兴趣和趋势的完整画像。

首先，这些先前研究仅关注Android恶意软件防御的一个方面，使用机器学习/深度学习技术检测Android恶意软件（基于ML/DL的恶意软件检测），但忽略了使用深度学习防止/抵御恶意行为的其他关键方面（例如，恶意软件演变、对抗性恶意软件检测、部署、恶意软件家族）。

尽管区分恶意软件和良性应用至关重要，但加强Android软件安全并非一个简单的二分类任务。实际上，它不仅需要定位恶意应用，还需要理解恶意行为，许多研究人员对此做出了贡献。然而，这些研究从先前的回顾工作中被忽视，使得未来的研究人员难以理解这个研究领域的最新状况。

更重要的是，这些早期调查并未基于完整的系统方法，因此无法全面概述该领域的研究趋势和未解决的问题。因此，在基于深度学习的Android恶意软件防御发展方面仍存在许多未回答的问题。例如，先前的工作仍无法回答什么是最先进的基于深度学习的恶意软件防御方法（例如，深度学习模型和特征处理方法），以及未来需要在哪些方面投入更多研究努力。

此外，大多数先前的工作关注的是2019年之前发表的相关研究。然而，近两年来，基于深度学习的Android恶意软件防御吸引了大量研究关注，这意味着有必要总结近期显著的研究成果。因此，本文通过进行系统和有组织的文献综述，总结以前的研究并呈现与深度学习相关的Android恶意软件防御的研究趋势，填补了这一领域的研究空白。

Table 1. Summary of Related Work

This survey aims to shape the research area of using DL techniques to defend against Android malware, and position existing works and current progress. Specifically, this article makes the following contributions:

We systematically collect and review 132 primary studies published between 2014 and 2021 on DL-based Android malware defenses.
We present a comprehensive qualitative and quantitative synthesis based on the collected studies. Our synthesis covers the following themes: research objectives, APK characterization, DL techniques, deployment, and model evaluation.
We further enumerate current issues of the existing works from different aspects and provide recommendations based on findings to support further research in this domain.
We provide trend analysis to identify potential future trends for the research community.

翻译：

这份调查旨在为使用深度学习（DL）技术抵御安卓恶意软件的研究领域定位，并将现有成果和当前进展进行梳理。具体来说，本文做出了以下贡献：

我们系统地收集和回顾了2014年至2021年间关于基于深度学习的安卓恶意软件防御的132项主要研究。

我们基于收集的研究提供了全面的定性和定量综合分析。我们的综合分析涵盖以下主题：研究目标、APK特征描述、深度学习技术、部署和模型评估。

我们进一步从不同角度列举现有工作的当前问题，并根据发现提供建议，以支持在该领域的进一步研究。

我们提供趋势分析，以便为研究社区确定潜在的未来趋势。

The remainder of this article is structured as follows: Section 2 presents the review methodology used in this article. Section 3 discusses the reviewed results and open issues for the proposed research questions. Sections 4 and 5 discuss potential implications and possible threats to the validity of this study, respectively. Finally, Section 6 concludes the article.

翻译：

本文的其余部分结构如下：第2节介绍了本文使用的评述方法。第3节讨论了针对所提出的研究问题的回顾结果和待解决的问题。第4节和第5节分别讨论了潜在的启示和可能对本研究有效性构成威胁的因素。最后，第6节总结了本文。

2 REVIEW METHODOLOGY

In this article, we followed the methodology suggested by Kitchenham [79] to conduct a systematic review. The main steps of the Systematic Literature Review (SLR) can be summarized as follows: (1) planning the review and developing a review protocol, (2) identifying research questions, (3) designing search strategies, proposing exclusion criteria, (4) data extraction, and (5) data synthesis. The following subsections discuss the review protocol used in this article. Due to page limitations, we detailed the systematic review process and results online as supplementary materials.

翻译：

在本文中，我们遵循了Kitchenham [79]建议的方法进行了系统评述。系统文献综述（SLR）的主要步骤可以总结如下：（1）规划评审并制定评审协议，（2）确定研究问题，（3）设计检索策略，提出排除标准，（4）数据提取，以及（5）数据综合。以下小节讨论了本文使用的评审协议。由于篇幅限制，我们将系统评审过程和结果详细描述为在线补充材料。

2.1 Research Question

In this article, we seek to investigate the following research questions:

RQ1: What are the research objectives of the DL-based Android malware defense solutions?
RQ2: What approaches have been developed for malware defenses?
- RQ2.1: How are features processed for model training?
- RQ2.2: What DL architectures are used?
- RQ2.3: How are DL-based Android malware defense approaches deployed in practice?
- RQ2.4: How are DL-based Android malware defense approaches evaluated?
RQ3: What are the emerging and potential research trends for DL-based Android malware defenses?

翻译：

在本文中，我们试图探讨以下研究问题：

RQ1：基于深度学习的Android恶意软件防御解决方案的研究目标是什么？

RQ2：已经开发了哪些恶意软件防御方法？

RQ2.1：如何处理特征以进行模型训练？

RQ2.2：使用了哪些深度学习架构？

RQ2.3：如何在实践中部署基于深度学习的Android恶意软件防御方法？

RQ2.4：如何评估基于深度学习的Android恶意软件防御方法？

RQ3：基于深度学习的Android恶意软件防御的新兴和潜在研究趋势是什么？

2.2 Search Strategy

After identifying the research questions, the next step is searching for relevant primary studies. To this end, five popular digital libraries, including IEEE, ACM Digital Library, Springer, Science Direct, and Wiley Online Library, are identified and the searching string is constructed based on the proposed searching items proposed in Table 2. To ensure that we did not overlook any significant relevant work, we conducted further searching processes on two of the most popular research citation engines, including Web of Knowledge1 and Google Scholar.2 In addition, we also performed a lightweight backward snowballing [80], which means that we only carried out snowballing once, before we identified the final review list.

Group	Keywords
1	Android; Mobile; Smartphone; Phone
2	Malware; Malicious; Malice
3	“Deep learning”; “Deep neural network”; DNN; “Convolutional neural network”; CNN; “Deep belief network”; DBN; “Recurrent neural network”; RNN; “Long short-term memory”; LSTM

Note: * means the plural form. For example, “Phone” refers to “Phone” or “Phones.”

Table 2. Search Keywords

2.3 Data Selection Process

Only those studies related to DL-based Android malware defenses should be considered for further review; therefore, any primary studies that meet any of the proposed exclusion criteria would be deemed irrelevant and would be excluded from the preliminary result set. On the other hand, obtaining all relevant studies doesn’t guarantee that we are able to identify the final list of papers, as it is impossible that the quality of all selected studies is desirable. For this reason, we defined a quality appraisal criterion and evaluated the quality of each paper by reading its full text. The complete list of exclusion criteria and quality appraisal criterion is available at our online supplementary materials. After these steps, we finally obtained 132 primary studies. Table 3 and Figure 1 provide a summary for our examined papers.

Fig. 1. Summary of the examined primary studies.

Table 3. Summary of the Process of Data Search and Selection

Figure 1(a) shows the distribution of the amount of chosen studies over time. Intuitively, the number of publications related to DL-based Android malware defenses has seen a continued increase since 2014. Although we only included the public articles before November 30, 2021, in this review, the number of selected publications in 2021 is still large. These facts demonstrate that the field of Android malware defenses using DL is attracting growing attention, illustrating the critical need for systematic and comprehensive review work to summarize the prior work and current research trends.

On the other hand, we examined the distribution of venue domain and type for these 132 articles, respectively. The results showed that over 35% of primary studies are from Security (SEC) venues, accounting for the most proportion. Both the proportion of Artificial Intelligence (AI) and Software Engineering/Programming Languages (SE/PL) is more than 10%. As for the type of venues, we found the percentage of collected studies published in conferences and journals is quite close, at about 50%. In addition, we counted the frequency of all major venues where our selected studies were published (see Figure 1(b)). The results indicated that these primary studies were mainly collected at top venues, especially the venues in a SEC domain (e.g., CCS, USENIX Security, TIFS) and more and more relevant studies have started to be presented in top venues in a SE domain recently (e.g., ICSE, ASE, and FSE).

3 RESULTS ANALYSIS

In order to answer the research questions presented in Section 2.3, we conducted a detailed review of the selected primary studies.

Section 3.1 discusses the analysis results for RQ1; Sections 3.2, 3.3, 3.4, and 3.5 present the results for RQ2.1, RQ2.2, RQ2.3, and RQ2.4, respectively; while Section 3.6 presents the results of RQ3.

To help our fellow researchers better understand the details for each primary study, we uploaded a detailed table in our online supplementary materials.

3.1 Malware Defenses Objectives

DL techniques have been applied to various aspects of malware defenses to protect mobile users from severe malware attacks. After discussing among all authors and drawing on the classification scheme used in previous surveys by Faruki et al. [38] and Ucci et al. [149], we classify reviewed studies into the following categories: malware detection (binary classification), malware family attribution, repackaged/fake app detection, adversarial learning attacks and protections, malware evolution detection and defense, and malicious behavior analysis. Figure 2 depicts the statistical trends of research objectives for the sources.

翻译：

深度学习技术已应用于恶意软件防御的各个方面，以保护移动用户免受严重的恶意软件攻击。在所有作者之间讨论并借鉴Faruki等人和Ucci等人以前调查中使用的分类方案后，我们将审查过的研究分为以下几类：恶意软件检测（二分类）、恶意软件家族归因、重新打包/伪造应用检测、对抗学习攻击和保护、恶意软件演变检测和防御以及恶意行为分析。

图2描绘了源研究目标的统计趋势。

Fig. 2. Summary of the primary studies by research objectives. Some primary papers contain multiple research objectives, making the sum of percentages more than 100%.

Malware Detection (Binary Classification). As shown in Figure 2, malware detection (binary classification), which determines whether a given application is malicious or benign, receives the most research attention (68%) and the increasing trend is expected to continue. This result is not surprising given that the most urgent task at the moment is to protect mobile users from malicious attacks by automatically distinguishing malware from goodware, which is why many previous surveys have primarily focused on this research topic. Droid-Sec [184] is the first attempt to detect Android malware using DL-based methods. The methodology of Droid-Sec can be summarized as three steps: (1) Android applications collection and labeling, (2) feature extraction and characterization, and (3) DL models training and evaluation. The empirical results of Droid-Sec have demonstrated that DL techniques are much more effective for malware detection compared with traditional machine learning techniques like Support Vector Machine (SVM). In fact, most primary studies related to DL-based malware detection usually follow a similar methodology with Droid-Sec but explore the applicability and effectiveness of different state-of-the-art DL techniques in more complex scenarios, which is consistent with previous literature [127].

翻译：

恶意软件检测（二分类）

如图2所示，恶意软件检测（二分类），即确定给定应用程序是恶意还是良性，受到最多的研究关注（68%），预计这一上升趋势将继续。

鉴于目前最紧迫的任务是通过自动区分恶意软件和良性软件来保护移动用户免受恶意攻击，因此这一结果并不令人意外，这也是为什么许多以前的调查主要关注这一研究主题。

Droid-Sec是第一个尝试使用基于深度学习的方法检测Android恶意软件的项目。Droid-Sec的方法可以总结为三个步骤：（1）Android应用程序的收集和标记；（2）特征提取和表征；（3）深度学习模型的训练和评估。Droid-Sec的实证结果表明，与传统的机器学习技术（如支持向量机（SVM））相比，深度学习技术在恶意软件检测方面要有效得多。

实际上，与基于深度学习的恶意软件检测相关的大多数初级研究通常都遵循类似于Droid-Sec的方法，但在更复杂的场景中探讨不同最先进深度学习技术的适用性和有效性，这与以前的文献一致。

Droid-Sec 是一种基于深度学习的方法，用于检测 Android 恶意软件。其方法可以总结为以下三个步骤：

Android 应用程序的收集和标记：在这一阶段，研究人员收集大量 Android 应用程序，包括恶意软件和良性软件。这些应用程序需要进行标记，以便在训练和评估深度学习模型时使用。标记通常包括将应用程序标记为恶意或良性。
特征提取和表征：在这个阶段，研究人员从 Android 应用程序中提取特征。这些特征可以包括静态特征（如代码结构和资源使用情况）和动态特征（如运行时行为和网络通信）。提取的特征用于表征应用程序的行为和属性，以便深度学习模型可以更好地识别它们。特征提取和表征的目标是找到能够有效区分恶意软件和良性软件的关键特征。
深度学习模型的训练和评估：在这个阶段，研究人员使用提取的特征训练深度学习模型。这些模型可以包括卷积神经网络（CNN）、循环神经网络（RNN）和其他深度学习架构。在训练过程中，模型学会根据输入的特征预测应用程序是恶意还是良性。在训练完成后，研究人员使用未见过的应用程序数据评估模型的性能。这可以通过计算准确率、召回率、精确度和 F1 分数等指标来完成。

Droid-Sec 的实证结果表明，与传统的机器学习技术（如支持向量机（SVM））相比，深度学习技术在恶意软件检测方面要有效得多。这表明深度学习具有很大的潜力，可以帮助我们更好地识别和防御 Android 恶意软件。

Malware Family Attribution. Another important aspect of Android malware defenses is malware family attribution. Figure 2 shows that 20 reviewed articles (15%) are specialized for identifying Android malware families. Given the growing number of malware variants, malware can be categorized into certain categories that are associated with different malicious objectives and behaviors, like the Adware family that displays unwanted advertisements to mobile users. In contrast to malware detection (binary classification), malware family attribution identifies which family a malware sample belongs to. Most primary studies like [192] and [141] employ multi-class classification approaches to identify existing or old malware families. As a large number of new malware variants are created, Qiu et al. [126] proposed DL-based approaches to detect zero-day malware families.

翻译：

恶意软件家族归属

Android恶意软件防御的另一个重要方面是恶意软件家族归属。图2显示，20篇审阅过的文章（占15%）专门用于识别Android恶意软件家族。鉴于恶意软件变种数量的增长，恶意软件可以划分为与不同恶意目标和行为相关的某些类别，例如向移动用户显示不必要广告的广告软件家族。

与恶意软件检测（二分类）相比，恶意软件家族归属识别恶意软件样本属于哪个家族。大多数初级研究，例如，采用多类分类方法来识别现有或旧的恶意软件家族。随着大量新的恶意软件变种的产生，邱等人提出了基于深度学习的方法来检测0day恶意软件家族。

Repackaged/Fake App Detection. In 5% of sources, DL-based repackaged/fake app detection is investigated. Attackers can unpack an existing malicious/benign application, modify its contents and repackage it, depriving app developers of revenue and contributing to the spread of malware on mobile devices [94]. For this reason, identifying repackaged or fake applications and analyzing the behaviors of variants is also critical. For example, in order to locate counterfeit mobile applications in application markets, Ullah et al. [150] and Karunanayake et al. [74] propose DL-based Fake app detectors to prevent the publishing of fake apps in app stores.

翻译：

重新打包/伪造应用检测

在5%的资源中，研究了基于深度学习的重新打包/伪造应用检测。攻击者可以解包现有的恶意/良性应用程序，修改其内容并重新打包，剥夺应用程序开发者的收入并促使恶意软件在移动设备上传播。

因此，识别重新打包或伪造的应用程序并分析变种的行为也至关重要。例如，为了在应用程序市场中定位伪造的移动应用程序，Ullah等人和Karunanayake等人提出了基于深度学习的伪造应用检测器，以防止在应用商店中发布伪造应用。

重打包技术（Repackaging）是一种常见的 Android 应用攻击手段。攻击者通过这种技术将恶意代码植入原始应用程序，然后重新打包并发布到应用商店或第三方市场。重打包技术的过程可以分为以下几个步骤：

获取原始应用程序：攻击者首先从应用商店或其他来源获取原始应用程序的 APK（Android 应用程序包）文件。
解包 APK 文件：攻击者使用工具（如 apktool）将 APK 文件解包，以便访问应用程序的资源文件、清单文件（AndroidManifest.xml）和字节码（通常为 DEX 文件，包含应用程序的 Java 字节码）。
植入恶意代码：攻击者在解包后的应用程序中添加恶意代码。这可能包括修改现有的 Java 类或添加新的类，以实现恶意功能（例如窃取用户数据、发送短信或传播恶意软件）。攻击者还可能修改清单文件，以添加所需的权限或更改应用程序的行为。
重新打包应用程序：将恶意代码植入后，攻击者使用工具（如 apktool）将修改过的应用程序重新打包为 APK 文件。这个新的 APK 文件看起来与原始应用程序非常相似，但包含了恶意代码。
发布和传播：攻击者将重新打包后的应用程序发布到应用商店或其他渠道。由于很难区分原始应用程序和重新打包的应用程序，用户可能会误下载并安装包含恶意代码的应用程序。

重打包技术的危害性在于它可以轻易地将恶意代码植入流行的 Android 应用程序，从而影响更多的用户。为了防范这种攻击，研究人员和安全专家需要开发有效的重新打包应用程序检测方法，以识别和阻止这些携带恶意代码的应用程序。这可能包括使用深度学习和其他机器学习技术对应用程序的特征进行分析，以区分原始应用程序和恶意重打包应用程序。

Adversarial Learning Attacks and Protections. Figure 2 shows that 16 primary studies (12%) focus on adversarial learning attacks and protections on DL-based malware defenses. Despite the fact that numerous research studies have demonstrated that DL models provide promisingly high performance to identify malware, these models have been shown to be particularly vulnerable to well-designed adversarial attacks [82, 183]. Adversarial attackers could inject a small but intentional perturbation to create adversarial examples, causing the trained models to misclassify adversarial examples. For example, Chen et al. [25] performed adversarial attacks on DNN-based malware detection models, decreasing the accuracy from over 90% to 0%. Consequently, there is a corresponding increase in the attention dedicated to adversarial attacks against malware defense models, as shown in Figure 2. Depending on when the attacks occur, adversarial attacks are split into two main categories: evasion attacks for testing samples and poisoning attacks for training samples. With respect to the two types of adversarial attacks, the majority of sources (14 studies, 87%) discuss evasion attacks and protections for DL-based Android malware defense models, and conversely, only two recent studies focus on poisoning attacks [86, 135]. We discuss more details about this topic in Section 3.6.2.

翻译：

对抗学习攻击与防护

图2显示，16个主要研究（占12%）关注基于深度学习的恶意软件防御中的对抗学习攻击和防护。尽管大量研究已经证明，深度学习模型在识别恶意软件方面具有非常高的性能，但这些模型被证明对精心设计的对抗性攻击特别脆弱。

对抗攻击者可以注入微小但有意的扰动，创建对抗性示例，导致训练过的模型对对抗性示例进行错误分类。例如，Chen等人对基于DNN的恶意软件检测模型进行了对抗性攻击，将准确率从90%以上降低到0%。因此，正如图2所示，针对恶意软件防御模型的对抗性攻击的关注度相应增加。

根据攻击发生的时间，对抗性攻击分为两个主要类别：针对测试样本的逃逸攻击和针对训练样本的投毒攻击。关于这两种类型的对抗性攻击，大部分资源（14项研究，占87%）讨论了基于深度学习的Android恶意软件防御模型的逃逸攻击和防护，相反，仅有两项最近的研究关注投毒攻击。我们在第3.6.2节中详细讨论了这个主题。

针对测试样本的逃逸攻击（Evasion Attack）和针对训练样本的投毒攻击（Poisoning Attack）是两种不同的对抗性攻击策略，主要针对机器学习和深度学习模型。下面详细解释这两种攻击策略。

逃逸攻击（Evasion Attack）

逃逸攻击是一种针对已经训练好的模型的攻击方式。攻击者在测试阶段创建对抗性样本（Adversarial Examples），这些样本在人类观察者看来与正常样本没有显著区别，但对于机器学习模型，它们会导致错误的分类。

攻击者利用模型的漏洞，通过添加微小的扰动（Perturbation）到输入样本，使模型将对抗性样本误分类为其他类别。逃逸攻击的目的是规避模型的检测，使恶意样本被误判为正常。

逃逸攻击在恶意软件检测领域尤为重要。攻击者可以通过逃逸攻击使恶意软件避免被深度学习模型识别，从而实现对恶意软件的有效传播。

投毒攻击（Poisoning Attack）

投毒攻击是一种针对模型训练阶段的攻击方式。攻击者在训练数据集中植入恶意样本或篡改现有样本，从而影响模型的训练过程。这种攻击会导致模型在训练过程中学到错误的知识，从而降低在实际应用中的性能和准确性。

投毒攻击的关键在于，攻击者需要精确地控制恶意样本或数据扰动，使其在训练过程中影响模型的参数更新。这种攻击方法通常需要对模型的训练过程和算法有一定了解。投毒攻击在恶意软件检测领域具有很大的破坏力，因为它可以导致恶意软件在未来的检测中被误判为正常。

总之，逃逸攻击和投毒攻击分别针对模型的测试阶段和训练阶段展开。这两种攻击方式旨在降低模型的性能和准确性，从而使恶意软件能够规避检测。为了防范这些攻击，研究人员和安全专家需要开发有效的对抗性攻击防御策略，以保护模型免受这些攻击的影响。

Malware Evolution Detection and Defense. With regard to the malware evolution problem, Figure 2 indicates that only seven papers (5%) attempt to develop solutions for malware evolution, but it is remarkable that all seven papers were published within the last 3 years. Due to the rapid evolution of mobile malware and the emergence of new variants and families, the performance of DL-based malware defenses models decays significantly over time. Pendlebury et al. [119] revealed that the detection performance of DL-based classifiers decreases drastically from almost 90% to below 30% for future malware samples. Thus, model retraining and active learning are applied to reverse and improve aged models by Pendlebury et al. [119]. However, the underlying models are still incapable of distinguishing evolved malware in this manner, as they still rely on humans to determine when models should be retrained. In the light of this issue, recent studies [37, 85, 89, 174, 178, 187] introduce a variety of approaches to slow down the aging of malware defense models, which are further discussed in Section 3.6.3.

翻译：

恶意软件演变检测与防御

关于恶意软件演变问题，图2表明只有七篇论文（5%）试图为恶意软件演变开发解决方案，但值得注意的是，所有七篇论文都是在过去3年内发表的。由于移动恶意软件的快速演变和新变种和家族的出现，基于深度学习的恶意软件防御模型的性能会随着时间的推移显著下降。

Pendlebury等人[119]发现，基于深度学习的分类器对未来恶意软件样本的检测性能从接近90%急剧下降到低于30%。因此，Pendlebury等人[119]应用模型重训练和主动学习来逆转和改善陈旧的模型。然而，由于这些基础模型仍依赖于人类来确定何时应该对模型进行重训练，它们仍无法以这种方式区分演变的恶意软件。鉴于这个问题，最近的研究[37, 85, 89, 174, 178, 187]提出了各种方法来减缓恶意软件防御模型的老化，这些方法将在第3.6.3节进一步讨论。

相当于是模型概念漂移问题

Malicious Behavior Analysis. There are six primary studies (5%) related to malicious behavior analysis in collected studies. Malicious behavior analysis aims at identifying or assessing risk behaviors in unknown applications. As for Android malware, malicious behaviors have diverse types, and a malicious application often performs more than one malicious behavior, increasing the difficulty of analysis. In addition, malicious applications may utilize code obfuscation and dynamic payload to conceal malicious behaviors. Hence, it is a relatively challenging research topic to investigate. In order to prevent malicious activities while apps are running, Gronát et al. [53] and Lorenzo et al. [30] employ recurrent neural networks to visualize potential risks for Android malware samples. For Android malware, performing malicious behaviors requires using dangerous semantic features such as permissions and API calls related to users’ privacy. To assist mobile users in determining the security risk before installing unknown applications or granting permissions, some researchers examine the consistency between risk permissions and metadata-based features of apps, like descriptions [39, 42] or icon widgets [170].

翻译：

恶意行为分析

在收集的研究中，有六个主要研究（5%）与恶意行为分析相关。恶意行为分析旨在识别或评估未知应用程序中的风险行为。对于Android恶意软件，恶意行为具有多种类型，恶意应用程序通常会执行多种恶意行为，增加了分析的难度。

此外，恶意应用程序可能利用代码混淆和动态有效载荷来隐藏恶意行为。因此，这是一个相对具有挑战性的研究课题。为了在应用程序运行时防止恶意活动，Gronát等人[53]和Lorenzo等人[30]采用循环神经网络对Android恶意软件样本的潜在风险进行可视化。

对于Android恶意软件，执行恶意行为需要使用与用户隐私相关的危险语义特征，如权限和API调用。为了帮助移动用户在安装未知应用程序或授权权限之前确定安全风险，一些研究人员检查了风险权限与基于元数据的应用程序特征之间的一致性，如描述（descriptions）[39, 42]或图标小部件（icon widgets）[170]。

Discussion. Despite the rapidly growing number of research studies on DL for Android malware defenses, it appears that previous research studies focus on relatively simple application scenarios. More than half of the sources focus on malware detection through various DL strategies. Additionally, most of these existing studies focus on improving malware detection performance through the use of various advanced DL techniques and demonstrate that the newly proposed models outperform prior models on their own experimental datasets. It is noteworthy that an increasing number of recent studies have started to address specific issues to better apply DL-based malware detection models in practice (e.g., on-device malware detection [40, 41], explainable malware detection [167, 197], malware detection on imbalanced data [16, 112]). However, the number of relevant studies remains small. How to improve the robustness, effectiveness, stability, and reliability of malware detectors with the help of DL is an open issue for future researchers.

翻译：

尽管关于Android恶意软件防御的深度学习研究数量迅速增长，但似乎以前的研究主要集中在相对简单的应用场景上。超过一半的文献关注通过各种深度学习策略进行恶意软件检测。此外，这些现有研究中的大部分都集中在通过使用各种先进的深度学习技术来提高恶意软件检测性能，并证明新提出的模型在它们自己的实验数据集上胜过先前的模型。

值得注意的是，越来越多的最近研究开始解决特定问题，以便更好地将基于深度学习的恶意软件检测模型应用于实践（例如，在设备上的恶意软件检测[40, 41]，可解释的恶意软件检测[167, 197]，在不平衡数据上的恶意软件检测[16, 112]）。

然而，相关研究的数量仍然很少。如何在深度学习的帮助下提高恶意软件检测器的鲁棒性、有效性、稳定性和可靠性是未来研究者需要探讨的问题。

Compared with Android malware detection, the number of literature focusing on other research objectives is relatively small, requiring further in-depth research. Taking malware behavior analysis as an example, defining specific malicious behaviors and associating them with the raw code of Android APKs remain challenging issues. Thus, these research objectives require more works to integrate domain knowledge and provide fundamental theoretical construction. Except that, while this review categorizes the existing literature’s research objectives into six categories, the scope of Android malware defenses is actually much broader. Therefore, future research should not be limited to these six categories, but should instead propose Android malware defense approaches that leverage advanced DL techniques in more new application scenarios.

翻译：

与Android恶意软件检测相比，关注其他研究目标的文献数量相对较少，需要进一步深入研究。以恶意软件行为分析为例，定义特定的恶意行为并将它们与Android APKs的原始代码关联仍然是具有挑战性的问题。

因此，这些研究目标需要更多的工作来整合领域知识并提供基本的理论构建。此外，尽管本文综述将现有文献的研究目标归类为六大类别，但Android恶意软件防御的范围实际上要广泛得多。因此，未来的研究不应局限于这六个类别，而应该提出利用先进的深度学习技术在更多新的应用场景中应对Android恶意软件的防御方法。

RQ1 What are the research objectives of the DL-based Android malware defense solutions?

The main objective is still malware detection (binary classification) using DL techniques.
On the whole, 53 primary studies focus on other research topics like malware family attribution and adversarial attacks, and the number is not small and cannot be neglected.
At the beginning, researchers only focused on the field of malware detection, but in recent years, an increasing number of primary studies have applied DL to analyze Android malware in more complex scenarios.

翻译：

RQ1 基于深度学习的Android恶意软件防御解决方案的研究目标是什么？

主要目标仍然是使用深度学习技术进行恶意软件检测（二分类）。

总的来说，有53项主要研究关注其他研究主题，如恶意软件家族归属和对抗性攻击，这个数量不小，不能被忽略。

一开始，研究人员只关注恶意软件检测领域，但近年来，越来越多的主要研究将深度学习应用于分析Android恶意软件的更复杂场景。

3.2 APK Characterization

As a response to RQ2.1, this section discusses the APK feature processing approaches used in the collected studies. Each Android application is packaged as an APK file, a zip archive that primarily contains the app’s manifest and bytecode. Before being fed into DL models, the collected Android APK data needs to be transformed into a formalized representation compatible with DL models. These research studies usually process APK files using reverse-engineering tools (program analysis approaches) and then various raw characteristics (feature categories) are extracted. After that, feature encoding approaches are utilized to perform further feature embedding operations on the raw information extracted from applications. To gain a better understanding of APK characterization mechanisms in DL-based Android malware defenses, we discuss the reviewed results from three perspectives, including program analysis approaches, feature categories, and feature encoding approaches.

翻译：

作为对 RQ2.1 的回应，本节讨论了收集到的研究中使用的 APK 特征处理方法。每个 Android 应用程序都被打包成一个 APK 文件，这是一个主要包含应用程序的manifest和字节码的 zip 压缩包。在将收集到的 Android APK 数据输入到 DL 模型之前，需要将其转换为与 DL 模型兼容的规范化表示。

这些研究通常使用逆向工程工具（程序分析方法）处理 APK 文件，然后提取各种原始特征（特征类别）。之后，利用特征编码方法对从应用程序中提取的原始信息进行进一步的特征嵌入操作。为了更好地了解基于 DL 的 Android 恶意软件防御中的 APK 描述机制，我们从三个角度讨论了审查的结果，包括程序分析方法、特征类别和特征编码方法。

3.2.1 Program Analysis Approaches.

As shown in Figure 3, program analysis approaches to extract raw features from Android APKs can be categorized into three types: static analysis, dynamic analysis, and hybrid analysis.

Static Analysis. Figure 3 presents that the majority of sources (73%) extract raw features using static analysis approaches. Reverse-engineering tools such as Androguard [21] and APKtool [11] are required to disassemble and/or decompile Android APK. The raw information extracted from the APK files is used for further analysis of malicious applications. The extracted information is diverse. Raw binary code and opcode sequence can be fed directly to DL models [59, 66, 141, 196]. Aside from that, high-level semantic features like API calls and permissions are also widely used [54, 77, 140, 188].

翻译：

静态分析

图3显示，大多数来源（73%）使用静态分析方法提取原始特征。需要使用诸如 Androguard 和 APKtool 等逆向工程工具来反汇编和/或反编译 Android APK。从 APK 文件中提取的原始信息用于进一步分析恶意应用程序。提取的信息多样化。原始二进制代码和操作码序列可以直接输入到 DL 模型中。此外，高级语义特征，如 API 调用和权限，也被广泛使用。

Dynamic Analysis. Only 17% of primary studies use dynamic analysis approaches to collect raw features from Android APK files. This finding is not surprising given that dynamic analysis requires executing apps in a protected environment and dynamic analysis can only provide a partial picture of applications (i.e., it is challenging to cover all code) [38, 95]. However, dynamic analysis works by running samples to examine the runtime behaviors and system metrics of Android applications, which is more resilient to malware evasion techniques like obfuscation [63]. Representative dynamic analysis tools include TaintDroid [35], CopperDroid [146], and so on. Dynamic features are obtained by dynamically executing collected app samples in a controlled environment, such as an Android emulator or a real mobile device. Thirteen primary studies employ emulators such as Genymotion to monitor the application’s dynamic behaviors. However, various anti-emulator techniques are developed to conceal malicious activities. Thus, we also discovered that there are seven primary studies focusing on dynamic analysis on real mobile devices. For example, Alzaylaee et al. [5] demonstrated that on-device dynamic analysis performed much better than on-simulator dynamic analysis concerning stability and detecting ability.

翻译：

动态分析

仅有 17% 的主要研究使用动态分析方法从 Android APK 文件中收集原始特征。考虑到动态分析需要在受保护的环境中执行应用程序，而且动态分析只能提供应用程序的部分概况（即覆盖所有代码具有挑战性），这一发现并不令人惊讶。

然而，动态分析通过运行样本来检查 Android 应用程序的运行时行为和系统指标，这对抗恶意软件逃避技术（如混淆）具有更强的适应性。具有代表性的动态分析工具包括 TaintDroid、CopperDroid 等。动态特征是通过在受控环境中动态执行收集的应用程序样本获得的，例如 Android 模拟器或真实移动设备。有 13 项主要研究采用诸如 Genymotion 之类的模拟器来监视应用程序的动态行为。然而，各种反模拟器技术被开发出来以掩盖恶意活动。此外，我们还发现有七项主要研究关注在真实移动设备上进行动态分析。例如，Alzaylaee 等人证明，在设备上的动态分析在稳定性和检测能力方面比在模拟器上的动态分析表现得更好。

在这里，“受保护的环境”是指一个安全、受控的环境，用于运行和观察应用程序的行为，而不会对其他应用程序或系统造成损害。在进行动态分析时，研究人员通常会使用沙箱、模拟器或专门的测试设备来模拟操作系统和硬件环境。这样，即使应用程序包含恶意代码或行为，它们也不会影响到实际的用户数据或其他应用程序。这个受保护的环境允许研究人员安全地观察应用程序的运行时行为和系统指标，以发现任何潜在的恶意活动。

Hybrid Analysis. Figure 3 presents that 10% of primary studies involve hybrid program analysis (which combines static and dynamic analysis). Static program analysis has the advantage of providing full code coverage at a lower computational cost but it is vulnerable to evasion techniques like obfuscation, while dynamic program analysis allows for the analysis of runtime behaviors in a controlled environment but the code coverage may be limited [23, 154]. Although hybrid analysis leverages the complementary strengths of both types of program analyses, it is still computationally intensive, which may explain why the number of related studies is small.

翻译：

混合分析

图3显示，10% 的主要研究涉及混合程序分析（将静态分析和动态分析结合在一起）。静态程序分析具有在较低计算成本下提供完整代码覆盖率的优势，但容易受到混淆等逃避技术的影响；而动态程序分析允许在受控环境中分析运行时行为，但代码覆盖率可能有限。尽管混合分析利用了这两种类型程序分析的互补优势，但它仍然具有较高的计算量，这也许可以解释为什么相关研究的数量较少。

3.2.2 Feature Categories. ⭐

As illustrated in Figure 4, extracted features can be summarized into 13 categories, indicating the diversity of raw feature types. Note that many studies may combine multiple types of features in order to accurately represent a malicious application.

翻译：

如图4所示，提取的特征可以归纳为 13 个类别，表明原始特征类型的多样性。请注意，许多研究可能会结合多种类型的特征，以便准确表示恶意应用程序。

Fig. 4. Summary of the primary studies by feature categories.

As can be observed from Figure 4, semantic features are the most common. API calls (55.3%) and permissions (51.5%) have been the most frequently used feature types, accounting for well over half of primary studies. A possible explanation for this might be that API calls and permissions carry sufficient semantics and that the risk API calls and permissions usually result in dangerous or malicious behavior. Other types of semantic information extracted from the decompiled code such as filtered intents and app components are also used by a large number of primary studies. There are also 13 primary studies (10%) using program graphs like Control Flow Graph (CFG) and Dataflow Graph (DFG) to represent an application when analyzing Android malware. Apart from the semantic information extracted from decompressed APK, we find eight recent studies leverage app metadata such as icons and app descriptions for the subsequent analysis.

翻译：

正如图4所示，语义特征是最常见的。API 调用（55.3%）和权限（51.5%）是使用频率最高的特征类型，占主要研究的一半以上。可能的解释是 API 调用和权限具有足够的语义，而且 API 调用和权限通常会导致危险或恶意行为。从反编译代码中提取的其他类型的语义信息，如过滤意图（filtered intents）和应用程序组件，也被大量的主要研究所使用。还有13项主要研究（10%）使用诸如控制流图（CFG）和数据流图（DFG）之类的程序图来表示在分析 Android 恶意软件时的应用程序。除了从解压缩的 APK 中提取的语义信息外，我们发现了8项最近的研究利用应用元数据（如图标和应用描述）进行后续分析。

API调用：API（应用程序编程接口）调用是指应用程序在执行过程中调用的系统或第三方提供的接口。这些接口允许应用程序与操作系统、硬件和其他软件组件进行交互。在分析恶意软件时，研究人员通常关注那些可能被利用以执行恶意行为的API调用，例如访问用户数据、发送短信或启动其他应用程序。

以下是一些API调用的实例，它们在恶意软件中可能被用于执行恶意行为：
- Runtime.exec()：此API调用允许应用程序执行系统命令。恶意软件可能使用此API来执行攻击操作，例如安装其他恶意应用、擦除数据或执行远程命令。
- TelephonyManager.getDeviceId()：此API调用允许应用程序获取设备的IMEI号。恶意软件可能使用此信息来唯一识别设备并上传到攻击者的服务器，以便进行设备跟踪或定位。
- SmsManager.sendTextMessage()：此API调用允许应用程序发送短信。恶意软件可能使用此API向指定号码发送高额费用的短信，以实现诈骗目的。
权限：在Android系统中，权限是指应用程序需要获得的设备功能或数据访问权限。当应用程序要访问某些敏感资源（如用户联系人、位置信息或相机）时，必须在其清单文件（AndroidManifest.xml）中声明所需的权限。在分析恶意软件时，研究人员关注那些可能被用于执行恶意行为的权限，例如读取用户数据、发送短信或访问位置信息。

以下是一些权限实例，它们在恶意软件中可能被用于执行恶意行为：
- android.permission.READ_PHONE_STATE：此权限允许应用程序访问设备的电话状态（如IMEI号、电话号码和当前网络状态）。恶意软件可能使用此权限来收集用户的隐私信息，并上传到攻击者的服务器。
- android.permission.ACCESS_FINE_LOCATION：此权限允许应用程序访问设备的精确位置信息。恶意软件可能使用此权限来追踪用户的位置，以便进行针对性的攻击或欺诈。
- android.permission.SEND_SMS：此权限允许应用程序发送短信。恶意软件可能使用此权限向指定号码发送高额费用的短信，以实现诈骗目的。
过滤意图（Intent Filters）：在Android系统中，意图（Intent）是一种用于描述应用程序之间交互的消息对象。过滤意图是指用于处理特定类型意图的应用程序组件（如活动、服务和广播接收器）所需的条件。通过分析过滤意图，研究人员可以了解恶意软件如何响应或发送特定类型的意图，从而揭示其攻击行为。

以下是一个简化的例子，演示了如何使用Intent Filters进行Android恶意软件检测：

假设我们有一个名为ExampleMalware的Android应用程序，从外部来源安装（而非Google Play商店）。通过查看其AndroidManifest.xml文件，我们发现以下可疑的Intent Filter：
```
<activity android:name="com.example.malware.MaliciousActivity">
    <intent-filter>
        <action android:name="android.intent.action.BOOT_COMPLETED" />
        <category android:name="android.intent.category.DEFAULT" />
    </intent-filter>
</activity>
```
在这个例子中，MaliciousActivity组件的Intent Filter监听了BOOT_COMPLETED操作。这意味着，当设备启动完成时，这个组件会自动运行。恶意软件通常会使用此技巧以实现持久性，确保设备重启后仍能继续执行恶意行为。

为了进一步了解这个应用程序的潜在恶意行为，我们可以对其源代码进行静态分析或使用逆向工程工具（如 jadx 或 apktool）来查看反编译的代码。这可以帮助我们找到更多关于应用程序行为的信息。

如果在MaliciousActivity组件中发现了可疑的代码（如窃取用户数据、发送短信或执行其他恶意操作），我们就可以得出结论，该应用程序是一个恶意软件。

通过分析Android应用程序的Intent Filters，安全研究人员和反恶意软件工具可以发现可疑或恶意行为，从而有效地保护用户设备免受恶意软件侵害。这仅仅是对Android恶意软件进行检测和分析的一种方法，其他方法还包括动态分析、API调用跟踪等。
应用程序组件（Application Components）：Android应用程序由多个组件构成，包括活动（Activity）、服务（Service）、广播接收器（BroadcastReceiver）和内容提供者（ContentProvider）。这些组件负责处理不同类型的用户界面、后台任务和系统事件。在分析恶意软件时，研究人员通常关注那些可能与恶意行为关联的组件，例如隐藏的活动、恶意服务或广播接收器。

以下是一个简化的例子，演示了如何使用应用程序组件进行Android恶意软件检测：

假设我们有一个名为ExampleMalware的Android应用程序，从外部来源安装（而非Google Play商店）。通过查看其AndroidManifest.xml文件，我们发现以下可疑的Service组件：
```
<service android:name="com.example.malware.MaliciousService" android:exported="true">
    <intent-filter>
        <action android:name="com.example.malware.START_MALICIOUS_SERVICE" />
    </intent-filter>
</service>
```
在这个例子中，MaliciousService组件声明为exported，这意味着其他应用程序可以启动和与之交互。这可能会被恶意软件用于在设备上执行后台任务。

为了进一步了解这个应用程序的潜在恶意行为，我们可以对其源代码进行静态分析或使用逆向工程工具（如 jadx 或 apktool）来查看反编译的代码。在对MaliciousService进行深入分析时，假设我们发现了以下可疑的代码片段：
```
public class MaliciousService extends Service {
    @Override
    public int onStartCommand(Intent intent, int flags, int startId) {
        new Thread(new Runnable() {
            @Override
            public void run() {
                while (true) {
                    stealContacts();
                    sendSMS();
                    try {
                        Thread.sleep(60 * 60 * 1000); // Sleep for 60 minutes
                    } catch (InterruptedException e) {
                        e.printStackTrace();
                    }
                }
            }
        }).start();
   
        return START_STICKY;
    }
   
    private void stealContacts() {
        // Code to steal contacts from the device
    }
   
    private void sendSMS() {
        // Code to send SMS messages without user's consent
    }
}
```
在这里，MaliciousService一旦启动，将持续执行窃取用户联系人数据和发送短信的操作。这些行为明显是恶意的，因此我们可以得出结论，该应用程序是一个恶意软件。
控制流图（Control Flow Graph, CFG）：控制流图是一种图形表示，用于描述程序代码中控制流（如条件分支和循环）的结构。在分析恶意软件时，研究人员可以通过控制流图来理解其内部逻辑和攻击手段。
数据流图（Data Flow Graph, DFG）：数据流图是一种图形表示，用于描述程序代码中数据值在不同操作之间传递的过程。在分析恶意软件时，研究人员可以通过数据流图来跟踪数据的来源和传播路径，从而揭示潜在的攻击行为。
应用元数据：应用元数据是指那些描述应用程序特性的信息，如图标、应用名称、开发者信息和应用描述等。这些信息通常包含在APK文件中，并在应用商店中展示给用户。在分析恶意软件时，研究人员可以利用这些元数据来识别恶意应用的分发渠道和伪装手段。

Although the aforementioned features are usually extracted via static analysis, we discover two distinct dynamic features. Eighteen primary studies employ Linux kernel system calls as extracted features to capture malicious behaviors. Unlike API calls, Linux kernel system calls are not dependent on the Android OS version, making them more resilient to malware evasion strategies [63]. Additionally, 14 primary studies examine characteristics associated with dynamic activities such as network access and memory dump. These observations from Figure 4 corroborate those from Figure 3, indicating that static analysis is the most frequently occurring approach for program analysis.

翻译：

尽管上述特征通常是通过静态分析提取的，但我们发现两种不同的动态特征。有18项主要研究使用 Linux 内核系统调用作为提取特征以捕获恶意行为。与 API 调用不同，Linux 内核系统调用不依赖于 Android 操作系统版本，使它们更能抵御恶意软件的规避策略。此外，14项主要研究检查与动态活动（dynamic activities）相关的特征，如网络访问和内存转储。图4的观察结果与图3的观察结果相互印证，表明静态分析是程序分析中最常用的方法。

Although high-level semantic features such as API calls remain the most commonly used, there is an increasing number of primary studies using raw code sequences to construct feature vectors. Figure 4 indicates that the most frequently occurring raw code feature is raw opcode sequences from disassembled Android apps (with 22 studies). The raw opcode sequences are fed into deep neural networks to learn high-level semantic feature representation automatically [127]. Notice that four primary sources convert disassembled code to Java source code to construct feature vectors. On the other hand, we find that 13 primary studies fed the deep neural models with the raw classes.dex bytecode. For example, R2-D2 [67] converts bytecode into a color image by mapping the bytecode’s hexadecimal value to the RGB color code.

翻译：

尽管 API 调用等高级语义特征仍然是最常用的，但使用原始代码序列构建特征向量的主要研究数量正在增加。图4表明，最常出现的原始代码特征是从反汇编的 Android 应用程序中提取的原始操作码序列（22项研究）。

原始操作码序列（The raw opcode sequences）被输入到深度神经网络中，自动学习高级语义特征表示。注意到有四个主要来源将反汇编代码转换为 Java 源代码来构建特征向量。另一方面，我们发现有13项主要研究将原始的 classes.dex 字节码输入到深度神经模型中。例如，R2-D2 将字节码转换为彩色图像，通过将字节码的十六进制值映射到 RGB 颜色代码。

3.2.3 Feature Encoding Approaches.

Figure 5 provides a summary of examined sources based on feature encoding approaches. Following program analysis, the extracted information is further encoded into feature vectors and then fed into DL models. There are numerous ways to represent extracted features in primary studies, as extracted data from Android applications take on a variety of categories. Thus, we classify feature encoding approaches into the following five categories.

翻译：

图5总结了基于特征编码方法的研究来源。在程序分析之后，提取的信息进一步被编码为特征向量，然后输入到深度学习模型中。在主要研究中，有许多方法可以表示从Android应用程序中提取的特征，因为从Android应用程序中提取的数据涉及多种类别。因此，我们将特征编码方法分为以下五个类别。

Categorical Encoding. Figure 5 indicates that categorical encoding approaches are most frequently occurring at 47% of sources (62 primary studies). This result appears to be consistent with Section 3.2.2 which indicates that categorical semantic features like API calls and permissions are the most frequently used. Typically, a numerical vector is constructed to indicate the presence of each categorical feature. It is noteworthy that we discovered that 55 out of 62 primary studies adopt one-hot encoding to record the information of the presence of each possible feature value for applications. For instance, DroidDetector [185] considers a total of 192 features through hybrid analysis, and constructs a 192-dimensional vector for each app where each feature is assigned a value of 1 if it occurs in the app; otherwise, it is assigned a value of 0. In addition, we find that seven sources assign each feature a discriminative integer and store the used features in a numerical vector. Although categorical encoding is the most prevalent strategy because of its simplicity, it has two significant drawbacks: (1) high-dimensional generation and (2) embedding in isolation between distinct patterns [98].

翻译：

类别编码

图5显示，类别编码方法在47%的来源（62个主要研究）中出现频率最高。这个结果似乎与第3.2.2节一致，该节表明类别语义特征（如API调用和权限）是最常用的。通常，会构建一个数值向量来表示每个类别特征的存在。值得注意的是，我们发现62个主要研究中有55个采用独热编码来记录应用程序中每个可能特征值的存在信息。

例如，DroidDetector[185]通过混合分析总共考虑了192个特征，并为每个应用构建了一个192维向量，其中如果某个特征出现在应用中，则该特征被赋予1的值；否则，赋予0的值。此外，我们发现七个来源为每个特征分配一个有区别的整数，并将使用的特征存储在一个数值向量中。尽管类别编码是最普遍的策略，因为其简单性，但它有两个显著缺点：（1）产生高维度和（2）在不同模式之间的隔离嵌入[98]。

Text-Based Encoding. It is quite common to employ approaches from natural language processing to encoding sequential features. Figure 5 indicates that 26 primary studies (20%) attempt to utilize text-based feature encoding approaches. Numerous state-of-the-art text encoding approaches have been introduced to process sequential data. In fact, one-hot encoding is the simplest method of text encoding but one of its disadvantages is the high-dimensional problem that we discussed before. In addition, some researchers employ discrete encoding approaches like Bag of Words (BOW), Term Frequency–Inverse Document Frequency (TF-IDF), and N-Gram [9, 122, 126, 147, 150, 153, 176, 181]. These methods, however, are still limited by data sparsity and high-dimensionality issues [160]. Therefore, many primary studies further investigate the effectiveness of pre-trained word embedding models, such as Continuous Word2vec [18, 23, 42, 73, 158, 188, 198] and GloVe [73].

翻译：

基于文本的编码

采用自然语言处理方法对顺序特征进行编码非常普遍。图5显示，26个主要研究（占20%）尝试使用基于文本的特征编码方法。许多最先进的文本编码方法被引入处理顺序数据。实际上，独热编码是文本编码的最简单方法，但它的一个缺点是我们之前讨论过的高维问题。

此外，一些研究人员采用离散编码方法，如单词包（BOW）、词频-逆文档频率（TF-IDF）和N-Gram [9, 122, 126, 147, 150, 153, 176, 181]。然而，这些方法仍然受到数据稀疏性和高维度问题的限制[160]。因此，许多主要研究进一步研究预训练词嵌入模型的有效性，如持续Word2vec [18, 23, 42, 73, 158, 188, 198]和GloVe [73]。

Graph-Based Encoding. We find that 15 primary studies (11%) employ graph-based representation approaches. Deep4MalDroid [63] obtains system calls through dynamic analysis tools to construct a weighted directed graph, and graph structure information including weights of each edge and in-degree and out-degree of each node is stored in vectors as inputs. Xu et al. [192] encode CFG and DFG into adjacency metrics, respectively, and combine them into a single metric in embedding layers. In [117], the authors investigate several state-of-the-art graph embedding approaches to encode API call graphs, including DeepWalk [121], Node2vec [55], HOPE [114], and so on.

翻译：

基于图的编码

我们发现15个主要研究（占11%）采用基于图的表示方法。Deep4MalDroid[63]通过动态分析工具获取系统调用以构建加权有向图，图结构信息（包括每条边的权重以及每个节点的入度和出度）作为输入存储在向量中。Xu等人[192]分别将CFG和DFG编码为邻接矩阵，并将它们合并为嵌入层中的单个度量。在[117]中，作者研究了几种先进的图嵌入方法对API调用图进行编码，包括DeepWalk[121]、Node2vec[55]、HOPE[114]等。

Image-Based Encoding. Image-based representation, employed in 16 primary studies (12%), usually transforms extracted features into a grayscale or color image. The most common scenario is directly transforming bytecode into images. For instance, IMCFN [151] reads an Android binary as a vector of 8-bit unsigned integers and then converts it into a two-dimensional array. Following that, the Android bytecode is visualized as a color image based on the RGB color map. Numerous research studies used similar approaches to encode Android bytecode [19, 59, 66, 107, 130, 141, 171].

翻译：

基于图像的编码

图像表示方法在16个主要研究中占比12%，通常将提取的特征转换为灰度图或彩色图像。最常见的情况是直接将字节码转换为图像。例如，IMCFN[151]将Android二进制文件读取为8位无符号整数向量，然后将其转换为二维数组。接下来，根据RGB颜色映射，将Android字节码可视化为彩色图像。许多研究使用类似的方法对Android字节码进行编码[19, 59, 66, 107, 130, 141, 171]。

Hybrid Encoding. Combining distinct feature encoding approaches to process richer features is also common in collected research (6%). Take Kim et al. [77] as an example. The authors construct one-hot vectors to record the existence of categorical features like permission, string, and app components. At the same time, in order to alleviate the impacts of obfuscation techniques, a similarity-based feature vector generation process is introduced to encode sequential features like opcode and API calls. In [74, 116, 170], since these studies also consider icons or pictures of Android applications, both image embedding approaches and text embedding algorithms are used to encode features.

翻译：

混合编码

结合不同特征编码方法处理更丰富特征在收集到的研究中也很常见（占6%）。以Kim等人[77]为例。作者构建独热向量来记录类别特征（如权限、字符串和应用组件）的存在。同时，为了减轻混淆技术的影响，引入了基于相似性的特征向量生成过程来编码顺序特征，如操作码和API调用。在[74, 116, 170]中，因为这些研究还考虑了Android应用的图标或图片，因此同时使用图像嵌入方法和文本嵌入算法对特征进行编码。

Discussion. According to our reviewed results, the majority of research constructs feature vectors by recording the existence of various categorical features of Android applications. Many studies create a look-up table to list all the potential features based on prior knowledge or feature selection approaches, and then build a fixed-size one-hot feature vector to represent each application [16, 17, 36, 40, 41, 43, 50, 53, 64, 65, 91, 92, 105, 112, 129, 157, 159, 161, 167, 184, 185]. For instance, Wu et al. [167] identified 158 high-risk features to construct feature vectors (including 97 API calls and 61 permissions). However, there are several issues to process features in this way. One of these is that it is pretty difficult to define a robust malicious feature list using either humans’ experience or traditional feature selection approaches. The built feature lists can’t encompass all potential malicious characteristics, resulting in poor performance in the practical application. Even when all features in the training data are used, concept drift caused by Android malware evolution is a serious problem that cannot be ignored [187]. Android malware continues to evolve with similar functionality but a completely different implementation, easily evading detection by Android malware defense models. As a result, how to design effective and practical feature lists is a challenging issue.

翻译：

讨论：根据我们的研究结果，大多数研究通过记录Android应用程序各种类别特征的存在来构建特征向量。许多研究基于先验知识或特征选择方法创建查找表以列出所有潜在特征，然后为每个应用程序构建固定大小的独热特征向量[16, 17, 36, 40, 41, 43, 50, 53, 64, 65, 91, 92, 105, 112, 129, 157, 159, 161, 167, 184, 185]。

例如，Wu等人[167]确定了158个高风险特征以构建特征向量（包括97个API调用和61个权限）。然而，以这种方式处理特征存在几个问题。其中之一是，使用人类经验或传统特征选择方法定义稳定的恶意特征列表非常困难。构建的特征列表无法涵盖所有潜在的恶意特征，导致在实际应用中性能不佳。

即使使用训练数据中的所有特征，由于Android恶意软件的演变而引起的概念漂移也是一个不能忽视的严重问题[187]。Android恶意软件继续演变，功能相似但实现完全不同，轻松规避Android恶意软件防御模型的检测。因此，如何设计有效且实用的特征列表是一个具有挑战性的问题。

As shown in Figure 3, static program analysis is the most common approach (73%). Furthermore, our results in Section 3.2.2 show that the majority of reviewed studies extract static semantic features from disassembled files. A significant drawback of this approach is its weak ability to handle obfuscation problems. Obfuscation techniques (e.g., polymorphic code, encryption) transform malware binaries into self-compressed and uniquely structured binary files that are resistant to reverse-engineering approaches [47, 113]. Obfuscation techniques improve code protection for Android apps, but create significant barriers to malware analysis. For example, code reordering aims to modify the order of instructions in smali code but preserve the original runtime execution trace, thereby evading detection by malware defense tools [15]. By using a variety of obfuscation techniques, malware attackers can produce multiple variants of a single malicious sample, complicating malware defenses. Although some studies have shown that the proposed DL-based approaches are slightly affected by some simple obfuscation approaches [77, 84, 108, 175], we cannot ignore the fact that the real-world obfuscation techniques constantly update and evolve against anti-malware approaches [127]. Investigating obfuscated apps using DL techniques is a potential future research topic, and we outline some potential research trends: (1) using DL techniques to detect and analyze obfuscation approaches and (2) analyzing malware based on bytecode level rather than capturing semantic features.

翻译：

如图3所示，静态程序分析是最常见的方法（占73%）。此外，我们在第3.2.2节的结果表明，大多数审阅研究从反汇编文件中提取静态语义特征。这种方法的一个显著缺点是处理混淆问题的能力较弱。混淆技术（例如，多态代码、加密）将恶意软件二进制文件转换为具有抗逆向工程特性的自我压缩且结构独特的二进制文件[47, 113]。混淆技术提高了Android应用的代码保护，但为恶意软件分析创造了重大障碍。

例如，代码重排序旨在修改smali代码中指令的顺序，但保留原始运行时执行跟踪，从而规避恶意软件防御工具的检测[15]。通过使用多种混淆技术，恶意软件攻击者可以生成单个恶意样本的多个变体，使恶意软件防御变得复杂。

尽管一些研究表明，所提出的基于DL的方法受到一些简单混淆方法的轻微影响[77, 84, 108, 175]，但我们不能忽视现实世界中的混淆技术不断更新并针对反恶意软件方法演变的事实[127]。使用DL技术研究混淆应用程序是一个潜在的未来研究课题，我们概述了一些可能的研究趋势：

（1）使用DL技术检测和分析混淆方法；

（2）基于字节码级别分析恶意软件，而不是捕获语义特征。

RQ2.1 How are features processed for model training?

Static analysis is mostly used to obtain features, and static semantic features like API calls and permissions remain the most frequently utilized.
The number of primary studies devoted to dynamic analysis is rising and many generally applicable methodologies/frameworks are proposed.
One-hot encoding and text encoding are mostly used to represent features.
Thirteen primary studies encode raw bytecode into feature vectors.

3.3 DL Techniques

Responding to RQ2.2, this section provides a detailed review of the primary studies according to DL techniques. To comprehend this section, readers are expected to be relatively familiar with DL. For more details on the patterns described, readers are referred to the DL textbook by Goodfellow et al. [51].