A System-Level Dynamic Binary Translator Using Automatically-Learned Translation Rules

Jinhu Jiang, Chaoyi Liang, Rongchao Dong, Zhaohui Yang, Zhongjun Zhou, Wenwen Wang, Pen Chung Yew, Weihua Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

System-level emulators have been used extensively for the design, debugging and evaluation of the system software. They work by providing a system-level virtual machine that can support a guest operating system (OS) running on a platform with the same or different native OS using the same or different instruction-set architecture. For such a system-level emulation, dynamic binary translation (DBT) is one of the core technologies. A recently proposed learning-based approach using automatically-learned translation rules has shown to improve DBT performance significantly with much higher quality translated code. However, it has only been used on user-level emulation, not system-level emulation. In applying this approach directly on QEMU for system-level emulation, we find it actually causes an unexpected performance degradation of 5% on average. By analyzing its main culprits in more detail, we find that the learning-based approach will by default use host registers to maintain the guest CPU states that include condition-code registers (or FLAG registers). In cases where QEMU needs to be involved (in which QEMU also needs to use the host registers), maintaining system states in the host registers for the guest, the host and QEMU during and between the context switches can cause undue overheads, if not handled carefully. Such cases include emulating system-level instructions, address translation and interrupts, which require the use of QEMU's helper functions. To achieve the intended performance improvement through better-quality code generated by the learning-based approach, we propose several optimization techniques that include reducing the overhead incurred in each context switch, the number of needed context switches, and better code scheduling to eliminate context switches. Our experimental results show that such optimizations can achieve an average of 1.36X speedup over QEMU 6.1 using SPEC CINT2006 and 1.15X on real-world applications in the system emulation mode.

Original languageEnglish (US)
Title of host publicationCGO 2024 - Proceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization
EditorsTobias Grosser, Christophe Dubach, Michel Steuwer, Jingling Xue, Guilherme Ottoni, Fernando Magno Quintao Pereira
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages423-434
Number of pages12
ISBN (Electronic)9798350395099
DOIs
StatePublished - 2024
Externally publishedYes
Event22nd IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2024 - Edinburgh, United Kingdom
Duration: Mar 2 2024Mar 6 2024

Publication series

NameCGO 2024 - Proceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization

Conference

Conference22nd IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2024
Country/TerritoryUnited Kingdom
CityEdinburgh
Period3/2/243/6/24

Bibliographical note

Publisher Copyright:
© 2024 IEEE.

Fingerprint

Dive into the research topics of 'A System-Level Dynamic Binary Translator Using Automatically-Learned Translation Rules'. Together they form a unique fingerprint.

Cite this