Binary Code Extraction and Interface Identification for Security Applications

Juan Caballero; Noah M. Johnson; Stephen McCamant; Dawn Song

Binary Code Extraction and Interface Identification for Security Applications

Juan Caballero, Noah M. Johnson, Stephen McCamant, Dawn Song

Research output: Contribution to conference › Paper › peer-review

Abstract

Binary code reuse is the process of automatically identifying the interface and extracting the instructions and data dependencies of a code fragment from an executable program, so that it is self-contained and can be reused by external code. Binary code reuse is useful for a number of security applications, including reusing the proprietary cryptographic or unpacking functions from a malware sample and for rewriting a network dialog. In this paper we conduct the first systematic study of automated binary code reuse and its security applications. The main challenge in binary code reuse is understanding the code fragment’s interface. We propose a novel technique to identify the prototype of an undocumented code fragment directly from the program’s binary, without access to source code or symbol information. Further, we must also extract the code itself from the binary so that it is self-contained and can be easily reused in another program. We design and implement a tool that uses a combination of dynamic and static analysis to automatically identify the prototype and extract the instructions of an assembly function into a form that can be reused by other C code. The extracted function can be run independently of the rest of the program’s functionality and shared with other users. We apply our approach to scenarios that include extracting the encryption and decryption routines from malware samples, and show that these routines can be reused by a network proxy to decrypt encrypted traffic on the network. This allows the network proxy to rewrite the malware’s encrypted traffic by combining the extracted encryption and decryption functions with the session keys and the protocol grammar. We also show that we can reuse a code fragment from an unpacking function for the unpacking routine for a different sample of the same family, even if the code fragment is not a complete function.

Original language	English (US)
State	Published - 2010
Externally published	Yes
Event	17th Symposium on Network and Distributed System Security, NDSS 2010 - San Diego, United States Duration: Feb 28 2010 → Mar 3 2010

Conference

Conference	17th Symposium on Network and Distributed System Security, NDSS 2010
Country/Territory	United States
City	San Diego
Period	2/28/10 → 3/3/10

Bibliographical note

Publisher Copyright:
© 2010 Proceedings of the Symposium on Network and Distributed System Security, NDSS 2010. All Rights Reserved.

OpenUrl availability

Full text

Cite this

@conference{7909a093d168417087fb9cf73384250f,

title = "Binary Code Extraction and Interface Identification for Security Applications",

abstract = "Binary code reuse is the process of automatically identifying the interface and extracting the instructions and data dependencies of a code fragment from an executable program, so that it is self-contained and can be reused by external code. Binary code reuse is useful for a number of security applications, including reusing the proprietary cryptographic or unpacking functions from a malware sample and for rewriting a network dialog. In this paper we conduct the first systematic study of automated binary code reuse and its security applications. The main challenge in binary code reuse is understanding the code fragment{\textquoteright}s interface. We propose a novel technique to identify the prototype of an undocumented code fragment directly from the program{\textquoteright}s binary, without access to source code or symbol information. Further, we must also extract the code itself from the binary so that it is self-contained and can be easily reused in another program. We design and implement a tool that uses a combination of dynamic and static analysis to automatically identify the prototype and extract the instructions of an assembly function into a form that can be reused by other C code. The extracted function can be run independently of the rest of the program{\textquoteright}s functionality and shared with other users. We apply our approach to scenarios that include extracting the encryption and decryption routines from malware samples, and show that these routines can be reused by a network proxy to decrypt encrypted traffic on the network. This allows the network proxy to rewrite the malware{\textquoteright}s encrypted traffic by combining the extracted encryption and decryption functions with the session keys and the protocol grammar. We also show that we can reuse a code fragment from an unpacking function for the unpacking routine for a different sample of the same family, even if the code fragment is not a complete function.",

author = "Juan Caballero and Johnson, {Noah M.} and Stephen McCamant and Dawn Song",

note = "Publisher Copyright: {\textcopyright} 2010 Proceedings of the Symposium on Network and Distributed System Security, NDSS 2010. All Rights Reserved.; 17th Symposium on Network and Distributed System Security, NDSS 2010 ; Conference date: 28-02-2010 Through 03-03-2010",

year = "2010",

language = "English (US)",

}

TY - CONF

T1 - Binary Code Extraction and Interface Identification for Security Applications

AU - Caballero, Juan

AU - Johnson, Noah M.

AU - McCamant, Stephen

AU - Song, Dawn

PY - 2010

Y1 - 2010

N2 - Binary code reuse is the process of automatically identifying the interface and extracting the instructions and data dependencies of a code fragment from an executable program, so that it is self-contained and can be reused by external code. Binary code reuse is useful for a number of security applications, including reusing the proprietary cryptographic or unpacking functions from a malware sample and for rewriting a network dialog. In this paper we conduct the first systematic study of automated binary code reuse and its security applications. The main challenge in binary code reuse is understanding the code fragment’s interface. We propose a novel technique to identify the prototype of an undocumented code fragment directly from the program’s binary, without access to source code or symbol information. Further, we must also extract the code itself from the binary so that it is self-contained and can be easily reused in another program. We design and implement a tool that uses a combination of dynamic and static analysis to automatically identify the prototype and extract the instructions of an assembly function into a form that can be reused by other C code. The extracted function can be run independently of the rest of the program’s functionality and shared with other users. We apply our approach to scenarios that include extracting the encryption and decryption routines from malware samples, and show that these routines can be reused by a network proxy to decrypt encrypted traffic on the network. This allows the network proxy to rewrite the malware’s encrypted traffic by combining the extracted encryption and decryption functions with the session keys and the protocol grammar. We also show that we can reuse a code fragment from an unpacking function for the unpacking routine for a different sample of the same family, even if the code fragment is not a complete function.

AB - Binary code reuse is the process of automatically identifying the interface and extracting the instructions and data dependencies of a code fragment from an executable program, so that it is self-contained and can be reused by external code. Binary code reuse is useful for a number of security applications, including reusing the proprietary cryptographic or unpacking functions from a malware sample and for rewriting a network dialog. In this paper we conduct the first systematic study of automated binary code reuse and its security applications. The main challenge in binary code reuse is understanding the code fragment’s interface. We propose a novel technique to identify the prototype of an undocumented code fragment directly from the program’s binary, without access to source code or symbol information. Further, we must also extract the code itself from the binary so that it is self-contained and can be easily reused in another program. We design and implement a tool that uses a combination of dynamic and static analysis to automatically identify the prototype and extract the instructions of an assembly function into a form that can be reused by other C code. The extracted function can be run independently of the rest of the program’s functionality and shared with other users. We apply our approach to scenarios that include extracting the encryption and decryption routines from malware samples, and show that these routines can be reused by a network proxy to decrypt encrypted traffic on the network. This allows the network proxy to rewrite the malware’s encrypted traffic by combining the extracted encryption and decryption functions with the session keys and the protocol grammar. We also show that we can reuse a code fragment from an unpacking function for the unpacking routine for a different sample of the same family, even if the code fragment is not a complete function.

UR - http://www.scopus.com/inward/record.url?scp=85025141365&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85025141365&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:85025141365

T2 - 17th Symposium on Network and Distributed System Security, NDSS 2010

Y2 - 28 February 2010 through 3 March 2010

ER -

Binary Code Extraction and Interface Identification for Security Applications

Abstract

Conference

Bibliographical note

OpenUrl availability

Other files and links

Fingerprint

Cite this