분류
2025년 2월
작성일
2024.10.14
수정일
2025.01.17
작성자
황선진
조회수
55

Effective Deep Learning Primitives Design for Binary Vulnerability and Malware Detection

Title: Effective Deep Learning Primitives Design for Binary Vulnerability and Malware Detection

 

 

As deep learning continues to achieve success in various areas, it has also attracted significant attention in cybersecurity. While deep learning is highly effective at detecting cybersecurity threats, existing approaches often depend on existing deep learning primitives. Here, “deep learning primitives” refer to functor x-objects designed to encapsulate particular computations, including data pre-processing, model architectures, and data augmentation. Although the design of deep learning primitives has the potential to enhance performance in binary vulnerability and malware detection, this area remains largely unexplored. In this dissertation, we introduce effective deep learning primitives optimized for the cybersecurity domain to address two critical areas, which are smart contract vulnerabilities and malware detection. As an emerging technology in decentralized finance, smart contracts need robust vulnerability detection, while malware remains an evolving and resilient threat. By considering both as binary analysis tasks, deep learning is possible to identify smart contract vulnerabilities and malware in traditional binaries effectively. Specifically, we obtained three remarkable contributions. First, a code-targeted architecture and data pre-processing strategy for binary vulnerability and malware detection. Second, a de-obfuscation technique designed to address obfuscation techniques that commonly degrade malware detection performance. Third, a data augmentation method tailored for smart contracts, which is a relatively new programming language that currently lacks extensive datasets. The primary contributions of this dissertation are as follows:

 

Deep Learning Architecture for Effective Binary Vulnerability and Malware Detection : Deep learning models, particularly those utilizing convolutional neural networks (CNNs), have demonstrated high detection accuracy and speed in binary code classification. However, codes, which are converted into image-like representations, often ignore the meaning and context of binary codes, resulting in false detections. To address this problem, we introduce CodeNet, an efficient code-targeted CNN architecture designed to detect smart contract vulnerabilities and malware while preserving code semantics. Additionally, to improve learning efficiency, we propose a data pre-processing method to convert smart contract code and malware into efficient features. Experimental results indicate that the proposed CodeNet architecture offers both high detection accuracy and competitive processing times.

 

De-obfuscation as a Data Pre-Processing for Binary Malware Detection : One of the simple approaches to detect malware using a deep learning model is to convert it into binary form. However, an obfuscation technique, which is commonly used to hide malware, significantly alters critical malware features, leading to misclassification in the detection model. To solve this problem, we propose a generic API de-obfuscation and unpacking method called GUARD. GUARD combines emulation-based obfuscated call detection with an analysis algorithm and a scattered import address table (sIAT) to restore original API calls from packed files effectively. Evaluations against advanced commercial packers, including Themida and VMProtect, demonstrate that GUARD can restore obfuscated APIs and unpack files, achieving up to a 24\% improvement in malware detection rates.

 

Data Augmentation to Improve Smart Contract Vulnerability Detection : The programming of smart contracts inherits numerous software development challenges, including security, reliability, and data efficiency. Recently, deep learning-based solutions have been in the spotlight in software development to solve various challenges. A deep learning model recognizes patterns from data, and it is widely known that the number of training sets improves accuracy and results in better. However, it is challenging to obtain a public dataset that is perfectly suitable for a specific task. Compared to the large amount of datasets available for conventional programming languages like C/C++, JAVA, and Python, the data size for smart contracts is relatively small. This scarcity is particularly challenging as smart contracts are a recent innovation compared to traditional programming languages. To address this problem, we propose the Compiler-Guided Generation Network (CGGNet) to augment smart contract datasets. Unlike traditional text generation methods like SeqGAN, CGGNet employs a compiler as an oracle within generative networks to guarantee the validity of the generated smart contracts. Incorporating Monte Carlo tree search, CGGNet improves the diversity and validity of generated contracts, effectively surpassing GAN-based models in generating syntactically correct augmentations. This approach enables the generation of millions of unique and valid smart contracts from thousands of inputs, effectively addressing underfitting in deep learning applications.

 

Three deep learning primitives described above in this dissertation provide improvements in analyzing binary smart contracts and malware. Therefore, we can use these primitives to enhance binary vulnerability and malware detection accuracy.

학위연월
2025년 2월
지도교수
최윤호
키워드
Deep Learning Primitives Design, Vulnerability Detection, Malware Detection
소개 웹페이지
https://sites.google.com/view/phdhwangsj/
첨부파일
첨부파일이(가) 없습니다.
다음글
Uncertainty-Based Hybrid Deep Learning Approach for Superior IoT Security Amidst Evolving Cyber Threats
멘가라 악셀 기드온 2024-12-10 15:41:57.577
이전글
Toward Immersive Multiview Video Streaming through Viewpoint-centric Quality Adaptation
탄중 디온 2024-10-14 10:09:08.633
RSS 2.0 123
게시물 검색
박사학위논문
번호 제목 작성자 작성일 첨부파일 조회수
123 Uncertainty-Based Hybrid Deep Learning Approach fo 멘가라 악셀 기드온 2024.12.10 0 39
122 Effective Deep Learning Primitives Design for Bina 황선진 2024.10.14 0 55
121 Toward Immersive Multiview Video Streaming through 탄중 디온 2024.10.14 0 37
120 A Low-cost Deep Learning Model for Real-time Low L 등 제강 2024.10.10 0 65
119 Enhancing Nested Entity Recognition Using Nested R 양홍진 2024.10.09 0 58
118 다양한 도메인과 데이터 형식에 강건한 사전학습 언어모델 기반의 표 질의응답 방법 조상현 2024.10.09 0 55
117 Trust Guard Extension for Enhanced Security Featur 김해용 2024.05.04 0 81
116 Task-Specific Differential Private Data Publish Me 신진명 2024.04.09 0 89
115 Advanced Defense Framework against Physical Advers 김용수 2024.04.08 0 121
114 한글 메신저 채팅의 크로스 텍스팅 탐지를 위한 저자 검증 모형 이다영 2024.04.05 0 102
113 상태 기반 테스트 시나리오 보강 방법 이선열 2023.10.17 0 176
112 Manufacturing Testing Automation FrameworkBased on 강효은 2023.10.17 0 196
111 Synthesizing Robust Physical Camouflage for Univer 수랸토 나우팔 2023.10.16 0 175
110 복잡도 다양성을 고려한 C 프로그램의 시험 용이성 예측 모형 구축 방법 최현재 2023.10.16 0 158
109 Design and Optimization of Quantum Arithmetic Circ 라라사티 하라스타 타티마 2023.10.13 0 187
108 Improving 6TiSCH Network Formation and Transmissio 파와즈 자키 자키얄 2023.10.10 0 176
107 저지연 고신뢰 운전자 프로파일링을 위한 딥러닝 모델 및 조기 종료 기법 임재봉 2023.10.08 0 250
106 802.11ax 대규모 Wi-Fi 환경의 심층 생성 모델을 활용한 트래픽 모델링 및 AP 이재민 2023.04.07 0 148
105 뉴런 클러스터를 활용한 합성곱 신경망 이미지 분류 신뢰성 향상 방법 이영우 2023.04.06 0 142
104 Trust Guard Extension Framework for Enhanced Secur 김해용 2023.04.06 0 114