HiGraph: A Large-Scale Hierarchical Graph Dataset
Hierarchical Graph Dataset for Malware Analysis with Function Call Graphs and Control Flow Graphs
Interactive Graph Visualization
Explore the hierarchical structure of malware samples through our interactive visualization tool.
Click to explore the complete dataset
Abstract
Graph-based methods have shown great promise in malware analysis, yet the lack of large-scale, hierarchical graph datasets limits further advances in this field. To bridge this gap, we introduceHIGRAPH, a novel, large-scale dataset that models each application as a hierarchical graph: a local Control Flow Graph (CFG) capturing intra-function logic and a global Function Call Graph (FCG) capturing inter-function interactions.
This hierarchical design facilitates the development of robust detection models that are more resilient to obfuscation, model aging, and malware evolution. HIGRAPH contains over 200M control flow graphs and 595K function call graphs, preserving rich semantic and structural information crucial for analyzing sophisticated malware behaviors. We provide an in-depth analysis of HIGRAPH and highlight its potential as a benchmark dataset for advancing hierarchical graph learning in cybersecurity.
Dataset Overview
Hierarchical Graph Structure
HiGraph models each application as a hierarchical graph, preserving both local and global structural information

Download Dataset
Access the complete HiGraph dataset through Hugging Face
Compressed dataset size
11 years of samples
Creative Commons
Updates
Changelog
Latest updates and improvements to the HiGraph dataset.
May 16, 2025
May 16, 2025
Initial release of the HiGraph dataset.
HiGraph, a novel, large-scale dataset that models each application as a hierarchical graph, is made publicly available. This initial version includes over 200 million Control Flow Graphs (CFGs) and over 595,000 Function Call Graphs (FCGs).
Future Plans
Future Plans
Continued development and expansion of the HiGraph dataset.
- Regular updates with new samples and features.
- Integration of more advanced graph analysis tools.
- Community contributions and collaborations.