AI for Software Engineering

← Back to homepage

I explore how AI techniques — from graph neural networks to large language models — can be integrated with program analysis to automate software engineering tasks such as bug detection, code understanding, and program repair.

LLM-Driven Program Repair and Specification Generation

Repository-level memory error repair — We proposed LTFix, the first system combining LLMs with typestate-guided program analysis for codebase-level memory error repair. It fixes 37 out of 49 real memory errors (94.7% more than SWE-agent) using ~1/42 of the tokens, and successfully repaired 3 zero-day vulnerabilities. [C13: FSE '26]
API specification generation — SpecGuru uses hierarchical LLM inference with self-validation to automatically generate points-to specifications for C library APIs, enabling effective alias and taint analysis without library source code. [C11: ICSE '26]

Code Embedding and Vulnerability Detection

Value-flow-based code embedding — We proposed Flow2Vec, which preserves interprocedural alias-aware value-flow transitivity via matrix multiplication and CFL-reachability, improving code classification by 21% F1 over code2vec/code2seq. This work received the ACM SIGPLAN Distinguished Paper Award. [C2: OOPSLA '20]
Path-sensitive code embedding — We proposed ContraFlow, a contrastive learning approach on value-flow paths that achieves 83% F1 for vulnerability detection with up to 450% improvement in vulnerability localization metrics. [C3: ISSTA '22]
GNN-based vulnerability detection — We proposed DeepWukong, using graph neural networks on program dependence graph slices, achieving 97% accuracy and 96% F1 across 10 CWE types. [J1: TOSEM '21]
Evaluating learning-based detectors — We proposed bug-triggering path (BTP) metrics, revealing an 85% IoU gap between learning methods and traditional static analyzers in vulnerability localization. [J2: TDSC]

← Back to homepage