AI for Software Engineering
← Back to homepage
I explore how AI techniques — from graph neural networks to large language models — can be integrated with program analysis to automate software engineering tasks such as bug detection, code understanding, and program repair.
LLM-Driven Program Repair and Specification Generation
- Repository-level memory error repair — We proposed LTFix, the first system combining LLMs with typestate-guided program analysis for codebase-level memory error repair. It fixes 37 out of 49 real memory errors (94.7% more than SWE-agent) using ~1/42 of the tokens, and successfully repaired 3 zero-day vulnerabilities.
[C13: FSE '26]
- API specification generation — SpecGuru uses hierarchical LLM inference with self-validation to automatically generate points-to specifications for C library APIs, enabling effective alias and taint analysis without library source code.
[C11: ICSE '26]
Code Embedding and Vulnerability Detection
- Value-flow-based code embedding — We proposed Flow2Vec, which preserves interprocedural alias-aware value-flow transitivity via matrix multiplication and CFL-reachability, improving code classification by 21% F1 over code2vec/code2seq. This work received the ACM SIGPLAN Distinguished Paper Award.
[C2: OOPSLA '20]
- Path-sensitive code embedding — We proposed ContraFlow, a contrastive learning approach on value-flow paths that achieves 83% F1 for vulnerability detection with up to 450% improvement in vulnerability localization metrics.
[C3: ISSTA '22]
- GNN-based vulnerability detection — We proposed DeepWukong, using graph neural networks on program dependence graph slices, achieving 97% accuracy and 96% F1 across 10 CWE types.
[J1: TOSEM '21]
- Evaluating learning-based detectors — We proposed bug-triggering path (BTP) metrics, revealing an 85% IoU gap between learning methods and traditional static analyzers in vulnerability localization.
[J2: TDSC]
← Back to homepage