Does Coding Style Really Survive Compilation? Stylometry of Executable Code Revisited

Authors: Muaz Ali (University of Arizona), Tugay Bilgis (University of Arizona), Nimet Beyza Bozdag (University of Arizona), Saumya Debray (University of Arizona), Sazzadur Rahaman (University of Arizona)

Volume: 2025
Issue: 3
Pages: 349–360
DOI: https://doi.org/10.56553/popets-2025-0102

Download PDF

Abstract: This paper describes a replication study of influential recent work on binary-level code stylometry by Caliskan et al. [8]. Using the Google Code Jam (GCJ) dataset that the original work used but with possible differences in authors and tasks, the accuracy results we obtain are significantly lower than those originally reported. An analysis of the features that contribute most to author classification decisions indicates that such features may, in many cases, be accidental artifacts---e.g., due to erroneous disassembly of data bytes embedded in the binary---and have little to do with programming style. Our results suggest that binary-level code stylometry. (1) is more sensitive to code characteristics than previously suspected; (2) can be significantly less accurate than previously reported (for 100 authors, we achieved approximately 63% accuracy, compared to the 96% reported in the original work); and (3) deserves careful attention to accidental artifacts arising from the compilation and stylometry toolchains. We found 29/33 of top ndisasm-based features resulted from erroneous disassembly. Our analysis revealed that this might cause the model to pick spurious features, i.e., the original file name, as the g++ compiler embeds the filename of the source CPP file into the binary -- which might unknowingly inflate the results.

Keywords:

Copyright in PoPETs articles are held by their authors. This article is published under a Creative Commons Attribution 4.0 license.