Usability Study: Extension

This section presents more details about our usability analysis. First, we explain our filtering criteria for excluding security-irrelevant repositories. Then, we provide some case studies to illustrate the unique in-the-wild cryptographic misuse found by LLMs.

1) Filtering Criteria

We initially select GitHub repositories based on their popularity. However, upon manual inspection, we discover that many of these repositories are not directly related to security issues. Consequently, we exclude repositories identified as follows:

A representative example is the Chaos repository (chaos), a social coding experiment that automatically merges all public updates. Maintainers of such repositories are typically indifferent to the security of their projects. Therefore, scanning these repositories is an inefficient use of resources, both in terms of time and cost.

2) Results of SATs

In our analysis of 47 recognized cryptographic misuses, we further investigate the efficacy of traditional detectors in identifying these confirmed vulnerabilities. Specifically, we employ CryptoGuard for Java targets and LICMA for Python targets. Our findings reveal that CryptoGuard and LICMA only manage to detect 3 and 2 of the actual misuses, respectively, in contrast to GPT-4's superior performance. The notably poor performance of LICMA can be attributed to its limited set of detection rules, comprising merely six. In contrast, the shortcomings of CryptoGuard align with the detection capability gaps elucidated in Section II.

3) Discussion on Real-world Cryptographic Misuses

We present a comprehensive list of detected cryptographic misuses in Table 1. Note that we obfuscate the descriptions of the misuse cases which are not currently repaired. From our interactions with developers, several critical observations have emerged.