Usability Study: Extension
This section presents more details about our usability analysis. First, we explain our filtering criteria for excluding security-irrelevant repositories. Then, we provide some case studies to illustrate the unique in-the-wild cryptographic misuse found by LLMs.
1) Filtering Criteria
We initially select GitHub repositories based on their popularity. However, upon manual inspection, we discover that many of these repositories are not directly related to security issues. Consequently, we exclude repositories identified as follows:
- Tutorials and Textbooks: These repositories contain numerous legacy standards and educational content (e.g., tutorial).
- Local Projects: Repositories explicitly designated for use in local environments, where security implications might be less critical (e.g., localstack).
- Exploitation Toolkits: Repositories designed for offensive cryptographic operations (e.g., exploit).
- Experimental Projects: Other experimental projects that do not prioritize security.
A representative example is the Chaos repository (chaos), a social coding experiment that automatically merges all public updates. Maintainers of such repositories are typically indifferent to the security of their projects. Therefore, scanning these repositories is an inefficient use of resources, both in terms of time and cost.
2) Results of SATs
In our analysis of 47 recognized cryptographic misuses, we further investigate the efficacy of traditional detectors in identifying these confirmed vulnerabilities. Specifically, we employ CryptoGuard for Java targets and LICMA for Python targets. Our findings reveal that CryptoGuard and LICMA only manage to detect 3 and 2 of the actual misuses, respectively, in contrast to GPT-4's superior performance. The notably poor performance of LICMA can be attributed to its limited set of detection rules, comprising merely six. In contrast, the shortcomings of CryptoGuard align with the detection capability gaps elucidated in Section II.
3) Discussion on Real-world Cryptographic Misuses
We present a comprehensive list of detected cryptographic misuses in Table 1. Note that we obfuscate the descriptions of the misuse cases which are not currently repaired. From our interactions with developers, several critical observations have emerged.
- A significant root cause of real-world cryptographic issues is the discrepancy between developers' experience and the evolving standards in cryptography. Many misuses stem from legacy components that are not adequately maintained as time goes by. For example, the widely recognized project Elasticsearch, which boasts 68.2k stars, was managing a deprecated vulnerable TLS protocol in 2019. However, this management was not extended to TLSv1 when it became deprecated in March 2021, as per IETF RFC 8996, until we reported the oversight. It is not uncommon for maintainers to express that the rapidly evolving security standards pose a burden, particularly in the absence of a dedicated professional security team. This situation underscores the need for researchers to focus not only on the detection of misuses but also on developing automated solutions for their repair.
- The majority of developers respond positively to reports of cryptographic misuse when supported by authoritative references. For instance, the developers behind Twisted initially contended that 1024-bit DSA keys were acceptable. However, they quickly revised their stance upon our presentation of the explicit deprecation guidelines from NIST SP 800-57. Further discussions revealed that even OpenSSH, a leading SSH protocol, continues to employ some outdated configurations, which are then replicated by downstream applications.