The Power of Open-Source Software in Cyber Security

Gustavo Silva
Security Engineer @ Paddy Power Betfair

PROGRAMMING

The beauty of open source software is that it allows you to create, experiment and transform code, and even give it a higher purpose. After discovering and deep diving into a new and exciting security scanning tool, with the help of our engineering team, we began making this tool into something more. What initially could have been used for red-teaming, bug bounty hunting or hacking in general was transformed into a tool that can help blue teams defend against the bad guys better.

A bit over a year ago, Paddy Power Betfair's Application Security Engineering team started an endeavour to adapt a trending, secrets scanner, called shhgit. On a daily basis, our team works on creating and implementing the necessary tools to ensure applications are developed and delivered with the best quality standards to existence. As soon as we learned more about this tool, we all immediately saw its potential to help us raise awareness and proactively reduce the possibility of leaking sensitive tokens and secrets into source code.

What is Shhgit?

Shhgit, is an open-source tool that finds committed secrets and sensitive files across GitHub and its Gists in real time. Simply put, it makes heavy usage of GitHub's API to find public code repositories containing leaked secrets or files. It was developed to raise awareness and bring to life the prevalence of this issue.

Underneath this high-level definition, shhgit is using a simple regular expression engine to match patterns the user can define against every line of code that exists in that repository. Once a user pushes code into a public repository, the application is triggered to perform a full scan in that repository and emit alerts on findings. The base and default configuration contain over 130 signatures, things like AWS Keys, Google Cloud's or SSH keys. It can detect all these things and many, many more. Some of these secrets, follow specific formats (like the ones mentioned), and that can used to our advantage, to create tools that can detect them.

Why use a secrets scanner?

Data leakage is one of the most common threats companies face, as well as any other software development project, for that matter. Leaking a secret token into the public might not necessarily mean a breach, but it surely facilitates malicious attackers to leverage that knowledge when choosing vectors of attack. The matter is so relevant that lots of tools focus on detecting and alerting when sensitive data is exposed. Amazon´s AWS GuardDuty is probably one of the most well-known examples out there.

Highs and lows of scanning tools - how we got to develop the blue team version

While this tool provided great visibility and information over sensitive data made public, it only sent alerts once the secret was already pushed into the public. Furthermore, it was made so that it sent alerts via a web application and would not easily integrate with our existing notification system. On the other hand, the tool was not ready to be set per repository. That was a key feature we felt required so that we could drop the false positive numbers we could potentially get when using this tool. If each development team could specify the patterns that are relevant to them, or the files that should be ignored, that would ensure that the tool would only notify us when trouble came about. This level of customization was not possible in shhgit, and it is typically hard to find in many other existing solutions.

While these tools provide info about potentially sensitive data made public, they are reactive…... Most often, they cannot be customized, making it difficult to be used. They fail to adapt to the specifics of a project, or a company.

While the concept was good, these results gave us the push to understand the tool´s core engine which turned out to be really simple: Regex matching and string comparisons. This single configuration file allows the customization of all options relevant to the scanner. Moreover, we wanted repository-level configuration, so that the tool could adjust itself to the specific needs of each project.

Therefore, we optimized the tool and basically created a blue team focused tool, from scratch, written in another language, reusing most concepts from shhgit, closely integrating our systems and development practices.

How it works

With the way the tool currently works, it allows developers to set-up a secret scanner directly in their GitLab's repositories which will scan new code additions in new merge requests. It can be seen like a step in our continuous integration (CI) pipeline and is an attempt to proactively try and stop developers from accidently leaking database credentials, email addresses, AWS keys or other sensitive data into source code. The base configuration holds formats for pretty much all the tokens you can think off, but here comes the main plus of the tool: you can customize it to your needs, including files and paths to ignore - like test directories and test files, at repository level. This ensures the tool adapts to your project, and not the other way around.

When the tool starts a new scan, (I.e., a new merge request was opened), it searches the repository for a configuration file and if it exists, it will use that instead of the default one. Do you want to ensure no company email is leaked into your source code? Add a regular expression with your company's domain and there you have it. Do you use a tool that has a predictable token format, and it is not there in the default configuration file? Add it in yours! Don't want to run the scans in your test files? No problem, add them in the ignore lists.

For the time being, the project has a little roadmap of features we would like to implement in the near future. These are extensions from the base idea, in hopes that more people can benefit from this project, and also to ensure the tool efficiency.

The first step is, naturally, to integrate the scanner back in GitHub. The platform contains numerous projects, both private and public, and the tech community has a huge presence there. Therefore, we think integration in there is a must. On the efficiency and portability, we want to work on implementing key entropy detection (to try and discover secrets from word entropy, particularly relevant for really sensitive or critical projects), customizable notification level (as in, blocking merge requests or simply alerting developers), and, finally, officially publish a Docker image of this project to facilitate its integration in development teams.

At Paddy Power Betfair, we always work on improving our tooling to ensure a safer software development life cycle. With all our knowledge, research and hands-on experience we believe this tool really fills an existing gap. Luckily, Paddy Power Betfair encourages and allows teams to be openly curious, build new things and grow, both the people and the company. For this I am wholeheartedly thankful. Not everywhere could we find such support to work on this tool and publish it as open-source software.

Since the tool's origins are open source, we wanted to keep it that way. We saw this as an opportunity to give something back to a community that gives so much to developers and teams.

We would love to see inputs and contributions from the tech community, so feel free to contribute to this project. If you are interested, please check the repository for further information on how to get started.