How secure is The Code Registry?

We understand the importance of security when it comes to code, and it’s critical that our users feel assured that when they sync their IP’s code to The Code Registry that it’s as secure as can be.

This article may get a bit technical, but we want to be very transparent about how our infrastructure works and what steps are implemented to make sure it’s secure.

There are a few moving parts in our platform, and they are all hosted on Azure. We’ll cover them all here!

Azure

First, Azure itself. One of the primary reasons we chose Azure as our cloud platform was security, because we planned for good security practises right from the start.

Anything that is stored on a disk is encrypted at rest, meaning that if anyone was somehow able to get access to a storage disk with files on it related to our platform, they would be encrypted and unreadable without the secret decryption keys.

https://learn.microsoft.com/en-us/azure/security/fundamentals/encryption-atrest

All of our analytics functions and the main web app run on Azure App Services (more details below) and Azure comply with the following security requirements for these:

  • Our app resources are secured from the other customers’ Azure resources.
  • All server software is regularly updated to address newly discovered vulnerabilities.
  • Any important data (such as connections to a database or app settings) are stored encrypted, and are only ever transferred within the secure Azure network.
  • Any connections we make to the apps (for debugging, deployments, maintenance etc) are encrypted.
  • 24-hour threat management protects the infrastructure and platform against malware, distributed denial-of-service (DDoS), man-in-the-middle (MITM), and other threats.

More information about Azure’s security:

The web app (the bit you login to)

This is built using the Laravel PHP framework and because all of the heavy analysis is done in private backend Python functions, the web app is more like a thin client that retrieves and displays the results to you (the user) and therefore is low risk.

On top of this, there are the following relevant points to show what additional steps are being taken:

  • The entire web app and all internal communications (i.e. to talk to the chatbot) are secured with HTTPS
  • Our platform is entirely “passwordless”, which was part of our plan from day one. We wanted to be at the forefront of web app security and this is the way to do it.
    • You can only login to our system with either a passcode emailed to your account email, or a secure passkey tied to the device you’re logging in on (which uses your device pin or biometrics to authenticate you).
    • This removes an entire layer of potential password based vulnerabilities.
  • Our entire web app is regularly scanned by AppCheck, who are a “Complete Enterprise Security Testing Solution”. We regularly review findings from those scans to see what needs to be actioned.
  • All internal identifiers (i.e. project ids, code vault ids, user/team ids etc) are completely random, using UUIDs instead of incremental database IDs.
  • All private secure code repositories (see below) have randomised names, with no obvious links to the teams that they belong to.
  • We intentionally only store minimal identifying data for each user. We only store each user’s email address, as required to log them in. They can change the name to anything, and the team name can also be anything.
  • We never publicise (without permission) the clients we work with, so nobody will know to try to access Company A’s IP by getting into our platform.
  • The data we store in the database about each code vault is only the results of the analysis, definitely not the full codebase or anything about the file or folder structure within it.

The above means that if someone somehow got access to the web app’s system, and somehow accessed the encrypted web app secrets to be able to access the private database (see below), there would be nothing useful in the database.

Analytics functions

We have various analytical functions written in Python, which are deployed onto Azure using their Azure Function App service.

These functions take care of all of the core code analysis that is needed to provide insights through the app.

As well as Azure’s assurances which you can see above, by their very nature these Python functions are low-risk, for the following reasons:

  • They are only ever triggered by queues, not from external sources or the general public.
    • I.E. when we need to analyse a codebase’s complexity, we securely add a message to a queue, the “complexity” function reads this message and does it’s analysis. The queue message simply contains the code vault’s ID, which is randomised (UUIDs, not incremental IDs – see below).
  • Any keys or secrets they need are stored encrypted (using App Service Environment Variables).
  • Most of the functions are “consumption” based, meaning that they are not always running. They are only ran when needed based on demand. So they’re not sat there ready to be attacked by a hacker.

Data storage

We have a central data store for account information, team information, project / code vault information and the results of our analysis.

We don’t store anything relating to your actual codebase in this data store – i.e. the folder structure – only the results of our analysis.

We also store our AI insights in this data store but we never pass your full codebase to our AI engine, only the results of our analysis. For example we’ll only give our AI engine the detected languages, or the top 3 most urgent security issues.

The data storage is in a MySQL database secured and managed by Azure with regular backups and disaster recovery solutions in place.

And as it’s Azure, all of the data is encrypted at rest, meaning nothing is stored on disk in plain text.

How we securely sync your code

When you setup a code vault with us, we securely mirror your code from your original code source to a private Azure GIT repository. We then do all of our analysis from that secure repository.

We do this for a few core reasons:

  • It means we are interacting with your original code source as few times as possible
  • We can make use of Azure’s scalable infrastructure to analyse the code as fast as possible, without putting strain on any of your systems

The goal is that we don’t get in the way of any existing development processes or development teams but can still provide our full suite of analytics data and insights.

When you enter your file archive or GIT credentials into our platform, they are securely encrypted using a completely separate method to all of the above. The encryption key is generated at random and unique to the production web app, and then store encrypted using Azure’s app settings.

So if anyone somehow got access to our web app’s database, none of your code source credentials would be readable.

We have no code within our platform that writes to your code source, we only ever read from it.

And we only access the code source at these points:

  • At initial code vault creation, to securely mirror your code and analyse the GIT history, and code contributors
  • When a manual or automatic code replication is triggered (i.e. weekly, monthly or clicking the “Update code” button).

Outside of these points the original code source is never looked at.

Azure DevOps repositories

The Azure DevOps platform is secure by design. The repositories we created are named randomly and nobody but our Azure hosted apps can access them.

Azure have lots of nice information about the DevOps security here:

Our AI integration (Ada)

Under the hood, our AI engine uses an OpenAI GPT integration.

But we assure you that we never send OpenAI the full codebase or the full details of your codebase.

We currently use our AI integration to do two things:

  • Generate AI insights about what data we’ve analysed from your codebase
    • For this we only send the AI engine the minimal amount of data to be summarised, enough to generate insights that can provide value to our users.
  • Provide a chat interface for our users to be able to discuss their projects
    • For this we define lots of custom functions and tools for the AI integration to call, but it can’t access your entire codebase. For example when you ask about file types or languages, we are passing the AI integration the data we’ve already analysed (a list of languages or file types).

Ready to get started?

Our simple sign-up process takes less than 5 minutes, once we’ve replicated your code and created your dedicated IP Code Vault you’ll be able to start understanding more about your code immediately!