We did it – one billion lines of code scanned! Our AI-powered code intelligence and insights platform recently passed the 1,000,000,000 lines milestone, and we’re equal parts excited and amazed. That’s a mind-boggling amount of code – if those lines were printed out and laid end-to-end, they’d stretch over 15,000 miles (more than halfway around the Earth!).
Beyond the fun statistics, this milestone gives us a treasure trove of data about the state of modern code. In this post, we’ll share some light-hearted yet insightful findings from analyzing a billion lines – from millions of code issues and tens of thousands of security vulnerabilities, to surprising facts about programming languages and file types. We’ll also dive into why these insights matter for developers and businesses, and what trends they reveal about the software we write every day.
So grab your favorite beverage (no, not Java – unless we’re talking about the programming kind), and let’s celebrate a billion lines of code with some key insights!
AI-Generated Insights: 23,000+ Helpful Suggestions
One highlight of reaching this scale is seeing how much our AI assistant Ada has chimed in. Ada has been busy and has generated over 23,000 AI-driven insights on that mountain of code. These insights are like automated code review comments or tips – spanning everything from pointing out potential bugs, suggesting performance improvements, highlighting stylistic inconsistencies, and even offering refactoring ideas. It’s as if we had tireless robot code reviewers working 24/7, dropping 23k+ pieces of advice into our customer’s dashboards.
What’s remarkable is how helpful and varied these AI-generated insights are. For example, the AI might spot that a function can be simplified, identify duplicate logic that could be DRYed up (“Don’t Repeat Yourself”), or flag a suspicious hard-coded credential. Many insights echo what a human reviewer would say – just scaled up to billions of lines. This shows the promise of AI in assisting developers: when you’re dealing with such a gigantic codebase, having AI point out the low-hanging fruit (and sometimes the not-so-obvious issues) is a game-changer.
Over a Million Code Quality Issues (Yes, 1,000,000+)
Scanning a billion lines of code, you’re bound to find some warts. In fact, our platform uncovered over 1,000,000 code quality issues across all that code. Now, before anyone panics at that number, remember it’s spread over countless projects – and catching these issues is the first step to improving them. The silver lining is we found them; they’re no longer lurking unbeknownst to developers.
What kind of code quality problems showed up the most? Here are the most common offenders our analysis flagged:
- – Outdated dependencies – These are external libraries or packages in use that have newer versions available. Using out-of-date libraries is risky, because you might be missing important fixes. (In fact, security experts note that vulnerabilities in outdated libraries dominate many security reports – iflockconsulting.com.) We saw tons of instances where a project was stuck on an old version of, say, a web framework or utility library, in need of an update.
- – Code duplication – Lots of copy-pasted or repeated code. It turns out many projects have the same chunk of code in multiple places. This isn’t just a stylistic concern; duplicate code can make maintenance harder (fix a bug in one place, miss it in the copy) and bloat the codebase. Our AI was quick to suggest “Hey, you’ve seen this before – maybe refactor it into a common function.”
- – Insecure URLs – We found many instances of URLs in code using plain
http://
where they should likely usehttps://
. These “insecure URL” issues are important to flag – an unsecured connection can expose data in transit. Often it’s as simple as an API endpoint or CDN URL that has an HTTPS option available. This was a surprisingly frequent find (apparently, old habits – or old URLs – die hard). - – Overly complex nested logic – Some functions were basically logic mazes: deeply nested
if/else
statements or loops within loops within loops. Code like that is hard to read and even harder to test. It’s no wonder our platform raised alerts for functions that looked like a plate of spaghetti. Simplifying complex logic can reduce bugs and make life easier for the next developer who has to understand that code.
Seeing these patterns at such scale is actually reassuring – it’s not your project alone that has tech debt; it’s a widespread challenge. The most common issues tell us that many teams struggle with keeping dependencies up to date, avoiding duplicate code, using secure practices, and managing complexity. These are fundamentally solvable problems: for example, introducing routine dependency updates, using linters or formatters to catch insecure links, and refactoring gnarly functions could drastically cut down that million-plus count.
165,000+ Security Vulnerabilities (53k High, 107k Medium Severity)
Now onto something more sobering: we detected over 165,000 security vulnerabilities in the codebases we analyzed. Yes, six digits. These include everything from minor informational warnings to critical security flaws. To break it down, roughly 53,000 were classified as high severity issues and about 107,000 were medium severity, with the remainder lower priority. In other words, tens of thousands of serious security bugs were lying in wait in that code – things like SQL injection flaws, risky use of system commands, hard-coded secrets, buffer overflows, and all the other nasties that keep security engineers up at night.
If that number sounds large, it is – but it’s also reflective of the industry at large. Studies have found that the vast majority of applications have known vulnerabilities in their code or components. For instance, a recent report found 86% of codebases contained open-source vulnerabilities (itpro.com). So our findings align with what others are seeing: most software has at least a few skeletons in the closet. The fact we found 53k high-severity issues is a reminder that many apps are one unpatched bug away from a potential breach.
On the bright side, knowledge is power. Each of those 165k vulnerabilities is something that can now be addressed – a hole that can be patched before an attacker exploits it. Our platform not only identifies these issues but often provides guidance on remediation (for example, pointing to safer functions or configuration changes). We’ve noticed trends in these vulnerabilities too: many originate from the aforementioned outdated dependencies (more on that soon), and others from overly complex code where developers perhaps didn’t realize a subtle security issue was introduced. It underscores a big takeaway: secure code and clean, quality code go hand-in-hand. When you keep your code simple and your libraries updated, you also reduce the chances of severe vulnerabilities.
Speaking of outdated components… let’s talk about open source.
Open Source Dependencies Everywhere (62,000 of them from ~4,000 Vendors)
One billion lines of code isn’t written from scratch – a huge portion of it comes from open source packages and libraries that developers pull in to build applications. Our analysis quantified just how much: we identified over 62,000 open-source dependencies in use, coming from nearly 4,000 unique vendors or communities. (“Vendors” here could mean the package maintainers or ecosystems – think npm packages, PyPI modules, Packagist packages, Maven artifacts, you name it.) That is an astonishing diversity of open-source software being used under the hood of various applications.
This finding drives home the point that modern software development relies heavily on open source. In fact, industry research shows that about 96% of codebases contain open source components, and around 70–90% of the code in a given application is open source (intel.com). Our numbers back that up – essentially every project we scanned had a long list of third-party libraries attached, and often the majority of the “code” came from those external sources. We truly are standing on the shoulders of open-source giants.
Why does this matter? For one, it highlights a major supply-chain management challenge: every one of those 62k components is a piece of someone else’s code that could have bugs or need updates. Keeping track of them is non-trivial. (No wonder the average application has around 900 dependencies according to one study – itpro.com.) It also shows how innovation is shared – developers aren’t reinventing the wheel for things like web frameworks, authentication, data parsing, etc., which is great for productivity. But it means developers and businesses must be vigilant in tracking what open source they use. With thousands of distinct packages (from thousands of different maintainers), you need good visibility to ensure you’re aware of updates, security advisories, and license terms for each one.
On the positive side, this rich open-source ecosystem is a strength of the tech world – it enables rapid development and collaboration. The data just reminds us that using open source is the norm, not the exception. The key takeaway: open source is everywhere, so managing those dependencies (and appreciating the work of those maintainers!) should be a priority in any development team.
Outdated Components & License Risks (37,500 Flags, 30,000 with Licenses)
Alongside identifying dependencies, our platform flagged 37,500+ components as outdated – meaning an update or patch was available but not applied in the codebase. Even more striking, about 30,000 of those outdated components carried at least one software license (typically open-source licenses), raising potential licensing risks on top of security risks.
Let’s unpack that. Outdated components are a big deal: running old versions of libraries or frameworks can leave known vulnerabilities in your app. It’s like knowing there’s a recall on your car and not taking it to the shop – eventually, that faulty part might cause an accident. Our data shows this is extremely common (tens of thousands of instances). In fact, a large-scale study by Synopsys found 91% of codebases contained outdated open-source components, with 90% of those being over ten versions behind the latest release (itpro.com). That statistic is staggering but not surprising – many teams delay updates due to fear of breaking changes or simply lack of time/visibility to do it regularly (iflockconsulting.com).
Why do outdated libraries matter so much? Because attackers prey on them. When a security flaw is disclosed in a popular library, cybercriminals know that many applications will take months (or years) to update that library – leaving a window of opportunity to exploit it. As one security consultant bluntly put it, unpatched dependencies “can become the weakest link in your security chain”, effectively an open door for attackers (iflockconsulting.com). By flagging 37k outdated components, our platform is essentially shining a light on thousands of those potential weak links so developers can address them before they’re exploited.
Now, about the license aspect: 30k of those outdated components had identifiable licenses. This implies they’re open-source (since proprietary components typically don’t have the kind of license you’d include in an inventory). The licensing risk comes when you have components with licenses that may conflict or impose requirements. Our analysis isn’t just about code health – it also helps companies ensure they comply with open source licenses (avoiding unpleasant surprises like accidentally violating a copyleft license, for example). We found numerous cases of license conflicts – e.g., two libraries in the same app with incompatible licenses, or usage of components with restrictive licenses that might require source disclosure. Industry-wide, this is a common issue: over half of audited codebases (56%) contain open source license conflicts (itpro.com). Moreover, about one third of codebases contain components with no license or a custom license, which is a wildcard that likely needs legal review (itpro.com).
In plain terms: not only do you need to update your libraries for security, you also need to track licenses to avoid legal pitfalls. An outdated component might be easy to ignore (“it still works, why upgrade?”), but if it has a known vulnerability and it’s under a license you weren’t aware of, it can bite you twice – first through a security breach, and second through a compliance violation. The good news is that by identifying those 30k licensed components, our platform gives teams a chance to review and remediate any license issues proactively. As software supply chain transparency (think SBOMs – Software Bill of Materials) becomes more important, having this level of insight is crucial.
The trend revealed: keeping software current is a widespread struggle. But with automated tools and proper policies, teams can chip away at those 37k outdated pieces. Start by prioritizing updates for high-risk components (especially those with known security issues) and establish a regular cadence for dependency updates. In the long run, it’s far easier to keep up than to play catch-up on years of tech debt.
Code Complexity: Taming the Tangled Code (JavaScript vs. PHP vs. C)
Another fascinating insight from our billion-line journey involves code complexity. We measure various complexity metrics for code (like cyclomatic complexity, which counts how convoluted the logic paths are). The takeaway: there’s a lot of complex code out there – and some languages showed higher complexity on average than others.
In fact, our analysis found that average complexity per file was quite high in some tech stacks. Notably, JavaScript files had the highest complexity scores on average. At first, this sounds surprising – JavaScript is often used for simple web scripting – but the culprit is likely obfuscation and minification. Many codebases include minified JS files (basically all the code smooshed into one line or devoid of whitespace), which skyrockets the complexity metrics. A minified file might look like one giant function with hundreds of branches – essentially a worst-case scenario for cyclomatic complexity. So, JavaScript taking the complexity crown in our findings is a bit of a data quirk: it’s less that JS developers inherently write crazier logic, and more that the analysis picked up a lot of processed/minified JS that is, by nature, hard to read.
That said, other languages weren’t far behind. Some of the PHP codebases we scanned showed very high complexity in many files – in some cases comparable to C programs. This might raise eyebrows, since C has a reputation for low-level, intricate code (pointer arithmetic, anyone?), whereas PHP is a higher-level web language. But it goes to show that any language can accumulate complexity when projects grow and evolve over years. We saw PHP files with huge classes and deeply nested logic, likely the result of long-lived projects accreting features. C code, often used in embedded or system software, also exhibited high complexity as expected (critical systems tend to have a lot of conditional handling). The complexity insight here is that complex code is everywhere, and it’s something to keep an eye on regardless of language.
Why does this matter? Because complex code is harder to maintain and more bug-prone. It’s not just us saying that – studies have shown that code complexity correlates with the presence of defects (arxiv.org). When a single function does too much or has umpteen branches, it’s easy for a developer (or tester) to miss a scenario. As an article on code metrics noted, high-complexity areas of code are “difficult to understand and maintain, increasing the risk of bugs and errors.” (blueoptima.com). In our billion-line scan, we’ve flagged thousands of files as having too high complexity, meaning they likely should be refactored into smaller, simpler pieces. Each of those flags is an opportunity for a team to improve the code’s structure – making it easier to debug and less risky to change in the future.
We also generated code complexity insights that compare complexity across modules and repositories. This can help teams identify the “hot spots” in their code where the tangles are worst. By untangling those, developers can reduce cognitive load (no more 500-line functions, please!) and thereby reduce bugs. Our finding that JavaScript’s complexity metric was off the charts (due to minification) is also instructive: when doing code analysis, it’s important to separate true complexity from artificial complexity. One practical tip is to avoid analyzing minified or auto-generated code with the same rules as source code, or at least interpret those results differently.
In summary, code complexity is not an abstract metric – it has real impacts. Our journey through a billion lines reinforced that managing complexity is an ongoing challenge. It’s a reminder to all developers: keep functions and classes focused, break down big logic, and refactor when things get unruly. Your future self (and your teammates) will thank you, and you’ll likely have fewer bugs as a result.
Language Trends: PHP Most Common, with JavaScript a Close Second
We also took a high-level look at the programming languages present in those billion lines. The results might surprise some folks: PHP was the most common language in our analysis, followed closely by JavaScript. That’s right – the runner-up was not Python or Java, but JavaScript, and the top spot went to the language that powers a huge portion of the web’s back-end: PHP.
Why is this noteworthy? Well, for one thing, it reminds us that PHP is alive and well in the coding world. In an age where we hear a lot about Python, Go, Rust, etc., it turns out a ton of code out there is still written in PHP. And this aligns with broader industry data: PHP is used by roughly 75–77% of all websites on the internet (server-side) (techjury.net). Think about it – content management systems (like WordPress, which is PHP-based), many web apps, and APIs are in PHP, and those contribute a huge number of lines. So our finding reflects that global usage – PHP has been around for decades and is entrenched in many systems.
JavaScript coming in second makes sense too. Almost every web project has some JS, and with the rise of Node.js, JavaScript is both a front-end and back-end staple. The lines of JavaScript we scanned ranged from front-end code, to Node services, to embedded scripting in other applications. Given JavaScript’s ubiquity (virtually every web page uses it), seeing it near the top was expected. (In fact, one could argue every modern application, even if primarily written in another language, includes some JavaScript, if only for the web UI.)
What about other languages? We did of course see plenty of Python, Java, C#, C++, Ruby, Go, etc. – you name it, it was probably in that 1 billion lines somewhere. But none of those single languages outnumbered PHP or JS in raw line count in our dataset. This could be because many enterprise codebases (and open-source projects) we scanned have a mix of languages, but web technologies are the common thread among them. It’s also a hint that a lot of the code being analyzed might be from web applications and platforms where PHP and JavaScript dominate.
Implications: If you’re a business or dev team, this trend underlines the importance of not neglecting older or “less glamorous” languages. For example, there’s often talk of whether PHP is “legacy” – but clearly, it’s still a workhorse for many organizations. Ensuring you have proper tools, security scans, and quality checks for PHP code is crucial (as much as for trendy newer languages). The same goes for JavaScript – which often flies under the radar in security reviews (people focus on server-side and forget their client-side code can be vulnerable too). Our platform’s ability to handle all these languages means we could catch issues across the board, whether it was a PHP SQL injection flaw or a JavaScript unsafe DOM manipulation.
On a lighthearted note, seeing PHP at #1 was a bit like discovering an old friend is still the life of the party. PHP has been around since the 90s, and yet here it is, leading the pack. As the saying (almost) goes: “PHP is dead, long live PHP!” 🎉
Files of All Kinds: 3,400+ File Types (Yes, JSON Included)
It wasn’t just source code files like .java
or .php
that we analyzed. Over the course of a billion lines, our platform encountered more than 3,400 different file types or extensions! We’re talking configuration files, data files, scripts, markup, you name it. This was a bit of a shock even to us – we expected a variety, but seeing thousands of distinct file formats reinforced just how diverse software projects can be.
Some of the surprising file types included things like JSON files, which aren’t source code in the traditional sense but often contain configuration or test data. We found huge JSON files in some codebases that define settings or dataset fixtures – and yes, our platform scanned those too (for example, checking if JSON files contained any suspicious URLs or keys, or if they were perhaps misformatted). We also saw XML, YAML, Markdown documentation files, Shell scripts, batch files, makefiles, .properties
files, Excel files, images, you name it. In modern repositories, it’s not just code – it’s a whole ecosystem of files that support the application.
One might ask, why analyze all those file types? Because issues and insights can hide in non-code files as well. For example, an .env
file (which we counted among those 3,400 types) might contain secrets that shouldn’t be committed. A JSON or YAML config might reference an insecure URL or an outdated version number. Part of reaching one billion “lines” included scanning these ancillary files for patterns of interest. It paints a more complete picture of a software project’s health.
We also noted the total size of files was enormous. In terms of sheer data, we processed tens of billions of characters of text. The total bytes of all files ran into the tens of gigabytes – truly a big data problem for code. The fact that our SaaS platform handled it is a proud point for us (shameless plug: scalability was a design goal!). It’s also a testament to why automated tools are needed – no human could read that much in a lifetime, let alone keep track of issues across them.
A fun perspective: we’ve likely scanned every file extension under the sun. From .ABAP
to .ZIG
, if it was in the codebase, it got looked at. (Yes, even a few COBOL copybooks sneaked in – legacy financial systems, perhaps?) The breadth of file types highlights that software projects are multifaceted. It’s not just one language or one kind of file. So any robust analysis solution needs to cope with that variety – something we’ve strived to do.
For readers, the key point is that our insights and stats aren’t narrowly focused on one language or file type. This was a comprehensive sweep of real-world projects, with all their weird and wonderful files included. So when we say “a billion lines of code,” it truly encompasses everything that lives in a code repo – not only the .cpp and .py files, but also the JSON configs, the documentation, the build scripts, etc.
(And yes, if you were wondering, we did exclude binary files from the “lines” count – we weren’t about to count every pixel in an image as a line of code!) 😉
Why These Insights Matter: Implications for Developers and Businesses
After partying over the big numbers, it’s time to ask: so what? What do these findings mean for developers, teams, and businesses at large? Here are a few key implications and trends revealed by our analysis:
- – Nearly every codebase has room for improvement. Over a million code quality issues and 165k vulnerabilities might sound discouraging, but it’s actually empowering knowledge. It tells us that it’s normal for software to have imperfections – and that with the right tools, we can systematically find and fix them. For a developer, knowing that things like outdated dependencies or code duplication are common across the industry can be reassuring. The important part is taking action. Businesses should foster a culture of continuous improvement: use automated scans, then regularly dedicate time to address the issues identified (before they pile up to scary numbers).
- – Security can’t be ignored – and it’s interconnected with code quality. The high volume of vulnerabilities, especially critical ones, is a wake-up call. Secure coding practices and regular security scanning must be integral to the dev process. One trend is clear: many vulnerabilities come from known issues (e.g., a library that has a CVE, or a misconfiguration that’s well-understood). These are low-hanging fruit to fix if you know about them. And many of those tie back to code quality habits – e.g., keeping dependencies updated, avoiding overly complex code that’s hard to audit, etc. Businesses that prioritize code quality are in effect also improving security. Conversely, letting code quality slip (lots of duplicate, legacy, unmaintained code) often means security holes slip in too. The takeaway: Invest in code quality and you inherently bolster security. It’s all part of maintaining healthy software.
- – Managing open source at scale is a business necessity. With tens of thousands of open-source components in play, companies need a solid strategy for open source management. That includes inventorying dependencies (perhaps via an automated SBOM generation), tracking updates and vulnerability feeds for those components, and ensuring license compliance. The data shows open source is ubiquitous – nobody can stick their head in the sand and pretend otherwise. If ~96% of codebases are open source (intel.com) and the average app has ~900 libraries (itpro.com), you either get a handle on it or risk security breaches and license lawsuits. We’re also seeing trends like regulatory pushes for software supply chain transparency, meaning businesses might soon be required to disclose and manage these dependencies formally. Getting ahead of that curve is wise.
- – Technical debt is real, but quantifiable. Those outdated components, the complex functions, the million minor issues – they’re all forms of technical debt. What our analysis demonstrates is that you can measure technical debt in various ways (number of outdated libraries, complexity score, etc.). That’s powerful for businesses to prioritize refactoring and upgrades. Instead of going by gut feeling, they can say “Look, we have 200 outdated dependencies, including 5 critical ones – let’s allocate a sprint to upgrade them.” Or “Module X has twice the complexity of Module Y and five times the bug density; maybe we need to re-engineer it.” In short, these insights give actionable metrics. The trend here is toward data-driven engineering decisions. Forward-looking organizations are using such data to drive down risk proactively rather than reacting to fire drills when something breaks.
- – Legacy code and languages stick around. PHP topping our language count and the myriad of file types tells a story: old technologies never truly die; they become legacy that we must maintain. Businesses often have a mix of legacy and modern in their codebases. Understanding that and investing in tools that can handle both is key. It’s tempting to focus only on new projects with the latest tech, but the reality (as our billion-line snapshot shows) is that a lot of value sits in older code that still runs critical systems. Those need love and attention too – whether it’s security patches or refactoring or just monitoring.
- – Automation (with a human touch) is the way forward. One simply could not achieve what we did (analyzing a billion lines, producing millions of findings) manually. The scale of software today demands automated analysis. Static analysis tools, AI-driven code review, dependency checkers, etc., are becoming essential parts of the developer toolkit. The trend is that AI and automation handle the grunt work of scanning and initial triage, and developers then use those results to guide their work. In our case, 23k AI-generated insights augmented the developers’ own reviews – that’s like turbo-charging the dev team with extra brainpower. Businesses that adopt these tools will have a huge edge in code quality and security because they can catch issues early and continuously. The key, however, is to integrate these tools in a developer-friendly way – making the insights accessible and actionable, not overwhelming. The feedback from our users has been that having these scans run in the background and then surfacing the most relevant issues (with explanations) is a productivity boon, not a burden.
- – Trends in complexity and duplication hint at the need for refactoring cycles. It was very telling how much duplicate code and high-complexity code we found. This indicates that many teams likely build features fast (copy-paste to meet a deadline, write a quick fix with a dozen nested ifs), and over time the codebase accumulates bloat and complexity. This is normal – but it reinforces the importance of scheduling refactoring and clean-up in your development process. The organizations that succeed long-term are those that periodically pay off this debt. Whether it’s through dedicated “refactoring sprints” or including refactoring tasks alongside new features, the data shows it’s necessary. Otherwise, you end up with a big ball of mud that slows down progress and increases bug counts. The trend we see is more awareness of this – thanks to tools highlighting “hot spots,” teams can no longer ignore the tangled mess in that one corner of the codebase; it’s quantified and visible.
In summary, these billion-line insights matter because they turn anecdotal suspicions (“I bet we have a lot of duplicate code” or “I feel like our dependencies are out of date”) into concrete facts and figures that can drive improvement. For developers, it’s almost comforting to see that everyone struggles with similar issues – and that there are solutions. For businesses, it’s a call to action to invest in code quality, security, and maintainability as first-class priorities, not afterthoughts.
Try It on Your Code – Join Us for the Next Billion Lines!
Reaching one billion analyzed lines is a big milestone for us, but we’re not stopping here. In fact, we’re excited about what the next billion lines will teach us (we have a feeling trends like the rise of AI-generated code, even more open-source usage, and new languages will start showing up – stay tuned!).
More importantly, we’re excited to help you gain these kinds of insights on your own code. The findings we shared are anonymized and aggregated, but every single data point came from someone’s real project – and that someone benefitted from uncovering those issues and patterns. If you’re curious what hidden issues or opportunities for improvement lie in your codebase, why not give our platform a try?
Ready to level up your code? We invite you to start a free trial of our platform. In just a few minutes, you can onboard your repositories and start receiving detailed reports on security vulnerabilities, code quality hotspots, dependency health, and much more – all powered by the same engine that scanned a billion lines (it might not be a billion lines in your case, but we’ll treat it with the same thoroughness!).
Our platform provides an easy-to-digest dashboard, interrogation of deep data, and of course, those handy AI-generated insights to guide you. It’s like having a team of expert reviewers and security analysts work through your code and hand you the findings.
Get in touch with us if you’d like a personalized walkthrough or have questions about specific needs – our team is always here to help make sense of the data and plan next steps. Whether you’re a startup with a few thousand lines or an enterprise with millions, actionable insights await you in your code.
References: (Supporting studies and sources for the statistics and claims in this post)
- – Synopsys OSSRA Report 2024: Nearly three-quarters of codebases contained high-risk vulnerabilities (investor.synopsys.com) and 91% contained components 10+ versions out-of-date (investor.synopsys.com). Open source software made up 77% of code in audited codebases (intel.com). Over half had license conflicts (itpro.com), and one-third had components with no or custom licenses (itpro.com).
- – IT Pro (Feb 2025) – Black Duck audit findings: 86% of codebases contained open source vulnerabilities (itpro.com); average application has 911 dependencies (itpro.com); 91% of codebases had outdated OSS components (90% were 10+ versions behind) (itpro.com).
- – iFlock Security Blog (Jan 2025) – Outdated dependencies are extremely common and often the weakest link in security if left unpatched (iflockconsulting.com). Teams often delay updates out of fear of breaking changes or lack of visibility (iflockconsulting.com).
- – BlueOptima Blog (Apr 2023) – High cyclomatic complexity in code “may be difficult to understand and maintain, increasing the risk of bugs and errors.” (blueoptima.com) In other words, more complex code can hide more bugs. Research confirms code complexity correlates with defect rates arxiv.org.
- – Tech Jury (Jan 2025) – PHP remains the most used server-side language, powering 77% of active websites (techjury.net). This aligns with our finding of PHP’s prevalence in codebases.
- – W3Techs Survey – PHP is used by about 75% of all websites for which the server-side language is known (w3techs.com), reflecting its enduring dominance in web development.
Want to Learn More?




