Home PC News Understanding open source databases

Understanding open source databases

An open source database is just a regular database that’s distributed with its source code.

Users can read, revise, and extend the software freely, although few use these opportunities. The most attractive feature for many may be the right to run it anywhere on any hardware at any time. The source code is a common resource for all programmers to use as they see fit.

It’s not that there’s anything different about the architecture, the language, or the feature set that defines open source databases. Indeed, many of the open source options speak a version of SQL just like their proprietary cousins.

The license has always been attractive to managers who have gone through relicensing negotiations with proprietary software vendors. If the source code is not shared, their only option to fight a big price increase is shifting to another product, which often includes plenty of rewriting.

But there’s no free lunch. The gift of the source code can come with several catches, some explicit and some implicit. There are a number of open source licenses. Some place hardly any restrictions on the user, and some insist that a user share any enhancements in return, essentially ensuring that the common code remains open for all.

Another obligation that’s not explicitly stated but is painfully obvious to users is that someone must pay the developers. Some companies that use open source databases hire people to contribute to the code base. Instead of buying a proprietary license, they pay through salaries. The companies that choose this path tend to praise the control over the code base that they gain.

Many open source databases are released under a hybrid model. Some will create two different versions where the simpler, more general code may be called something like “community edition” and released to circulate freely. Developers exploring the technology and creating new prototypes may download it without cost.

The bills for the corporation that supports the development are usually paid by companies that grow to adopt the “commercial version”, which usually offers extra features related to working with larger data sets or supplying better security. They’re often features that aren’t needed by new developers but offer value over the long term to groups running production code.

Setting up the multiple versions and ensuring that the right features are in each is a bit of an art. Keep too many features in the commercial version, and no one will experiment. Leave too many in the community version, and no one will feel the need to upgrade and pay.

Open source databases fall into a wide range of categories, and these are largely defined by their era of development. The earliest tools, like MySQL or PostgreSQL, emulated the commercial leaders. They spoke SQL and stored data in relational tables ordered by indices and linked by JOIN routines. Sometimes they didn’t offer the same complete selection of features, but they gradually evolved to support the same style of data storage.

Later NoSQL databases, like MongoDB and Cassandra, are known for their flexible schema and document-style storage of key-value pairs. This particular class of database largely evolved as open source.

Some of the newest databases, like the ones that support ledgers or geographic data, are often hybrids with closely evolving pairs of products. One is the full-featured community edition that is freely available. The other is often called an “enterprise edition” because it contains features that support larger data sets that might need greater stability and reliability. These extra features are usually available only for a price.

How the major vendors have embraced open source

Oracle purchased MySQL in the process of acquiring Sun Microsystems in 2009, effectively recognizing the power of the open source model. They continue to both develop and support the database. Users can choose either the free edition, known as the community edition, and more advanced editions that include extra features desirable for larger companies. Backups, extra security, and cluster management are available for a fee.

Oracle also purchased BerkeleyDB, a set of key-value database libraries that are often compiled into programs. They allow developers to offload the job of maintaining data structures.

Microsoft has chosen to host some of the major open source databases on its Azure cloud. Teams that want to rely on PostgreSQL or MySQL can start up instances managed by Microsoft, saving them the trouble of configuring and maintaining the server. The price of the hardware and the curation is bundled together.

Other clouds are following a similar path. Amazon, Google, DigitalOcean, Rackspace, and several others offer options for renting fully configured servers with running versions of the major open source databases. Amazon alone offers managed versions of most of the major open source databases.

The rise of these managed instances has rankled some developers. Many of the newer product announcements from cloud computing providers offer to install and maintain the open source packages. These aren’t insubstantial tasks, but the work can be automated. This has led to some friction between the developers of the tools and the cloud companies, and these conflicts are far from settled.

The upstarts

Many of the new databases begin as open source projects. There are dozens of new companies that have released new databases under a community open source license. Most are also trying to support themselves by selling some mixture of support and extra proprietary features.

Some of the projects build on previous versions. MariaDB is a fork of MySQL started by Monty Widenius, one of the founders of MySQL. He began this new version after selling MySQL to Oracle. The early versions began with the original codebase, but the newest version has added features that speed up working with extremely large data sets. Many of the basic features and the core parts of the SQL syntax are identical because of the common heritage, and so many developers switch freely between them. In the future, the differences will probably grow. MariaDB, for instance, has added integration with popular databases like Cassandra, TokuDB, and SphinxSE.

SequoiaDB is a large, distributed database that supports SQL, key-value document storage, and direct JSON storage. The database links together a variety of nodes, and each node may be a different storage engine like MySQL or PostgreSQL. The database routes queries to the appropriate nodes while ensuring that transactions can offer ACID-level concurrency. The core is released under the AGPL, while some of the connectors are governed by the Apache license.

While many new databases are open source, not all companies are embracing the model. Fauna, for instance, chose a commercial license for its distributed database. The enterprise-friendly features target managers who must juggle data retention policies and scale quickly. New developers may not get access to the source code, but they can work with a free tier of the hosted service that places hard limits on the number of elements that can be read or written each month.

Governance issues

The control of the software is what attracts many users to open source databases. They’re willing to pay in time and salaries for what the proprietary software companies sell, often to avoid the pain that might come from vendor lock-in. Open source licenses explicitly make users full partners in control of the code.

The nature and limits of this partnership, though, continue to be questioned. Lately, several database companies have rebelled against the way that some cloud companies are bundling together the hardware and the maintenance. First MongoDB, and lately Elastic, have questioned whether this process is fair, in part because the cloud companies don’t directly share the revenues with the original company. The cloud companies aren’t violating the letter of the open source licenses, but some feel they’re violating the spirit by keeping the lion’s share of payments.

Finding a solution is not simple. Recently, Elastic’s CEO Shay Banon announced that they were shifting all new development to a more restrictive set of licenses, designed to stop large cloud providers from freely reselling their tool while not constraining end users.

“We’ve tried every avenue available including going through the courts,” Banon said in the announcement in January 2021, “but with AWS’s ongoing behavior, we have decided to change our license so that we can focus on building products and innovating rather than litigating.”

Amazon responded by announcing that it was going to “fork” the code for Elastic. That is, they would take the last version available openly and continue to maintain it themselves while reselling it in the cloud.

“Today, we offer 18 versions of Elasticsearch on Amazon ES, and none of these are affected by the license change,” Amazon’s Carl Meadows, Jules Graybill, Kyle Davis, and Mehul Shah wrote in the announcement. “In the future, Amazon ES will be powered by the new fork of Elasticsearch and Kibana. We will continue to deliver new features, fixes, and enhancements.”

There will be two paths that may or may not evolve along the same directions. The core features would probably remain the same, but users might need to align themselves with one or the other. Their code may work smoothly with both, or there may be issues. We can’t know which decisions the development teams will make.

Forks like this have developed in the past. Oracle’s version of MySQL remains very similar to MariaDB, and it seems like both companies feel it’s important to maintain close compatibility, at least in the core features and syntax.

MongoDB, another database company, was one of the pioneers of adopting a more restrictive license, the Server Side Public License (SSPL), that restricts cloud providers that aren’t its partners. It’s been releasing its product under this license since 2018, balancing the needs of the company and the users.

“We wanted to provide developers an easy way to access our product so they could use, modify, and redistribute it, all in a frictionless way. That is no different under SSPL,” MongoDB CEO and president Dev Ittycheria said in an interview.

But he also noted that the company has invested $700 million in R&D that its partners paid for. “We wanted to counter the threat of hyperscale cloud vendors taking our free product and offering it as a service without giving anything back,” he explained. “It’s been more than two years since we changed our licensing to SSPL, and it’s had no negative impact on user adoption or our success as a company.”

In other words, the company has gone out of its way to support the original vision of freedom to read, use, and modify the code, while curtailing the one class of companies that has not formed a successful commercial partnership with it. AS Richard Stallman, one of the original developers of the open source software movement, liked to put it: Open source is free as in freedom, not as in beer.

This article is part of a series on enterprise database technology trends.


VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform
  • networking features, and more

Become a member

Most Popular

Recent Comments