The Swyx Mixtape | [Tech] The Origin of MongoDB

[Tech] The Origin of MongoDB - Dwight Merriman

January 18, 2023 / 18:45/E502 Download MP3

Dwight Merriman is CSO and Cofounder of MongoDB

https://podcasts.mongodb.com/public/115/The-MongoDB-Podcast-b02cf624/f96bd55f

Transcript

Michael Lynn: Welcome to the show. My name is Michael Lynn and this is the MongoDB Podcast. Thanks for joining us. Today on the show, Lena Smart, Chief Security Officer of MongoDB, and I team up to interview Dwight Merriman, co- founder and key contributor to MongoDB. Dwight Merriman is a true tech legend. In addition to co- founding and co- creating the MongoDB database and 10gen now called MongoDB, the company. He also co- founded and led several other well known successful companies including Business Insider, DoubleClick and Gilt Groupe. In today's interview, Dwight shares openly and honestly about the motivations behind creating the database, which now actually claims nearly half of the entire NoSQL market. He talks about the decision to build the database rather than use something that existed at the time. Dwight's friendly, easy to talk to, knowledgeable, and probably one of the smartest individuals that I've had the pleasure of chatting with. Without further ado, let's get to the interview. If you enjoy the content, please consider visiting Apple Podcasts or Spotify. Leave a rating and a comment if you're able, let us know what you think. Stay tuned. Hey, did you know that MongoDB University has been completely redesigned? That's right. Hands- on labs, quizzes, study guides and materials, bite- sized video lectures, programming language specific courses. You can learn MongoDB in the programming language of your choice, Node. js, Python, C#, Java, so many more. You can earn that MongoDB certification by validating your skills and leveling up your career. Visit learn. mongodb. com today.

Lena Smart: So it is my absolute pleasure, and I'm so glad that you could make it in person today, to introduce Dwight Merriman. He is the first CEO of MongoDB, and you were still coding, I understand. You're also co- founder and director of MongoDB as of today. Are you still coding?

Dwight Merriman: I'm still coding or tinkering a bit myself, but not on the database anymore. I think there's, to really dive in and work on it, there's a certain minimum number of hours a week you have to work on it, just to keep up with the code base and the state of everything, because it's not short, it's not a small program anymore.

Lena Smart: Amazing. And also in the room we have Mike Lynn, who's our developer advocate, and I know that you'll likely have some questions.

Michael Lynn: Yeah, for sure.

Lena Smart: And just fire ahead, because probably this will be the most interesting person I'll speak to in a inaudible too.

Michael Lynn: Well I'm fascinated already and I've got so many questions for Dwight, but I'm going to let you go ahead and ask away.

Lena Smart: Cool. So the first question I have, and this has been a burning question of mine since I joined three and a half years ago, is how did you start the company? How did you start MongoDB?

Dwight Merriman: Right, so when we started, actually the name of the company was 10gen, and this was around 2008, or I forget the date, maybe two months before that, I can't remember. The original, what we were really looking at, at the time, is as myself and our other co- founders like Elliot and Kevin, we've been working on various entrepreneurial projects, and we were seeing this repeated pattern where over and over. New product idea, you start building the system. At this point, I've been doing that for quite a long time. So knew what the best practices were at the time. But it was always around that timeframe, January, 2008, whenever it was, it just seemed like it was always a bit awkward. There was awkward and un- anesthetic, and it just seemed like there was a lot of duct tape and rubber bands. And even though those were best practices. You would talk to CTOs at the time, and they would say things like, " Putting memcached in front of databases is okay, and roll your own sharding in front of my MySQL sequel or Postgres is okay, but it isn't. It was because there wasn't a better way at the time. And everything, that was really when the cloud computing EC2 was really taking off. So it was very clear to us that cloud computing was the future, and a lot of the traditional products weren't very cloud- friendly. So if you have a database that scales vertically, so I can make it bigger, but then it's a mainframe, or a Sun 6500 or something like that, that's the opposite of a cloud principle, which is horizontal scalability and elasticity. And then if you tried to do it the other way, horizontally, it was usually rolling your own when it came to operational databases. And a lot of other things, but also just agile development was the way to go then, all iterative development. But a lot of the old tools, and this isn't just databases, but languages, everything, weren't really designed for that, because they were invented earlier. So it's not their fault. So we were just saying, " Gee, there's got to be a better way to develop applications," and this is both on the how to develop them, how to code them, and also on how to scale them, and how to run them in the cloud painlessly. So our first concept was just we were going to do platform as a service. So we were going to try to do a fresh take on the developer stack, versus LAMP and whatever else was common then. And see what we could come up with. So we started building a platform as a service system. It was open source and this was very early. So I think when we went to beta, it was almost exactly the same time that Google's, was it Google App Engine?

Lena Smart: Yeah.

Dwight Merriman: It's the same time it came out to beta. So our timing was, it was like when they came out with it. And I was like, "Oh, okay, somebody there's thinking similar thoughts." And so that was fine. But a few months later, as we got a little further into it, I was thinking about it and I was like, I'm looking at things like AWS, where they have all these microservices. And they're like, " I'm not going to give you a full cloud platform. I'm going to give you some building box for your toolbox, and over time I'll give you more." Because the scope is large, so today they have a lot of services, but this, we're 15 years later- ish. So if I give you a platform though, to give you everything you need really, it's a big scope, and it's going to take quite a while to build it. So I think platform as a service makes sense, but we got further into it, and we had something working analogous to Google App Engine, or I guess, Heroku was around back then. It just felt like, " Boy, to get this true maturity, there's so many pieces that you would want in it. It's going to take a long time. This is, it's going to take a decade or something." And for a startup you only have so much runway. And it's now even today platform as a service, I think, is a valid notion and concept, but it's certainly not mature yet. The more AWS style or microservices- style approach, which you could do on all the big cloud platforms today, I just, I say AWS because I'm just contrasting it with the PaaS vendors back in the day, approach is still the dominant approach. So we've been building this, and really what were we building? So we're trying to build something where you'd write some code, you put it in inaudible, then you would just click Deploy. And it would deploy your app into our system in the cloud, try to handle scaling for you, including things like app server layer, app tier, how many app servers should there be, and low balancing for that. All this is just happening automatically. You don't have to think about it at all. So it's really trying to eliminate a lot of the operational overhead. It's just, give you a platform. It's like, " Here's my app, I've written all the code, deploy it." And it just happens, and you don't think about machines at all. So this is an aspiration. Obviously what we built, there's a little bit about machines, if we look at today with MongoDB and sharding, and things like that. I mean we do have things like Serverless, but we also have things like sharding where, as the person developing a system, how many shards you have, you can change it, but it's not like it's just completely opaque in that sense. And likewise in your replica sets, have control over how many copies of things there are. But conception, that was the path. We were looking at completely elastic, serverless too. But as we looked at it, we also were thinking about what would we want if we were building a new app or system. And there's certain features I wanted from the data layer, and if you really went to something that was just 100% inaudible, infinitely scalable and so forth, you're getting into things that were more like the early Amazon Dynamo stuff, where they're more, at least back then, it was just more a key value store, key document store, if you will. You didn't have the rich database functionality. So we didn't want to throw out tons and tons of data layer functionality. So our approach was, it had some traditional elements to it, but then we tried to innovate on those. And it's like, yes, it's sharded, but it's auto- sharded. You can, it'll do it, you don't have to write it yourself. And the replication, it's still replication, but it's a lot more sophisticated than the traditional just primary- secondary model, and push button on a lot of these things. So we've been building this platform, we had the app layer, data layer, and then it's just like, " Gee, this is such a large scope for a startup." We didn't have many people at the time, and it was maybe I feel like we should just do one or the other. We should do this, the app layer of the platform, or the data layer. So if we look back at Heroku, their data layer was Postgres, right? That's how they reduced the scope. And then in the end we decided to focus on data layer, because we were in beta with the platform.

Michael Lynn: What was the platform called by the way?

Dwight Merriman: 10gen.

Michael Lynn: 10gen? Okay.

Dwight Merriman: And then we called the data layer MongoDB. And since it was sort of a module or a component, we didn't mind using a slightly cheeky name, because it wasn't the name of the whole product at the time. But actually the background on the name, is that the concept of the Mongo is it's the middle of the word, " Humongous," and half of the point was the horizontal scalability, or easy scalability of the product. And then the other half is of developer productivity and agility. That's where the name came from. So it was the name of the subsystem. And then it's like, " Okay, that's all we're going to do now, instead of the whole platform." So there was a pivot if you will, which we did very early. Things were going fine, but we were getting very good feedback on the beta of the platform. But I was just thinking ahead in how this plays out. And it was like, " This is a lot to do." And also the rate of the adoption of that model. But then thinking about, " Well, do we do the app layer or the data layer to cut the scope?" We were getting really good feedback on the data layer of the platform from the beta testers. So they were like, " Hey, I really like this." So that helped us feel like, " Okay, maybe let's just take the data layer, let's un- bundle it from this platform as a service- thing and just make it a database, open source database, you could run anywhere." And so we just pulled it out of the code base so it was its own thing. And then it's like, " Well, I guess we need to write some drivers." So we spent a month or two running drivers, and then we released version 0. 9. And then it was just all we were working on, was MongoDB, and that was the company.

Michael Lynn: What drove the decision to go open source?

Lena Smart: Mm- hmm. That was going to be my question. Thank you.

Michael Lynn: Sorry.

Dwight Merriman: It seemed pretty clear to us that the traditional enterprise model was changing. And obviously there was a lot of things that were open source at the time. There's a lot of things that were SaaS, and then there's some things that were freemium, that seemed like the options that people were doing for new stuff, were those three. They weren't the classic enterprise software. They were maybe free. For example, I hope, I don't get this wrong, but I think Splunk, it was free for a small amount of data, and then it turned into more enterprise software. And then of course you had any things that are SaaS, or maybe you call it infrastructure as a service, you pay for what you use, and then there's just the open source stuff. So we felt like, " Okay, we are a startup, how do we get awareness, branding, adoption?" People that try it as a startup, they're very big companies. Some of the biggest companies in the world have databases, and how do we compete with them? How do we compete with Oracle, how do we compete with Amazon? Things like this. And it seems like the open source is the asymmetry there that lets you compete with them. At the same time, it was clear that things were moving into the cloud. So when we're thinking about open source licenses, obviously you could go all the way down to BSD license, it's just free, and that's great if you're, especially for a community project. But we had investors and things like that. So we need a way to have revenue eventually, we wanted a license with more like a copyleft. It's like GPL. But with everything moving into the cloud, the traditional GPL copyleft doesn't really work. So this was clear enough to us even in 2008. So we made the license AGPL. I think, it was one of the first projects that was AGPL, and it seemed like that was the right way to go at the time. And I felt like, I was CEO at the time, so I was pretty involved in the decision. So it seemed like, " Well, if it's a problem, we can always just dual license it and with another license that's more flexible." You can't go from a very-

Michael Lynn: Permissive?

Dwight Merriman: Yeah, permissive license to a less permissive license. But you can go the other way, because you could still keep the other license available if you liked it, and you don't want to even go read the new one. But then you could dual license and have something more permissive. So I thought we can always go more permissive, we can't go less permissive really. And then three years ago, we actually switched the license from AGPL to this new license called SSPL, Server Side Public license, which is, it's super similar to AGPL, but if you did a inaudible on it, it's only a couple sentences are different I think. But this was a big decision we didn't take lightly, because obviously all the old releases are still available on AGPL. So it was just on a forward basis, it's like, " Let's use this SSPL thing we came up with." Which is just basically saying if what you're building is just purely a database, like a general purpose database, then you're subject to the copyleft. And this was coming out of some analysis of AGPL, and it was not totally clear that it did what the original intent was, that it totally worked legally. So we thought we needed to do that. That did push the product and the license into a slightly gray area, where there's a classic definition of open source software. Which is, there's no restrictions on how you can use it. So with GPL, you triggered a copyleft by distribution. It's not how you're using it in your application with this, it's actually, well it sort of triggers on how you use it. So if you're doing something like Amazon RDS with the MongoDB source code, it would trigger.

Michael Lynn: So it's offering it, offering your software as a service?

Dwight Merriman: Yeah. Basically Mongo as a service, and if you offer that, you can do it with SSPL, but then you trigger the copyleft, and you have to release your code just like you did with GPL. So you could still do something like inaudible version of Mongo if you wanted it as a service. So it was really a response to things, where the cloud providers, not just Amazon, I'm not trying to pick on them, but with RDS, they're just taking every open source database, and they're making a nice wrapped management layer on it. But then it's like, no, we don't have any direct customers anymore And they wouldn't be paying us, I think. So that was the notion. So it gets gray then, and a purist might say, " Well, that's not open source." But I think in practice it's completely practical. If you're doing applications, you can definitely use it for free and without any encumbrances. So I think the whole notion of how we define open source, and the licenses inaudible, and the definition thereof, I think is, right now, it's in a transitional stage, where it needs to be iterated on. Because I love open source, but given these cloud models, and if you wanted to do anything that had a copyleft, it just doesn't, the old ones don't work anymore. So now we've seen, since we did that, many other projects have done similar things. And I think from some of the standards bodies, why we predict we're going to see some new things that are in the spirit of that. But were definitely not available when we thought we needed it, because we talked to them, and the speed of motion wasn't working for us. So I think in practice, basically nothing changes. You're making an app, you want to use MongoDB, you know you can use it for free. Your code is your code, you don't have to release it, or anything. You haven't triggered a copyleft there. In practice, I think it works great. But if you're an open source specialist, theorist, you write licenses and stuff, you might quibble.

Lena Smart: That was fascinating.

[Tech] The Origin of MongoDB - Dwight Merriman

Broadcast by

headphones Listen Anywhere

Listen Anywhere