[Tech] Separation of Control vs Data Planes - Steve Yegge

Download MP3
Yegge explains Service Meshes and Control vs Data Planes
Listen to Stevey's podcast: https://youtubetranscript.com/?v=Wi8SL-Tot-8&t=1212

Transcript


so let me tell you about service meshes
kind of like the terminology just to get
everybody up to speed because i know
some of you haven't looked at this space
or haven't looked at it recently
you're going to hear two terms control
plane and data plane bandied about a lot
and it's very confusing at first okay
because first of all they are sort of
poorly named and second of all there is
actually a fair amount of overlap
between the two in the in the service
offerings that we have today all right
and in the tech stacks that we have
available so let me walk you through
them all right
so starting at the uh at the service
level so you have a bunch of services
maybe they're on vms maybe they're in
kubernetes maybe they're in nomad or
fargate or whatever right but you've got
services vms or containers and you want
to have them communicate with each other
all right
well having rather than having them all
communicate with each other
which obviously means you're going to
have to build like service discovery
logic into the service itself
so if i have a player service
let's say i have a game server and it
wants to go call the player service and
say is this player real okay if so give
me their give me their information give
me their credentials okay typical
service to service uh you know function
call rpc
all right well you could have the game
server say well i'm going to call the
service registry service to see uh where
the player service lives and then i'll
make a call to the player service right
but now you're building that i'm going
to call the service registry service
which is this other service right that
you would have to build or whatever or
use ncd like grab did or whatever
and then it has to call and get the
address of the player service and then
and then it makes the call and it's like
you've built
routing logic
and discovery logic into your actual
application logic which you do not want
you do not want that okay
so
almost immediately people started moving
to proxies
you have a proxy that's your local proxy
they call it a sidecar proxy in
kubernetes land because it actually runs
in your little cluster as another
service along alongside all of your
other services
and it handles all
network uh ingress and egress for you
so you the idea is that your application
only knows about the sidecar proxy right
so to your application the proxy is the
outside world if if you you know it
knows about the service locations and it
also knows about circuit breakers and
traffic splitting and load balancing and
scaling and everything else that we'll
talk about in a bit
and that proxy becomes the thing that
other people use to talk to your service
as well because your service may be a
cluster right and so people if people
want to send something to the player
service and there's a bunch of instances
of it your proxy is the one to choose
which one maybe maybe it interacts with
an external load balancer or maybe it
does the load balancing itself the proxy
does okay by doing the health checks on
its local service instances yeah
does this model make sense so as soon as
you get this basic model of the of the
sidecar proxy you've got a helper
service that goes along with every
cluster
and it knows about the services in that
cluster and it knows about the outside
world
and your cluster talks to the outside
world through the proxy and the outside
world talks to your cluster through the
proxy okay you can use nginx for that
and that's what dropbox is doing right
but these days people always almost
always use envoy or link or d there are
a couple of other options in addition to
those in nginx but i mean those are the
really popular ones okay
envoy is the the super industrial
strength
does everything swiss army knife amazing
data plane okay by the way those sidecar
proxies i'm going to introduce you now
to the to the second term you hear data
plane the other one being control plane
data planes is just all of your sidecar
proxies in aggregate because if you if
you've got a whole bunch of clusters
right uh or even a whole bunch of
services and you want proxies for each
of them then
that mesh of proxies
that are all talking to each other
to work out the service discovery and
the routing and everything on behalf of
the application services now you've
extracted all of that you know who who's
talking to who what where and how much
and all that you've extracted it into
your
sidecar proxies
that's your data plane
it's because the network data is going
through that and i think it's a terrible
name it should have been called the
network plane or the proxy point proxy
plane would have been an absolute great
name for it right
proxy plane but no they call it data
plane so it's completely confusing
because you'd think the data plane would
be either your application logic or it
would be the data layer behind your
application logic but no
so stupid name really stupid shame on
whoever chose that name really you just
you did a huge disservice to the
industry so if you patent yourself on
the back because you came up with a name
data plane like seriously like punch
yourself in the mouth okay it just it
was a bad name
naming you know naming stuff matters man
you don't want to confuse everybody for
the rest of their lives
whatever but the name is stuck and the
name is the name now and in fact there
are well we've been ahead of ourselves
here but they're even becoming universal
standards now for data plane uh
interfaces
so the data plane i mean like you're
just going to have to learn what data
plane means it means it's the proxy
layer okay the proxies that can uh could
load balance and they can they they
handle the network for you it's software
load balancing they actually in envoy
they actually communicate through a
protocol called a gossip protocol which
is a family protocols where they're sort
of like udp multicast where
everybody just kind of like spits out
the state and consumes the state and it
sort of floods the network
and it's eventually consistent
so that's one thing to know about envoy
is they chose an eventually consistent
model
if you'll recall
i said that etcd and technologies like
it like google's chubby or uh
zookeeper or uh even hashicorp console
they're all they're all key value stores
that are
um transactional highly available and
strongly consistent okay
uh and that actually makes them uh sort
of a pain to operate
uh in practice
all of the ones that i just mentioned
chubby is an interesting one google's
chubby it was probably the first uh mike
burrows i think uh did chubby and if you
haven't heard the name mike burrows
uh you really should know his name
because you know he's easily one of the
the people who had 10 people who've had
the most impact of google right
uh he's you know i don't know he's a de
or whatever
and uh and he he came up with chubby as
far as you know among other things and
chubby is um
chubby is distinguished as
having something like seven nines of
availability it was down for 30 seconds
in 10 years something like that
so um
so yeah and it's because google has a
core competency of operating chubby at
scale
right because it's the it's the central
you know key value service for service
uh discovery and information exchange
for all of google right so chubby could
cause global outages so seven nines of
availability there it's pretty
amazing you're not gonna get seven nines
of availability out of your ncd cluster
i'll tell you that i think that i might
have been talking into the back of my
microphone this whole time as part of my
um my new setup uh so that's that's kind
of a bummer i hope i hope that i wasn't
and it just rolled over my goodness
um
all right so yeah this is still work in
progress apologies folks okay so we were
talking about data planes you guys i
think understand now why data planes
exist data planes exist to abstract away
the network and the service topology and
security groups and circuit breaking and
all of the other things that are stacked
up on top of communication the service
proxies also handle a lot of heavy
lifting of
you know managing tcp proxying or they
can do udp tcp http http 2 http 3 they
can do grpc they can do all sorts of
protocols envoy has filter chains where
you can implement a lot of these things
it's very very very powerful envoy is on
look everybody agrees that envoy is like
the data plane to use
with one exception which is if you are
using kubernetes my understanding is
that linker d is custom fit
uh has more or less the same protocol
much fewer features but it's also much
higher performance and i think easier to
operate so some people use link or d and
the control planes which are basically
just the configuration stores so it
should really be called the
configuration plane but whatever the
control planes for these service meshes
usually can use envoy or link or d
okay but if not if they only have one
that they support it's usually envoy
because it does everything all right
okay let's build on what we've learned
so far all right we've learned about
data planes we've learned that they do a
lot of stuff
envoy you know out of the box does l7
load balancing and it does l3 l4
and they also do so what else does envoy
do for you so you can just use envoy and
by the way i started by talking about
dropbox dropbox's article remember is
they they moved their data plane from
nginx to envoy and you can build an
entire service mesh of your own on top
of envoy although you're pro probably
going to need something like etd right
or zookeeper depending on how you've set
it up but but you don't absolutely you
absolutely don't have to have it if you
think about it scd is a little confusing
because i mean depending on your needs
right like
it might be okay envoy is eventually
consistent right and etcd and all these
other paxos based key value stores are
strongly consistent right so which one
do you need right well envoy argues that
hey it's service discovery it's okay for
it to be eventually consistent meaning
look if we accidentally route somebody
to a service that's going down and they
wind up getting an error and have to
retry as long as it's it's a tail case
and it doesn't happen very often then uh
it's probably okay because of retries
right and so envoy you know pushes some
of that that retry logic that you you
don't need in a strongly consistent
system where as long as that cd is up
you're gonna get an accurate up-to-date
service instance right but i mean envoy
takes the approach that it's like well
what if you call lcd or zookeeper and
you get yourself a service instance and
then it immediately dies right like
strongly consistent doesn't necessarily
mean that the service is going to be
available for the duration of your call
to it so why not go with eventual
consistency which dramatically
simplifies things and speed things up
and it's an interesting i don't know
it's an interesting take everything that
i've seen built on top of it winds up
using strong consistency so i don't i
don't know who's right here but it's
it's an interesting thing to know right
is that envoy is generally attuned for
eventual consistency in that gossip
protocol
envoy is written in c plus i believe it
was created by lyft
envoy is it's its own thing now you know
it's it's a it's a huge huge system with
a massive community contributing to it
and it really does everything it also
does things like so it does load
balancing like like i said and it does
traffic splitting and it does uh you
know filter chains the filter chains are
amazingly powerful and can do all sorts
of um important stuff like um you know
calling your cert authority to you know
validate ssl certs and um
transformations between protocols and
all sorts of stuff well hell let me pull
up the list
yeah grpc proxying
it can do health checks it can do stage
rollouts that's why i looked over here i
thought it could do red blue or stage
rollouts with traffic splitting
percentage-based traffic splitting
what's that for
like are all these words do you guys
know what all these things are you know
what load balancing and dynamic service
discovery and auto scaling are tls is
the new ssl and tls termination is you
know you have to do it somewhere to
actually integrate with the cert
authority and whatnot so you can do all
that in the proxy circuit breaking is a
relatively new concept where you rather
than overloading a service causing its
performance to degrade and all sorts of
alarms going off and
potentially scary
things happening like data corruption or
whatever
with circuit breaking you basically
configure the circuit breaker to say i'm
not going to take more than
nqps and then i'm just going to like
open the circuit and we're just going to
stop stop sending stuff through right
and so you get an immediate alarm and of
course a cascade of circuit breakers
upstream and
and they can be a little tricky to
manage especially since in a lot of
situations the client of the service the
person calling the service is expected
to configure the circuit breaker and
they don't really know have the
information to configure it properly so
circuit breaking is a bit of an art to
say the least
but it does seem to be preferable to
not circuit breaking which just allows
services is an arrangement where
services will just fail arbitrarily
under heavy enough load you definitely
don't want that okay so envoy can handle
circuit breaking it can handle fault
injection so that you can do things like
chaos testing and what is netflix's
chaos monkey right you can actually do
that in the proxy plane the proxy plane
you know if i start calling it the proxy
plane will you and you guys start
calling it the proxy plane maybe we'll
actually be able to like eventually kill
the term data plane for the proxy plane
whatever
can't have it all i guess
so and it'll also do logging access
logging you know all the things you
expect out of things like nginx um
so it's you know it really is a pretty
uh pretty robust
mesh on its own envoy is envoy's really
cool and linker d does some of that but
as you can as you can imagine like you
you know you sure load balancing you
know some of these things you know
protocol transformations filter change
things like that those make sense in the
proxy
right maybe but if your proxy is only
for kubernetes
and not for redis then obviously linker
d doesn't need all that stuff right
linker d probably has you know access
logging and observability and and maybe
maybe uh tls maybe tls termination
but uh uh it doesn't have a lot of the
features of envoy but it performs much
much better it's a much much smaller
binary and so it's it's fairly bespoke
for kubernetes again it's also a very
good piece of technology
and there are some other ones out there
but honestly like if you're a cio or cto
or just a team lead even and you just
want to like um you want to decide that
you're going to as a team lead you
probably shouldn't be making this
decision you should you know your
company your organization should not use
like multiple service meshes and
multiple data planes and so on you
should really probably try to
standardize on one but if you're a team
lead who's responsible for maybe proving
one out before you roll it out more
broadly to the rest of the company then
sure you could make this decision too so
i'm telling you unless you're like a
kubernetes only shop and you you know
you're basically being backed into using
linker d by the stack on top of it use
envoy like that's that's just it's a
it's a no-brainer envoy has basically
replaced all of the other like proxy
technology out there there's no reason
to use anything other than envoy unless
again you really really need a very
lightweight high performance kubernetes
only installation and then you can use
linker d all right blinker d plus
whatever things that you're going to
need to use because they're not in link
or d
so you with me so far so everybody
agrees that envoy is the cat's meow
everybody does envoy is it
okay and the and the the the guy that
invented envoy i'm sorry i forgot your
name man but amazing job but i i read
his his blog posts periodically and his
his updates and he he talks about you
know the decision process you know for
how envoy started and how it evolved and
he talks about other data planes and the
fact that they really all ought to be
pluggable and they shouldn't all just
assume envoy and so he has been driving
along with some others a universal data
plane interface or api
udp it's called
i think it's udp universal data plane
which has an unfortunate collision with
udp
but whatever
so again this is probably the guy that
came up with another reason not to call
it the data plane folks it could be
called upp if it was the universal proxy
plane and believe me everybody wants upp
instead of udp because
uh all the collisions and names here all
right universal data planes data planes
proxies we went through the list of all
things envoy does look if you're a cto
cio or team lead trying to prove
something out absolutely you're going to
want to use envoy and if you don't
believe me believe
all the service meshes because most of
them use envoy under the covers as their
data plane
console has envoy integration i don't
think it requires envoy but since so
many people use envoy console happily
integrates with it okay
istio uses envoy and it's not pluggable
it just assumes envoy all right and
there were a couple of others maybe kong
i think kong's service mesh may be built
on envoy as well but don't quote me on
that
yeah i double checked and kong in fact
is deeply committed to the success of
envoy and so even kong which is also a
fantastic offering that we'll talk about
in a little bit
[Tech] Separation of Control vs Data Planes - Steve Yegge
Broadcast by