18 Episode results for "Cooper Netease"

Okteto: Cloud-Native Applications on Kubernetes with Ramiro Berreleza

Software Engineering Daily

1:02:10 hr | 2 months ago

Okteto: Cloud-Native Applications on Kubernetes with Ramiro Berreleza

"Coober netease is an open source contain orchestration system. It makes managing container clusters possible as well as deploying co changes to those containers micro. Service architecture is widely used today in large part because of cooper netease however using it can require a large time commitment due to its learning curve. The company octavio empowers developers to innovate and deliver cloud native applications faster than ever octavio cli lets developers deploy realistic replicas of their stack on cougar. Netease and updates it for continuous deployments. It also manages different code environments self-service access and container scaling automatically. In this episode. We talked with ramiro. Bear elisa founder and ceo of octo- about managing kuban eddie's clusters with octagonal g two. I is a marketplace for pre vetted java script developers hire react react native and no j. developers. Do you can trust on for a contract or full time. Basis g to i will match you with pre vetted developers within three days of your onboard call. You'll be able to review their technical profiles. Set up interviews with candidates that you like you get a detailed technical profile that provides the developers assessment scores in each category a copy of their code challenge and a recording of their technical interview. I love g two. I and i use it for all of my companies. I just think it's a great way to get started. It's a really fast way to find a great front end developer. Which is most of what we need these days react. Developers are in such high demand and g. two. I is the place to find the best react developers. You can test a working relationship with no risk. The first week is free if you decided to developers and a good fit. If you don't like it you won't have to pay for it. Ju- litmus test is simple. can this developer make an impact in your code base within their first week. Impact is the thing that matters. Go to software engineer. Daily dot com slash g to i to get started that software engineering daily dot com slash g too. I i really don't have any problem. Promoting g too. I because i have been a power user of their services and i just gotta tell you it will accelerate your development kittens ramiro. Welcome to show different things. Having me very thrilled to be here as a longtime listener first-time participant wonderful well as we spoke about the place to start. This conversation is with the famous container orchestration. Wars to jog people's memories or if they have not studied the container orchestration wars. This is a period of time. Probably twenty fifteen through twenty seventeen roughly when there were rivaling container orchestration systems and businesses built on top of this container orchestration systems. It was a messy time. It came out with the result of coober netties. Essentially being the defacto contain orchestrator ramiro. What are your reflections on the container orchestration wars dot as you say the super interesting time back then i was working for devops going to reentered company and i remember pretty close the mess folk back then and it was interesting because he had this kind of remember three three kinda big things. Karez was messes. It was dockered swarm. No mad nomad. Yeah you're right though much there. There's a few of those and then even at some point pop and chaff. We're trying to do something around containers. It was very interesting time. It's funny how looking back how we're still struggling with love. The same things back then like that is clearly one. It became the standard. Now everyone is using. Cooper is but it's funny how the same concerns. We had back then around. Hey how do developers use this democ speed operability whether you wanted something to be declarative or more like program oriented are still. The same challenges of the committee is debating this day. So it's it's fun now with solve some big issues like a running Skater now people know how to but a steal the how the tooling around battiston something that we keep debating. I like it. Because i feel like we are finding more challenges as we adopt mortgage technologies. And it's a fun space to being bumped with. I've been around containers since around two thousand fifteen onto today and it's it's definitely a fun space to be at Even the governor's kind of one quote unquote. There's so much work to hoover that kisses busy everyday is the correct way to think about this lake. Lennox one there's not really lennox company. There's a lot of companies. Is that how we should be thinking about this. Where there's just this is not like a domain were gonna win a gigantic ecosystem with a lot of companies doing different things. I see the same way. I think that covered nida's is really in activity to become the neal. The knicks one of my mentors always talks about kuban says like the cloud. Os like. Lena was the vm When clouds became a thing you know back when like either lil starting links clearly became the easiest way to ruin a cloud server. But now is like cooper. Dennis is the which think as this kind of like eco-system matures we're going to see less about cobra netters and more about companies building things on top of it. I think what being word did for. Vm's i think we're gonna see for cobra. Notice not setting you coordinators. That's gonna become invisible. Can i like an atrocity. Is something that you just get from. Someone really care that much more about what vendors are giving you value added on top of it with vendors are solving difficult problems always had with policies storage access control and now all these new things like in our case of expediency but many other things and i think that's how this settles. That's where we're gonna see a lot of interesting companies coming up all right. So kuban indies platform companies. I've seen a number of these. Perhaps the most successful outcome at least a successful measurable outcome. I would say hefty. Oh hep do started by jovita and craig mcklusky who were early. Cooper netease founder slash contributors was sold to vm-ware for something like four hundred million dollars. I believe. And if i recall the hefty deal product was built heavily around support and they were they were of searching for. What is the software play here. They had they had a great support story and they were able to sell to vm. Were sort of under the ages of we're just gonna define him worse container and coober nettie strategy. Since then i guess. Ranchers is still going pretty. Well i like Shang leong. But i look at octavio. It seems like you've done something a little bit novel at least a little bit novel. What are you doing that. His novel in the kuban indies platform. Space the main thing where we're trying to really innovate and do things. Differently is a thing platform firm. That's berry berry focused. on developers. Octavio is not a platform for production is not a platform where you're under services at skate. Our main focus is on. How do we help. Developers who are building disservices during did development life cycle. Because there's a lot of tooling alert platforms. Focus on things like you know. Software supply chain security progressive. All the things some really cool stuff in that space but when we started a title there was no one building tooling for what we now call cloud native developers which is all these teams were building micro services for building applications that we run cooper netease so from they wind michael function i all of developers who were like this is a tool. We've always needed. Let's go build it so very focused on or does things that developers do from the moment they get a feature assigned to the point where their pr is reviewed ready to be merged and then take into production. Who uses this. Like i find using cooper netease. Chances are i'm using it through. Aws or google cloud or azure. And they're gonna give me. Lots of robust kunitz thingies. Why do i wanna use octavio right. That's that's a great question and the wheel. Ted as we see it is with the title as the next layer above from from those things you go to the us. Google civil any of this catheters and you get you know your infrastructure and that's a step one. You have the compute. You have the engine. The question now is how to covers. Your developers are going consume this and well we've seen in the market is there. You have two choices one. Is you train your entire engineering team on cobra notice and they have to use things like many cubans or you know. Everyone has to cluster. What or what we see more and more on the industry is developers. Kinda ignore cooper. Dennis run things locally. What whichever way they see fit. And then they kind coordinators notice maybe on appear in vitamins typically on a pre-production staging. So that's kind of what we've seen on company so Kind of fills the gap and giving you the software you need. So your developers can access cooper notice when they're developing and this is everything from how you gave them credentials to clusters. How do they do things like create an in space and deployed or application. What are the things we going to build earlier with. Ted was this concept of a namespace as a service. Which is that you log in to web click a button and that creates needs for you and then you click another button and your reputation. And then you're as you write code you can sink your changes between your local machine and this devon vitamin from on cooper notice and you're constantly testing you're already aided and you're interacting with couvert notice on day one the something that some of the early adopters of dentists kind of built on their own. If you if you know spotify has a this platform goal is back store back channel. Something like that and there's more like that but most companies don't don't have anything like this so they're using a title to accelerate their developers to give them acas coronets without having to train them every single nuking cranny that as you know dentist can be can be complex and most of don't need to get that. Most repairs are happy to kind of like focus on the business logic and not become experts on containers on networks on policies. They just went on that automated. So that's where keto really helps teams ships over foster those f through the layer on. So you're talking about layer on top of essentially g. k. e. e. ks or e c s e ks. What's what's a an azure address. Akea a kiss. A i think assure aka all right whatever you choose as your acronym your layer on top of them. You give those platforms richer. Richer problems Solutions and those problems are things like credentials secure name spaces auto scaling and essentially garbage collection. So that the whole garbage collection ideas really interesting to me because there's obviously this trend in cloud cost management right where everybody's spending too much cloud you have start up your starts growing really quickly you scale up your infrastructure. It's nothing you're serving traffic. You got a high margin business who cares about the cogs then over time you realize you're cogs have gotten way higher than they should be because you're not cleaning up your your communities clusters. How do you build the garbage collector for cougar. Netease just that's one of our over the features of the Most and in context where we think about it is the other way around is as you took title. And you the boy. You're devon battlement that's really care about it copies replication any other services drew my need your code all those things. Right one click. It's up and running. And when you're done a lot of as you said starting to see is as developers have access to more coming for stricter this about administrators just kept running all the time and as you do. Things like attaching a devon vitamin a. perview emolument. Npr has a copy replication for thinks. Or as you start creating more branches they same vitamins to start to add up and they don't like to think about the sinks. I never remember. I have to delete this data like more than once. I've been surprised with this. Huge cloud bills because i forgot to erase things and every paper does that so we had to do was it will give you a wonderful experience having environment but treaty what we want to give you. Is this platform where you don't have to think about this things you deploy by taking about them and when you're done is going to go away so we bit is. We're going to build this workflow engine that based on metrics like requests when deployed. How long it's been running. When was the last time you access your devon vitamins. It was late and after it reaches her it. We're just run and the first thing we do is will scale everything down zero and it's one of the really cool things about building. This platform corporate. Dennis is up cobra already. Has all this mechanics in place like scaling a deployment to see replicas. It's fairly us. Change the manifest and it's just there and then when he comes back they can simply access We build this integration with english controller. So as soon as they hit an endpoint we gonna like rehydrate seven vitamins restore or the settings and they're ready to go and seconds and that was something that was really well received their customers. 'cause it's of as you said very real problem which is we're spending too much and clouds but we do see the benefit of giving developers access to cloud because some some less sophisticated companies refers reaction will be. Well don't give them access o'clock. They have computers. Let them use it. But then they start to pay in production on the issues and lack of expertise in their teams in all distinct so as they mature that adoption of cooper nervous. They see that things like this automatic. The leader in spaces are y. Said them to see the so. They consume less compute. We've been wrote a note to scaler for cooper neta so you can scale the notes following more deaf audience that patrons which necessarily match why did you. Production because in production you know. Traffic is fairly constant. You're getting head you get up down with. Developers is mostly like you. See this huge rush of deployments of nine in the morning and then everyone goes home at six. And then no one's consuming cpu anymore. So we build a specialized garbage collector slash out the scaler to deal with the patterns of we see developers have and they have to be able to make that because you cannot just say start at nine. Showdown at six in remote workforce people work at different times. They work working different schedules. So we have to build in some smarts. But it's definitely something that especially like large companies really like they. The of access delivers bought with goes with automatic. Cost control rather than them having chase people and tell them hey. Don't forget to stop this thing. So yeah that's kind of how it works. My impression of garbage collection is it's essentially an unsolvable problem. It never ends actually house zoom out when you look at the suite of things that you can see. I'm kind of understanding what what you're just is. Now you're basically saying look. There's a lot of stuff that you want on. Top of your coober netease. It kuwaitis is so far from being a cloud. Basically take the difference between you take you think about okay. So it's like one on one side you got like coober netease as it is when when amazon gives it to you and then you have kuban eddie's as you want to use it as a developer and the gulf there is really really big which is why you have to write a lot of custom code on top of kuban eddie's to get your infrastructure to work the way you wanted to with octo- you're basically saying look. There is this margin between what amazon gives you for coup grenades. And what you actually want out of kuban eddie's and we're just gonna fill in as much of that margin as we can. That's that's a good way of seeing the way. When we when we talk to to use russert samantha about this is not core. Dentist is a great container orchestrator. It gives you a resources a task policies. It runs your things but when you're building software you need a lot more than that. The namespace would example you somewhere to the player coach. So we give you a one click namespace creation but companies also needs you know policy security so bianca made those things for you so you slipper. Don't care but then from there. Octo bundles a lot of other things that you sort of labor always needs a good example of this is when you select tanto on your coordinators cluster. We install a remote built service for you based on built and we also installed a race based on dockers competitor eastern. And we put them there because we know that any developer who's building a cloud deflation meets to bill images and they need to push them somewhere. So instead of you having to push things to get hub to the blue and then do a round trip to your cluster. We just put them in the cluster. So you you're the plummets. Our way faster and they would be with all the tooling though to make all this for you so you run one comment october built. And that builds your container just as if was with docker but as through already in your cluster so then when you leave deployment is way faster. So we're gonna feeling the gap or this workflows and tools that typically developers have to either assemble themselves or just don hughes and give them to them another good example of this is with the provision certificates for your points. We will Which is things that everyone needs to do. Because you're testing your application would say but if you have to do it by hand is time. Consuming is boring isn't productive instead activity lesson for you. Put your application a the building containers. Bush history and set. And then you're ready to just test it. Read some code and ethan. This and that is where we're seeing that people really like rebuilding because we give them those extra things that either they have to build themselves which sounded like but it's not really that productive like if you're a company building recipe spending a little time building. A new space manager is not under best interest on your business. Instead we believe it's our business and our customers use our stuff to build their own business and you know go faster easier reach out more people all those things that they need to make a healthy business. You can learn as much fancy theory as you want. But at the end of the day machine learning is still ninety percent data cleaning and infrastructure work and doing it. All manually is exhausting. It's not likely to make way to production especially when your data your models in your code are constantly changing. Pachyderm is an easy to use. 'em ops platform that empowers anyone to build scalable into in machine learning workflows regardless of whatever language or framework built on pachyderm provides get like data version and lineage to automatically track every day to change and final output result. Meaning you'll also know exactly what data was used to build the latest model automatically right now. Se daily listeners. Can get over. Four hundred dollars in credits. Pachyderm hub sign up today and build production grade data science workflows in minutes without ever having to configure a single piece of infrastructure. Imagine being able to automate your entire data science workflow instill reproduce any result from any point in seconds with complete confidence head over to pachyderm dot com slash s daily to get over four hundred dollars in free credits. But you want to hurry. Because this offer only lasts for a limited time that's pachyderm dot com slash s daily p. a. c. h. y. d. e. r. m. dot com slash daily. Here's a puzzle would do products like dropbox slack. Zoom and asana all have in common. The answer is they were all successful. Because they became enterprise ready becoming enterprise ready means adding security and compliance features required by enterprise. It admins when you add these features enterprise users can buy your product and they'll buy a lot these features unlock larger deals faster growth but enterprise features are super complex to build. They have lots of weird edge cases and they typically require months or years of precious engineering time. Thankfully there's now a better solution work is a developer platform to make your app enterprise ready with a few simple. Api immediately add common enterprise features like single sign on samuel sei user provisioning and more developers will find beautiful docs an st case. That make integration a breeze. Work os is trying to be like stripe for enterprise features work powers apps like web flow hop in her cell and more than a hundred others. The platform is rock solid fully socked to compliant and ready for even the largest enterprise environments. So what are you waiting for. Integrate work os today and make your app. Enterprise ready to learn more and get started go to suffer engineering daily dot com slash work s that software engineering daily dot com slash work. So your customer base is pretty interesting. You got some real infrastructure players. It looks like you've got people from vm-ware cisco microsoft intel ibm dell using octo. Is that right gas. Don't at the end. We have from prox- we have an open source. Cli beth really focus on anybody who's building applications. Dan we have a public cloud. That's for the flippers. What he thinks of notice for themselves and they will have a self which is called. That is a vision that you install andrea clusters and that's what most of our customers use at a professional. So one thing we've seen. Is that the kind of resolved around helping you get ready to do your work. It's pretty university. Anyone who's building a microsoft based to hospital. This prep work right. You have to start your containers. You got the runner application and figure out how your dependences and then when you're right co you have to figure out how do i use docker use cube. Cdl so this really. A big market of people building cloud that help legations that really benefit from automation. We're building so has know our committee. Everything has started with an open source project so in our community we have everything from developers in large corporations to startups to more traditional enterprises. It's really cool to see. An i think is a testament to as we have more than oedipus and as more companies build internal software. They need deliberate and they need platforms like a tunnel. And i think that's why coburn is has really grown as it has because definitely every day more and more companies need their own sulfur teams and the kinds of experiences that these sort of lower level of delhi emcee or intel or ibm or microsoft as needing things that go beyond web apps right like a lot of the companies. I talked to are sort of airbnb. Uber you know a gaming company. Things are higher level. We were talking about the infrastructure players. It strikes me that they need something different out of kuban as platform. They perhaps need something that is a little lower level. Can you tell me about the difference between those two customer types. The maybe the hard infrastructure customer type the cisco or the microsoft kind of customer versus the airbnb or post mates kind of customer that plays at a much higher level. Frank when thing was here there is that this action. They somewhere know up and down the stack. We have people using a title to cloud infrastructure that our teams in all the major clouds using some four more liberal components or cli some of our open source of stuff we see in those customers that you have the infinite which were they need is tweeting to help them accelerate their work so when they're building components they don't care too much about nathe service outdated deployments but they do like one kind of like the core features of cli is the ability to hot reload a go process on governess. You know that they've got to work for your some coach. You have to build a container push it readable. Your application see the results instead of that when ucla. You seem to run one comment. Ted coat keeps it synchronized back and forth gives your demi now into your container. So you're right coat and you see the results and this is something super useful for teams. Who are building something. Really close to coronets of your controller. An update third. Because this allows you to test direct link cooper notice from day one. So you don't have to spend time building marks or trying to replicate what a cluster of looks like. In your local machine you're just directly in a cluster. And actually that's how we started the company with this in mind as we talk to our customers as we see the market coordinators evolve. Then you have as you mentioned all these. The airbnb strives all these companies who are building components and what these companies value especially as the scale making developer teams more efficient in the stripe. Stripe founders have mentioned allow towns that they see developers as like every developer contributes to the gdp of his trip and they know that investing in tools that makes it developers go faster. It's good for the bottom line. So that's where we see a lot of adoption of especially are hosted version of potato which is companies who they won their teams to go faster. They want their teams to not spend time trying to figure out. How do you apply work together. Reminded us from having to learn. Couvert net is. You know if. You're a front than developer. Yard cooper net is is not productive. But if you have something to tell you can just log in cleveland button deploy draped and then you can actually work against api and then you're being more productive. You're taking advantage of cobra notice. Even though you're not using it directly and that is one of the kind of core missions for rousing enabled elizaveta pers- to take advantage of all this cloud meteorologists without them. Having to be experts in every single thing. Which is you know everybody come on topic as technology matures as it gets adopted it has to look like has to be by right. Like you have to benefit from cooper net is without you even having cube. Cdl installed that is for me the gold standard of any technology. And i think. Bm's got to that point. I know it would be more was doing docker containers where us run. Ducker up and that leads to container We hope to the same thing for notice asset. Can i ask what was the ideal process for this. The how did you stumble upon this. This this is a very specific layer of abstraction that you're working in so i'd love to know how you got there. It's funny because when when i look back at my career it's it's funny because this is a problem that i've been facing for a very long time. I started my career at microsoft. I was actually team. Was building ashore and back dan. One of the biggest pain points of my team of the serbs bus was the lack of a of a realistic devon vitamins. Like we were building these things for asher but developers had to ride all this dot net code locally and then spent some time waiting to get deployed and then run your tests. Then you found something that broke and then colleague storrow reagan. I saw that in a sure didn't move to this startups. I guess seeing the same problem and then right before title. I was working for year. I was part of the teams hitch on. We saw missing thing know hip chat was was doing well at some point. Were hiding of engineers and we start to run into this problem. Where you're hiding engineers. Your team is growing. But you're doing engineering. Practices are not growing. And you start to hit all these problems around quality around the velocity so my experience with that and then at the same time i took oh founders. Pablo and roman polo was a docker time. Someone was google of the time when we chatted. We kinda kept hearing the same problems that i had in. My teams was the same browser on hat on his team. He was part of a g. Mail and paolo was part of what was dark. Cloud but point. So we're gonna keep everyone has programs around. How do i get started. How do i get a realistic devin. Vitamin how run my tests. So all of that kinda like gave was initially idea from the hague. This is parole and everybody has and in those days. I'm talking maybe three years ago. Were really deep incumbent cornets so we said hey. Everybody has Specialty as you to notice it becomes harder because now you're going from everything runs a machine to everything needs to run on this orchestrations software. Who's a spray around. you know. Tens of hundreds of notes. So we saw that tweet to say. Hey this is all these problems. Seeing are going to get worse as people adopt cooper. Data's let's do something about it so we quit. Our jobs started building. This open source project. We got some really promising early traction and you know through the community through talking to perspective customers and you know through our we ended up building. What now became tattle as you can imagine. We started with something fairly different at the beginning. We're building a more traditional black firm. Something closer to her okubo for couvert net but slowly is we got kicked back as we did our ideas we ended up with. They say the only. Hey what we really need. This is not an interim platform is a platform for developers four. The development life cycle and. That's where we are today. All right you got my attention with her. Roku i love her roku i just love it. I love the platform. It's an undying love. I've used hiroko competitors. I love those as well. But hiroko hold a special place in my heart as basically the first place that i deployed infrastructure to without going completely insane. I think before that i tried to use amazon. Elastic beanstalk one. Time and just went completely like i just. I spent all day trying to use amazon elastic beanstalk and then i tried to roku as very as experience of love at first sight. I think the heroic for x. analogy will never go away. it doesn't matter how many fire bases there are. It doesn't matter. How many high level things there are. It's always gonna to be harajuku honestly. It's almost like an apple level brand. I think salesforce made one of the most legendary acquisitions of i'm who knows. How much do you think arocca makes for salesforce like it's so so hard to know right. It's hard to know. Because it's i would you one hundred percent. I i remember this tread on twitter the other day but someone asks west kind. The first time you felt like magic using a tool a lot of people myself included were like. Yeah i don't who could push was like. Wow you have a ripple push it just runs. I don't know i think he was. I don't know if it was a good or bad acquisition to be honest. That's a topic. Davis wind up for debate at this point because i feel could've done a lot more as an independent. They shouldn't have sold. I wish they would have sold. Of course helical on its own khuda man who knows it could have been a huge thing. I think The safety could've it could've been as big as apple. Could've been as big as microsoft one hundred percent hundred percent. I think so sometime. I feel like i know some people that work with salesforce and the plans they have for head oku. I think it's interesting. I completely agree. I completely agree. I haven't counted her roku out in the slightest. Oh no no honestly like i think from a developer expedients perspective. Ten years into this. I still see them as the golden standard. I show agree with you. This is like a torch. It's kind of a kosh man. We go so deep on this We continue to give give me. I want to hear thoughts a little bit deeper on this. I feel like they really nailed. The i like the the shape of eli. You know the way the the commanders flow for me. What's really valuable is it really gets gets you to work really fast. It's like your application along a big deal as you said you don't have to get m. you don't have to get you see two. I think from for me headache was able to show what service is trying to do. Which is day via and i think that's where eventually we'll get to. Which is i have. Coach run it. I don't care how. I don't care where just run it given an point but these may application and head of. Who really did i remember. When i was hip chat we were using head over to a lot of our chat bots because it was so simple it was like oh and the d. shirt hot one thing manifest on it locks out another thing and they really marketplace's it was super easy to aboard author services by using keen for analytics. And he just everything is flowed size. Li it was. It was betty. Well thought especially for you know a small teams. I think probably had escaped well for enterprises where you have to get into procurement. You have to get into a for everything compliance but not but for small teams. Who don't have to go through those workflows. I feel that way of the plane suffered west great actually whenever investors divide is. He's part of the team and we talked about it. It's super cool to hear. Would they have saved when they were building. Stuff is definitely feels like they were ahead of their time right. I feel like most of us are just catching up to all this concept around to ever that they had many years ago and they have retained. I think it's very impressive. How they've managed that acquisition of done enough interviews with people from her oku to know that that product is so hard to build scale. It's hard enough to build basically any context but it scale. It's it's just insane and have you used firebase. Did you ever use firebase have yes. I have western farthest for a few days. Wow okay tell me what you fire race for. So in our in our in our cloud who use firebase for some internal systems. We use it as a mostly. The store part of it is one of the things that i really like. We looked at one of the first things we decided. It was no database cooler. Dennis ditties database. So everything if you go to telecom everything. Everything is in the database incorporated. But at some point we do need it to store certain things outside of netease logging information. Analytics secrets configuration value listings. And we're looking around and it was like okay. What gives us kind of going back to our topic of. How can we be more efficient and we were like running. Database is not efficient for us. Because that's not our business so we ended up one of our engineers not ended up saying hey you know firebase thirty two good fit for this. Will you put the values there. It scales put in read it back and it's so simple and it just works. Well worked works for escape on for me. It's like the same way we decided to run our stuff on something like g. e. versus roy garrone covered in clusters fibers. Just made sense and all the stores that just give you an points to thinks Of course the the better known examples of this of the hague. Even i replace replacement entire part of my business but i do like like firebase. Up obeys. we're into the same thing but open source are great ways to just make developers more efficient and in a way. That's i think. Faris is not to use other thinks but once is set up no should not what should i say anything but so far. No here's my take. I think that her roku versus firebase is one of these classic x. versus y. Things that we can always point to to To try to express an analogy. It's like a microsoft versus apple. An open versus closed iphone versus android. There's something there's something different between the two. I don't know exactly what it is. I think the difference is. They're both sort of a backend as a service thing but the viruses like we're back end deserve our first class citizen classes in the year building. Everything around is our is our crazy. Do everything database. And then her just says you're deploying servers and we help you deploy servers. I agree. I think going back to those days. It's funny how you can see the initial targets. They had like their targeted user. Really molded ital-. Those things work because they was really focused on like ruby apps in the toll factor app it. Was you have one elite. Even by aruna in the in the first early days have had okwu even the playing a micro multi-service service application was not because it was betty clearly built for you have a ruby. Have rents up. Run it so a lot of their logic came from there. I think at the same time as was many focus. Back the nikkei you're more for your bidding on mobile up. You need this database. Does everything and it just worked because when you bring back then when we had less sophisticated mobile apps logic was hey i have data. I need to store somewhere an attribute later we keep and i feel like this even though there seem similar like they're back in back in service kind of thing. I feel like start. Use them you start to see how a lot of their expedience is very much influenced by this early use cases. I think they go to netflix. Fi is the same thing it looks. I really focus on front than static pages and everything they do makes that issue. I think that's where this platform's wing is by taking this very specific use cases and just being better than everybody else. at least. That's my take on headache. Courses firebase team city cloud is a new continuous integration. Service that is hosted by jet brains last year we invited listeners of software engineering daily to take part in the team city cloud beta now the services officially released and ready to be used in production environments team city cloud is based on the original on premise version of team city. It's the same great. Cic d. but managed by jet brains. The best thing about team city cloud is that it doesn't tie you to any particular technology or workflow it integrates with all popular version control systems. Build test. frameworks issue trackers. Ide's cloud providers and it supports them all equally well and you don't have to deal with updating your tools or installing security patches. That's all done by jet brains team. City cloud lets you run your pipelines on cloud agents provided by jet brains or connect build agents from your own network to get started good a team city dot com create an account and get twenty hours of bill time for free once again. That's team city dot com and as a bonus. If you want a personal introduction to team city cloud just use the contact us form team city dot com and right the you came from software engineering daily and the guys from jet brains will get in touch with you and help you through your cic. I path for free. Thanks for listening and thanks to team city cloud for being a sponsor of software engineering daily. You are you building. Cloud applications with the distributed team checkout teleport an open source identity aware access proxy for cloud resources teleport provides secure access to anything running somewhere behind nat as h servers cupboard. Daddy's clusters internal web apps and databases teleport gives engineers superpowers get access to everything. Single sign on with multi factor list and see all ssh servers cooper daddy's clusters databases available to you get instant access to them all using the tools you already have teleport. Insures best security practices like role based access preventing data exfiltration breading visibility and ensuring compliance. Best of all. Teleport doesn't get in the way. Download teleport at software engineering daily dot com slash teleport. That software engineer daily dot com slash teleport. Well now that we've teed up this conversation see you or trying to hiroko for kuwaitis. It's an admirable goal thinking about the kuban as platform things. I mean the closest thing really seems to be rancher. I think rancher i think the biggest problem with rancher is to my mind and i i like ranch i really like ranch company but my biggest problem with ranchers i think they had to re platform. I think they were originally on their own thing and they re platform to kuban is and that that to me is a very difficult challenge. I would assume that they're still trying to get that to get that to work through the have their own. I think they did at some point. Jay orchestrator at yeah. They were around during this time and they saw the writing on the wall. And they pivoted to cooper netease. So i'd seen a few a few fries i think for you. It was what with rancher. Who was the other one. There was the other one that microsoft acquired days left upper. Das davis day. that also is not coober nettie. that was just a character like experience. There was also anyway. Yeah go ahead go ahead. The rail yard railyard was well. No railyard not anyway. Sorry go ahead go ahead. There's many of them. But i think it's a steeler problem. Like one of my fathers probably was working in this company called. Shoot him the by docker container wars and was kind of the same thing as have a container run. It don make me go through infra choices just right. i think it's the st lagoda. I think the challenge with couvert is on one of the main reasons why we decided to not get into the head would like thing and focus on what we're doing. Is that cooper. Dennis is a lot broader than other platforms so requiring you to have an application. That looks very specifically. Doesn't work coordinators. Casino was your has to look this way has been people you need to have this structure just works and i think that worked really well to the rails committee because they were used to this kind of enforcement kind of thing. Cornet is people that are running everything you have people building the sophisticated ammo workflows batch jobs containers. Api everything you know on you. Have halloween have manifest customized. All this thing. So i think anyone who is trying to be the platform right now and i think she really good job at this. It's a big challenge because you have to push votes the platform and the application for rent. Maybe in five years if we all as an industry agree on application four months it might be easier for us. It was very important as we're building tito to tetovo in a way that you can bring in any application and use it so tennis reports everything helm. Qc docker compose single containers. We don't care about the front of your vacation. You can bring everything. And you're still benefits from all of this workforce overstocking early on. And i think that's what really sets us apart from the head. Okwu like of the work which i helped someone named it because i think it's needed every time i go back to writing coordinators manifests. I'm like why are we doing. This is so complicated. I do not see the purpose of the world doing this every day. So hopefully someone will come with that with a four month. But until then i think bathroom psycho who are more liberal in with the except have a much better chance of success than a than the more forcing unit shape kind of firms. So again the. If i'm looking at the productivity features actually let me ask you this. What do you think of the term. Get ops like it. I'm a big fan of doing things get. It's funny that it took this long to get this brother adoption. Because when i was talking to thought of like yet but this is not new. It's been going on for a very long time. Go back to do and there. Were other kind of get base deployments back in today so it's interesting that is now becoming such a big thing but i think one of the main reasons for for the emergence of ups is because cooper net is manifest are so hard to manage that get makes it a lot easier like having you know your your helm released fi on depot anatomy obeyed that just triggers deployment gives you this berry. Nice seperation between development and production. It gives you an inbuilt audit law because you can see gets history. I think it's definitely a step forward in devops in all this kind of journey within would infrastructure as code and all that and i feel like anything that makes it more accessible to everyone. It's great because i think i think one of the downsides of devops connecticut lucien. We saw the last. You know ten years or saw was that it was a steel very much gated through anyone who had an operation span using chaffeur. Puppet was enough for everyone then. Docker came analyst gives it makes it more accessible commit and he gets he played. That's something that any bor who uses. Get a understands and i i like that a lot going back to the topic of more and more developers are joining the workforce every every month every year. The more accessible arafat doors are the more everyone can be productive and the more they can build stuff. I don't want people to stop. You know ways their time trying to figure out a general fight. I wanted to build cooled software and solve problems for me. How do you go from this place that we talked about a little bit earlier. I see you basically as providing a lot of extra sauce on top of these hosted coordinators platforms on name. Your cloud provider. How do you go from that to the hiroko of kuban or all you. Are you already the hiroko I would like to say we are but we're not there yet. I think for us. The path forward is really making sure that will rebuilding enables developers. What's very important to me. And one of the things that i can entirely track is we want to make it easier for the who have no experience with operations or cooper nida's built donating ups. I think that is the path. Forward is is scott taking the spirit of hanako. Which is we're gonna give you the tools that are very well integrated that are very easy to understand very easy to use better use of one but at the end they are the tools that i bet. Success is gonna be if they are using those tools to build suffer. I think that's the dilemma with deference twins. Which is deaf through are a means to away right. You want people to use your death rules not because of the tool so because of what they can accomplish with those afterwards. So that for me is going to be forever can claim hayward the head uncle of cooper notice it would be because of we have all these users. Were building a missing stuff. On top of cooper notice and don't even curse coordinators that you have great tools. They enable me to go fast. Twitter eight to be productive and to really be able to express my thoughts in software for me. That is really the the path forward. And today we really focus on building this blocks and giving your widely deployment even the vitamins out demanding as much as we can and as our bathroom matures would be getting more into other parts of it. i think. Elimination of the sulfur. Life cycle development cycle. Like how you run your tests how you install your how you have the you into your. It undermined environment. And all this other stuff. How do you get people using this. Is that hard guessing. No it was hard the beginning. We were part of the kind of docker coordinators committee so our early. They were very focused on talking to the community. And kinda trained. Explain why the problems we were solving worth solving and why it was good for them to adopt those tools now us as more and more people kind of started building micro services containers scuba notice. That beach becomes easier. I think right now. The biggest challenge for companies like ours is that there are a lot of foods. Companies building things for cooper news magnum so from a developer perspective. It can be hard to tell which one you should be using and you know every toy is good for something and not so good for other things so definitely the challenge for us is. How do we explain to our users onto our committee. Hey we are really good for this five. Thinks if you have this kind of problems you should use tunnel. It would make your life easier. And that is the challenge. I think condensing. I was talking to somebody earlier today. We're talking about how hard it is for companies to really condense what you do to a landing or like a A single paragraph but. It's a challenge. We have but it's something that we're always talking to your community talking to more developers as the will and that's what i love about this kind of new class of companies that are building communities that are bidding on open source. It just change the conversation compared to the school who say survey roach so that makes it easier. What is the biggest problem. The business today is is it expanding sales. Is it going deeper with your existing customers. Is it engineering challenges. Is it marketing challenges. What's the most difficult part of your job today. There's two to fall challenge. One is from an engineering perspective. One of the biggest challenges as you build up left from like ours is that we have all these ideas. We could build in twenty years but knowing which features to build and how they fit together is a big challenge. Now you can have a c. Lie with thirty five comments but if they don't make sense within them or dashboard has always pains. That didn't have anything that would each other. You're doing a disservice to your user. So one big challenge for us is how we enabled eliza scenarios while keeping a useful experience and the other big challenge. Today is marketing. It's how do we get people to understand the value of tattle by visiting our website once we have developers using dudes they get the value. They run through our galaxy samples. They try on apps. they see the benefits. But the challenge is when you have a blog post when you have you know you of come. I only have five seconds to convince you to try it out. So how do i do that. And that's a super interesting challenge. It's something that i come from being background. So if number focused on this part of the business onto until now and they have a great content designer in our team has all these so much better. But it's a state of challenge. And it's something that i feel like will continue to be a challenge as a company matures. Let's step back. You have been in the business for awhile. You got a great resume. Lot of interesting stuff. So i just want to get your perspective your unique perspective on when you look at the world of backend infrastructure today. What is exciting to you. And what do you see around the corner. One of the things that are super excited about is family. Like the whole wassim thing competing at etch am still big believer in service. I think the korean version of survey lists assistant kind of not where should be but you know they can now run this very complex computations in places that are like super close to where i'm running where i need to consume them in a way. That's easy to manage these to support. It's super interesting. Stuff is doing where you can run business logic in a secure way on the edge of the sea the end it has to change of how we see computing 'cause now. You're moving from everything from this huge center next waterfall to. Hey i'm going to split application in the super small pieces spread all over the world. And that's where my users are going to consume it and it's super interesting from academic perspective from an operations perspective. Just imagine how the rollout when you have functions in. I don't know one hundred two hundred points all over the world. How do you ensure bursts are compatible. It's super interesting challenge. And i think it's a problem so that's something that i'm super inches to see how it is and the thoroughness of hate going back to i have. I'm reading code. I wanted to run efficiently fast without me. Having to do anything that for me is is a bit interesting place to be at. I think under the war worked advise fis doing k. native lamba. it's it's definitely willing to resurrection. Wait to see what the next generation of those. Those look like awesome. Okay now the last five minutes. I get to ask you the stupid question. I was at coober netease conference three or four years ago. When i realized something cooper nettie people and the blockchain people don't talk to each other. Why don't they talk to each other. They don't seem to even have respect for each other. They might even have disdain for each other. What's going on there. That's that's a really good question. I never thought about it but you are completely right. I don't know. I feel like it's just so funny. It just feels like the blockchain people think that the kuban eighties people are squares and the kuban. Navy's people think the blockchain people are crazy losers. That's what's going. I think. I think there is something i feel like cooper net is became a enterprise. So fast right. Yeah i remember. Same thing going to coupon and i went to the first two kief cons. indy confirm. Got good was so. This was good. I know it was. It was really cool and then from one keith. Khan and i think it was. Maybe austin to seattle it was like every major vendor sponsoring. You have this huge hole every well known brand trying to things that didn't even make sense but there were doing it with cornutus and it was like. Wow it just crosses chelsom superfood. So i think that the blockchain people can see themselves counting this almost like this. You know going back to the apple allergies kind of the pirates versus a navy. And i think they see gordon enterprises tuffy thing on multiple not reduction anything see blockchain. This this crazy wild west of things would know use. But it's i honestly feel like they could benefit from each other because it gives you compute and it gives you the to compute which is what you need for blockchain so it feels like almost like a match for each other. But you're right that they don't like each other for peace anyway. Ramiro this bureaucracy. Conversation anything else you want to add anything else you wanna talk about. This has been great. Just wanted everyone to because listening to invite them to try out a cattle in a where wherever developer tools company built by developers. A sweet spot. What's the sweet spot like. What's the biggest pain point that you hear people say oh. Octavio saved my life because of x. What did they say frag now. And this sounds kind of funny. But it's actually. We have a lot of customers during this is if you have a darker compose that's too big to run five six services a couple of databases give up headshot because just moving your devon vitamin from running compose locally to run the same thing but on our buffer makes you go faster makes. Your battery lasts twice as much. It's really cool use case so if you're struggling with ducker composed locally machine kind of slows down to a snail's pace when you run duck compose up triumph keto and you'll see some some really quick benefits. Well ramiro thank you so much for coming on. It's been a real pleasure talking to you. It was great fun conversation. Love talking about the continuing wars parochial. All good stuff takes hunting. The if you're going to be another container wars. Are we ever going to have that again. I i really hope not right. Oh i'm sure we'll have whatever the next container looks like. I'm already what does that web assembly arrive assembly. I was thinking about web assembly. Maybe something in data engineering news. The airflow orchestra looked airflow. You know maybe that could be another one. There's a lot of things going data orchestration so depp. Based i know we had dinner wars. We had b. m. wars we had jared him wars at some point so i'm sure they'll be something as engineers laugh to the very each other and love to pick winners. I think i think the data orchestrators are going to go to war. I think dexter and not to amp it up. I'm an amp up the cage match because the cage match is going to be aired on software daily. Yes it's going to be bloody dexter versus dexter versus prefect. Airflows like a passive observer. Winning already won the game. That'd be interesting. I'm looking forward to hearing that episode. Because i don't think airflow airflows reached 'cause airflow proliferate it's like businesses around air floor never gonna go away. I don't think any seem so anyway. Okay well great to talk to. You have a great day. Ramiro talk to you soon. Thanks for having here. It was fun.

kuban eddie cooper cooper netease ramiro netease Dennis microsoft octavio Bear elisa battiston lennox company jovita craig mcklusky Cooper netease Shang leong berry berry keto amazon cooper neta
Kubernetes Best Practices

Arrested DevOps

40:00 min | 2 years ago

Kubernetes Best Practices

"The powerful tool. But it's also It's time her arrested. devops the PODCAST. We help you achieve understanding develop good practices and operate your team in for maximum devops awesomeness. I'm bridget from I'm Hatton will introduce our guests afterward from our sponsors. The worst thing about the arrested devops podcast is when it ends. You're left wondering what to do next. I what are you going to listen to on your commute home. How do you occupy your time when walking the dog? What are you going to listen to during the quarterly all hands meeting but fear not dear listener owner? There is a solution you need to subscribe to software defined. Talk Right now. It's a weekly podcast. That recaps all the news. In Cloud Computing devops doc enterprise software the hosts coattail Matt Ray and Brandon. Richard will keep you up to date on all things cloud while offering tips on how to optimize your Costco Hall and how to powerpoint. It's a fun free flowing conversation that will keep you entertained and informed. What are you waiting for? Subscribe to the podcast today by visiting software defined talk DOT COM or by searching for software defined. Talk in your favorite podcast APP. Okay it is super exciting. Having to be chatting with all four authors of the forthcoming Cooper Netease best practices book from a Riley Media. I'll have them introduce themselves in say byline order Like who are you. And what makes you want to write books about Cooper Netease. I'll let's start with imprint in. If we haven't lost brandon I you know I think that I've written a few books I guess it's a legacy of once upon a time being a professor I I really like teach people I guess maybe it's a byproduct of the fact that I'm excited about people using an empowering technology and you don't teach them how to use it it happened so you know I right as I like to teach a SOM love it. Okay great any. Hey I'm at been at Microsoft for ten years actually last week and Just understanding that being able to really understand what customers are looking for what people are looking for and my knowledge and the product has really driven me to kind of spread that wealth that that knowledge to others. I've seen Dave and I both have been in the field for quite some time and seeing what can go wrong and ethically wrong and how we we can possibly get that out there for even small startups. That don't have the luxury of getting help. From companies like Microsoft to really look at the space of communities in and cloud native and get the knowledge that the big folks learn really quickly and also from bad experiences good experiences that's why I love to get this right is written. My name's Dave Struggle So I hope customers daily be successful at Cooper Netease A aw. I won't lie. I never had aspirations to really write a book But when opportunity presented itself Things things I do like to do is help break down. You know complex technologies to make more similar to help people really understand those technologies also. Also I was really glad that I did help. Write a book so it really changed my mind You know how to really help people role of technology and Lachie. Hello My name's Laughlin Evenson. I wanted to write the book specifically because I wanted a chance. Chance to give back to the community. I have lunch so much from the community of the is and people you know I remember in the early days approaching Brendan and like light. Twenty fourteen nearly twenty fifteen and he stood in the hallway and answered all my questions and that was kind of a really You know it's the community steps up answers these questions so one is. I wanted a chance to give back the other one was. I wanted to an opportunity to write a book that I wanted. So I've been through the journey of growing up in the ecosystems are I would've loved in Twenty fifteen to have this kind of Almanac of all the things that you should do. When looking at Kuban Eddie's and kind of short cutting or the decision Asian points you might have to make when building and operating community so I think this provides that kind of level of you know? Get to the problems you need to the answer really quickly. So that was my excitement with having the opportunity right. Vote Nice so For any of our listeners. If you acquire A time machine gene instead of bringing a sports ALMANAC the past. You should bring at this book and give it to Lackey Twenty fifteen action for listeners. Who all of course have to go get the book and read the Book and then get it put in your time machine? Okay so I think for Qalat of our listeners as they may be using these which is awesome and some of them are like that is all I hear about but why what even is this Cooper Nettie is is. Where did they come from Brendon? You're probably responsible for something. Why don't you give us the five second elevator pitch? What's going on with this Cooper Nettie saying and why do we best practices for it today? I think that It has moved on from being something that people heard about somewhere. Does something that everybody it is looking to implement And but I think that our experience building is managed services is that While people are are convinced by the value that the tech can bring they are sometimes. They sometimes struggle in figuring out how to accomplish the particular task that they want to do And you know as a as Dave said they We work with a bunch of people hands on but That's not super scalable. And so you know. I think that being writing down the best practices that we've seen out certain up level the knowledge collagen entire community and because it is according to it right right. We've seen lots of people themselves in their foot. It's powerful tool but it's also kind of a book so a mixture people that's why we write these things down and I think in contrast to other books that you know other people written at times. That sort of explained in general. I think what's great about this book is it's really focused around specific topics. We don't expect you necessarily to read it cover to cover. I'm but rather to dip in on a topic when you're working on say machine learning or you know you're working on another aspect of the projects and you want to read a quick summary. Like what should I do. You know what what what what should I be thinking about as I approached doing machine learning on commodities. Or what should I think about as I approach setting up a cluster for a bunch the developers So it's it's maybe more short series of short essays rather than Hold Together Book Ninety Brazen there was a need for that salting people right. So people shouldn't expect narrative flow or narrative structure necessarily but this is something where they can when I go directly to what interests them so I'm curious and maybe this is something that in a lackey and I were talking about a little bit. What makes now the right time to do? A Best Practices Book for me. It was now there's more adoption of Cubans out there in the ecosystem in the ecosystem has become More complex as a result of everybody using it so there are a bunch of tools. Have Been Eddie's itself has quite of a sprawling variety of different so when it comes to how do I solve Horsey with Kuban Eddie's for example It's nice to have that his his way to start his. How the community is is approaching? Something like policy our topic policy and give you all the pieces that constitute that specific topic. So I think we've covered in the book all those kind of touch points as you. Go on your journey from Cuban Eddie's from deploying his fellow service to doing rolling upgrades looking at policy looking at governance to looking at security security. We've kind of covered in detail in each of the chapters these pieces and I think that's what the community needs right now. What are the aspects? I need to worry about when music Kuban Eddie's throughout my journey so whether it's your first using communities or you're in a large enterprise using CANETTI's and it's like I have this new thing as Brennan in and said it should be that referenced on desk and you can get value out of it at any point but I think short-circuiting the time it takes to make a reasonable decision about data. Specific topic is why Best Practices out there now. So also we've got some miles on Kuban at ease in the real world. Dave Eddie Myself Brendan Endon. were out helping customers. And we're seeing a wide variety of questions that come in so that we've used that to guide how we wrote the book to give if people that reference material as here are the problems I'm going to need to solve his reasonable ways to solve them So I just think that maturity in the system has led into this being the right time to read the book I want to Echo Latkes the maturity is there as well in the sense that there's a lot of organizations that community at large. They're already down the path of these projects and they really need They don't want the step by step walkthrough that a lot. It's all over the place right. You see it on the Internet everywhere everywhere they need. I just need. Hey what's if I'm looking at this and I'm going to my project manager or or going into a scrum meeting. These are the few things we need to make sure that we look at while we're focusing on specific topic and that there hasn't really been a place yet for that and I think this book really hits those points in and it's just a great little SL book to say Hey I'm looking at networking op. It open and gets out will spot right there or policy as saying so definitely the velocity of of both the community and also the velocity at which communities is changing. I think what we did a really good job of as well To make sure that we kind of covered topics based on what we know about the project and what we know about how how. It's moving in the future and and try not to make it a a snapshot in time dispersion of that version of Cuban ideas but Best Best Practices that should follow through based upon Emission in the focus of what Cuba NASA's is going through a being part of the community and being part of the contributors helps to help us. aww Keep that mindset and to kind of walking's point on You know the ecosystem really growing and becoming a lot. More complex I think there's a understanding that users need to have to focus on those basic core concepts within Cooper Netease Try Right to skip over those going to Kinda over engineered their environments layer on more complex technologies but does really focused just understanding those core Kluber netease concepts said you need to learn and understand the Zang's before wearing on other technologies. I just just in hearing what everybody else said. I really liked the mix of this philosophy because people are signing onto. Hey we've gone to devops. We've gone cloud native Steve. This is a continuum journey so the the book is a great mix of philosophy. So why you would want to do this. The problems it's aiming to solve and kind of tactical acticle how you would solve them incubates as that mix that makes it a good read so it's not just how to do policy on communities it's for example. It's why would I want policy in water. My actually trying to solve as part of this in that continuum of of cloud native ecosystem and moving at work leads clout what matters. Okay so I feel like we've talked about your policy chapter a couple of times and I do want to go into detail about a few of your your your favorite chapters or at least some of the chapters because you wrote several chapters in this book and We can't of course give our listeners of thorough overview review of every single topic. I have a pdf in front of me. It's two hundred fifty eight pages so let's Let's talk about your policy. Chapter specifically like what what stands out for their. What should you kind of tell people that might make them say? Oh that's for me. Well for for starters. It's chapter eleven and everybody loves hearing chapter eleven for anything so that's a great stuff but I think in the ecosystem as a lot more enterprises coming on in looking communities is a platform to move or that workloads they have this outstanding question of. How do I deal with policy and governance and what this really is about is how do I make sure that the workloads that I deploy conformant to some policy that we've set up whether you're in a heavily regulated environment or you just want to understand how the things configured so that such a new concept in the ecosystem because obviously policy isn't the first thing you address when building something like Kuban Eddie's but now we just see a really uptick in people asking about. How do I deal with policy? So it was fun to write that chapter from my perspective to give people an overview of where the ecosystem is at whether tooling is at and the things that you might be able to achieve with policy so that was my favorite chapter to deride although it was fun too fun to write them all but I think this is a really really selling question in the community right now which is how do I actually actually get control of my workloads and make sure that they are compliant. So I hope everybody appreciates the context. It's given that chapter and I think that's turn tastic too because if you think about it if there is anything that enterprises have actual customers and money care about. Is They care about making sure that everything that they're trying to control does get control in the expected way so it's nice to see the community's community actually focusing on that I know there is that the open source project gatekeeper in that space And I think that's one of the examples that you go through in chapter if I'm not incorrect. Yeah that's correct and the great thing about something like gatekeepers it's built on top of a PA which is open policy agenda in the cloud native ecosystem so we have a generic policy control or a policy engine that people can use and things like gay keep a make a Kuban at his native implementation of A. Pi said. There's a lot of why policy can be expressed. But I think you know the the high level philosophy of how you might want to achieve policy on Cuban Eddie's illustrated and then in this kind of tactical how could actually do this today using some open source. tooling ecosystem are eight and so at other chapters. That people were interested interested in highlighting. Dave tell us about your favorite. My favorite was resource management. It doesn't sound really exciting exciting at all but it's something that I see users struggle with a lot of the users I work went on it has a lot of impacts on other things besides just running the actual workout but it also has a lot of impact on things like scaling with Anne. Cooper net is so so I really liked the chapter because I think I learned a lot myself from it Things I didn't know around technology. Oh Jeez that were in cooper. Netease that You should really think about more when deployment workload so all excited about that Chapter Eban though it doesn't sound exciting but it really is coordinating that you have to really understand for running workloads. It actually sound on sort of extending from the perspective of your call for production possibly resource you know overruns are what's going to pay you so people love the idea of having having happy well-controlled so they don't end up with Gosh wait. We had to save memory for the control plane interesting. I feel like there's there's some communities has some guardrails in there for that right or is that kind of Senate how you like. Yeah there's different guardrails and Booker Kinda hits on those best practices around setting up things like request and women have a huge impact on how you run workloads. How your workloads loads are going to behave when you run out a capacity lead? You're saying that the cloud is not infinite Tristan. I I wish I had this chapter back in Twenty fifteen because I would say that most of the outages I was paid for in the early days I think most people trip across on resource management in the early days as that cluster gets to eighty ninety one hundred percent and it's had a you deterministic. -Ly understand what's what's going to happen when things start around hot is really great context for you to be able to put Guide rails in place very early rather than having that three. Am Page which you know. The cluster is going into cascading value. Is something that I've personally seen happen and resource management would have been a great chapter for me to you get started back in my early days communities any Tulsa about the chapter. You found the most interesting to write not only interesting but the the most challenging because I tried to have had to fit a lot of stuff into a concise format in chapter nine which covers networking network security. End Up the new buzzword service Meshes And I think the reason I like it a lot because it is the. It's the foundation right without getting this right without getting networking correct Architecture begins to fail right away and we especially when we start looking at hybrid platforms when we start looking not cloud based platforms and integrating it with very complex enterprise systems In Complex Win Systems. Little tiny things DNS The move from Cuba from Cuba the sky DNS to coordinate and how. That's configure Gino that blue people's minds the first day and like Oh this work before Ford. Now it's not working and Little tiny things in your part especially in the enterprise space. We saw Cuban as had this kind of like huge like Just our playing with it and now we got we got to see it in production right and in some cases like customer. I'm working with a lot of divisions just decided to put it in and like not not even tell anybody and then come back later securities like what is going on here and what did you do right. And that's like all my controls gone. What do you mean why is this not being seen by our central security stack and How do I get those and that has all become a kind of In some enterprises surprises come full steam ahead and saying we need to kind of treat this as if it's just another node on our network but when it comes to the Cuban airspace there these things in new paradigm that we have to understand right. North South traffic is now handled. Potentially inside not a device me You know we talk specifically around network policy agents agents in integrating with your CNI some best practices around choosing those those tools and and then once you have that foundation everything looks Hunky Dory Now this immediately the first next conversation as well someone told me Anita Service Mish for my mismatch because it's a floor wax Santa Desert happening exactly. It's it's the you know. What's the office supply company? That had that little easy button right I I. I don't WanNa have to rewrite my app but I want observability. I won Security in want policy. I just want to press this little. And what they're finding out unfortunately it's not so easy Thousands of little buttons that you have oppressed in the right combination button reality is it's not that hard when you start to and I and again I try because service methodist new and I did not want to make this point in time at consideration we talked I talked talking. More about Generalities generalities of everybody agrees. That service meshes should do certain things correctly right and and If we decipher that break those down down into what those things are and deciding why you need those things in what the priority is For your application stack an picking your vendor from there and then I talk a little bit the SMS back which which kind of helps level that right in in. It's that idea of. Let's get a combination against those common things that all service message should do uh-huh and as a service Mesh creators start writing modules eight. Say Hey if I meet this I know that I'll have this available to everybody at this level and then I can add my value adds from that right and I ca- kind of cover that and want to make sure people are aware of of what to look for. When they're you're trying to decide Something as Critic Service Mesh for their your technologies so brandon. I know you don't want to pick a favorite child but if you WANNA give us your thoughts now now that you've completed this book and the parts of it that you guided lead what stands out for you. What should people definitely? I'll flip to that page. Read in the first chapter Is Sort of an intro. Like how do I even just lay out a service and so I think some people may come at the book having already done a bunch of the basics expense. Like if you come at it and you want to you know you've learned all these communities objects and then you're like wait. Okay how do I actually put it all together. That after the great one of us make sure people started on the right right-foot I actually like the one that follows that. Probably the the best which is about how you set up developer and developer? We're we're close on the cluster. Because I think sometimes they attack in and the operators love it. It's great delivery but we've made the developers lost control. But I think there's a lot that you could actually use Brunetti said and not just the operations in running the application better actually make a developing applications easier as well talks about you know how could partition With names faces in how you can take look at The ways in which is sort of stuck moments in using it after. There's here's the act where you've just hired a new developer communities and let's encore set up the environment the news On there is ask That people can accidentally stores with Iraq and things like that there is providing sort of cluster through levels services. People's they don't have to learn about log log is just there to learn about monitoring monitoring. It's just air and things around announced the testing and debugging that are critical close. That changed a little bit. We're not just doing your development early locally on your machine in but you're using this cloud based resource or you're using cluster based resource testing and debugging are different and. It's important that we make sure that that's easy for people to I think especially with testing testing because if it's not easy people will do less of it and then you ship bogere software so it's really a critical component on. It of success is actually getting the environment setup before you ship a software place where you build the software And I think it can be sort of an under looks or an underappreciated part of the thing we think a lot about production but what goes into production comes from a place where we develop in the place where we test so. That's that's a chapter that I like. A what every chapter every now. That's awesome that's a really good point. Actually because of course my mentality is focused on production but if people who are creating the production experiences don't have good on boarding and don't have good reproducible environments than how are they going to produce good production experiences. Slight that's I like that. The idea of making Kuban ities workflows for developers. Whether whether it's shirt clusters or you know stanchion at will or whatever it is that they're doing making that more reproducible. Making that easier is really valuable. Label naive I remember reading that section when I was originally reading through the book and thinking. Oh there's a lot of good ideas here and maybe the hard question is this. How do you get organizations to use them? Like what those of you who are out there dealing with people in the field like do do mostly we see people doing the stuff that you're recommending here. While actually I mean I think I think we wrote the book. I wrote the book because I think people want to right. I don't think that I don't think that they're like no. We really want to provide a horrible developer experience or would want to slow down our developers or wanted to bad a job with service Mesh I think there's a lot of desire but but there's just not the recipes just worth aren't there necessarily. I agree one hundred percent with Brennan on that. I mean the the customer. I'm focused on right now is is going through that journey. And it's it's it's it's a twofold it's like hey we get to reset some of this bad stuff that we did before because of institutional just processes that were put in place long before ninety eight percent of the developers that are working there now Are there and and they get kind of do a quick reset and say now. We're going to this new club native world. And and how do we you get to be. How do we have this reproducible? Process right even created their own little new division called They basically they're looking at patterns practices but there is you advocated patterns division and that whole job is just to say how do we put patterns together for our developers are obstacles to say. Our new world is all Cuban. Eddie's it's all cloud native how do we make it so it's real easy to onboard. These folks they don't have to change the way they do things and and we automate reading the beauty of devops right. And how do we automate made everything get it working. And that's the goal of just this whole new team within the within the company itself. Really great point here too like Because those workflows have to change in one of the things I found interesting working with a lot of customers is. There's a big cultural impact to all that has to change a lot of to be successful. COOBER netease all just found that to really interested in how the culture how you work on needs to change too when you start adopting tube. Eddie's in how is that different from any of the other patterns that they were using with their conflict management or their. VM's would offer. Yeah I think the biggest thing is now you leave a lot of control up to Eddie's things that you had very tight control over sure It's no different than kind of adopting devops culture. A lot of those things are going to have to change And just how you operate and the controls handover to developers to kind of empower them and make them. I think the only thing I'd like to add is there's no right answer about the best way to do this. But there's a set of tools and techniques that we've all seen that have worked in different kinds of ecosystems and environments so this book goes goes into providing those set of blueprints which helped shape the decisions. You need to make when adopting this kind of technology or making cultural changes which I think is incredibly valuable without this your left scrounging around. What is the best way to do push an APP and deploy and develop his onto things like Cuban at ease? Oh do network policy. So this scopes down the touch points you would have to consider Based on things that we've actually seen out Out there in the wild working different customers or community members. Okay so we're a little short on time so what I would love to do is get in each of you to give us your best advice for best practices like you wrote a book all about Best Practices. And you know. I don't know if you WANNA touch Huchon depending on context or how people evaluate and make the decision But like I'll just you know I don't know take you in in a reverse byline order. Let's let's start with A. Let's start with lucky and say what's your best advice to people who are trying to have their recoup netties best practices and and read it to. Yes I for me. When I was out actually responsible for platform that was built on Kuban eighties? It was coming to the understanding that this is a journey that will take you know it will keep going. It's not destination that will never be a point where I look at the whole system and and say it is perfect. I not only no longer need to touch it so always adjusting and re evaluating the decisions you make So Best Practices for me is around around guiding you to make a decision because I also see a lot of people out there who are paralyzed by indecisiveness because this so many complex things that they can't grow all at the same meantime so taking these best practices and actually making a decision and moving for what I think is one thing that would allow people that. Read this book to to be able to get out of it. The other thing is just you should be able to no matter. What stage of Kuban Eddie's you'll journey is with Kubis reference this book throughout time and and get something else out of it and root continually re evaluate the things in the choices that you've made and at and make constant adjustments best practices is something that you can always refer to and it always gives you? You know those nuggets of wisdom that you can take and course correct change out there and you're running environment's excellent and put a paper copy in your time machine delorean GonNa put it in your Delorean and make sure if doesn't steal it from Ya. Yeah that's right. Can you imagine we thought the sports ALMANAC was valuable. But if somebody had this book right Dave Yes. The biggest thing I always dress is really have to walk before you run with Cougar. Netease meaning that you really need to focus on those core core concepts and Cooper Netease in the Gore construct that are available to you in Coburn. Hbo Layering on a lot of complex technologies. Because I think we over complicate our environment. I don't think Cuban as has to be complicated. I think a lot of times we make it over engineered and complicated because we have great trolls. Israel's like helm were I tend give one command and have a service Mesh up and running but there's implications to you know these new complex technology so really focus on those core capabilities. That are built into Cooper Netease. Didn't really get it. Those and then editor eight and added some more. These really useful technologies like service. Meshes love Dave. I hear you killing my service. Mesh Dreams or at least saying Dan. You can't start with that. Maybe start with a little bit more basic before you jumped into really advanced. It may not be a great starting point for day one. Get good at those Zang. Thanks Lloyd Network Security Resource Management Policy. Although things really important to be successful coober Eddie. What do you got for us? I think obviously echoing Laki Dave's sentiments Definitely but I think another her. Another one is When you coach especially around cloud native communities if you're coming from Along background in other ways of doing things especially early on premises old kind of server VM. Kind of processes that you did before Keep an open mind. It may the best practices you had there. They no longer fit today And really try to keep that open mind about how things work especially when we are I think Dave said it perfectly. You know giving a lot of control over to this news thing. That's that's now kind of this data center operating system that's managing our systems and beat be open to just a new way of doing things because I think The biggest challenge that David and I have in the field is the the bias. I have no. That's not. How did things? That's not. How do you think it's like okay? But this is how you're going to have to And then started wearing like. Let's start from the simple. Let's work our way up to the complex was mentioned all right brandon. Tell us what you got a lot of. It's been covered I think my my my biggest thing just to echo is to make sure that you understand why you're making every decision a lawyer. Opting every technology even cooper netease itself. There should be a reason why you're doing it. Not just because like it was in a you know the CIO or CEO Magazine. That everybody needs a Cuban any strategy. Like you need to understand the system well enough before you make before you even start to understand why you think it's valuable to you. What is the pain? You're trying to improve. John and I would say that for every single technology. I mean I think you know a lot of times people say well but I mean in order these communities I need to have XYZ he also and you're like well you know actually like why people who are hosting simple websites behind ingress and a service and service. Meshing your that you needed all that stuff in the end remembering that that all adds operational complexity especially because I think makes things real easy makes it super easy to deploy a deployed But it actually doesn't help you understand the system and I think it used to be the systems were started personally hard to deploy and so you'll learn the long way and I think one of the dangers tremendous brunettes. That's is that some things that are actually very hard to manage in production over time are very easy to get started with and people that assume that that he's GonNa continue throughout life cycle of the tech and it's just not the case saying Kuban is alive lives. It's a it's a living thing and not a not a static thing so like it it it goes to block. He's being journey like it could turn on you at any moment right. Just because it's working day doesn't mean that you're not going to have to investigate something in change in something tomorrow. I think that's a little different than you know something that's been statically deployed to a VM. Under somebody's desk in running for years so I think that's an adjustment for people to do. It almost seems like we have the Admiral Akbar principle there like. It's a trap if you think that it's going to be easy but at at least with his best practices book it'll be easier okay. So let's let's bring it on home. I community events stuff. Where can our listeners? Catch APP with you folks so I will be at all things open talking about non code or non code contributions to endorse And also Yukon Con North America again talking about Nine Co contributions stuck Cooper metis. Thought I'm going to be at the Cooper Netease meet up in a Heidelberg Germany coming up in a couple of weeks I'll also be at Microsoft. Ignite in Orlando. If you WANNA go to Harry Potter Better World or Disneyland or Disney world. Yeah no doubt right and aaliyah cube comments will al Q- North America and you can also so obviously obviously always hit me up on twitter. Brendan Deep Burns albeit my next event I guess is a meet up here in Austin so I could be Austin committees committees. Meet up So we have our October twenty fourth up. So if you're in the Austin area please come by and I will also be coupon contributor summit in San Diego November In I may be at night. I'm still deciding on paternity leave so I don't know what's going on with my Schedule just yet but I'm back at work on the fourteenth so I'll know more than I will be at coupon North America in San Diego November as well as so come and say hi. I'd love to chat about old things. Best Practices or whatever. You Janis. Aaron Kuban Eddie's Come up and make yourself known you can. Also I also have a youtube channel where I do O s s on boxing's to help people understand all the tools out there what they might do. So that's a fun way to Keep up tonight and you can always pay me on twitter as well said look forward to seeing everybody in Hueco North America. Awesome and I will be. Let's see I'll be at twin cities startup week in a couple of weeks and then upstate's Philly devout stays gin tin velocity Berlin and Coop Concourse and in terms terms of Something Fun to check out. That totally isn't tat a theoretically today. They're out for delivery. Joe And I are getting some van. mathie bikes aches So I am very very excited about those and apparently the correct number of bicycles to have in your life is an plus one so oh this may not be the end to everything but we. We now have six bicycles while when those couple of six bicycles to for me for for him so so yeah in we also have links in the show notes to all things people talked about and the open. Sea of fees for devops DOC stays Devastating daughter and velocity Berlin. There's as well as many devops as the discount code. Eighty Twenty nineteen. We'll give you discounts twenty percents off in any cases. If you head over to wristed ups dot com slash Cooper Netease Dash Best Dash Practices. You'll have this episodes show notes. visit arrested devops dot com slash Item store you on help. Other people find the podcast that is apparently a thing that exists in the world. I have no idea how that works. we're also apparently on spotify in her radio. Now if you're into those systems so much to Brenton Dave lucky for joining today. Thanks it to be your. I'm bridget epic from how this is arrested. devops remember. There's always devops Banana Stan Yeah.

Aaron Kuban Eddie Cooper Netease Brenton Dave Kuban Microsoft Netease Cuba Kuban Eddie developer cooper Brennan Cooper Nettie Hatton Brendan brandon Costco Hall Richard twitter
Brigade With Kent Rancourt

Arrested DevOps

34:41 min | Last month

Brigade With Kent Rancourt

"We weren't going to be talking about cooper netease and yet we've said kuban eddie's so many times it's time for of thoughts the podcast that helps you achieve understanding develop good practices and operate your team in organization for maximum devops awesomeness. I'm bridget crime. Help with a great show for you today but first a word from our sponsors rudely helps engineers manage incidents directly from slack without ever needing to leave the tool they handle all the boring and tedious manual work during instance like creating channels looping in the right people and acting as your scribe to document that ever important timeline. Companies from twenty to two thousand manage hundreds of incidents daily unruly. It's super simple and easy to use. You can install in five minutes or less visit rudely dot. io to learn more and mentioned arrested devops for one thousand dollars off when you book a demo cruise. the all in one cloud security platform for developers the automate and embed security throughout the entire development. life cycle. You can streamline your dev ops tool train into one solution by integrating infrastructure as code security and compliance into your version control systems cic the pipelines bridge crew empowers you to find fix and prevent cloud miskin configured faster get started with bridge crew for free at arrested devops dot com slash bridge crew the role of a developer or engineer has evolved into a security first mindset the ability to confidently build and deliver your software assets across the globe while also avoiding supply chain. Threats is a priority for organizations to remain successful cloud smith is software supply chain management for modern devops practices. They provide a single source of truth for also offer assets while integrating with package formats. Your team is used to with a focus on securing. Your software supply chain cloud smith is truly at the heart of your devops ecosystem to learn more and receive a firsthand. Look at their solution. Please visit arrested. Devops dot com slash cloud smith for once. I'm not bringing you an episode on cooper notice or at least not just coober. Nice actually to tell her listeners. A little bit about yourself and what this. Not just hooper. nettie story is about well. Sure bridget thank you for having me a little bit about myself I'm kent ran court a senior engineer at microsoft and Some other things that i do. I'm a dad martial arts instructor a comic book nerd and a lego maniac. There's a lad there that we're gonna need to unpack like for example. Why is lego. Not pluralist legos. Assist some sort of european thing or what. i want. wanna talk about lego tm. I'm doing it wrong. Yeah i feel like anytime. I talk to somebody about lego. They offer me that factoid him as a hey to know that. The plural of of lego is is lego. And i've heard it so many times now like i just don't understand. Why is this like a collective noun thing are like what is is like. Oh swedish I can't remember. I can't remember now anyway. Finland maybe maybe they have they have a big headquarters right right near me here in connecticut okay. So yes you're connecticut-based and you have opinions about the cooper netease adjacent ecosystem and what is this sub brigade project that we're gonna talk about here. Yes so we're gonna talk about brigade and i'll give you a little background on on brigade so The tagline for the project is event driven. Scripting for kuban netease so. We said that we weren't talking about who bernice today And and will you know will get into why. This is not just coober nineties but little background on on how the project started. I think it's actually very Interesting story so i came from a startup called das. Which which some of your listeners may have heard of in we were acquired by microsoft back in. I don't know it must have been twenty seventeen and one thing that we used to do. At at davis. We had our annual off site because we were largely remote in very distributed in once a year we would get everybody together. and something. We did a couple of times two consecutive years. We did this kind of Shark tank exercise. Where you know. I it was like a hack ifan but you didn't really have to produce anything more than an idea. And like maybe a very very simple low fidelity proof of concept so so two consecutive years. We we did this exercise and the first year we did it. The winner was held And the second year that we did it. The winner was brigade so Helm and brigade both came out of the same process At dais and and the process was kind of making this comparison of kuban eddie's to an operating system which is kind of a popular ish comparison to make. Because if you stop and ask what is operating system It's it's the program that once it's loaded into memory You don't manages all the other programs all the other processes and coober nettie is very much the same thing it's what you put on your cluster to manage the rest of the cluster. So with with this comparison between an operating system and and kuban eddie's we started asking ourselves You know what are some things that you know a traditional system that you're used to you know mac. Os or lennox or windows What are some features or amenities that you have there. That don't exist yet in kuban eddie's now can we close that gap so with helm. The gap they were closing was welcome. Remedies doesn't have a package manager yet and with brigade the gap that we were posing was that cooper netease did not yet have some kind of scripting environment quick sidebar because you mentioned the hell origin story and i happen to know that it had a very different name at the outset. It was called kits place. And that's about the ended up getting the helm name some curious if you if you ever had a different name for brigade or other alternately. Maybe at the end of this puddle ask you what would the quirky alternate named. He could think i'm not. i'm not. I'm not aware of their ever. Having been in alternate name for there there was no name attached to it when it was pitched But i've always thought that with cooper netease having you know not league themed names for everything. I've always thought the armada would have been more appropriate than than brigade. But you know we've got enough name recognition. Now that i think we don't wanna go mess with at all. Yeah totally but just kind of funny to think. I think for our listeners. Who are familiar with cooper netease. They're very familiar with this control loop structure. She's like hey get desire state. We're gonna get to the desired state. Great we've defined everything so much Got drowning in yambol now. We have everything we want. Also yemo how is the brigade experience different from what people would have already been doing. Like how does like you've given us the concept of what it is but like how does it look. What is it like the thing to emphasize with brigade is that it's event driven so it's not just a scripting platform and it's it's an event driven scripting platform and and probably going back to the operating system comparison Probably the best comparison. That could be made. I think is i. I don't personally use it. But i think it's a Apple script or action script or or something like that in november. Because that's very that's very event driven out. It's like you know when when you make this gesture. Run the scripts. You know that sort of thing. That's the thing to emphasize with brigade is that it's it's very event driven in nature. You know it's a little bit different from what people may be. Used to with with cooper nettie. Because it's it's not simply declarative lee. Saying hey Do this and go reconcile it. It's a very different model. It's a model where something happened now. I've got to do something and event occurred and now i have to have a vet. He and i think that definitely meets a need. That people who have started using crew netease might be saints themselves. Who i was. Try to hack something together to accomplish that. I need to do that. And of course the community's ecosystem being as it is there are playing projects out there. That use tom overlap of the sort of thing if somebody is evaluating brigade or just figuring out they need to do something with events. Can you give us a few of the. The highlights were Decision points that might help someone understand if brigade is right for them. What i find it's very useful for it is for Just you know doing work in the background. It's senior it's they are your minions right. There is a little bit of a misconception. That brigade is a ci cd platform in. It's not but it happens to do. See i cd very well. So i'm gonna use that as as a convenient example because that's that's actually the way that we the brigade team actually use it quite extensively is re actually have it tied into into Get up so whenever we open for instance of opole request. That's an event. That's something that we can respond to And so when somebody opens a poll request we actually run our building all of our tests and everything and and Sent results back to get up. So that's one example of the the sorts of things that you can do with That's really interesting. Because one of the reasons for chinese because you have brigade the to coming up. So i'm immediately thinking. Oh cool all right. You're drinking your champagne will go with that and said the dog food went on what you're if you're testing brigade with brigade but you're also in the process of building a v. Two does that mean. You're testing brigade original recipe with brigade original recipe or your testing everything. Forgive you to kinda give us a sense of what's going on here with your with your step into v. to an even why is there a fee to that is different so i'm going to answer the last question. I why is there a v. To now i guess is the the point where we address the facts that we said we weren't going to just keep talking about cooper netease and yet we've said cooper eddie's so many times this conversation so so so that's why there is a v. Two that we're working on is v. One was very good. But i would still call it a minimal viable product and it was very very lightweight. Lightweight is good. Lightweight is good but there there were some consequences that that came along with how lightweight it was so there was for instance no. Api all of the persistence in anything that got stored anywhere. It was just a cooper nettie secret. Now how we're kubrick secrets created well And when i say everything was unity. Secret i mean every event. That brigade responded to was represented as cooper nettie secret. How did those secrets. Get into your cluster. Well either it was created by a user typing from the command line. Hey here's an event. Go do something Or it came in through a gateway gateway kind of bridges the gap between external systems like get hub and brigade. So how did they get their secrets. Into your cooper netties cluster. Will they were just talking directly to coober netease. There was no gate. Api now the consequences of that are that you bridget. If you you were using brigade one you had to have credentials for the cooper netease cluster. You had to have direct. Access to kuban eddie's cluster. Now if i'm the operator who owns that cluster. I'm not giving you that less guy. Unless i know that you're competent. Kuban eddie's user so the barriers there that don't need to be there necessary. Yeah it was a very high bar Very high barrier to entry. Because you kind of had to be competent with coup brunetti to get some value out of brigade but we knew that it didn't have to be that way. We knew the people who really didn't know much about kuban eddie's or maybe knew about kuban eddie's but just like nobody was forking over the credentials to cluster You know we knew that those people could get value out of brigade and so something we set out to do with v. Two was too abstract kuwaitis away from the end user to the greatest extent. Possible so that we lowered that barrier to entry. And what what i like to say is that we have gone through the subtle. Transformation of being an event driven scripting platform for kuban eddie's to being event driven scripting parentheses four Eddie's so cooper netease is implementation detail now and we use it to get the work done. And that's it so that's so that's why you teased at the beginning that we were not just going to talk about kuban eddie's today because we've we've tried our hardest to make fade into the background and make it not something that the end user has to deal with in this of course opens up so many questions about like If cooper netease is a detail does that mean that brigade Can be used. Or maybe in the future will be used on different orchestrators. Or in a cooper netease lists environment hoover firemen. If that's how you feel that's that's a great question. There are no plans to do that at the moment but we are architects such that. That is a possibility in the future if it becomes relevant. Don't get me wrong. I love cooper netease. But i don't want to make the assumption that cooper netease is going to be around forever. I don't want to make the assumption that something better will never ever ever come along. And i would like for the projects to be able Brigade that is to be able to continue on into the future even if five years from now there's a much more popular orchestrator. Here's also of course spaces slake functions etc. That people may be trying to figure out how do traditional the traditional by now orchestrators work versus you know Functions versus any other thing that people come up with people. Come up with a lot of things and so. I'm curious as you're as you're making brigade as Decoupled and future profess. You can't make sure it needs. Its goals like what's that some big changes and i want to kind of dive into what surprised you Of in this is you're doing a major version. Like with kind of a bit of not. I wouldn't say pivot but at least a rev of the approach and i feel like a lot of our audience might have either lived through the fallout of or tried to make some of those decisions. You talk a little bit about. How do you even decide to do that. And what what challenges does bring. Yeah so that that's an interesting question that there's actually a a very very lengthy. I think it's probably twenty pages or so Proposal that we put before the brigade community before we actually started to undertake the v. two development effort and it articulated you know in nauseating detail All of the things that were not optimal about the one that we couldn't fix without breaking changes. So there's basically a twenty page document Justify justifying why we were going to do all of the things that we were going to do. And and and abstracting cooper netease was certainly not the it was certainly not the only thing on that list but a lot of the cheek is can be traced back to how tightly coupled were to coober native preview. I'm curious in the cooper daddy's world. They call it a cup in the brigade. Is it a bap. Feel like you would need a cute cat mascot if that were the case oh yeah no we. We are pretty light on process at at this stage compared to coober netease or or hell hell does those now right. It's is a hip actually down the hip Yeah we're we're pretty light on processes And that's not to say that we. Aren't you know very diligent and disciplined about how we do things we certainly are. But i don't. I don't think we are at the point as a projects where we need to do that for everything But you know it was worthwhile to do it when we were talking about such major changes as we were because it's because really be too. It was a complete rewrite it. We started from scratch when you mentioned that. You're not at the place where you're going to institute a lot of heavyweight process but at the same time you of course do want to grow the community. He'd hug a little bit about that. Like how does community involvement in an open source project. That people might stumble upon realized that solves problems for them. Start using But what what does that leap to amd. Become involved look like you did ask previously about what were some of the things that surprised me and community. Things surprised to be honest at some in some of it is is. I would say unfortunate. I think that there we had a relatively vibrant community Going into the two And you know. Certainly not to the extent that cooper Itself is you know if we have brigade con. we're not gonna have fifty thousand attendees but you know we. We had a fairly vibrant community. And i think my observation. I think something that happened is people became aware that we were making this major shift that we were doing the to and i it seems to me that people have community members have been content to kind of step back and let the maintainers handle it. And that kind of you know gives me. This adds a little bit. You know. I want i want to. I want to bring our community. You know back together and start rebuilding on top of this better platform that we have now and i think the the project has reached a point You know we are in beta right now and we have a lot of different integrations that we've built as well which are also in beta or some are in alpha but we we have reached a point where where things are pretty stable and we really think sade for people who might have been intimidated by the large pivot that was going on. We think safe now for them to come back and get involved in. I really want people to get involved because we we have something very proud. We've made it much easier to build integrations with brigade. It was very tough to to write new gateways and things you know. Sources of events It was tough to write those for for v. one For v two. We have such a rich and powerful. Api and language bindings for those api's sobe currently support go and java script type script and we have rust rust sdk. That's currently in the works. So we we. We have made it easy for people to come along and with just a small amount of code right interesting things that integrate will integrate brigade into different sorts of workflows. It's a great time right now for you know community members to get involved or or get involved again Start building some cool stuff. I'll say that that we have some awesome new swag that we're working on. It would be really happy to set that aside for anybody who wants to help us kick the tires or anybody who wants to contribute some interesting integration. Now that is exciting. Because i gotta tell you this. Several year period of our apocalypse the The swag count has been weighed down so does not. So i'm curious. I know that Brigade will have At least i thank brigade will have a virtual office our during the upcoming kuptana with america. And i think that you have project meetings. Maybe not so that i would love to hear if people want to get involved. Which avenues are there on ramp to get to know where in the code base or wearing the project or like. Where should someone start if they are like. Maybe i did kick the tires on this. I wanna use it now. I'm confused. I don't know if it should open plora class because i guess there's up different fee to branch somewhere where recommend people get involved. Oh highlight some of the ways that people can get involved We have a slack. Channel on the cooper netease slack. So there's a brigade channel over there. I hang out there and and so do other maintainers so it would be great for people to show up there and ask us questions or show us something cool that they did. And of course find us on get hub And and i know you'll put all these links in your your show notes The one thing that i do want people to know about finding us on get hub and you kind of said it already. Is that the v. To work is happening in avi to branch. So if you do come and find us on get hub just know that. V to work is progressing in v. Two branch So just make sure you're looking at the right thing. From the point of view of your cnc have project and you're not gonna any specific release schedule. But maybe i can have breaking. News on the podcast. Do you have any word for us about. When does fee to come out when you have Exciting cnc next steps like what else is coming down. Yeah we don't. We don't have a specific release date yet I i feel like it's like all my favorite tv shows. I you know periodically gone. I google when is season four of xyz coming out. And i always see the disappointing news that no date is set yet. Yeah we we don't have an exact date yet but it's it's it's definitely happening in q. Four of this year. We're pretty we're pretty stable at this point. And we are just continuing to work on stability and integrations featured development is is pretty much done We are avoiding breaking changes at all costs at this stage. It's kind of like coober nettie when something becomes beta we we start treating it as we. We start being concerned with backwards compatibility at that point. But you're not gonna make any would fill out of production readiness review and also. I think it helps you know. Because i did say that. We're light on process for the time being. We are a cnc f- sandbox project at this point we we would of course like to graduate to The incubator at some point. But you know that besides being contingent on the to going today I think we do need to do a lot more community building To make that happen and Something something else l. Mention because i i am making a plea here for for people to come and play with us right. You know one thing. I'll also mention is that we're looking to diversify the ranks of our maintainers You know not just in in terms of gender or race or anything like that. I mean certainly those but also just in terms of what companies are maintainers represent so at present most of our maintainers are employed by microsoft We have a few that are not But they're also not particularly active currently so we are looking to grow the roster of maintainers so getting involved right now is a a really good way to get your foot in the door. If you know you wanna get that onto your cv that you're you're the main -tainer of cnc f project. So you know like. I said come come play with us we would. We would love to have people come and Contribute ask questions make feature requests you know and and it's not just coating either You know anybody who can You know help us grow the community That's that's a skill set in. You know frankly one. The i don't have your very good at things. Like that my. Em karen is very good at things like that and there's a lot of ways that you can that people can contribute to a project without writing code So whatever people can contribute. We would love to have them a suspect that you're and search you. Hey how can people learn. More about brigade is gonna come right back to bat. Yeah absolutely i mean besides besides our blog and besides our documentation and you know i already mentioned get hub and and slack but but yeah just come. Come find us. Come talk to us. We would love that we would love to talk about brigade just to make sure we cover it because i know we talked about it a little bit before. But like if there's a pattern that you mentioned earlier that people might find interesting or wanna learn from about using brigade to build brigade q. Talk a little bit about where people would look in your. Get hub to see how that works really. If if you look. I think most of our projects at this point most of the repositories under The are brigade corps. Get hub organization. I think in most of those you would find a da brigade folder. And in there you will see Project definitions in scripts that we have used for the sake of building those projects using brigade to see you mentioned that of course will put links in the show notes to the get organization and the repos i is there a specific repo repos. I know there's a lot of reports under the brigade court organization. Is there a specific. One that you'd highlight. The people should look at for the best place to jump in. Oh sure you know Probably the the just plain brigade just brigade is is probably the. That's the main repo. That's where where you know. Most of the magic happens but We do have a lot of integrations that we've developed mostly in the in the form of gateways So again a gateway the role of the gateways to bridge You know the the gaff. The bridge the divide between Brigade and some upstream source of events. So we've we've created gateways for a get hub and bit bucket and docker hub in azure container registry in cloud events and. I'm working on one for slack. Currently do one for teams So we have a repository for each of those as well in each of those is in and of itself an interesting project to look at a yes so as somebody already has an interest in or is trying to do their own integrations with one of those then diving into the code base. For one of these specific integrations would be interesting and valuable. Yeah most of them are pretty simple too. So you know by all means if you want go clone one of our gateways and slice and dice and do what you want and make it work with another system and You know we would. We would love to have people do stuff like that all right so that's a roadmap for people if you've got a become a contributor Take over the world possibly up with the process of taking abrogate to incubating in. Cf anything else. I don't think so. I you know i'm just really super excited about the the work that we've been doing at. I'm really super excited to to you know. Start growing growing this community in. Having you know more people come up to play and have fun with this platform. and Yeah i'm just. I'm just super excited for all of this. It's a really good time to be working on this project can else. Would you recommend people go to follow. You worked catch up on the project if they want to dive in now that they're all fired up and excited yeah Definitely get hub in slack are the best places to find us. I would love if people wanted to go star and watch our repositories That that would be great so likened scrapped mash that button but yeah really open. This is an open invitation to come find myself and the other maintainers wherever we may be You know find us talk to us because you know would love to hear from you. Well that does it for our time so head over to arrested devops dot com slash brigade for this episode. Show notes and contact info for. Can't visit arrested devops dot com slash students. Leave us review in the tune store. If you want to help other people find the podcast through some sort of algorithm mc magic. We're also on spotify in iheartradio if you're into those systems. Thank you so much for joining the podcast today. Thank you bridget. This has been wonderful. I'm bridgette bridgette This is arrested. Devops and remember. There's always devops in the banana stand.

cooper netease kuban eddie cooper nettie bridget smith yambol connecticut cooper eddie microsoft Kuban eddie nettie cooper bernice hooper Helm Finland brunetti kent davis
Kubernetes & the Future

Arrested DevOps

44:50 min | 2 years ago

Kubernetes & the Future

"Aw Man all over the world using cover bettys and look I can deploy right in front of you right now and the year like. Oh man maybe I need some Kuwaitis. It's time for arrested devops. podcast where we help you achieve understanding develop good practices in operate your team Organization for maximum devops awesomeness. I'm bridget creme hot and introduce today's super dope guest afterward. I the worst thing about the arrested devops. PODCAST is when it ends and you're left wondering what to do next. What are you going to listen to on your commute home? How do you occupy your time when walking the dog? What are you going to listen to during the quarterly? The all hands meeting but fear not dear listener. There is a solution you need to subscribe to software defined. Talk Right now. It's a weekly podcast. That recaps all the news in cloud computing devops enterprise software the hosts coattail Matt Ray and Brandon. Richard will keep you up to date on. All things cloud wild while offering tips on how to optimize your Costco Hall and how to powerpoint. It's a fun free flowing conversation that will keep you entertained and informed. What are you waiting for? Subscribe to the podcast today by visiting software defined talk DOT COM or by searching for software defined. Talk in your favorite podcast apple so exciting to be chatting with Kelsey. Hi Tyler healthy. You've been on the show before but for new listeners. Can you introduce yourself. Hey I'm I'm Kelsey. HIGHTOWER are referred to myself as a minimalist. Keep a real short. I'm the type of person who enjoys learning in public and helping other people do the safe. It's a love it and when people are wanting to have conversations with you I feel like twitter. Some places says you having opinions and you recently had what I thought was a pretty interesting opinion. Just about the future of Cooper Netease and like just to set the stage for. Why why? We're having this podcast today. Is You told the twitter she wanted to come on. PODCAST TACO Cooper Netease is going or what comes after it or you know the future and I love this prognostication like lean on US Kelsey. What's happening now? It's funny as it is not the future of the now right because if I if I could predict the future here than they need to add another zero To the pink. I think we're Cuban as is in that sweet spot in it about remember the fifty. Six K Modem right right you get the Dow up. You hear the Modem. Actually some people like that modem sound is. You know you're getting on the Internet right. You hear that sound and I think Cooper is in the same boat where people say. Oh man look at all my nodes. I got my cluster up-to-date and it makes them feel like they're doing computering right like. Oh my a cluster staff. I see my stuff in a shoot. Is the Internet got real interesting when that went away. I remember when I got DSL for the first time arm. You never had to dial up your always connected and you didn't think about it anymore. I didn't think about how close my computer was to the phone. Jack Wireless Routers came out around the same time and then you can liberate it yourself from that Hook in the wall and when that happened. That's when I thought Internet got super interesting fast forward to two thousand nineteen we are now. I'm streaming net flicks streaming everything To the wall of no wires showing the Internet net is now just a thing so things tend to get better when they disappear. So I'm a big fan of Kobe's have always been but I know that we're just just in that fifty six K modem era of Coronado's when we hide it then I think more people can leverage it without learning how to manage it all. And that's where. Yeah I think the carbonates has to go there whether we take it there or so or else does I think that is the key to long-term success. That's a really interesting point because I will say that the exposure I've had to coober netease has included things like running workshops with a drum Sony who I know that you're GONNA be podcasting with him as well. I look forward to listening to that on but definitely running workshops with something that is constantly gently changing. It's like you've got this moving target. And there was a recent discussion just today over Cooper not slack about the number of releases that like a cloud provider will support versus like how quickly things come out from the release team and and I feel like this is a very fast moving space if you are paying attention to the bleeding edge and at the same time the entire world out there is not able to you know install exactly what's in the release candidate running production tomorrow like what do you see as when you're saying the future is now and this is happening but if somebody's listening to you and they say Kelsey as sounds wonderful and we're stuck John One twelve and we are getting anything new anytime soon or Kelsey. That's wonderful we don't even have cooper to netease yet. What do you say to them? I think that's a good place to be because if you don't have this problem you don't really need the solution quite yet. I like to think about like Lennox went through the same transition Atlantis. His came out. People rolling their own districts Lennox from scratch slack wear and then we got red hat canonical and then we stop thinking about. Wow so much. A building Lennox are using land from scratch. We got more into a what each vendor district identify with because they will provide and we just get to leverage averages if we fast forward on that Lennox analogy just a little bit think about people that are running android phones. Do you think they care about Lennox. But they get to harvest the benefits of Lennox. They get to use a mobile phone that they just installed APPs to as a mobile device and they never really had to touch the lettuce colonel. So I think there's GonNa be a lot a lot of people who will miss the entire cluster management phase of coober netease but one day carbonates just may be a thing that's just baked in to the saints. They're using I'm John Dimsdale. Just be using cooper ninety so I think for a lot of people. It's early for me. It's early six years into this thing. It's not like go Ten Years Old I. This stuff is so early that VM still work. Some people will skip the whole container thing. Go Straight to serve list for some of their workloads. Like it's fine but one day I would say Coober pedy style. API's are starting to resonate with more and more people and just like we. I saw from the Web web pages and Web servers and browsers. We saw restful interfaces come out of that where we took. Hd We took the verbs and now we're making Congratu API So some people will start to just leverage some parts of Without ever been cluster. Administrated themselves neat that's a really good way to put it because a lot of people are familiar with this idea of like okay you have. API's cool that means you have an points. You have something that you can interact with. You have hooks you can hook into. They hopefully don't have breaking changes important releases constantly and is that how. Oh you think most people are going to consume communities in the future. Because I feel like you mentioned server `less There's a lot we can talk about Matt Direction. But there's also this idea of if Cuba Nettie sounds like a big thing to you. Maybe all you need to think about is you're going to have an API to make your stuff work and you don't have to think about the substrate as much because that that yeah because the lyrics is ever evolving mandolins has released all the time. Colonel Bash all these users components there nonstop releasing at this point point what we're getting those checkpoints g represents a checkpoint AKA represents a checkpoint. represents checkpoint kate three s which just like this miniature version of Kuban eighties where they swap out allowed the components run on embedded are smaller platforms these represent checkpoints in this ever moving project. Honestly most people aren't even ready to adopt all the stuff we currently half right. So I think that's GonNa be playing time for you to catch up so I like to think of this as I. I am very happy that the innovation is still going on. People are still excited. New ideas are always showing the but for the average person who I put myself in that boat now i. I am a consumer carbonated these days and I go to the checkpoints and I just use it as is. I'm fairly happy. And that's that's a lot coming from you because we'll put a link in the show notes to your Cooper netties hardway on now you. Are you know a Cooper Netease. Og I mean this goes way back to the 2015 era. Can you talk a little bit about what people can learn from Communities the hardware style. Either yours or anything like that. Like what can they learn from that now. And what are your recommendations for them if they're looking at actually production izing any such thing. Yeah that's a good question because early on in Kuban as there were no docks really so when I learned Kuban as before I wrote my first loan at code before my first. Pr To the project. I had to learn how how to install it right like what pieces go where what's the scheduler thing. All about what the whole thing as simple then able to identify identify issues that I had with the entire system. Once it was put together I was at coalesce at the time so I was building on extensions to make coober nannies. Were well with corliss snowed so we can register them that kind of thing. So when I think about coordinators the hard way you can't really fix a system that you don't know how it works works right. That's just like really shooting in the dark and if you're gonNA take something to production you probably want to know how all the pieces kind of fit together. And I think for a lot of people and I have ops. Background is well the thing I always want to know as well how does it. So how's it supposed to work right. Minister Lenders see it in production with with some idea of how you'll know if it is working right in that I think a lot of us can appreciate. The happy paths occurred as the heart as about going through step by step in a very eighteen fashion no scripts so that people can say. Oh this is. What the cooking does it connects to the API server in this way? Wow that's where the SSL certificate. Goes I think once people go through that once. Then they have that foundational knowledge to improve the system troubleshoot the system or debugged the system. That tells me that people still get value out of looking at some of those details and I think that the container dot training that a Jerome Mend Brat and I have taught people a lot of stuff from is relevant again but at the same time. We definitely aren't showing people something that they should use in production like Would you say that people can start with something like Uber. Nowadays the hard way and then get to where they should be or. What would you recommend when they're actually credit now? Taking this stuff to prod the I think production is very tricky because there's so many layers to production is in system performance tuning security most customers. Come out of the box with flexibility in mind not necessarily security so you can run random images from the Internet. You can run things as route. Some people aren't using any security policies so people don't even know what policies are so you kinda run into squirrel convenience. I remember when Sti Lennon's came out. No one knew how I'm going to bet most people still don't know how to use So every since that men even though they wanNA admit it in take the first step turn. AC- Lennox there. You turn off so like people will act like this is a reoccurring thing so a lot of times. Most systems have been optimized for ease of use. And we've been making the security trial for a long time coordinates. Hardaway has to make some of those trails because if I did every possible security security step the guy will be ten times longer right and I think there's ways that you should probably automated way those things so I think the cloud providers the various dish shows that are out there the various tools that help you get a cluster running. They tend to try to automate a lot of those very tedious security things that you should do to lock down your cluster. That's what you need to start thinking about production security probably as a number one thing you have to consider when it's time to go to production 'cause you can always add performance you can always tweak we can tune the Lennox destroy image that you're using but what's that security whole is too big a little too late to go rewind. The clock on a breach is And somebody says to. You know this super nutty thing you decided you wanted to do turns out it's a security problem and you like it doesn't have to be in there like especially if we I believe you at this point but the thing that I've seen in the production is in front is now when you go and watch a talk at could Kahn. Or you read someone security guidance or the docks. They're starting to say the most common things do exactly these four five things and then we raise the security profile for a lot of people at the same time which which I think is really a big game changer. Yeah I think I in cold water. Talked about that. Some in their Their keynote at Kube Khan and I think even just some the work that a snake security has been doing like Gareth rush rush curve has been doing a lot of stuff in the space that ties in with of party agents and other things to just make it easier for people to do the right thing. Thirty percent of carbonates hardaway is because of the security early feedback that I received from people over time. The reason why I take the time to generate SSL certificates for each component and for each note. The reason why I encrypt the secrets the FBI database. And Show you how to do that and verified that it's encrypted ally straight up comes from the community saying hey teach people abo-about are back. Teach people about the encryption teach people about SSL certificate management so while. I'm not probably going all the way that I could go. I am definitely getting you. Pass the just do nothing stage right and it's it's Kinda funny too because I have taught the we like hey. Let's take a look at the dashboard. We're just gonNA open everything up and take a look at it and then there's like you have to put the giant disclaimers of like do not. Actually we do this if you do this. We are not responsible for your terrible life choices. While that's why the dashboard out of carbonates. The hard way UH-HUH I don't want to send someone thinking that that's how you do it. I'm like well I won't be the person to do that. I mean like we show the Dash Birthday. We say unless you want to mind crypto currency for someone on the Internet. You really should not do this. And if somebody is mining cryptocurrency on this raid the second. Were shut down right after workshops so yes yes Okay so when I was was was thinking about a staff that you've attacking about lately it's really interesting You have pointed out kind of you know trade off is the the right word but kind of there is complexity we we have to acknowledge this complexity choosing the right layers to interact with or choosing. You know exactly what you should abstract away in what you should do. Obviously it a workshop. This is going to be different. If if the goal is learning versus production. This is going to be different but what I would be curious to know. What are you seeing in terms of the translate? Now in what you recommend in terms of which complexity should you incur. Kerr is incidental is necessary. Like I would love to hear your thoughts on that incongruity space specifically this is why I love looking at other systems. uh-huh and how they evolved over time like CNN's. I remember when we used to glorify FTP servers. And then as you remember that. Although although not only do I remember that. But her member horrifyingly it was like twenty twelve and a major customer. I'm not even GonNa say what what are what vertical but I'll say a major customer for the startup. I worked at a customer heard of probably given money to you listeners listeners in the world they really didn't WanNA use. SMTP wanted to use FTP. And I had to talk them into using SF GP as like the lowest common denominator of them giving us more data sets there. You go so so people still use these kinds of protocols to transfer files than I remember. It was such a big task of of getting those files delivered as fast as possible to people in the world and some people thought they were just going to evolve. FTP to become more robust right right like some people are there's still a s FTP conference somewhere from like eight all gathering doing their thing but then CNN's came out and said he took this very complex problem. How do we get these images and video files close to people in the most efficient way Possible and be cost effective over time and maintain the security so now we don't really think about that problem anymore because it's gone away in many ways and that more powerful while most of disappear because the complexity grew at a number of people who can actually understand maintain that complexity shrunk but we can all benefit from it. Now when I look at the compute shoot world computerworld a little slower to doing that because there's so many more people who believe they understand the compute problem so every couple of years a new person comes around and events the new platform at four again. We see it over and over again right you and I have been part of some of those startups in the past. Now when I look around the service world is saying a lot there about there's eighty percent of the compete use cases. We understand no one ever has to build that again. So if you WANNA use that you can use that. But that doesn't mean mainframes go away because they're still really good at certain workloads in certain tasks doesn't mean. VM's go away. Because same reasons. So I think people look for this all or nothing that the next thing will replace all the other things no. I really look at this as we're going to have multiple things in parallel and you'd get to sit back and choose which one meet your needs if it was me these days. If I have to start from scratch I think I would probably try to go as high as I could and focus on building my APP in the product before going to play infrastructure again yeah that makes so much sense to because the nuance that you're pointing well here if I can try to summarize you can tell me if this is right but the new ones you're pointing out is if you're starting something new you probably probably don't want to incur a lot of overhead. You don't need however at the same time it's not like will be a good use of time or your money for your bankers airliner. Whatever to redevelop everything that they have and make it all service functions -ality they are not going to do that? The Dow and trust element hair. And I get it because I have a lot of empathy for people who say well if I let if I leveraged one platform I'll give blocked into that platform. I think we're trying to do a good job of various efforts like the season and various other organizations trying to create open standards for a lot of this stuff. I think what we want is the Internet is so complex that you probably aren't going to dig up wired to your house and connect directly clean to the backbone of the Internet. You're not doing so so you trust at a provider is going to give you open interface an Ip address so that you can jump onto the Internet an participate like everyone else. I think that's going to happen to compute. We need the providers to gain our trust so that way we can use open interfaces to participate in this big global compute set of offerings. That actually takes us to a really interesting place of Lake Security and compliance. When I mean I work at a club provider? You're a cloud provider. I'm sure you have these conversations from time to time where someone says. Gosh we would love to put workload X. Six in the cloud but you know our security slash compliance slash auditors. Don't like that idea. And I'm thinking like because the data center leasing down the road is safer or you know the basement of your building is safer really. And that's when that's when the human elements meet technology rice because if vital understand how would they can be secured. I might just be a little bit skeptical. We haven't earned that trust for a lot of workloads. Now there are some companies there are banks are one hundred percents online online. I'll absolutely basie being too much of a risk to have a single building with the door on it that I can walk in and take the server right. So there's there's different degrees of understanding the risk now. Now it's it's just it's a really interesting space because all right so you've got your mainframe workloads in you've got your VM's Em's and you've got you know maybe stuff that is in a container and you've got your your Con from Europe Stream Vendor that. You can't really do anything about. And you certainly can't re factor them down to their functional components or whatever and then you have maybe the new development people are doing and I think that our industry loves to focus focus on the exciting new development. People are doing because Shurmur seems simple. It doesn't come with any legacy. It doesn't come with any earn some customers who don't want you to change things or move their cheese she's Realistically the customers who pay money for cloud services or anything else are going to probably by definition have money and have customers customers and have workloads that they care about and they still might be looking at Cooper Netease and thinking I would love to solve some of the multitude of problems. I have with Cooper Netease. How do I get there? And am I going to be burdened by a too many megabytes of Yam will do it. What is when you're talking to people about about that transition? Like how do you make Cooper Netease or what comes after Trooper Netease which is to say like using could just as creative. How do you make that accessible the people? Yes when I when I go to engineers with handsome keyboard or maybe the leadership senior leadership you bet these companies one thing I do is I asked him to stop using the word legacy just like. Don't don't use it because right now. It kind of has this derogatory. Oh context behind it lasted this. Classic infrastructure is the stuff that actually worked cutting everyone's paychecks. It was the thing that actually works. I like the a place where all the customers in money are. There you go so if we if we start from there we say look. You did a good job building out those systems now. If you look at the current systems you have what problems they will so well on these platforms. We have a problem with things that you describe service discovery. We have a problem with failing over in this. Part of the data center goes down. We've written all of these scripts in behind all these people to bring it back up on these other notes. We have a big uncle rotation to do that so when I look at that. That's the pain mass the opportunity for value. So when Kuban as comes along and I look at what they can do having those previous jobs before I can easily say you know what there's a part of Rinaldi's that does solve that problem. Like very specifically that problem doesn't mean we have to a bill that part ourselves. Maybe we don't need all of carbonates we don't necessarily to replant form everything we're doing too but I can tell you that there's a much better check point these days for incorporating some of these fundamentals into our infrastructure and if we can get that by leveraging cover netease than let's pragmatically adequately show the values always ask people always have a good reason why Burnett's because we were maintained our own scheduler in something looked like horrendous. But you know what we have fifteen people that are doing that. I rather have those fifteen people working on this other problem. You're yet to tackle or or half of them. Contributing the Cooper News to make sure that we can leverage it going for and love that too because this is you having a very human centered. Conversations nations of like if someone thinks that their entire Value is the thing they're doing right now. They might be resistant to something exciting fighting in new. That could make their life way better because they don't see themselves there and if you can help them see what they can be doing that adds a ton of value there than they'll be way less resisting. This human factors spoke huge when it comes to getting people to adopt something seems different and therefore perhaps threatening us. Why like spitting time on site with people in their home turf? Where they're comfortable and I'll just ask him? Listen for a little bit. Just say what are all the things that you're not doing. What the things on your backlog? And a Lotta Times see observability style is the security stuff. He's just the things that no one ever gets around to doing properly because they don't have time I'm so nf carbonate is quote unquote. Automated you out of a job which it won't all. It's probably time back so you can start doing this other stuff spoke. I love that. 'cause I gotta say like Hashtag ops life is you have. Your hair is on fire or whatever rate the second but then there is the step you would love to get to in you know. You probably won't anytime soon in the luxury of being able to do those things that will make some everything so much better. That's huge but I do think that the community's ecosystem if you're approaching it from first principles or if you're approaching it from I don't know what's going on here but now that I have a little bit of understanding of pods a nodes so much Amal and everything else like. How do we get people to I? Don't I don't want to go into like the you know. Long tail or the future. Being being very unevenly distributed but I do feel like there's there are the maybe they are a bank but they jump on the stuff early and then there's the people who are thinking. Is that stable enough. Can we use it yet. What do you think can people actually use Cabrini National People? They're like they're not touching the Internet. You're not going to a touch it and that's okay like if you don't want to touch the Internet that's cool for the thing is these things will progress without you given the green light. And I think that's where people have to be a little bit pragmatic the world is going to continue to move forward with or without you. If you look up one day and you see a tool that might be useful. Only you can make that decision even if it's actually useful for you and the best way of doing that as being a bit more informed like what is the tool actually doing does it even solve all of my problem. And that's the engineering piece where I don't think people get themselves enough time to say let's bring it in. Let's put it this paces unless either say yes or no if we say no doc internally the reason why we're not using tool X. is because of reasons if those reasons get solved before we find something better that will revisit that decision. That's just engineering. And I think that helps with the whole fash shaded in vogue of all this like. Oh Rene I gotta be on it. I gotTa that'd be on it. I think that's where people run into that brick wall now there's one more thing to fundamentals for a long time. I worked at puppet labs. I've used answerable before contributed to answer bowl. I contributed to tear form and that whole era that lasts ten to fifteen years that was attempting to treat infrastructure as code. We wrote allowed code. We have these very advance. A domain specific languages dea sales to allow us to describe infrastructure in the Begin. Get into four loops and then we inherit the problem of dealing with any code base code reviews testing so forth and now with Coburn. Eddie's were try something slightly lightly different. I'm not saying it's radically. I'm saying it's slightly different will force ourselves to no longer do that. No more infrastructure's code now. We're doing infrastructure as data. Meaning if you want to be able to do some new capability you have to do in that control loop controller that thing. I some people call call them. Operators doesn't matter all the statements in four loops in any programming language. You want you have to implement it there. The state machine lives there then on the front end and we restrict ourselves to Dayton so we call that Gammel even though companies doesn't actually support Yam would only support Jason in protocol. Buffers ammo is how humans humans found a way to write data out in the way that we can interchange it to some API. So now we're in this world now what we're starting to treat infrastructure as data wise this powerful the drawback is it's a lot of the benefits just like we saw with assembly language. You can have any programming language. You want compound down to a common format that can be executed on large generation of CPU's Nanna we have this Aka the assembly language. We can now out build tools at any programming language. We want and they will all boil down into this data model. Then guess what you can do. You can take hell which is implemented in a real programming programming language adds a ton of great features gives you key values all up tin plating. I can run helm as a preprocessing. It spits out some mm data I can take that data to customize to patch it and then when it goes to Cuba navy's it can go through a mission controller that can add even more to the the data model before it lands into Canetti's this is like the dream come true that we can finally describe infrastructure with type system and interchange the tools without throwing everything away and starting over every time. I think that's that's really powerful. Just this idea that it is declarative not imperative which I think is maybe a mental hurdle that people have to get past. But then the idea that your tools are imposible because and that's one of the great strengths of the open-source ecosystem and why I definitely want to work in that space base and resist anything that says we're going to go down this specific rabbit hole. That's not cross compatible with anything else. 'cause like speaking as someone who does his work at a vendor. I'm very well aware that not. Everyone uses my employers services for everything but they may want to use it for something so I would love to be a cross compatible so that everyone who wants to use save cloud or whatever can do you. Do you think that we have. I don't WANNA I don't WanNa go down a giant Sierra de shaved rabbit hole but I think that there is a lot complexity in this space and so does the does the flexibility mean that the complexity becomes unapproachable for people depot. Isn't that's kind of what I worry about. It depends on who the user is right so we know that if we had no computers at all to a wall to this his point were required this kind of complexity total amount of complexity measures. Put it in a box. Wet Parts of the system deal with that complexity exiting so without these tools didn't they humans hold it all in our heads and spreadsheets remind spreadsheets we take all that complexity and then we try to articulated articulated would be a bash scripts right. That's that was the world we came from. So I'm familiar with the world that says dump Out What exists in your provider and check that and to get hub. Maybe have Jenkins great the bill if someone change changes something that they didn't check in because like yeah. We cobble together a lot of stuff like that before. We had a better solution so I think what carbonates is doing is formalizing complexity so now we can critique complexity we can talk about where it belongs but at least now we can see more of that complexity in one place so now that we look at that complexity different people deal with it at different levels if you are a really great your job and you're the person managing that carbonated environment and you're creating all this year dis. Your consumer the people you work with will know nothing about Kuba names they will just give you a CR that says you know what deploy my app across US fifty different countries. You Chrissy are that just as that wet zones do you want and in my control do all of this heavy lifting so you. You don't have to so I think that's a place where we can transfer the complexity from the edge of the company the developers the other operators and move it into into the system. But in order to do that then the system has to be capable of encapsulating that much complexity. So I think that's where you see this tradeoff. Aw carbonates is a very powerful tool is extension model can be very simple or it can be very complex. The Nice thing is when it does get more complex we you don't have to abandon ship on it and find a whole nother tool to start over with so that that kind of brings me to like if people are. We're trying to figure out approaching Cooper netties. Now let's say they don't need communities WANNA win. They've been using it a little but they want to implement everything that they would love to do that. The here about What should people should be people be jumping into? CRD's which for people people who are just now tuning into the Kupfernagel custom resource definitions. Should people will be looking at operators like what would you or something else entirely like what do you I think people should be looking at. If they're like okay. I am ready to really understand this the space. So here's the thing if you just want to deploy some containers riding. Sierra decent operators is the equipment running kernel modules like. Oh I want to understand links. Let me write my own drivers like. Hey the thing you can do it on your employer's time but there's there's a group of people who should know how to do those things trip and we'll need to do those things but for the majority of people that does not the job. That's not the job so I think you gotta Ask Yourself. Am I looking to use as a tool to run my application nations. That has it's very convenient. Api that lets me declared say I need a load Balancer S. O. Certificate a DNS name and this container container. You want your hands of it now. If you're managing the cluster you learn how to use it as is out of the box and the community will provide you adopt the best of the breed and you just provide that to your developers now. Sometimes you're going to have to go beyond that so Kuban as is still early. Meaning every integration. The world does not exist. And that's where you're GONNA have to scratch your own ish now. If you WANNA be able to do that so you WANNA be able to scratch your own each then yeah. CRD's is how do things in the world and like we said earlier. CRD's these are backed by control loops. Where all the complexity of the logic winds so those two things go hand in hand but that's only if you need to extend the system and so that that kind of comes back to the conversation that you have and I I have probably a lot of people have with customers when they say is technology X. Right and you're like let's talk about the problem you're trying to solve because picking the technology I is lake? Classic Resumes Tour of Development Is Kinda what I'm thinking. Somebody is suggesting it is so hard now because you know things are the best they can be with your current stack and maybe you caught in the middle. I don't know I don't know so I don't even know what question to ask. And I may not be able to even articulate the problem. All I know is every time we try to deploy away the APP. Something goes wrong ching years in the room and it's not always the same something. Yeah so then you could Khan and people like Oh man. I'm the plan I and all over the world using cover ladies and look I can deploy right in front of you right now. And you're like Oh man maybe I need some Kuba Nannies and I think I'm partly responsible for some of this right so I think people say Dan all I know is that there's this new check point or some people people don't seem to have the exact same problems I have. They have a different set of problems. But maybe not the exact ones I have so all they know how to do is say Kuban eighties right for me. Ask Your doctor if Cooper not exactly okay so like when people are figuring out of Cooper Netease is right for them realistically. We're human some of US are going to make decisions based on what is interesting to us. Or what's right for a career or not. Everyone is going to start with like you know Landscape and you go. Oh my got way too many and then you go to maybe the the cloud native trail map or whatever you say okay. We'll get some CIC D that's great. We'll get some a source control. That's great hey orchestration containers if people are to the place where they're moving beyond that. What are some of the exciting things that they I should be looking at that? Maybe will help them and also help their career become little bit more what they want it to be like. What's what's the the new cooper? Nettie is what that you have in front of you. So here's the thing I was teaching my daughter How To make a web page? I mean by ill Geo cities not you know we got the male tags. Got A little bit of job with scripts. Um Cea says she's she's got the Blinky tag thing going on and this is like with scratchers they're no no just straight up text editor. Oh Asian male tags and in the Nice thing about deployment processes it just Right she goes through browser as she opens up the file she sees something game so we're making progress. She's like how do I put a pitcher. Okay she got a Web page there. She asked a question. There's like hey I text my friend. One two seven zero zero one and they can't see anything right so I'm like. Oh Wow now we get to talk about the actual Internet and networking and infrastructure alabi aback parent if I taught about docker containers detainers and Like that's not what she wanted to do she wants to show her webpage up. Doing is just using firebase or something and just said hey we installed firebase she said firebase deploy in a spit out a euro as she put her phone. Oh my God I can see my website. They gave it to her friend. She could see it to to me. I was like that's the end game like that. That's it the goal of the people like us we're building these platforms to enable that for as many workloads as we can possibly imagine to a point where if you wanna run your own cluster. You will always be able to do that but I want people to have the ability to say I have an idea have the code done and I wanna see it come to life. Why can't that be possible? Oh like that legitimately should be possible. And I think that's where I think the the excitement so between that that wow moment and everything below that that's all the opportunity we have so any platform that it is closer to that moment for people. That's the exciting thing so burnett he's will evolve toward that way from the ground up. The service STEPH will backtrack its way down to support different kinds of workloads outside of functions than in no surplus or stateless containers. Everyone knows that's the in game for compute. So that's the thing that I'm really focused on. So that's where I find. Excitement it across the board is when people push me towards those moments and people love that people love it just works vessel especially I think. That's that's a good thing for us to remember as technologists to is like as fun and interesting as all of these toys Azzaro play with like we are also hopefully building things that produce actual value something that people would pay for or would vote hoped for or would be interested in that. That's our engagement. We should never lose sight of that. Even though I'm not saying that if you work on those things in the middle very important in thing I'm not saying that it doesn't matter I'm saying is is not the end game. It's not why we do that. Yeah that's that's that's pretty important in. That's probably where we should leave it because we've been talking for a good long while here. I really appreciate your time today. Kelsey on and where our listeners catch up with you if if they're thinking that that Kelsey Guy I want to hear a lot more from him where can they do that aggregated one place at this point online on twitter. DM's are open even promised to close them. But I actually find it very valuable. Learning from other people and direct messages is one way that still works for me. So twitter's where I met excellent so people can find you on twitter or if they if they want to see you speaker interact somewhere where where would they get a chance for that. Random meet UPS across the world and they're just random but we always announce them like two weeks before they happen and there's lots of content on youtube right like you have been doing this for a long time so if you're just missing Kelsey your room you can totally pull me up on Youtube and I will talk to you about all kinds of stuff. I love it I feel like that's that's pretty much where I'm at too. It's like you you know I run. A local meat appeared Minneapolis. A- probably be at Kube. Connie you though I won't be speaking but like it's mostly just you really really want people to be able to connect with their local community connect with people in their organization and hopefully some of the stuff we talked to what today will spark some ideas for them. Thank you so much for for coming and talking to us about that I will. I will tell people that I cannot over to arrested. devops dot com slash Cooper Nettie Stash future for the substance channel. Send leave a review in the I tunes store or if they want to help other people find the podcast note that I have no idea how that works but apparently reviews We're on spotify an Iheartradio if you're to those and And Yeah I'm just really grateful that you're able to make the time where it can come share with us. Today's Thinki- Kelsey awesome. Thanks for having me hate I'm Bridgette bridgette crowd. This is arrested devops and remember. There's always defaults in the bananas.

Cooper Netease Thinki- Kelsey Cooper Netease twitter Kuban Lennox CNN Cabrini National People CRD Kuban Burnett bridget creme Costco Hall HIGHTOWER
101. Cloud Native Applications

Code[ish]

30:38 min | 11 months ago

101. Cloud Native Applications

"Hello and welcome to coach. An exploration of the lives of modern developers. Join us as we dive into. Topics like languages are frameworks data and event driven architectures and individual and team productivity all tailored to developers and engineering leaders this episode as part of our deeply technical series. Hello and welcome to the podcast. I'm joe kutner your host for today. I'm an architect working on the roku and salesforce platforms and with me is cornelia davis Today we're going to talk about cloud native the cloud cloud patterns. And hopefully today all learn if i am a cloud native developer so cornelia. Welcome to the podcast can tell us a little bit about yourself. Yeah my name is cornelius. Davis i am currently the cto at. We've works We've works just for those listeners. Who may not know it. You might know. Actually we've worked through a number of open source projects like flux and flagger and we've met in some of those things But we are really in the in the cloud native operations space We have coined the term ceo alexis has coined the term get ops. That's the space that i work in now. But i've spent the last decade or so working in cloud application platforms prior to joining. We verts at the beginning this year. I was at pivotal Where i worked on cloud foundry initially Helped bring that product market and worked with lot of customers to make them successful in the cloud using a cloud native application platform and then later on more in the last four years or so have really focused on cooper netease which ironically is wouldn't call it an application platform. I call it more of a cloud native infrustructure platform And so kind of bringing those two worlds together very cool stuff. Can you define get ops. That's a attorney mentioned in. Can you maybe give a formal definition and talk about what some of the implications are. Yeah so i think that. The simplest formal definition actually doesn't involve the word yet at all. It is cloud native. Operations is the way that i think of it now. Let me draw an analog there. In that i'm one of the things i didn't mention in my intros that also the author of a book called cloud native patterns and that book is targeted at developers software developers architects. The who are building these highly distributed applications these micro service based applications and helping them understand all the patterns that you have to put in place to be able to make these micro services based apps work in this ever changing environment that they run in and i think as an industry we have really come to understand those cloud native patterns from a software architecture perspective quite well so as an industry. If you remember a number of years ago martin fowler said. If you're gonna do micro service you have to be this tall to ride the ride right. And so we figured out with those patterns were too so that we could you know circuit breakers and retry patterns and things like that. We have not as an industry figured out what it means really to do. Operations in this cloud setting where everything is highly distributed and constantly changing. And that's really. What get ops is focused on. Its focused on a new paradigm for doing operations. Now the fact that get is in there. It's a snazzy name. And i'd like to say that guesses the central square on the buzzword bingo card these days so it's it's kind of a snazzy term but i'd like to emphasize the ops part more than the get part get does play a role in that a couple of the key patterns are declared of configuration and having a version immutable version. History of those declared of configurations and. It happens to be a really good tool to do that. So the gift is really kind of hinting. At one of these cloud native operational patterns. But i think of get ops is the whole broader spectrum of the set of patterns that were used to do operations in the cloud native setting. Yeah that makes sense so in your book You have a statement that i think is related to this and i really like that. The cloud is where we are doing. Computing and cloud. Native is how we're doing it. And so you consider gaps or those operational patterns as part of that. How the right exactly. And it's it's more again now on the the how of how we keep these things running in production. How do we upgrade them. How do we. So one of the patterns in the book actually starts to talk about different deployment things like blue. Green deployments or canary deployments. And it talks about those things. I actually talk about them in the book from the concept but then i talk about the architectural patterns that need to be in place in your application to support those operational patterns a progressive delivery. I consider the operational patterns to be. And i've actually heard somebody wants say that that the software architecture patterns that we have they referred to it as designing for operations and so it's designing for cloud native operations the day. I read that that statement. I had a doctor's appointment and i was talking to my doctor and he asked you know what what do you do for a living. What kind of software. And so. I told him i worked in the cloud and that led to him talking about where he stores his photos and i sort of just let them. Go like yeah. Sure not really the same thing. Yep but i feel like that's going to help me articulate. Just what. I do on a day to day basis better so thank you for that my pleasure. There's such a thing as running in the cloud and not being cloud native. Are there anti patterns. That were maybe falling into his traps. Yeah i mean that's absolutely a big part of what we've been doing is an industry is helping people understand that going to the cloud so going to the place of the cloud doesn't mean that you're doing things in the cloud native way. In fact i mean the first four words in chapter one of my book are. It's not amazon's fault and i start off by talking about an act outage that amazon had an how there were a whole bunch of online properties like imdb an even nest that were down for longer than the aws outage was because by the time the outage came back in the outage was like five hours By the time they recovered after the outage was resolved. They were offline for six seven. Eight hours but net flicks was they. Actually i have a quote in there. Were they in any blog post. They said yeah. We suffered a brief availability blip. I mean for them. It was literally a shoulder shrug. And that's the difference between cloud native and being in the cloud so if we don't follow some of these patterns and again going back to the operations in the software architecture patterns if we don't follow the software architecture patterns there's a whole host of things that can go wrong but then even if we do that in don't adopt these new operational practices than again. We're going to be in a world of hurt Because when things change when amazon has that outage which they're gonna have. They never promised you that they'd never have an outage or region outage. That's they give you multiple agencies or multiple regions. It's up to you to embrace those. And so yeah we do things i talk in the book about. Please don't use sticky sessions when you do sticky sessions than what that does is it binds your user experience to a particular node and now you can't even apply some of these operational patterns. You can't do a rolling upgrade or you can't do you have to take a maintenance window and you have to wait for things to drain away and all of that stuff. And how do you know when when sticky session can be drained away because it's no longer an open socket at something else and there's lots of mistakes that we can make and again. I think we're getting better at it from the architectural patterns. That were still still have a ways to go on the operation side. Yeah i think. That's that sticky sessions are a great example of of those anti patterns. And i think we're definitely seeing the industry start to move away from those And in my mind relates very much to the twelfth factor app because that was one of those patterns that when we would talk about the twelfth factor app and stateless processes Was something that we had to as a community as an industry. Sort of move ourselves forward from So i think there are some other examples that you talk about in your book different patterns. That are very much related. I think you have a great chapter Talking about configuration and environment variables Can you talk a little bit about the different types of configurations and really how i think maybe some. The ideas in your book have gone further than was originally stated in the twelfth factor out so on application configuration. What i think is super interesting about application. Configuration is the relationship to life cycle in. So that's a big part of what i did in in that chapter and i think application life cycle and configuration were to adjoining chapters if i remember correctly And i did that on purpose because they because of the relationship between them now even today. I still find what i consider application configuration deeply embedded in inside of a code base. You know there's there's some value that's in there now. We've gotten better at that. We've pulled that out at least put it into a properties file But then there's the question of is the properties file compiled into the binary that's distributed or is that something that is added later when you're using something like this spring framework. You can have other ways of doing that and now. The twelfth factor app actually suggested well. Let's use something completely neutral. Let's not worry about whether it is a dot properties file and java or something else for ruby or something else for c. shop or something like that. There's something that's uniform across all of these different environments and that's environment variables so let's use environment variables in what's cool about the environment. Variables is it allows you then to draw them from a number of different sources so you can use some of these operational patterns something like get ups for example you can use that to deliver environment variables or you could use some type of configuration service. Or there's a number of different ways that you can inject those things into environment variables and so it's nice from that perspective. The environment variables are nice from that perspective. In that it allows you from within your code to say look. I don't have to worry about the mechanism. I know that there's something that's ubiquitous across all of all of these environments the relationship though to application life cycle is really interesting because so when you change that environment variable. Let's say somebody comes along and wants to do credential rotation or something like that. You change a value. An environment variable. What what is the cycle picking up that environment variable from within the code and this is where the relationship to operational patterns is really interesting as well and tie back to the whole notion of twelve factor. Apps and statelessness is that let's take a containers based way. Let's say like something like cuban. Eddie's say my thing is running in that container and i want to deliver new application configuration. This is actually orthogonal to whether it's an environment variable or not. I want refresh the configuration one of the ways that i can do it as i can. Just say you know what. I'll just get rid of that pod and i'll stand up a new one it state lists and that allows us some flexibility. It allows us to have some application code. That maybe isn't designed in such a way that you can kick it when there's new configuration and have it re initialized itself. Maybe it only can do that on initialisation. But now you've applied some of these other patterns like statelessness that allow me to have an operational pattern. That i can use to refresh the configuration. I love that those ideas sort of hearken back to the the principles of the twelve factor app at i think you framed it as the neutrality of environment variables but still leaves space for accommodating. Some of these new concerns Like for example as we talk about containers. And i think start to see this on the roku platform in the future to other mechanisms for for providing in a neutral way secrets. And things like that. That gives you the flexibility to for example roll credentials without restarting the process. Those kinds of things. Which of course means that. The process has to embrace that that puts an onus on the process to be able to refresh people without having to be rebooted. You have these three categories in your book of cognitive apps cloud need a data and cloud native interactions and i see between like apps and interactions potentially that boundary where i think a platform like roku being able to handle certain interaction patterns for you but i'm not sure if it's as clear cut as all of the interaction patterns so i'm curious if you have a better framework or or definition for how to decompose. Yeah and what's interesting is that i have this very firm belief that like i said i have about thirty or so patterns that i cover and they're listed on the inside covers of the book and i cover all of them but honestly i think that the ones the developers responsible for implementing themselves is probably maybe only a third of those certainly not more than half of them because they can leverage a number of other things. For those implementations now the platform i think is one of the places that you can have that in your right that there are A number of patterns but even before we go there it's not only the platforms. It's also the frameworks and so one of the things that i did in the book was i was using the spring framework and the spring framework implements. Many of those are provides an implementation. Now it's you up to use the developer to know that implementation exists in the framework to included to configure it. The right way and all of that stuff. It's also by the way up to you as a developer to know exactly what the platform offers and so. That's why i say it's important for the developer to understand those thirty patterns but they don't necessarily have to implement them but one of the things. I find it really interesting here is that there are some examples in the book where i use the spring framework to the pattern in a concrete example. So i talked about the pattern. Explain it and then i use it. Concrete example that my readers can actually you know follow along. They couldn't pull it out of the get hub repository some of those patterns. If i was writing the book today if i was writing that chapter today as opposed to writing it two years ago or three years ago i would probably have the platform do so for example some of those patterns are now implemented in sto or implemented in a service mesh an so that kind of transition from even less responsibility on the developer to do the implementation in and then tested within their put. They're actually now using platform primitive to leverage that pattern to make their software resilient and have certain characteristics in the way that it runs in the cloud does that reflect a sort of maturity in the industry that these things are becoming more integral to the technologies would choose as part of our platform. I absolutely think so. I think it's it's an indication of maturity of platforms in general. And you know. I spend a lotta time with cooper nowadays and i think that the reason that cooper netease took off wasn't because it was the best orchestration container orchestration engine out there but because it was the best it was a platform for building platforms. It's designed in such a way that you can add something like service mesh adds something like sto an envoy to the mix and now some of the things that cloud foundry for example. We had a number of services that we created for you. That would tie in so here was where we were actually starting to take some of the things that netflix had done. So they had created for example eureka. We offered that as a service on the platform but then it in particular integrated best with the framework which is the spring framework so there were still a tie there. And you're right. I think i hadn't even thought of that but thinking about it. What we're seeing is kind of this next level of maturity. Where now in fact. I don't need to do anything on the code side. I take all of that. That was in the framework implement in the sidecar for example. But that does introduce some new patterns. Like now you're just talking to local host and you no longer have to worry. Your local code can just talk to local host in. It doesn't have to worry about doing this service discovery protocol because you've taken client side load balancing and said okay. I'm going to let the sidecar deal with that. I'm gonna let on void deal with that for example so going along with that. I think there is one thing in your book that might disagree with or at least terms of its certainty. And you made a statement that Writing cloud native software is complex. Just flat out complex and so. I'm curious as if it has to be that way because as these platforms mature as our as we sort of kick those old habits down the road you know. Is it possible that a software developer who is not solving necessarily complex problems but just trying to build some something that's valuable to their business can still have all cloud native characteristics like the resiliency in adapting to change without the software and the apps. Being complex is that possible or is it just inherently complex. Yeah no i in. You know what. We're not going to actually disagree on this. That might all. I guess it's a scoping question as a whole when you look at something like netflix. The netflix application or the facebook application. Or something like that. That is absolutely a complex system. It's got lots of moving parts and the only way that we've been able to manage things like that. At the scale that facebook and google in those properties reached was because they are really good at understanding what are the not only the software architecture patterns but the operational patterns these. Are you know the unicorns have figured out. Those cloud native operational patterns so as a whole bets there but i think that we have achieved the right thing if somebody who's working on a component within that system isn't burdened with that complexity and. I think that is absolutely achievable. And have we achieved that completely. I would say probably not. I think in places we have but again going back to the cooper nettie space today today. The developers are increasingly. Asked to not only worry about the the code in the processes in the multi threading that they have within their single micro service but they are also asked to understand what it looks like from a deployment perspective in those types of things. And i think that we haven't nailed that substrate yet that makes it easy for services to be consumed into the complex and maybe were sixty percent of the way they are to make the micro service developers life a little bit easier. So you mentioned micro services and a lot of these patterns are directly related to micro services whether it's like service discovery or circuit breakers. Maybe this is a restating of the question. I just asked before but is there room for the monoliths in a cognitive architecture and these cognitive patterns. Yeah i mean that's a really great question. I think that monoliths in general break so many of the patterns so right out of the gate. You're gonna be burdened with with problems because you've now you're you're now running. Let's say cloud native platform or in an environment like that. That assumes those patterns in that can do things based on those patterns based on that assumption of those patterns being there beyond that there are definitely monoliths there where the internal architecture is component sized. Done a good job creating separate services. And maybe you're even doing statelessness in if if i'm following this patterns. Why am i not calling that a micro services architecture. Well it has to do with even the way that you bundle those things together. and so. There's this idea of this monolithic bundle. I've done micro services on the inside and that type of monolith will probably work well reasonably. Well it depends on again how. Those components are deployed in those types of things. But it isn't going to solve some of the other things that these more loosely coupled micro services architecture do and things like having independent release cycles being able to do independent blue-green being able to create bulkheads between the different micro services. And so i would say that. The answer is i mean. I am not a purist. I'm a scientist. And i want to be able to take things that. Don't follow twelve factors. Maybe they only follow four factors. And i want to be able to bring some of those things over and whether they get this like negative label of monoliths or not. I'm very very interested. In bringing those into the fold and getting their incrementally so in short. I think that there is room for some level of monolithic architectures monolithic applications in the cloud. And starting to make its way into cloud native. And i think the reality is. I work a lot with large enterprises. The reality is that we can't just rebuild everything greenfield we need to get there incrementally. Okay so I'd like to come back to the twelfth factor app because Just love it so much. And i talk about it so much One of the things that i think you do really well in your book is cover some topics that may have been missed in the twelve factor app or like often say if we were going to write it again from scratch. Today there'd be a whole bunch of things that we would add in two of those are related their visibility and then logging and metrics. So i think in in visibility you talk about health and points. Can you talk about why that's important. And how it fits into the cloud native patterns. Yes so i love that you you talked about visibility with respect to These probes in. I'm going to use the term. That cooper netease they have you know. These health probes in these health points. Although you're as developer of these services you're responsible for implementing them would coober netease does is it allows you to tap into those do some interesting things with that in fact. I think that's really the answer to your question. Which is okay. I've got a bunch of different components and we talked about the fact that i've got lots of components that come together into a relatively complex whole you know whole Whol e. so there's a there's a a larger i'm digital offering that is made up of composed of all these little micro services and it's one thing to know whether one piece of that is working fine and that is something that from a monitoring perspective. You could just say okay. Well i'm going to monitor these components. What the health check does is. It allows us to actually start to get some behaviors where we stitch those things together. So it allows for example. A client micro service to. Let's say you've got some behavior in there. You wanted to find some behavior in the client micro service on consuming a another service so downstream service. You have a system going back to platforms. You have a system. Like cooper netease that is constantly watching those health points and then can actually help the upstream service do something different based on the way that that downstream service is appearing so used the term visibility. So it's not just visibility in terms of dashboard. I think a lot of times when we hear visibility. We think okay well. I'm just going to bubble things to dashboard. But visibility is super important for being able to orchestrate things automate things as well and that's a big part of what i see these probes in these health end points are about. I can tell you that ten years ago. I didn't really think about programming. Health end points so that some automated robot could do something on my behalf depending on what that value is. But that's exactly what kuban netease does. That's how cooper netease knows. Hey if there's something there's health probe that says if i'm not getting back in response from that health prob- i'm gonna throw away. This pod and i'm going to create a new pod. I'm going to create a new container instance for you. So that's the mindshift that we need to do as developers is to think about robots. The consumers not just a dashboard somewhere. So the subtitle of your book is Designing change tolerant software. And i really like how several times throughout the book you characterize cloudy patterns of apps as being able to adjust to new conditions in it in adapt to change. So i'm wondering if you can give you some examples of the kind of change that you're talking about. Maybe something that you've experienced. So i mean that is really what characterizes the cloud just about more than anything else. I mean highly distributed and. I've talked about that a few times. So highly distributed by services all over the place network latency networks going in and out and in fact that network latency in networks going in and out kind of hints at that change. That's one of the many many things that can change in the environment and so that changed. Tolerance is partly there to say. Hey stop assuming that your infrastructure is going to be stable because your infrastructure is going to change. We talked about the the example of. It's not amazon's fault. You know this. That's one of the things that i always say as if you as as an administrator as an operations person if you ever catch yourself thinking okay i'm gonna run the script and then i'll be done that that done word is a bad word in the cloud. Because you're never done. There's always some change that's happening but there's also changed. That is intentional. Change that you want to be able to enable so i'll give you a concrete example here one of the things that we used to think ten fifteen years ago. We used to think that the way that we achieved security in our software systems was to go through this rigorous process of analyzing everything we had the security office in. We had the change control people. And we this is part of what has slowed us down significantly in being able to do deployments but we did it. It was something we were willing to put up with because we felt like the way that we could be secure as to get everything set up in a secure way and then not allow it to change as we've gotten better and better at the cloud we started to realize that there's some really interesting patterns and we've started to realize that in fact security comes through constant change so for example the way that we have historically tried to protect ourselves from malware is to have malware scanners but the malware scanners depend on recognizing the signatures of malware signatures that are coming across networks or signatures. That are coming across in. I don't know computer profiles the start seeing spikes on some regular basis and those are the things that we tried to look for but what if instead we said you know what we recognize the fact that we're never going to be able to see those all those signatures. There's gonna be some clever hacker that gets malware in there that is going to be in detail. Undetectable what if we just throw away. That container instance every single day and so now the malware. Maybe it gets in there but it doesn't live there for six months collecting credit card numbers right and so that is something where this intentional change. This allows us a completely different security posture but it requires that we build our software in such a way that it can tolerate that changing condition in the changing condition being ham. Just gonna reboot you. Well thank you for joining us today. Cornelia a really been a pleasure to talk if you are interested in learning more about cloud native patterns and cutting of applications will include a link to your book coordinated patterns from manning In the show notes thanks again. Thanks so much for having me. It's been a real delight. Thanks for joining us for this episode of the podcast. Coach has produced by her roku the easiest way to deploy manage and scale your applications in the cloud. If you like to learn more about or any of whose podcasts please visit hiroko. Dot com slash podcast.

cooper netease joe kutner cornelia davis amazon netease martin fowler netflix cornelia salesforce cornelius alexis Davis Eddie facebook eureka cooper kuban netease greenfield google
Espionage phishing in unfamiliar places. OT vulnerabilities. LemonDucks rising fortunes. Data exposure. Kubernetes advice from NSA and CISA. Meng Wanzhous extradition.

The CyberWire

30:30 min | 2 months ago

Espionage phishing in unfamiliar places. OT vulnerabilities. LemonDucks rising fortunes. Data exposure. Kubernetes advice from NSA and CISA. Meng Wanzhous extradition.

"Now a message from our sponsor cyber reason cybersecurity defenders. Don't fear ransomware. They end it with cyber reason defenders detect and stop ransomware that even others miss promise backed by their one million dollar breach warranty. At cyber reason. They don't fear rent somewhere. They end it. Learn more at cyber reason. Dot com abt thirty-one cast its net into some waters. That aren't yet fished out vulnerabilities in the niece stacked. Tcp stack are reported. Lemon duck may be outgrowing. Its beginnings as crypto. Jacking botnets a large marketing. Database is found exposed an essay and cisa offer advice on securing cooper netease clusters adam derra from zero fox checks in from the floor at black hat. Our guests are nick filling ham and natalia. Delia from microsoft's security unlocked podcast. David differ from web route on the hidden costs ransomware and while way. Cfo returns to court as her extradition hearings. Enter their endgame from cyber wire studios at data. Try by dave bittner with your cyber wires summary for wednesday august fourth twenty twenty one positive technologies the moscow based security company with operations in multiple countries late yesterday reported widespread activity by a pt thirty-one also known as zirconium judgment panned up and hurricane panda a chinese cyber espionage group usually associated with collection against governments in pursuit of beijing's strategic goals between january and july of this year the campaign used phishing emails to prospect targets. In mongolia canada bellarusse the united states and unusually russia positive technologies close to the russian government and participate in the gossip information sharing system that russia's cert- overseas intends to keep rushing organizations in particular apprised of abt thirty-one activities. The company believes this marks hurricane panda's first significant effort against russian targets. It also expects the activity to continue at least over the near-term since the hurricane pandas typical approach has been through phishing emails. The usual cautions about proper suspicion and skepticism. With respect to the stuff that shows up in your inbox would apply security firm four scout and security research shop jay frog this morning. Disclose their discovery of fourteen vulnerabilities in the niche stack. Tcp piece stack widely used in ot and industrial iot environments. The vulnerabilities could be exploited for remote code execution. Denial of service information theft tcp spoofing or dns cache poisoning recommended mitigations include prompt application of patches. When they're available network segmentation and blocking unused protocols four scout sensibly acknowledges the difficulty of patching operational systems with their mission criticality and multiple dependencies and offers. A range of things organizations can do until they're able to apply available fixes the lemon duck dot net. Once known as a small potatoes crypto. Jacking operation has outgrown its origins the record reports. It's become massive and is showing signs of expanding its capabilities to include hands on keyboard intrusions into hacked networks this suggests a possible move into ransomware or destructive attacks in the near future researchers at security firm guard corps. I described lemon duck in two thousand nineteen and microsoft within the past two weeks. Devoted a two part series to lemon duck and lemon cat as is usually the case. The bad actors run by many names it would be convenient to simply call them legion. Lemon duck is now across platform threat. Infesting both windows and lennox systems and it also operates as a loader we disclose again that microsoft is a cyber wire partner. The guard corps malware analyst. Oh fear harp house who i noticed lemon duck back in the day told the record that it began as a classic spray and pray crypto jacker but even in its early stages lemon duck while small seemed to be serious about its business and determined to build for the future. They showed strong technical chops for one thing. Quote there multi-stage power shell scripts were more complex and obfuscated than others and they already made extensive use of open source tools for code execution and infection and quote and some of the features microsoft called out where there from the get go credential theft removal security controls and lateral movement. They were all there from the very start so for now. While lemon duck remains a mining operation we may be seeing an incipient entrant into the criminal. The criminal ransomware as a service sector. The annual black hat conference is officially underway in las vegas albeit with lighter. Crowd says many have chosen to sit this one out. Thanks to cova. I checked in with zero. Fox's adam darragh from the black hat show floor to get his sense for how it's going. We anticipated the same thing. Everybody else's anticipating you know we were watching the news closely on what what blackout had in mind. As far as like rules regulation best practices. And i will say that They're doing a great job. So far in You know people are being courteous. People are being kind. Respectful of of you know maybe not wanted to be close. Shake hands and stuff. But you know in the run-up to at all At the end of the day was we expected a lot less people to show up. I mean some vendors pretty major vendors We had heard pulled out and judging by the the floor right now. You can definitely tell. There's definitely been tamed. A bit as far as vendor participation and even like user participation but You know we just decided that it would still be worth our time and our efforts to be safe to be reasonable and to give people opportunities both to meet with us in person you know because those relationships matter and i think people are are excited to meet with each other face to face in in an reasonably safe manner as possible so we just went for a man. I've heard folks say that when you have a year like this where attendances down. It might not actually be such a bad thing because you get to spend more time with the folks who are interested in having a substantive conversation you can actually step aside and have the time you need to make those things happen. Yeah so i. I happen to agree with that You definitely don't want people to get the impression that you're you're not caring. You're not attentive to what they're doing like you're sitting. You're sitting in a boon or walking down the hallway. You see somebody that you know. You definitely want to give them the time. They deserve So this year definitely will Afford us that opportunity however in the opening hours We are still seeing quite a rush So we will see if that dies down as the day as the days continued but I happen to agree with you. I i really love. I'm preferred taking the time one on one. Be thoughtful with my answers to be subsidy substantively accurate with my answers and make sure we're resolving the concern or or seeing things through the and so. Yeah that's definitely the by this year. I think is insofar what about beyond the show itself. A big part of events like this are being able to get together with friends and colleagues. You don't get to see Very often is are those sorts of things still happening. Wow that's very yes so Those things are happening Based on just my personal preferences I find it quite therapeutic to be back in. I in talking to people shaking hands giving hugs high. Fives elbow high five. Whatever people people are comfortable with And it it definitely provide An added layer of trust in the security business. I think trust is paramount. Mutual trust and respect the fairmount and be able to reestablish that in person face to face talking just all those things are are great and they are happening outside of the venue itself which is really refreshing to see. That's adam derra from zero fox. Vpn mentor reports. Finding an unsecured database maintained by business to business marketing firm one more lead the database included personal data on between sixty three million and one hundred twenty six million people in the us one more lead secured the data when vpn mentor. Contacted them. How the data were collected in the first place remains unclear and vpn mentor speculates about possible connections to earlier incidents involving other marketing outfits. Nsa and cisa issued joint guidance on coober netease configurations intended to help organizations build and maintain secure cooper netease clusters the two agencies. Explain quote cooper. Netease is an open source system that automates the deployment scaling and management of applications. Run in containers coober. Netease clusters are often hosted in a cloud environment and provide increased flexibility from traditional software platforms. The report details recommendations to harden cooper nettie systems. Primary actions include the scanning of containers and pods for vulnerabilities or miss configurations running containers and pods with the least privileges possible and using network separation firewalls strong. Authentication and log auditing and quote the advisory also details. The reasons threat actors are interested in kuban. Netease quote cooper netease is commonly targeted for three reasons. Data theft computational power theft or denial of service data theft is traditionally the primary motivation however cyber actors may attempt to use cooper netease to harness a networks underlying infrastructure for computational power for purposes such as cryptocurrency mining and quote and finally the extradition hearing for weiwei cfo. Zhao is entering. Its final stages out in vancouver where canadian authorities are. Considering whether to honor the us request that she be expedited to face charges related to alleged illegal alway trade with iran. She's been in vancouver since she was detained on a us request. In december of two thousand eighteen. Bloomberg says that if you bet on form the odds of canada sending her south to the us are about one hundred to one in favor of extradition. The case involves some murky financing while way is said to have arranged with bankers at hsbc involving a subsidiary or partner their relationship was obscure sky. Com sky. Com is said to have tried to sell. Hp equipment to a service provider in iran. Which would constitute a violation of us sanctions on tehran. Mang is alleged to have lied about skycams. True relationship to alway an essay and light reading while not particularly friendly to wa- away or blind to the questionable aspects of the company's operations that have brought it hostile. Us security regulation thinks. The prosecution of mang looks at this point vindictive especially since she's been stuck in vancouver effectively under house arrest for more than two years and given the reach and effectiveness of us sanctions on wa wa. If men's prosecution is intended as a further measure against the company seems to amount to making the rubble jump in any event the cases nearing its conclusion and should be decided soon. The latest round of hearings began today and now a word from our sponsor extra stopping advanced threats with network detection and response. Let's face it. Cyber attackers have the advantage. Extra is on a mission to help you take back. Regain the upperhand with security. That can't be undermined outsmart or compromised with complete visibility from extra hop enterprises can detect malicious behavior. Hunt advanced threats and forensically. Investigate any incident with confidence. When you don't have to choose between protecting your business and moving it forward that's security uncompromised. See how it works in the full product demo free online at extra hop dot com slash. Cyber that's extra h. o. p. dot com slash cyber. And we thank extra for sponsoring our show we here at the cyber wire are very pleased to announce that another microsoft cybersecurity. Podcast is joining the cyber. Wire podcast network. The show is called microsoft security unlocked. And it's hosted by natalia. Julia and nick filling ham. Who join me with a preview of what to expect. We're very fortunate you know. There's literally thousands of people at microsoft working on security Be it you know building. A are the building product And actually protecting customers and so we are in a very fortunate position that we can send them an email and say. Hey we've got this little podcast and We think you're doing some cool stuff. Can we talk to you about it. And and try and bring to light some of the you know the great new techniques research. That's being uncovered on a daily basis. it's a. It's a very fun job. Natalia very fortunate and where we're very much enjoying the podcast other than just having the massive rolodex. we also are fortunate to have so many eager new guests. The microsoft security folks are so excited to share the work that they're doing so you can feel that energy on the show and it's also just awesome to continuously have new guests who want to come on and share the work that they're doing it really speaks to that mission driven approach to security as a as the co host and so the produces of this podcast. We really do wanna make sure that we. Aren't you know just talking about microsoft and microsoft products. We actually try not to say the word microsoft and the podcast over the names of the of the products. Because that's not what this is about. This is about bringing to light the work that really really talented and experienced People dedicated folks You know at microsoft really across the globe or doing to protect Obviously ourselves and our customers but also really trying to make the cyberspace sort of a safer place Some of the more recent episodes. We did We are very reason. Episode was about. How do you have cybersecurity conversations with business partners that have no idea what cyber security is. So that wasn't a technical discussion at all. It was really bad. How do you talk to people that don't really understand your main And then we've also dived into The nuts and bolts of the rust programming language. And we've looked at how to use sikua firm where we've we've really gone up and down the stack it's we we cover a very wide range of topics italian. I'm curious you all are a few dozen episodes in now. What is the value proposition. That you think the two of you bring to the table which the do do each view as co host. Bring a different perspective to the program. I don't know about perspective. But i do think that we we tend to ask different questions which is great. We complement each other in that way. I'm going to speak for both of us neck but you can correct me. I think we're both really interested and the cybersecurity domains you we have that inherent passion and we're both very curious and so we come to these episodes and speak to our guests with that perspective and minds just ear to find out what they're doing And eager to unlock that for. I used unlocked. Look at that batman audience. One thing i'll add is on not a security professional. That's not my background And i've been a microsoft a long time of sort of being the technical space for a long time. But i don't come from a professional cyber security background and so i actually used that i hope to to the benefit of the audience. I hopefully get to ask some questions. That maybe sometimes don't get asked because they're thought of as you know sort of title stakes So we do revisit a lot of those fundamentals. And i hope that the audience appreciates that because we will from time to time. Come back and say you know what. That's a sort of a buzzy word that we've used a lot there. Let's just sort of revisit. What that means and rapa head around that concept so you know i think forty episodes in on this one so we were starting to understand the space but we're also bringing to it that sort of fresh perspective of people that you know. Want to make sure that we're not glossing over a concept or an idea or a technique. That may not be familiar to everybody now. Nick just for a point of clarification. Here i mean previously Joining our cyber network was microsoft. Security unlocked cease. Oh series this is microsoft security unlocked. in a bit of Challenging branding differentiation. There can you help us understand the difference between the two shows and and so that people aren't confused and know why they should tune into this one. Thanks we we'll get. We'll get on trying to create some clarity that we might need to revisit those but yeah there to podcasts. The first one is secured on love italian night co host. A weekly podcast. We've been going for about forty episodes now and that's where we we have conversations with anyone and everyone at microsoft working on security and and we will cover a really wide range of topics based on what's going on security unlocked. Say-so series with bread That actually came to the cyber war earlier A couple of months back and that is with microsoft's chief information security officer. Elsie celebrate on We have been pestering bread for years to allow us to create a podcast with him. Yeah he has the ultimate rolodex and so he's podcast comes out every two weeks and that's him having conversations with his security later colleagues microsoft But also some of the seaso's of you know the biggest and most interesting companies out. There take tuxedo Lululemon telcos you. Name it He knows the mall. And that's what's happening on his podcast. I would say to cyber listeners. You should really subscribe to both listen the boys but they are. They are different podcasts. One is weekly. That's natalia and myself and then it comes out every two weeks where he chats to other ceos. I have to say For for our listeners. Who may not have yet checked out Security unlocked There is a a tremendous amount of energy in a real sense of curiosity. There that i think is contagious. Sin one of the things. I like about it is that there's something for everyone. You can be someone who's just starting out on their journey Or someone who's a seasoned pros been at this for a while and the spectrum of things that you all cover as you say is so wide everybody can get something out of it. It's it's time well spent. That's italian delia. And nick filling ham. They are co hosts of the microsoft security unlocked. Podcast you can find it on our website. The cyber wire dot com or wherever the fine. Podcasts are listed. It's time to take a moment to tell you about our sponsor taint him today. No industry is exempt from the growing threat of ransomware ransom attacks. Against critical infrastructure private companies and municipalities are alarmingly more frequent and pervasive in. Twenty twenty one. It and security leaders must act understanding network assets and their status is the first step to reducing in organizations attack surface improving cyber resilience and accelerating incident. Response check your environment today with tantrums freeze cyber hygiene assessment busy taneomi dot com slash cyber dash hygiene dash assessment. And we thank ten. Am sponsoring our show Pleased to be joined once again by david. Defer he's the vice president of engineering and cybersecurity at webroot. David it's always great to have you back You know we've been seeing a lot in the news. Obviously about ransomware a certainly a hot topic. I wonder touch base with you today about some of the things that are kind of running below the surface. Some of those hidden costs. That folks don't always think of when it comes to ransomware. What can you share with us. Today i'm david so yes. It is in the news everywhere for first of all. Great to be back levine on the show but yet you know we think about the paying the ransom we think about the the folks who maybe you're not able to deliver your solutions are or or do business When when you've been affected by ransomware but there are a lot of other costs behind the scenes. Some of them tangible some of them. Intangible that that. I think a lot of people need to think about. Let's go through some of them together. Well operationally one of the first things you have to think of. How much is gonna cost you to get back up and running. And that's not just i have to restore some computers. You know could be systems. That went down. Hard may be affected directly by. How are you going to bring them back online. These these large industrial systems. You don't just flip a power switch and turn them back on you know you. Don't reboot them like a pc or something. There's a lot of effort operationally in bringing large industrial systems online. And that's something people aren't thinking about. What other things are you thinking of here. Well there's the brand reputation i mean you and i are we. You can't really hurt our brand. 'cause i renounce bottom kinko lower than zero. That's exactly right so we don't worry about that but you know there's a lot of really companies out there that that this brand reputation is is a big deal and one of the things we say in cheek is. It's always nice to be the security guy of the competitor of the company that got hacked because all of a sudden. You're going to get a lot of money because the you didn't get hacked budget but your your company doesn't want your brand to go bad So that happens in your industry and at one hop over You know that's when people start paying attention and saying you know this really does affect our brand and we've got to keep our reputation strong right right. Well what. What do i have to do. Security person to keep that from happening to us. That's exactly right and and again you might see it happen in healthcare but recently here if you're an oil and gas you're like well we're not healthcare. We don't care about that. But i promise you everyone who was a competitor of j. b. s. security. People got a bump in their in their annual budget. Yeah yeah that's interesting any other ones that come to mind for you you know. It's just a general shutdown of business a lot of times people. Stop and say you know. Here's the cost if we want to recover from ransomware. But they don't look at the bigger picture. And if you can somehow factor in that larger picture across your organization it becomes a lot less cost efficient to be prepared for a ransomware attack and and that's easy to take your senior management. You're bored and justify the cost. What about the emotional impact to accompany to have. I don't know this the sense of violation. It seems like it's hard to put a dollar sign on that. That's something. I haven't thought a lot about because usually were in the middle of it trying to recover from a but you're absolutely right and not only that you're you're wondering will this happen again that i get everything and so you're spending a lot of energy and a lot of cycles Trying to make sure that you've done everything you can to prevent it and your folks are wondering could it happen again. What sort of advice do you have for folks to make sure that they've got these. These things covered well back in the stone ages david back in the eighties and nineties. When i first started in this industry we spend so much time protecting against environmental disasters. We'd have multiple setups. There was no cloud and we would spend a lotta time. Testing fail overs. Testing recovery's and people just have lost sight at. They don't spend the time that we used to. I guess when you spend twenty million dollars on a computer in the eighties you're going to take the time to verify that it will rollover but now things have gotten so grand but less expensive that they just we assume fail over so you need to take that time to ensure you can recover from things. Yeah it's interesting. Because i it strikes me that so many people they cut these corners because they think it may give them some sort of competitive advantage and maybe they're just playing the odds whistling past. The graveyard is not going to happen to us but then when it does boy can chore seemed to be short-sighted you've nailed it because it is short-sighted and if you if you get get away with it then i guess it's okay but i think somehow as an industry we we talk about this a lot but how do we get folks to consider you know what their posture their defensive mechanisms. That are in place really protect this company and its cost of being a good company with a good reputation. So you wanna do it rather than the stockholders always wanting you know if it's if it's public company always wanting your low ebidine you're hitting your margins and all that like how do you add that value in convinced people. How critical is yeah. I did not answer your question. I put it out there. I don't know we gotta do. No i mean i it's. It's not an easy question but certainly If you're the person standing in front of the board of directors and saying boy. I really thought we you know we were just crossing our fingers and hoping we'd be lucky. That's a hard conversation to face. That's exactly right and then you don't want to be the board that the security guy has a bunch. I told you so emails that said i tried to bring us up. But you wouldn't listen like you know what i mean. So it's it's we gotta figure out some way to make this an equitable thing that people value so it actually adds value to a organizations bottom line and am a monetary value as much as this is a is a reassurance like all right well david before thanks for joining us. Great david thanks to all of our sponsors for making the cyber wire possible find out more about sponsoring our programs at the cyber wire dot com slash sponsor and that's the cyber wire for links to all of today's stories. Check out our daily briefing at the cyber wire dot com cyber wire. Podcast is proudly produced in maryland data the startup studios of data tribe where they're building the next generation of cybersecurity. Teams and technologies are amazing. Cyber wire team is trey. Hester elliott peltzman peru. Precaut- justin savy. Tim no dr. Joe carrigan carol. -tario yellen nick. Veliky tina johnson bennett. Mo- chris russell. John patrick jennifer ivan. Rick howard peter. Kilby nine gave bittner. Thanks for listening. We'll see you back here tomorrow and now. A word from our sponsor verizon mitigate the risks and realize the benefits of digital transformation with the help of verizon a leader in cybersecurity managed and professional services for nearly two decades from secure cloud computing solutions to advanced detection and response capabilities. Verizon helps secure data networks and infrastructure of many of the world's best known organizations their annual data breach investigations. Report is considered the gold standard of cybercrime research. And verizon's leadership in network wireless and iot connectivity makes it uniquely capable of protecting the ever expanding attack surface. Let verizon help you optimize your defenses and achieve the maximum return on your security investments. Learn more at verizon enterprise dot com slash products slash security.

microsoft cooper netease adam derra us nick filling ham dave bittner natalia russian government jay frog nick filling lennox systems Netease adam darragh vancouver cooper nettie russia Delia cova sikua
Episode 182  Hands-On Kubernetes on Azure with Nills Franssens

Microsoft Cloud IT Pro Podcast

37:18 min | 1 year ago

Episode 182 Hands-On Kubernetes on Azure with Nills Franssens

"Welcome to episode one hundred eighty two. If the Microsoft cloud it, pro podcast recorded live on June. Twelfth the Twenty, twenty Show about Microsoft, three sixty, five and Azure from the perspective of it pros and end users where we discussed topic or recent news, and how it relates to you in this episode, Ben Scott Hop on a call with Nielsen. Senior cloud solution architect at Microsoft, and the author of hands on Kuban Idiots on Asher which is currently available for free on Asher Dot Com. We have another interview episode for everyone today. I'm kind of excited to go through this one, so we have a guess. Two I've done some work with in the past through a couple of Microsoft projects that I've been involved with and some community programs over there like hack. And yet he just recently wrote a book to which having participated and written some technical materials before I very. Very much feel the pain and we can kind of commiserate about that if you want to as well but knows why don't you go ahead and introduce yourself here? Real quick. My Name's Neil. I'm a senior cloud solution. Architect Wave Microsoft Saturday California, and sculpt said. I recently wrote and published a book on running Kuban Eddie's on issues, which was a very fun experience of. Writing the book outside of Coober my main area of expertise within Azures, everything related to infrastructure, networking storage, and the general automation of the platform it I'm actually very happy to be here in Dr Ben Scott because I'm. Melissa of the bulk cost as well all right well good to hear. Did you actually say you had fun writing the book? I think you're the first person that I've ever talked to. That wrote a technical book that actually said. It was fun. Definitely were periods where it was not as fun, but I think the overall experience was was a good experience. It took a lot of time and a lot of nights and weekends. Because the one thing that I didn't realize when I signed up was a new Microsoft. Head, a moonlighting policies like couldn't anything during business hours. But I didn't realize was exactly how much time was required to write the book? But if I look back on the process, I might not have been the right word, but it's actually. A really nice process that I went through. It's fun to actually have the physical book in my hands right now and I learned just during the process, both technically and about becoming a better writer as I can see that so I've never written a book I Scott has helped on an one son, but I'm like it would be. I think from my perspective. It would be interesting to go through the process if you like you learn a lot, because obviously you want the book to be correct, and you want to really cover everything. And I feel like once. You actually had that book in your hand like you said it would be really rewarding knowing that you went through all of that Rhoda. Yes, I think you're absolutely correct. You learn a ton and I think you when you're writing a book. You actually need to go back and take a a beginner's mind set about to. That! You're writing about SCUZ- once you're dealing with something and you've been working with something for couple of years. Certain basic things you don't even consider them. You don't even think about what's underneath those basics. And then once you actually start writing a book. You need to go back to basics and figure things out and what I found important was using the right words and the right. Thurman Theology to describe certain things, and just as a as a stupid example in Kuban Netease. There's this thing called ingress. US to do some layer seven load balancing as basically do routing based on the on the host's name that you said. And there's an ingress and egress controller like I always true both terms together, and I didn't. Even I didn't even realize what the difference was between England in English control that just makes the words and when writing the book. I was like I'm writing this actually what it used the correct terminology that was some of the things that I research learn because when you write a book, you actually want to be factually correct. You don't just want to be. Works actually want to be correct in when you say if you're a good author. Yes, that's the Wego so I. Make that mistake all the time I tend to use those two synonymous as well. It's not something that I would very much. Think about along the way 'cause usually once like you're down to the nitty gritty. You're like what do I actually need to do I need to. Set up a deployment for an ingress controllers, so I can actually get that software that construct whatever that thing out there so we can start routing traffic for me now. One of the things that helped as well for me when I was writing, this was a had somebody from the public of the book is in self published that worked with a tax as a publisher and I had a I. Don't know what the actual term is of the person, but it had somebody working with me. Me who reviewed everything that I wrote and she wasn't a Kuban eighties expert, which was actually perfect because when I wrote, something and something didn't make sense to her. She could actually all of clarifying questions which I didn't head to research myself. Is something just seem so common sense when you're dealing with something for a couple of years, the you don't realize than having somebody writing or working view that actually points out to you. Hey, by the use this terminal debt outlawed the in just. Doing my own knowledge and sharpening my own knowledge, you should get on board that train I am not a coup netease guy at all. Well. Probably so does this book take it from? Let's say someone of my level who literally knows just a little bit about Kuban as from what Scott and I have talked about on the podcast to someone to your level level knows where you're like a complete total, expert and communities, or should you have some Kunis experienced going into this kind of what does that level of the book, and what is the I? Guess the journey or the path that the book will take you through as you work through it or read through it. Doesn't require you to have prior knowledge of COOBER netease itself. The main focus on the book is. Is Hands on Cougar Netease on. Azure, so, the book is very practical with a lot of examples in its, and if I think about how we laid it out, there's three sections in the book itself. One is just simple basics where we cover. What's docker containers? What's distinct gubernatorial? And why do we even need? It's to setting up the cluster which are like the absolute basics. Then there's a second section which covers more of the Kuban Nettie. Constructs that you need to know like the Boeing pods the blowing. Doing some ingress for how you could potentially secure certain things, so that's section which focuses more communities itself and it touches on Cubans on Azure in a couple of places for instance wave, a cluster although scaler that needs to interact a little with as your, but it's that section I think ninety percent of that section. You could do on Azure. Coup Bonetti's cluster as well as on. On the KUBAN edges cluster anywhere. Let's say you run them. Run it on your own laptop or even on DC theoretical. Yes, I think ninety percent of the content is pretty neutral, and then there's a third section on the book where we actually go over some more of the indepth as integrations where we describe how you can integrate wave some services. There's a chapter on event. Event hubs. There's a chapter on my sequel databases. There's another chapter where we actually run as your functions with data, which is pretty new project as you functions with Kedah ownership group, Bonetti's clustered. You don't have to have any prior knowledge. If you want to start reading the book and if you actually have no prior knowledge is a good way to get you started from like. Like level one hundred to level three hundred. The book itself doesn't make you a in-depth experts because there's too much in the Kuban, eggs ECO system to fit in one book I believe, but I think if you have no knowledge of it. It's a good way to to get started to get a good understanding of what it takes to build and run applications on Cougar Netease. As IT professionals and the cloud era. Sometimes it feels like we don't speak the same language as the rest of the organization, so when stakeholders from finance or other departments start asking about a specific project teams azure costs. They don't always realize how much work is involved in obtaining information sifting through cluttered CSV's in a complex massive Meta data in order to mainly create custom views and reports. It's a real headache on top of helping you understand in reduce. Reduce your organization's overall as spend share gate, overcast lets you group resources into meaningful cost tubs and map them to real world business scenarios this way you can track costs in the way that makes most sense with your corporate structure whether it's by product, business, unit team, or otherwise it's a flexible, intuitive and business friendly way of tracking azure infrastructure costs in. It's only available in share gates overcast. Find out more share gate dot com slash it pro. Yeah I think there's a lot that goes into their communities and that ecosystem, and then there's AK s which is Cooper Netease com sort of, but you've got this entire management playing on top of the Regular Management Plan. That's sitting there. With Cooper Denise itself as a construct like last week I was doing a boot camp for partners and delivering infrastructure sessions, a cast and we got down into the weeds on something. It was Tainton toleration suppo-. Let's go through and setups new node pools and talk about ways that we can constrict certain types of compute our pod specs just to those note pools, and here's how you would do it in Cubans land, and by the way over here in a it's gotta be a little bit different, because we have to drive through the Management Plan for arm to set the taint on a note I can't just go into the cluster and do it myself there so. So, there's just little one offs like that, so it's worth it totally to understand Cooper Daddy's and then once you're into you. Come out the other side and you go like okay so now how do I? Operational is all this and make it work on top of Asher which is really interesting and fun exercise like you mentioned maybe talking to a database like my sequel, or you want to run like Cosmos DB or get out to a functional things like that now you're got compute. That has to interact with virtual networks and other of the plane to get out and do what it needs to do, and that's like starts to get really kind of fun. Because it breaks quickly if you're not too familiar with Coburn, Netizens Aka s that's a very common confusion is what do I need to do against? The COOBER NETEASE INC meaning. What do I do using QC Taylor? Cucuta whatever you WANNA. Call it and what do I actually do? When do I call an Army Guy Window window cube? CDL and I do as E. H. E. S. command because certain things like if you think about although scaling, which is something we describing the book itself there too axes to although scaling wants although scaling your. It self for which you would use in an horizontal auto scaling makes you figure in Coober Netease using QC thehill. Aldo scale your cluster, you actually do a as Zeke amount because the blister, although scaled or something that figures on your behalf, so those little nuances sculpture pretty much correct that there's little nuances left and right in how you have to do certain things. What are the interesting things about KS is an I found over time kind of working with a it really drives you towards better practices round deployments and. In general you end up with a lot of these features like you mentioned the cluster auto scaler I can go back and potentially you know sometimes have features that aren't even going to be enabled in my cluster, unless I turn them on the very first time that I spend that cluster up and start to get ready see quite often ended up tearing things down, standing other things back up and helps to understand even that whole ecosystem of what to be. Be there when I start like you mentioned ingress controllers like hey might actually help me to control their lives, some resource that can live side by side with by cluster like an APP gateway or something like that and start to externalize some of that infrastructure, and make it a little bit easier for me to do like migrations and swings, and just all the the random stuff that comes in with Ak us because you're like. Oh, it's Kubis until it's not. And then he start going down the path I. Don't know if you've ever rented anybody. Do you get into like AK s engine at all in your booker? You focus mostly on just the regular. Vanilla Cooper nutty offering described like one paragraph to just explain what gas engine is, but we don't touch it at all in the book and based on a Mike Experience and it's pretty strange. I thought that AK ascension was a lot more used, but in in the customers that ideal with either day run AK s itself, or they are very brave, and they just run their own clusters. At least the customers that I work with I, don't know. Know. If you've seen other things, I've seen one or two. I haven't seen anybody who uses. Aka S engine I did have some customers in the way back when who used ACS engine when that used to be a thing, but I think it's less and less of potentially a driving force today like most people have settled on kind of mobile docker as a container runtime, so it's not a big deal to come in and say like Okay Aka S. This is what you get out of the box so unless you're driving into something like you know you don't WanNa, Do Boon to sixteen zero four eighteen o four for your Lennox nodes then. You've got to go down this path. And once you start going down the other path like. Can you say CASS engine or like? You said I can just spin it up myself at that point, you might WanNa. Spin it up yourself because you probably have the expertise with Cooper Netease to make that happen, and you understand how to operate it and keep it healthy I after that, and if you don't the knowledge how to operate it yourself, you definitely don't want to use a gas engine because it abstract so many things for you, but with engine it. That's not the managed service. You just get a template. Federal Employees Cluster for you and. A cluster is there. You're on your own, and if you don't know how to operate it, you're gonNA. Have a tie. Because Cuba's very finicky. Do you feel overwhelmed by trying to manage office three sixty five and firemen. Are you facing unexpected issues that disrupt your company's productivity intelligent because here to help much like you take your car to the mechanic that has specialized knowledge on how to best keep your car running. Intelligence helps you with your Microsoft. Cloud environment because that's their expertise intelligent keeps up with the latest updates in the Microsoft cloud to help. Keep your business running smoothly and ahead of the curve. Small Organization with just a few users up to an organization of several thousand employees. They want to partner with you to implement and administer your Microsoft Technology visit them at intelligent dot com slash podcast. That's I n.. T. E. L. L. I. G. I. N. K. dot com slash podcast for more information or to schedule a thirty minute. Call to get started with them today we member intelligent focuses on the Microsoft cloud, so you can focus on your business. I think it's one of those potential things. Run into, and you might run into it with even like certain pieces of software inside of is like Ben does a lot with share point so I'm not going to be able to run my share point farms inside of VM scale sets because it's not the farms not GonNa, like new servers, coming up and coming down, and not being demeaned joined. It's. It's just not a thing that that's going to be very happy. In abandoning at an is, and you know you still use feel ability sats and spin things up manually, and you know maybe you find other pieces of automation on AKA S, engine versus Ak s is i. think a little bit of that same argument where you're going to say I either WanNa. Take on the management like I. I want to be in infrastructure, and that's where why I want to be or I want to be in this other managed service. Just give me that management plane for free and I think that's gotten a little bit better. Particularly for some types of customers like now that there's the SLA that you can purchase for time the gaps there that have just been filled over time and it makes. Makes a lot easier and a lot more consumable where you don't have to go down that path and say well. I'm GONNA spend everything up myself in arm templates and hope for the best later to add to what you just said. It's called I like how the team evolves really quickly, and also like how how they are sharing all the updates that they're doing like they have a guitar. Repository. They've an awesome roadmap, yes. They have their own on gets, but they also have like a chains log that you can see weekly. Every week, which changes were made against the AK a service, and if you just see the change log, some of the both the minor and major tweaks, do our stunning to just see on the week to week basis, so if you, if you decide to run your own either using AK s engine or running your own cluster, all of those little tweaks are things that you will have to engineer and operationalize as well so using managed service. For me a almost a no brainer. If. You're going to go down the path. I'd be one hundred percent with you on that and. I think that that one last hesitation was maybe some things around SLA, an that last piece of friction was removed, so it makes it very consumable now, thank. You can step in, and you can understand what you're getting. They've definitely Rampton things like availability, so thinking about maybe even just resiliency and being able to run with availability zones for your nodes now being able to do multi zone node pools, the external controllers like APP gateways or integrations with other external services like it's a really compelling story like if you're going to use all those bits. Bits and pieces once you get in there, and if you're just spinning up like you're like I'm going to play with Cooper. Nettie and I want to spin up a you know a single node cluster and see what's going on Lincoln yeah sure, but if you're going to be running at any kind of scale, you might as well be in that service. You're not paying anything for the management plane. You're only paying for the compute, and you're gonNA. Pay For that anyway. If you're running into an Jenner I, answer, however you want to do it. I'M! The one of the things that you mentioned over just as an interesting topic of discussion, Rice customer struggle with still today is with multi region deployments goes I think the answer from goober. nutty perspective is still for multi region deployments. Just have your pipeline. The boy twice once in each region and figure out your data strategy for a lot of companies that are using or that are used to signals like in your site recovery that can just. For VM's to a secondary region where you don't have to do anything except setup Sr multi-region is still pretty difficult for. For customers to rub their head around with Cougar Netease and itself just as you. Just. guber nineties in general I would agree with you there and I think some of it goes back to like for better or worse. I found the AK s drives into a model where you end up spinning up a lot of new clusters, Blake, regardless of the whole multi-region thing like there's always gonna be a reason to create yet another new closer. You Got Yemo yet. Another markup language y'all Scott Yank yet another new cluster. and. You're driven into this model where automation is your friend whether that's simple, deployment, scripts or richer? CD within there, and then that starts to give you some of that flexibility like yes, it's a pain to potentially duplicate your pipelines and have ultimately to deployment targets, but it lets you do things like hot, warm or hot cold scenarios. You understand the time that it takes to not only spin up a new cluster. Spin up your infrastructure around it, and it lets you potentially use native tooling to help you through some of those. Along the way like if you wanted to use like happy, oh, or Valero say like a backup and restore across clusters. You could totally do that in a multi region scenario pretty easily like Valeros, got a really nice integration with Ak ass, where it can snapshot directly to Azure Blob Storage, and then you can just pick up that blob with the SAS token from Valero in a cluster in the same region, or in another region, and go ahead and restore it back and get it. It to where it needs to be, there's like weird nuance zones, and like if you've got volume claims and things like that, but if you're in just in like a next next next kind of cluster, you understand your deployment model, it's a fairly flexible thing to get into. I mean I would always wish and yeah. Just give me a button that does it for me kind of like Sr does, but I think there's so many moving pieces with Kuban Eddie's in general that it's. Tall ask to get there. It's one thing for your service provided. Understand the service plane that they offer to you. It's another thing once you start deploying your pods and running your applications with all your services and everything on top of it I haven't played around with Valero at also I'm learning something new here and. When you mentioned it snapshots uric, yes, cluster does it make a snapshot of your running so basically everything you have deployed, or does it make natural of Beta drives it a or discs that are attached to pods? Yes, you can do things like sap, Shaw, individual pods or s after you can back up Valera. They called her back so you can back up individual pods. You can back up entire namespace so if that namespace like say you through a couple of. And you mounted some managed discs into those pods. It'll take snapshots of those managed discs as well and include those as part of your backup, so when you go and restore into a new cluster into the same cluster whatever it happens to be, it will bring those managed disks back for you as well. It's pretty nifty. Yes, absolutely around with that someday. They made a pretty slick integration for it. They're certainly some infrastructure that comes with it, but I think there's infrastructure that comes with anything. You want to monitor your container. You see want to back them up. You know there's more kind of system pods and things that you need to stand up. But it is for the most part fairly straightforward. It's kind of like we would do an integration with anything else we're. Maybe you want your. You want your cluster to talk to you a CR well, you've got to give it some type of our back into you. Can Go ahead and get it, you know. Let's let the service principle for the cluster through same thing with Valero. You've got a storage count. Everything just goes into blob storage, and then you can either interact with it through service, principles or Y'all. If you want to hook up with like an access through storage explorer, something like that, be able to download individual backups or create Sas Tokens to him whatever you need to do. That's all possible as well did I catch you selling blob storage costs Scott's. What's up now I'm not allowed to do that. I was just getting. The one of the interesting pieces that he highlighted. There was all of the additional infrastructure that you still need when you're running in Coober Nettie. self is just and I don't WanNa. Use The word simple, but it's just an orchestrator. They're so much more bed. You need to run an application at scale. You need to have your monitoring. You're logging your security and oldest oldest things. Things require additional infrastructure. That's a love in I, think it's a reading Goldi- to work with I don't think it's a fit all for every application that should be developed and the boys and you and I actually have some experience with together on moving something from Kuban. Eddie's into as your web APPs for containers, which for some application might actually be a better fit, then gubernatorial cluster es. So. Here's something to talk about now. Thought of a question I can ask you guys not being cooper. Nineties Person. I'll look at a great way to improve productivity and save time in the workplace and sperry software has all the atoms. You'll ever need to save PDF. Adan is a best seller in this great for project backups, legal discovery and more this Adams saves the email attachments as pdf files, it's easy to download easy to install and sperry softwares. Unparalleled customer services always ready to help download. Download a free trial at Sperry, software DOT COM S., p. e., R., Y. S. O., F., T., w. a. r. e., Dot Com and see for yourself. How Great Savings PDF is listeners can get twenty percent off their order today by entering the code cloud it. That's cloud it, C., l., O. U. D. it all one word at checkout, sperry software work in email, not on email. So as you talk about that, what types of applications are actually suitable for? Because like you said, it's not necessarily one-size-fits-all, just like share pointless should never be used as a database. What scenarios or what applications is a ks tend to be a good fit for? Straggling and I'll sculpt. Chime in as well I think it s and Kuban, Eddie's four. Just a generalization are real good fit for applications that have been designed with a micro services mindset from the start you can run traditional monolithic applications in gubernatorial as well, but it's less optimized for it. Another workload that fits really neatly in gubernatorial is everything that stateless. If you have any stateless applications or stateless, a the is it you need to run at scale that works really well in coober. Netease states and storage are getting better, and we've every release of Kuban itself. The support for state than the support for storage gets better, but it's still sort of a hassle to manage a stateful application on Guber Nettie, so I typically recommend customers for anything that has to do with states, just use a manage service. If you need a my sequel database, don't run my sequel on your cluster. Just run service for my sequel, and that'll take care of a lot of pain for. For you sculpture, what do you think are good prime applications for humanity I think that's kind of a a great approach to it I. Don't know that it works for too many existing things. Even if you have something that's like a well, architects said of her services to take those and translate them into Kuban. Eddie's means learning potentially this whole new network plane you know. How does DNS Work and Coober Daddy's? How service discovery work what am I. Considerations around securing these end points, which you probably already have a good handle on over in whatever your world is today, but if you're looking at something, that's Greenfield's like. It's an awesome option over there I love the idea of put. Put your stateless things kind of in Kuban and anything that stateful in externalize it as much as you can from the cluster like databases and things like that. It's really easy to say you know. I have Mongo DB today, and Mongo's running in a container already so I'm just going to bring that over as is to Aku Brunetti cluster. Will now you've got to work through that problem of yet? Another cluster backup restores, and it's just easier if you can externalize all that and have your to your in your web tier, and things like that potentially sitting in your cluster ready ready to go, and then you can kind of get to that potentially that polyglot motto where you really. Really taking advantage of the cloud for what it is, and running the right tool for the job so where you're going to be able to kind of Eke out your best deficiencies and really move things forward as you do those deployments, I'm not a fan of Kuban Netease for the sake of Cubans I think there's too much else out there. That does a really good job. So as your Web APPs for containers is an interesting thing. It's not the end all be all, but it's interesting for some use cases I'm a huge fan of Asher. Container instances I know that I use those all over the place for infrastructure deployments, even for some of these smaller micro service applications like. Most people think of Casey I as a single instance thing, but it's really a container group, so I can put a bunch of containers in a little group together, and as long as they meet the compute needs of one container instance, which are pretty darn high. I can put them all together. Still make them do what they need to do. Without having the overhead of a entire cluster weighing me down, I have a very popular opinion here, but there is nothing wrong with running a really nice integrated ultimated. VM deployment as well absolutely not if that's the thing that works, do what works and what you're going to be comfortable with and what you're going to be able to operate. Do Today. You're going to spend so much time learning cougar. Nettie is rather than potentially getting started and being able to actually accomplish what you probably set out to do. Whatever your project was focused on, so you can kinda learn those things on the side and get them going, and maybe over time like you start out these it in VM's and as you re factor that application and you start to transform maybe. Turning to micro services or little task, runners or things for you that are out on the side that are better suited to containers in general. Then he can think about the container ecosystem, and if Cuba thing cooper NASA's the thing, but there's tons out there it's it's a I've always been really impressed with the. The breadth, and and the number of options that you have for running containers in Ayrshire a to what you said. Startup VM'S NC were addicts. You actually working with customer? I don't WanNa. Spill their information here, but they moved something to as you're about a year ago. They did it the right way. Did it fully updated? They had a thorough forum deployment to their application and everything was fully automated then they saw. Saw increased amount, so they deploy more and more using their form, and they're right now at a point where they have a fully automated deployment system, but they're realizing that the using vm's the density that they're able to reach is not as high as they would want it to be, and they're not looking into guber netease as a solution to get more dense applications than they're able to run on virtual machines. Taking the crawl walk run approach to cloud made about relations. They started out with the to vm's in an automated fashion. That's the crawl now there walking and experimenting with Cooper Netease in in a couple of months, they'll be running in having it all in production Cooper Netease at lower costs, but it's they didn't just move. It's one big bang into Kuban Eddie. Switch I think is a release approach certainly sounds like it, and in majority of cases you'll probably find. That's the way to. To Go, you've got enough to learn with going to Azure or aws g like adopting any of these platforms that's enough to wrap your head around on top of getting your applications up and running and getting the Roi out of it to have it make sense. I agree one hundred percents. There's so much learn, and maybe it's better to stick with something that you know right now than to go full blast. Go you and make everything new. That's not gonNA WORK GOING SLOW YOU DOWN! Down tremendously and they think that she has talking through that. That's something that I feel like us. As technologists can have a tendency to do sometimes, too as we hear the latest hot new technology whether it's gas, or whether it's some new service and Asher or some new software in too often I think because I do myself. It's like Oh, we got to get everybody on this. We got to skip the crawling and the walking and we jump right to running because it's something new. New It's shiny, and we want to get everything on it rather than taking them ethical approach figuring out what's actually best for me. What's the right path I can take and making sure were choosing the right technology the right platform the right services for the job that we need to do I think I can only say yes, and I catch myself doing that every single day as well just trying to get onto the latest and greatest thing like las build when wfl to was announced. I was one of the first guys to into the insider preview fostering so that I could get my hands on the shiny new objects. It's fun though I mean for me I, like it as part of the fun of being in technology to write as he was to play with new stuff, I think it's about recognizing that that's the case, and that's the mindset. It's always worth taking a step back and taking a deep breath on like okay. What's the right thing to do here? Do I go home and play with WWL? wwl At night, or is that the thing that you know? We go out to everybody in the world and say hey, let's do wwl to now and run. You know insider rings and things like that. That's not often often the case. It's always nice to kind of step in and and especially if you've played around with it, you understand that landscape rate you bring that expertise to the table and all right everybody. Let's calm down. Here's some options. Let's rationalize them and see if we can't figure out where we're going from here. And quite often have that crawl walk run thing where you can put that on the table, and if somebody wants to run, you know it might not be in their best interest, but at least you've said your piece. Yeah, kind of like you said even with your book nells at the beginning when you were going back and making sure you use the right terms, and you weren't jumping too far ahead of where people should start what they should understand does just it's good to take that. Step back every once in a while and just look. Look at the whole picture. Yes, definitely, when like when I mentioned the US L. thing that was on my own laptop, and if it if it broke broken, I would just do a fresh install deborah. Oxford developed machine. That doesn't work for a business critical application, so I'm right with you. Yeah, alright, well anything else and the other topics. You guys would like to talk about lightness on I. think that was a good about that. Was that was a lot? It was a fantastic talk. The AD is and there's just some shameless self promotion, but. But the book is available today you can either get a free copy and I think the link will be in the show notes question-mark Yup. We'll put it in there as long as you give us the link. Correct in the show notes I already have a link to the copy that I think you're going to talk about the the one on Asher. Dot Com, so there's a a free copy might deserve thing on Azure Dot Com will share the link if you prefer a print version available either directly from the of the share on acts. Acts, have they'll come, or you can find a copy on Amazon. Dot Com as Israel excellent include links to both of those as well and then people can choose all right we'll put twitter handles and stuff, too. Is that the best place? If people have questions, those WanNa reach out to you, twitter, twitter or Lincoln I'm equally active on both whatever platform you prefer. You can reach out to me all right we'll put both of those in the show notes as well a link to your Lincoln profile as well as your profile on twitter. Thanks for your time. This was fun. Thank you will have a good weekend, guys. We'll talk to you later. Thank you. If you enjoyed the podcast. Bill Lisa, five star rating an IT and helps to get the word out so more it pros can learn about office, three, sixty, five and Asher. If. You have any questions you want us to address on the show or feedback about the show. Feel free to reach out via our website, twitter or facebook. Thanks again for listening and have a great day.

Kuban Kuban Eddie WanNa Microsoft Asher Cooper Netease Azures Cougar Netease Ak Coober Netease Guber Nettie netease Dr Ben Scott Valero Kuban Netease US Ben Scott Hop COOBER NETEASE INC Cooper twitter
Cloud Native Is Trending: What You Need to Know

7 Layers

17:54 min | 6 months ago

Cloud Native Is Trending: What You Need to Know

"The next generation. It infrastructure industry moves fast. Never miss a beat. By subscribing to sds central's daily newsletter at s dx. Dot i o. slash newsletter. Hello and welcome to seven layers. Where every episode. We take a look at a different technology that connects our world from literal wires in the ground to switches and routers and all the way up to the exploding amount of smart devices around us. Remember to subscribe to seven layers. So you never miss an episode and tune in to our next episode where wheel cover open source technologies and as always you can learn more about the current state of technology over an sds. Central dot com. The cloud era is here and the adoption of cloud environments continues so does the cloud native approach to application development. I twenty twenty two. The idc predicts ninety percent of new enterprise applications will be cloud native. This growth has been indicated by increasing cloud native technologies in market consolidation episode. We will cover the basic elements of cloud native applications. The benefits in downsides of a cloud native approach cloud native use cases trends in the cloud native market in the future of the cloud native landscape part one breaking down cloud native applications cloud adoption is at an all time high. Thanks to the surge of remote work twenty twenty. The cloud industry grew by twenty percent in the first half. Twenty twenty compared to the first half of nineteen as the cloud becomes a common approach to networking for enterprises. The dependence on retrofitted legacy applications is no longer viable with the clouds evolution. The lift and shift approach the process of migrating current enterprise applications to the cloud composedly nc performance issues in doesn't truly utilize the cloud to its full potential for a deeper dive on its evolution. Check out the cloud episode of seven layers that recently published a cloud. Native approach uses cloud native applications for enterprise networking meads cloud. Native applications are developed natively to the cloud or simply put the applications are designed for the cloud. This approach leverages cloud speed and agility. I rapidly infrequently frequently building releasing and deploying applications or application updates cloud native applications are built in and deployed rapidly by relatively small teams providing enterprises with increased agility resilience in scale ability across clouds. The small team approach is called a devops model alongside. This model cloud native development leverages several supporting technologies both micro services and containers are central to cloud native applications as these applications are made of micro services that are packaged in containers. Using micro services is an architectural approach that breaks applications down into small modular components. These small components are then loosely coupled to create the application this creates a highly flexible and scalable application that is typically part of an automated system that allows for application updates to occur within a single component limiting customer impact micro services are typically coated by the developers on the devops team while containers are orchestrated by the operators on the team containers allow for applications to be packaged an isolated in the cloud environment. This isolation makes workloads portable even between different servers and clouds this portability and you'll acidity results in an application that can rapidly scale migrate workloads as needed in accelerate time to market for new or more complex containers software packages that house components of applications offer some of the same benefits that attract people to the cloud in general scale ability agility in creativity skill ability is accompanied by container orchestration. Systems like cooper netease which i will touch on a moment orchestration tools place workloads on the lease us note in a cluster and automatically scale depend on application use. Application delivery is more agile with container usage because of reduced overhead allowing for new deployment to occur quickly the automation orchestration platforms also facilitates the creation of programmable tool chains that are used in the cic development process containers also promote creativity because they facilitate ops culture of collaboration and creation these benefits allow developers to rapidly explore and create new ideas while encouraging unique design approaches unique and different approaches further leverage container systems in the cloud and rapidly reduced the life cycle of new applications. This reliance on containers has led to an adoption of coober. Netease a cloud native open source container. Orchestrator attorney twenty survey found that ninety one percent of enterprises were using cooper nets eighty-three percent of which use it in production. These enterprises also reported they will move more cloud native applications into production resulting in exponentially more coupon. Eddie's clusters these of coober netease comes with several benefits in the two thousand twenty survey and prices rank scale ability and shorter deployment and improved ville ability as the top benefits. Cooper netties in cloud native projects developing and running applications and cooper netease clusters also results in increased resilience in that notes can be restarted for error recovery automation that is scaling and self healing and immutability where optimize packages updates in rollbacks errors. Api's or application programming interfaces is another component that is critical to cloud native applications. Has this is how micro services are often connected. Api's our tools definitions and protocols that connect defined the interactions between applications and services the use of containers micro services results in the deployment of cic d. or continuous integration continuous delivery. Cic de is a method that uses ongoing automation throughout an application's life cycle from testing to delivery and deployment is ongoing automation. Means applications are updated with small updates on a near constant basis. If a problem arises the automated software will identify the issue. Team members will then create and release a solution efficiently and quickly the cic model results in applications and networks that are up to date and a better security skill ability. When compared to traditional approaches cic d is often accompanied by devops model. Devops is an approach that merges development and operations teams. Like i mentioned this. New singular devops team works together across the entire application life cycle. These teams leverage automation to streamline the development process the devops culture and use of automation increases the speed of innovation. Engineers can move at high velocity responding to customer needs in markets faster whether that be the development of new application or new feature to an already existing application cloud native development comes with a myriad of benefits most of them grounded in the culture of devops with devops comes increased efficiency teams are able to release applications and application updates faster and when paired with automation the development and deployment process become streamlined and reliable improved efficiency and scale ability also stems from the ability to take advantage of ready to use infrastructures developers. No longer have to focus on building a common framework. Instead they can reuse existing components in focus on application specific intricacies ease of updates is a major benefit a cloud native application development. Thanks to the use of micro services. Devops teams can update an application quickly by making a change to one or multiple micro services then the cic d. model means updates are continuously integrated into the application on the other hand a traditional update. Approach can take months to reach the end user as the entire application must be updated not just single component the micro services in cic d. model improves in cloud security as security updates reach end-users at a much quicker pace however with these cloud native benefits. Also come some downsides. One of the biggest challenges of the cloud is application security in fact the cloud native computing foundation recommend security is integrated in the development process. Application security shortcomings are the result of the innate design of cloud native applications applications are no longer running constantly instead infrastructures. Turn on and off. This makes running applications more efficient but also make securing a managing vulnerabilities difficult. The agility of the cloud also poses unique security challenges as the environment is always changing in fact sixty one percent of enterprises surveyed in a cio. Report said their environment changes at least once every minute this means new updates or patches need to be launched within days hours or even minutes a speech traditional security solutions can't operate at one of the biggest challenges of cloud native development is making the transition to becoming of the cloud. Native landscape is constantly evolving and upgrading. Systems can become increasingly difficult. This is particularly true. When applications aren't containerized in cloud native counterparts to legacy technologies have not been found keeping up to date with cognitive technologies can be overwhelming. If not a bird which is why okay source. Tools have become increasingly more common as they provide relevant and updated technology at a lower risk with a guaranteed sense of reliability quality and security in fact fifty three percent of respondents in a red hat customer survey said open source was critical to cloud native in digital transformation up from eleven percent two years prior and now a word from our sponsor demand for telemedicine grows. So does the need for connectivity. Five g. meets that need qualcomm remains focused on giving doctors and patients superior security rich five g. Connectivity learn more at qualcomm dot com slash invention age to cloud native trends cloud native use cases vary center around transforming it operations by approaching provisioning scaling integration testing and deployment from a new perspective with modern technology enterprises can bring applications and solutions to market faster by dedicating dedicating time to creating new solutions for new problems in allowing automation tackled known issues. One of the most clear-cut cloud native use cases is implementing new functionality. Quickly automation is a key. Component of the cic de pipeline so the process of developing and applying an update is automated with continuous delivery. Developers application update. His automatically bug tested in uploaded to the repository from their the operations team can deploy the update and make it live to end users. This ensures minimal effort is needed on the development side with continuous deployment. Automation goes one step further automatically releasing the update from the repository to production where it is usable by end users this insures operations teams aren't weighed down by updates and manual processes is ongoing automation. Means applications are updated with small updates on a near constant basis. If a problem arises the automated software will identify the issue in devops team members will then create in release a solution efficiently and quickly. This means when an innovative idea strikes or bug appears devops teams can make solutions functional quickly increasing. The chances solution is i market. This seamless and speedy process of implementing new functionality ultimately means end users get a clean experience which is critical to maintaining customer satisfaction and loyalty cloud native development is often used because it produces reliable applications. This reliability stems from the portability of the containerize infrastructure containers have required dependencies to operate in only rely on the cloud for physical resources. With this portability comes the ability to move cloud native applications to a new data center within a distributed cloud if part of the network infrastructure fails or data center experiences issues. The rolling updates approach also offers a level of for instance if there is an error in the update only few applications in ultimately only a few end users will be impacted. These benefits extend not only to enterprises and end users but also technology companies and developers give engineers and developers a competitive advantage and offers end users and enterprise customers a more reliable and enjoyable experience cloud natives benefits both for end users and developers is reflected in the surge of activity in the cloud native industry in twenty twenty the cloud native market segment experience. Some of the largest emma strategic acquisitions in history some of the standout acquisitions of twenty twenty were vm-ware acquiring octa rain assassins platform for containers and kuban security cisco buying ban cloud a company that specializes in deploying cloud native applications with a host of open source tools in susie. Acquiring rancher lapse in enterprise cooper nettie management platform the acquisitions indicators shift in industry mindset a mindset that fully embraces the cloud this was cisco's second cloud focused acquisition in a two month timeframe both of these acquisitions will join cisco's emerging technologies incubation group showing that cisco is moving into cloud innovation this market consolidation. Is what the vp of strategy at aqua security raining net refers to as the industry's version of natural selection in a sign of cloud natives longevity and potential announced another acquisition in the cloud native space in march twenty twenty one when it revealed it will acquire mesh seven in api security startup palo alto networks spent one hundred and fifty six million dollars in february. Twenty twenty one. When it acquired bridge crew the devops security startup these latest acquisitions indicate a trend in the cloud. Native space a focus on security earlier in march twenty twenty one snick her focus security startup landed three hundred million dollars in series funding pushing its valuation to four point seven billion dollars the same day snick closed its deal. Aqua security landed a hundred and thirty five million dollars in series funding pushing its valuation to over one billion dollars. Clubbing of security is a growing solution. For tackling the increased use of cloud native platforms and the increased number of cyber threats but the devops culture of cloud native development means developers need to get on board with the security needs of enterprises and end users. This meshing of devops and security has resulted in a growing trend of depth sec. Ops groups in organizations deaths cops bring security teams into the development process writing policies into the code in protecting the application from the get-go essentially security is no longer in afterthought improving security and building it into the development process also increases application resiliency if an attack or breach occurs that can be limited to just a single container protecting the rest of the application cloud. Native technology is supposed to see a spike as it will power digital transformation strategies the forrester now twenty twenty one cloud computing report predicts that by the end of twenty twenty one sixty percent of companies will leverage containers in public clouds. All twenty five percent will go service for more on service competing check on my article from december on sgx central dot com with this increased reliance on containers gardner anticipates major growth in the container management market segment. Going from four hundred and sixty five point. Eight million in twenty twenty to nearly nine hundred and forty four million in two thousand twenty four within the market segment public cloud container orchestration and service container. Offerings are expected to see the most growth. Thanks for joining us on this week's episode of seven layers special. Thanks to ask the central studios editor ashley. We sner for writing our script before you go. Let's do a brief overview of what we discussed today the basic elements of cloud native applications the benefits in downsides of a cloud native approach cloud native use cases trends in the cloud native market in the future of the cloud native landscape. I've been your host connor. Craven associate studios editor 's dx central follow seven layers. So you never miss an episode into into our next episode. Where i'll be talking about open source tech as always you can learn more about the current state of technology over an sds central dot com in one more thing. Sgx central is on twitter. Give us a follow at. Sgx central give my pager follow at the definer of core sd x. Please leave seven layers review on the podcast of your choice to help us reach a wider audience. Thanks for joining us. And i look forward to our next episode.

cooper netease idc Netease qualcomm cisco Eddie cooper Cooper cooper nettie
Special Edition Repeat: AWS Analysis with Corey Quinn

Software Engineering Daily

1:06:42 hr | 4 months ago

Special Edition Repeat: AWS Analysis with Corey Quinn

"Amazon web services changed how software engineers work before aws. It was common for startups to purchase their own physical servers. Aws made server resources as accessible as an api request and aws has gone on to create higher level abstractions for building applications for the first few years of aws. The abstractions were familiar s. three provided distributed reliable object storage elastic map reduce provided a managed cloud duke system canisius provided a scalable queuing system amazon was providing developers with managed alternatives to complicated open source software more recently. Aws has started to release products. That are completely novel there. Unlike anything else a perfect example is. Aws lambda the i function as a service platform other newer. Aws products include ground station. Which is a service for processing satellite data and aws deep racer a miniature car for developers to build and test machine learning algorithms on as aws has grown into new categories the blog announcements for new services and features have started coming so frequently that it is hard to keep track of it all corey. Quinn is the author of last week in aws a popular newsletter about what is changing across amazon web services. Corey joins the show today to give his perspective. On the growing shifting behemoth that is amazon web services as well as the other major cloud providers that have risen to prominence. Corey is the host of the screaming and the cloud. Podcast which you should check out this episode. I should also mention that. We have our own newsletter. You can go to software engineering daily dot com slash newsletter to check it out and sign up and we also are looking for sponsors for q one. If you're interested in reaching over fifty thousand developers you can go to suffer engineering daily dot com slash sponsor. You can learn as much fancy theory as you want. But at the end of the day machine learning is still ninety percent data cleaning and infrastructure work and doing it. All manually is exhausting. Not likely to make its way to production especially when your data your models and your code are constantly changing. Pachyderm is an easy to use. 'em l. ops platform that empowers anyone to build scalable into machine learning workflows regardless of whatever language or framework built on pachyderm provides get like data version and lineage to automatically track every day to change and final output result. Meaning you'll also know exactly what data was used to build that latest model automatically right now. S daily listeners can get over four hundred dollars in credits on pachyderm hub sign up today and build production grade data science workflows in minutes without ever having to configure a single piece of infrastructure. Imagine being able to automate your entire data science workflow and still reproduce any result from any point in seconds with complete confidence head over to pachyderm dot com slash s daily to get over four hundred dollars in free credits. But you want to hurry. Because this offer only lasts for a limited time that's pachyderm dot com slash s daily p. a. c. h. y. d. e. r. am dot com slash. Save daily arp corey. Quinn you're the author of the last week in aws newsletter software engineering daily. Thank you it's a pleasure to be here. When did you first start working with amazon web services great question there's a little bit of revisionist history that goes into it. I started back in two thousand eight give or take of. Oh what is this thing. You mean the online bookstore. I'll start clicking around in the console or what passed for the console at the time and it turned out. It was super complicated and hard and i very quickly started looking things like right scale to make sense of it. I've sort of dabbled with it ever since through a variety of jobs consulting engagements. But i didn't get really focused on. Aws until a bit over two years ago as a primary area of interest. You've been in software for a while. How would you describe the world of software before and after aws. That is a terrific question. i'd say beforehand. It wound up having much higher barrier to entry. You had to deal with a bunch of different. Vps providers all of them terrible in various ways. Or you'd have to wind up building out your data center. Having spent entirely too many years doing that. It focuses doing all of the wrong behaviors in your background. You were you like the person that was in a data center or were you just working with engineers that we're in a data center and give me a little bit of background on what you've been doing engineering wise. It wise in the past sure. I started my career as a grumpy unix systems administrator which you can also status unix and from there. It turned into a an awareness of what the rest of the world was doing. It was pretty clear that my initial area of emphasis which was large scale. Email systems was not the wave of the future. Unless i wanted to work for one of maybe four companies so i started pivoting. I focused increasingly on automation and data centers. Using things like puppet. I was one of the early developers behind salt stack. Which really it should not be as much of a condemnation is. It probably is. If you've ever seen my code. I love technology. My code is terrible but at the time there was no one else to do some of the things that needed to get done and there was another shift where configuration management was no longer the way of the future immutable infrastructure is a concept started rising very rapidly so if not getting environments to correct for drift. If if that was not going to be what. I focused on next what was and it seemed to me that when i left my last job and was trying to figure out what's next becoming a consultant focused around a very specific problem was the right answer specifically the aws. Bill was too high. What problems i ran into early. Was that amazon releases. An awful lot of stuff constantly all of them effect economics of what you're doing in your environment. so how do you wind up getting from understanding. What's important and what moves that without getting lost in the minutia so i started collecting a list of everything that happened every week to keep myself up to speed and it didn't take much for me to realize that maybe i wasn't the only person who would benefit from this. I put together a few drafts of a newsletter told people i would be starting to send it out in a couple of weeks. I figured i'd get a couple of people who signed up. Charity majors tweeted about it. And the first issue went to exactly five hundred and fifty people. Okay i guess. I'm doing this for awhile. And that was at the time of this recording almost ninety weeks ago so your newsletter is last week in. Aws explained the goals of your newsletter. Sure it has a few different goals depending upon who you are for me. It forces me to keep up with what's going on and not drift and ignore it. It also winds gathering all of the news from amazon's cloud ecosystem both the announcements that they release internally as well as what people are doing in the community. The problem is is because they giant multinational company. They're not allowed to have a sense of humor of which they are themselves. Aware my first languages sarcasm in stark. So i make fun of the announcements that come out and i include in a way that is hopefully not punching down not calling out individual people that focuses on snark and sarcasm and being uplifting rather than putting everything down and crapping on it all the time and that sort of keeps me engaged because otherwise i'm more or less gathering a bunch of press releases and midway through the second issue i would fall asleep and give up and go do something fun so it keeps me engaged. Lets me express creativity. It forces me to keep up with a very rapidly evolving ecosystem as an added bonus early on in the process. I received an email from someone. I think the sponsor was data dog asking. We love what you're doing. Can we give you money to mason our product in it to which my response was. Can you give me money. Well of course you can give me money. How much money are we talking about. A very rapidly emerged that i had built a strange revenue model. I expected this to be a labor of love but it's turned into something that's non trivial early the throwing off non trivial amounts of revenue. So that became something. I didn't expect not only that. The advantage of doing developer focused media. Is you build a really strong familiarity with what is going on in technology so in some sense you happier downside risk because even if this whole newsletter thing doesn't work out well you've built a really good understanding of what's going on in aws exactly and if we want to take a slightly more cynical approach again my entire businesses built around fixing aws bills for large environments that's lucrative but it's also not something i can necessarily see myself doing for the next fifty years. I don't know what i'm going to do next. I know. I don't hold still very well. But whatever i wind up doing in five years. I'm going to need a base of people taliban it so the time to build an audience from that perspective is now indeed. Well let's get into a discussion of aws. Describe your mental model of amazon web services. I view a company that operates in such a way that compared to any other company on the planet is it may as well be an alien organism. they're effectively micro services driven which means they have very small teams each working on individual projects. Anytime they wind up having to work on something that works cross functionally across all of the service teams. It turns into what. I don't wanna use the word disaster so pretend i didn't you see that manifest and things like the console. The bill cloud watch which has to aggregate metrics across everything but the individual services that launch as developed by small teams is fascinating they have a product strategy of. Yes means that they're at some point going to wind up competing with basically everyone and that does give some people pause. But today if i take a look at the entire cloud ecosystem they are for most use cases. The vendor makes the most sense to work with. I'm not a partisan as far as these things. Go the reason i focus on. Aws bills is because that's where the customers are if people were instead all on azure. We'd be having a very different conversation. I don't know if i'd be writing last week. Asher or not. But i think that there would be a need for something like that in that he goes. Do you think of. Aws is a set of products specifically for engineers. Or do you think there are aims to move up the stack into. I don't know design tools or things for. I don't know sending emails or things like that. They've already got that. To an extent s at the time of this recording. They just released a whole email deliverability dashboard for pinpoint which is itself a service built on top of s with over one hundred and fifty services now even amazon employees aren't always sure whether i'm talking about a service that exists or something. I made up just to mess with them at reinvent. This year they announced. Aws ground station which provides telemetry services for satellites passing overhead and even after mentioning that in a couple of talks in the coupon section. I've gotten yeah follow. Are you making that up. Just a mess with us. No it's real. It's called ground station. Here's what it does cool. Follow up no seriously. Are you messing with us. So if you're asking me. Is there anything that i would say. Definitively that would never move into is a market really. The only one i can think of is my own i can't see. Aws self-mockery monitoring in tampa so aws. Is this sprawling set of services. What's your sense of how the engineering org is laid out to foster that kind of sprawl to sustain that kind of sprawl from conversations with people who've worked in that environment. Some home were extreme champions of amazon. Others of a home would not cease the profanity. They it seems to me. That the feeling i guess is closest to a bunch of internal startups that are competing for funding for mind share and they go through iterative rounds until something winds up getting released at the end of an entire laborious process of iteration going through series of fundings they're quote unquote exit is when someone at. Aws gives the service a stupid name and launches it to the public and that means that a few different things emerge a lot of things get built that never see the light of day and occasionally you'll see services launch that appear to directly compete with one another and that becomes in some ways a fascinating story. I was doing a lot of coverage of the container orchestration system wars. There was this period of time. Where doctor had reached market acceptance and people were looking for. What is the best solution for managing all of my docker containers. Is it going to be mesa's fear. Is it going to be docker swarm. Is it going to be hashi corp nomad. And then eventually cooper netease came out and it it reached enough saturation or market share or was marketed. Well enough that people say okay. This is what we're settling on and aws head placed a strong bet on any of the particular orchestrators. They had placed a bet on their own. Proprietary orchestrated the one. And then when kuban eddie's got accepted there was a sense that it impacted the strategy of. Aws how did coober netease. Impact the strategy of aws from my mind. I think you you nailed it in that. There were a lot of different competing standards more or less. And when cooper netease emerged as a more or less de facto standard across the board. It feels that again. I have no inside information on this that it sort of caught. Aws flat-footed they wound up releasing e ks or the aws elastic container. Service for cooper nettie apparently could pay by the syllable for that one and it was very much a one dot org product when it launched it took over fifteen minutes to provision a cluster. It was not at all clear what permissions it needed so you needed to grant it rolls. That were incredibly broadly scoped. It didn't wind up. Doing logging appropriately the horizontal pot auto. Scaler didn't get released until well after initial launch and the initial guidance around that was well. It's it's not the first market. It's not the best of breed. I'm not entirely sure why i would use this today. That said it has in typical amazon. Fashion improved rapidly since the time of launch to the point where now looking at it. It's not at all about option. That said i think the larger ecosystem story here. Is it coober netease and how we orchestrate containers is not going to be an area of focus for too terribly long. When's the last time you had a deep dive discussion with someone about which limits distribution. They should use it becomes plumbing. It starts slipping below the surface and stops mattering to most people. I think that container orchestration is absolutely going to follow suit. Yeah and what's interesting about that. Is you know what has gone from the service idea. The lambda aws lambda has gone from a fringe. Kind of cool idea to looking like that's going to be the modality that people are building their applications in from day. One more and more going forward. Although it's hard to to see exactly what the server 'less stack will look like because today you. You know. I guess the the most server lists stack that you can build would be something where you're using a bunch of managed services like queuing and database some have database system. And then you glued together. These big managed services with glue code that runs in. Aws lambda but that could certainly change. You could certainly have further developments of things on top of lambda. What's your sense of the adoption of server less. I think that service is one of those interesting technologies that at what it launched. Its started to look like an awful lot like a toy where it's the sort of thing that you see that more or less manifest itself in a way that this is neat for this one use case but never for my use case it only supports certain languages and it only wednesday working for five minutes and as capabilities continued to expand and it starts to move up the stack. It seems to be something that met starts to look a lot more realistic. The hidden secret behind service powers it is the event model where an event happens in your environment and it automatically invokes a reaction to it that is highly parallel allies. -able and it almost completely removes the need for a company to look at infrastructure. It's also priced incredibly competitively to the point where there is virtually no company on the planet that is spending a huge amount of money on lambda so it it winds up shifting the attention in a few different ways. The problem is as we look at service that term has come to mean a lot of things to a lot of people and it's not helped by people chiming in uselessly. To proclaim service still runs on servers as if it were this revelation that occurred anyone until the person chimed in and great terrific. Thank you you have anything else to add in the same hackneyed comment that everyone makes all the ten. That's not helpful. And if you're not advancing the discourse in any meaningful way although there are fair. Criticisms of lambda. It doesn't do everything these days list. Oh tell me your list. What are the shortcomings of using lambda. these days. The initial problem of course is shifting the way you think about software into this new paradigm. Have a twenty year old monolithic application. That has been doing something of business value for example running the atm networks or this is what. Make sure the traffic lights. Don't turn green at the same time then. Shifting that to lambda is likely not going to be beneficial in any strategic sense whereas if you're looking at this from the perspective of we're building something greenfield and the way we think about things. Maybe that adding significantly more value. The biggest danger from my perspective is people who see this new tool and see it strictly as a set of check box capabilities. But don't shift the way. They invision architecture where they still try and shove entire monolith into a lambda function where they try to store state inside of lambda. You can build an awful lot of terrible things with terrible anti patterns using something like this technology but that doesn't mean that that's necessarily how you should be doing thing. I mean i gave a talk. Terrible ideas in lambda as in service silence of the lambs. Which showed exactly how not to do these things and anti patterns for example it turns out databases fall over and catch fire. When you have ten thousand lambda functions concurrently trying to talk to it at the same time. And when you have a whole bunch of lambda functions scanning the entire internet to find open. Elastic search clusters. It turns out amazon. Would like a word. Because they're premier service platform isn't something they like you using to attack the larger internet so there are ways of doing things well and there are ways of doing things terribly just as with any tool no tool survives. First contact with other people's terrible architecture. It is they also. Ats uses the term server. 'less in the context of managed databases sometimes these days so there is a server 'less aurora and i haven't really delved into what that means aurora's i think they're. They're post grass air. They're managed relational database service. But i don't know why they use the term service because like s. three is is innocent service or dynamo. db is innocent service. What's different about service. Aurora terrific. I would consider as three service and they might have even called it that would it launched ten years ago. They thought of it in advance. You would have Service simple storage service so it would then be sore there. You go exactly the. I think there are a few things that defined something a service one of them is sure while you pay for data that lives in the system. It scaled down to zero. So you when it's under load and processing work it costs you money when it's not it doesn't and that seems to be one of the key. Distinctions historically dynamo was the subject of some debate. Because you could only scale it down to one read and write capacity unit so still costing you few bucks a month not that that's massive when you have fifteen hundred of them sitting in various dev environments that starts to add up so the ability for to scale down to nothing other than what data you're storing in that and then on demand spin up and begin processing things from a compute perspective seems to be a fundamental tenet of service these days and released for example dynamo db on demand capacity at reinvent recently and specifically nailed that objection. Right there whether or not something is service or not is again something that i leave more for the philosophers and people arguing on the internet while the rest of us go about our jobs but i do get the sense that the service aurora which now i believe has announced both my sequel post grass flavors. Where you have this thing that lives there. It's a relational database. You don't need to think in my sequel context but now whenever your application gets traffic it can talk to a database as it would traditionally but then that database turns off when you're done you're not getting billed for it it doesn't need to sit around doing nothing and given some of the new data models it also starts support much better can currency so you avoid. The ten thousand lamb does all talking to your my sequel at once problem iota scale is a leading cloud cost management solution designed uniquely for engineers to make smart cloud cost decisions with smarter attributions and smarter analysis with iota scale. You get a complete view of your cloud. Infrastructure spend including containers and eddie's ninety five percent cost attribution accuracy actionable recommendations in continuous. Cost anomaly. detection. You get team. Based alerts via microsoft teams and slack to prevent monthly bill shocks and machine learning based projections with predictive analysis and budget alerts for teams products and applications iota scale is widely adopted by some of the best engineering teams in the world including zoom hulu encompass. Who depend on iota scale to help them save up to fifty percent on their cloud costs requested demo and find out how to scale can empower your engineering team today visit iota scale dot com slash demo that iota scale dot com slash demo. Today's podcast is brought to you by google cloud and dora research team. The team recently launched survey to collect insights for the two thousand twenty one st of devops report and would love your input. These state of devops report is the largest and longest running research of its kind providing insight into how we can improve software delivery performance with devops by completing the survey you get to shape the conversation on devops along with over thirty thousand software professionals who took the survey over the last six years. So what are you waiting for. Take the survey at cloud dot google dot com slash devops. That's cloud dot. Google dot com slash. Devops thanks for being a supporter of software engineering daily. And take that survey at cloud dot google dot com slash devops will. That's the promise of service than i really like where this is going. Because i don't know about you. But i have a bunch of applicastions that like stupid experiments that i've run but i refuse to turn them off because someday i'm gonna come back to them and do something with them and they're costing me twenty five dollars or twelve dollars a month and just you know when i look in quickbooks or my bank statement. It's like oh twelve dollars here. Twenty five dollars there and it starts to add up and you know hearing you talk it makes me think like this is going to be a relic of the past. Eventually these things are just going to shut down while they're not being used. It doesn't make sense to have a cloud infrastructure system that just like stays up all the time while it's not being used absolutely yeah. I played with aws wants. That's why i pay amazon. Twenty two cents a month and will until the earth crashes into the sun. Yeah so there are these companies that have huge amounts of infrastructure running. Aws you've met many of them. What kinds of challenges does accompany running to once. They have a really large. Aws deployment visibility first and foremost for example. My bill month was sixteen dollars on. Aws the month before that. Because i did a demo for someone and then left some things running inadvertently. It was fifty bucks so when you're bill more than doubles you've noticed that when you're at significant scale and you're spending i don't know one hundred and twenty million dollars a year on. Aws it turns out that even big mistakes are easy to lose the noise. It you do some digging around and you realize you're someone on your data. Science team wound up copying an extra few pedal bites into s. three and left it there after they left your company three years ago and you start to see things like that start to accumulate waste and croft it. All comes down to a certain lack of visibility and control. Large companies are generally used to the historical data center bottle with you would wind up building things and a capital expense basis. You would plan out your data center buildouts and it's super hard for a single engineer to accidentally order six million dollars worth of hardware without getting fired or arrested the new model though is that someone can inadvertently spin up that level of resource and not only not be aware of it but no one is aware of that for in some cases years at a time. It is not at all transparent. What's happening in your environment. I keep going back to the bill. Not just because. That is what i do for a living. But it's also the only place in your entire. Aws account where you can see on one screen all of the resources you have running across all of the regions in your account or linked accounts. There's no inventory service. It's just the bill. What are some other ways that teams that are run on. Aws end up overspending. A great example of this is if you take a globally something like sixty percent of all spend is on ec two if you add four services you get to the eighty five percent mark and there's a long tail of other things so you see people overspend. I not reserved instances because they're convinced they're going to turn that cluster off next week and then months and months and months go by and that never gets turned off. In fact that expands. You also see effectively misunderstandings the difference of if you're new to aws and you want to spend a t three instance or a p. Three instance. that's one letter. That sounds the same difference but you can go anywhere from spending half a penny an hour to forty some odd dollars an hour based upon that single letter and that winds up being something. That is a tremendous shock to people who've never played with us the counter. Challenge that you see. These large companies is how do you govern that intelligently you can act as a gatekeeper. That says you're absolutely not going to be allowed to spin things up without a three week approval process. Well that's where cloud came from in the first place when you have a corporate credit card. That has a five thousand dollar spending limit. You can spin up a new cloud account. You're good to go until it's too big and serving production traffic and then it gets noticed and accounted for if you block people. They're gonna start doing that process again or they're going to go work somewhere else. You also don't want to turn a into the. I'm going to nag people about what. They've spun up an amazon all the time. So i'm looking at the bill by user and who spun off and dear lord. I don't know who this jenkins person is. But they're spinning up all the resources and production and that leads the biggest problem that you see the person getting the bill in finance and the engineer spinning up the resources that impact. That bill are five organizational levels apart. Most cases person in engineering spins up some resources and the end of the month the person in finance these enormous amazon bill and wonders. How many books engineering is buying. They don't see them reading all that much. A company like net flicks has a massive. Aws budget is there. A department within netflix does cost negotiation and cost analysis. How do you think the dialog between net flicks in. Aws goes without speaking to specific companies. I have a number of clients in around that general level of scale and there's an evolution in some companies especially those that are generally either born in the cloud or adopted a cloud native approach. And that permeates everything you wind up going from the idea that an engineer can do this part. Time perhaps partnered with someone in the finance group to building out a cloud cost optimization team to effectively having a dedicated usually small team of folks whose entire purpose to go and optimize things for cloud spend and those people are not inexpensive but at certain points of scale they can save ten million dollars a day depending upon what they're focusing on so the economics begin to make an awful lot of sense the challenging part for me in one of the great inefficiencies that i see in cloud is hiring those people when you're spending tens or hundreds of millions of dollars a year makes sense if you haven't done that you're probably doing something wrong whereas if you're spending i don't know forty thousand dollars a month on your aws bill even hiring one person to do that will cost you more than any savings they could possibly arrive at so there are inflection points as you go from zero to small medium. Large holy crap is a phone number level of aws bells. Yeah i mean this is one of these areas that people would have never expected to be a business. Ten years ago the cloud cost optimization specialist business. i've met probably four or five different companies that have built really big businesses off of this. And that's your day job. That's what you do is is your corpse. You help people fix their aws bills. So what does that look like. What are the methods for cost controls that you have found to be most effective great question. I'm not going to talk smack about any of the vendors in the space. There's over a dozen at this point now but they all tend to tie back to the same model they come into an environment they drop a bunch of analytics onto a aws account they charge a percentage of your aws bill every month which incidentally finance hates and then they wind up saying were pointing out cost savings opportunities for you to go ahead and implement and in practice. Almost no one does. And i start to see this entire space as something that doesn't respond nearly as well to a platform as a service offering. Yes you need analytic it but most of what. I do takes a much more advisory tone. It's not about tooling. It's about getting the person in finance and engineering to sit down and talk to one another. It's about building good governance processes. I don't sit there and write custom code for my clients. I have conversations and that still down to do these five things and you'll knock twenty two percent of your bill on average as an initial assessment. Then we'll get into deeper discussions around strategy around how you want to govern this environment. How you wanna start handling reporting of this and how you can start shifting. How the business views this if it makes sense pass a certain point cost savings no longer matters to accompany. You're not going to optimize your next business. Milestone you're just going to be responsible steward of your money or improve the picture of unity economics. It's one of those areas where at some point. Your innovation is much better spent elsewhere. The challenge of course is that this is a problem every company has and they all tend to solve it themselves internally like it's a bespoke unicorn this is the business equivalent of. We're going to build our own version of container orchestration. No uber nannies. The trouble is today. There's no cooper netease for this problem. Let's talk about legacy enterprises. There are large legacy enterprises that have been increasingly eager to adopt the cloud whether it's banks or insurance companies or agriculture companies. There was a point in time where they were either resistant or were just shrugging their shoulders and say well we've already got our own data centers. Y you know. I'm not sure we. We need that kind of thing but now they realize that there is value. What is their path to adoption. Rocky is probably the best answer. I've got for you in a nutshell. It requires a fundamental rethinking of your engagement. With technology it requires an understanding of aligning. These things strategically it requires accepting. You're probably not going to save any money for the first three years that you do. It and it requires an awareness on the company's part of things such as well. You're two hundred year old company and you're used to spending cap x. On your expenses if you start shifting to cloud almost all of that could theoretically wind up being reflected as their around that that'll affect earnings per share that will affect the visibility of your market. Is your business and it could theoretically to problems for you. Make sure you're aware of what's happening before you get there. There's regulatory risk of people doing a click through agreement before signing an enterprise agreement directly with a cloud vendor there are a bunch of nuances that apply to large established companies. I live in san francisco. It's somewhat natural based upon the environment. I meant to think of companies as this thing. Someone started in their garage three years ago and it will be sloppy now and fix it later but with a lot of companies migrating to the cloud the quote unquote legacy businesses. There's a lot more at stake. There's a lot more risk and they have to be move more deliberately. This is not incidentally in any way intended to be a condemnation of those companies. It's just different and making sure that they address this from a perspective not just economically but from a business point of view and have very clear outcomes in ways of measuring that as they go through. That process is critical. People like to make fun of the lift and shift idea of judas. Take exactly what you have and you put it in the cloud and then stays to is migrate to take advantage of cloud primitives. The problem is is the other approach of re architect. Everything is you go okay. Now it's in the cloud. It doesn't work the way you've introduced a bunch of regressions and no one has any idea. Why is it. Your code is the changes to the code is at the environment i mean. There's there's no silver bullet here and the answer to almost everything in this space. As there are with any complex problem is it depends. Amazon has products that cater specifically to these kinds of companies. That are they've been around for fifty years or one hundred years. One example is the virtual tape library or the tape gateway which allows people to move their tape based backups to the cloud. I thought this was an amusing example. That i found in your newsletter at one point. What are some other ways that. Aws services are tailored to legacy enterprises. Great question i put in a feature request back in may of this year twenty eighteen and they announced it two weeks ago that they is now supported namely what i wanted was the ability to upload a file to s three via s ftp. And when i tweeted that at them. I have course wound up with a whole bunch of people yelling at me. Ftp passed just use s. three api. Great i appreciate where you're coming from on this. The problem is that large banks the defacto communication pattern that they have between them is generally ftp ftp of gp encrypted files for transaction logs. And in three different jobs. Now when i worked in finance i had to build an instance that waited for. Ftp files to show up. Validate once that was done that it was not still uploading package the entire thing and put it into s. three. Because you're never going to be able to teach a large bank how to use s three if you start trying to talk to a banking partner about how s. three works. Congratulations you've opened. Pandora's box of compliant and compliance legal requirements. Because now. they have a lot more questions that you can't back out of its much healthier for those environments and meet people where they are now. The service cost a couple hundred bucks a month for an endpoint and people are screaming about that. But it's not for consumers it's for large banking institutions like that where engineering time to build out and maintain an ec. Two instance. running. This is an order of magnitude more expensive than just enabling this point. Now if i saw that launch said it was terrific and then two hours later. I had a follow up request. Okay now give it a static because otherwise we're going to wind up in a situation where you have to have companies that only update their firewall rules after six weeks of cav approvals. Now have to do that constantly and that becomes awful are work arounds for it. But i'd like something a bit more out of the box. So as aws is accepted philosophically more and more by these legacy enterprises like banks. And now they can now. The banks can ftp their files to s three. And they can get their tape. Backups moved onto the cloud the legacy enterprises. They're going to be more and more willing to adopt things and at reinvent this year. Aws announced outposts which allow for custom aws hardware. I guess it's basically you order a box from. Aws with specific services on it. And they give it to you on prem so you can kind of get aws functionality. Out of on prem devices. That amazon sends you. This seems like a pretty big development. Explain what the with. The implication outposts are absolutely. But i i want to clarify something you just said in that we talk about legacy companies like banks for example if you take a look at capital one and their transformation story of being a bank that has gone from. Purely on prem to being entirely cloud driven they have radically transformed their entire organization. If your a on prem company looking at digital transformation you could do a lot worse than to model your transformation capital one and. I highly doubt you can do better. They've nailed this so saying someone is a bank therefore their legacy boring and crappy and slow is in many cases not accurate. I don't think that's what you were saying. I just want to make sure that that winds up being stopped using the term a legacy. Because i mean in a you know categorically non-judgmental way. It's more like a a lovingly lovingly phrased agnostic. Just saying like this has been around for a while company. But i need a better word for it. I should say something different but yeah okay so go on so the question was around. Aws outpost where they ship you sealed racks starting next year. That contain aws hardware and run aws services on prem. I think that this approach is to be direct brilliant. It's a great example of what drives amazon in the sense of being focused on the needs of their customers their meeting customers where they are and they are effectively extending their api is and their model of doing business into the on prem data centers. I've seen some fairly poor takes on this. Not just on twitter but things like headlines and business insider saying this is amazon's tacit admission. That cloud isn't for forever. My counter argument to that is they launched a downlink station service for satellites in orbit if people with satellites in orbit is a large enough addressable market for them to focus on. I promise people who are intimidated by public. Cloud absolutely is. I think that that product also goes a long way towards saying without saying one of the biggest problems you'll see in on prem environments. Is that you cheap out on buying hardware. You don't may manage it properly and you're effective systems. Management approach is awful by dropping these things sealed on prem into your environment. We get rid of most of that. Now all you really responsible for here is power and cooling. And we can't do anything about that until next reinvent will announce aws power and cooling or something. I have no idea if they do that. That is a complete guess speculation. And the fact that i feel the need to disclaim. I have no knowledge of any plans to do. Such things really should give you a clue. As to how far. I think amazon will go towards solving some of these problems. They do have wind farms. Right that's a good question. I know they have some internal they say that they have some research data centers and regions powered mostly or completely by renewable energy. I don't know offhand. If they own the wind farms themselves if they wind up these from someone else or they just put a turbine front of larry ellison when he starts mouthing off about the things that are not true about aws and then just generate power from that it really tends to come down to a few different ways that could be implemented their corporate structures fascinating to me. But i try and stay in my lane. With respect to aws more than amazon dot com. As a whole so outposts you could say are a form of well. It's probably wouldn't the kind of form edge computing. I guess it's it's computing. That is done outside of the cloud. But i think that term edge computing more generally refers to iot devices or some kind of smart security camera that sitting outside of the cloud. Maybe it's a wifi network at a shipping yard as their cloud front edge locations which can now themselves run lambda as well. We're starting to see a bit of an exodus where not just storage but compute moving out of the regions and into the edge. I'm curious to see down the road if they wind up doing the same thing with state. If you can wind up not having to go all the way back to a region to store it updates date that begins to be something fascinating as well. We parts of that. Now with certain implementations of things like apps inc but these are still early days for a lot of this. You're right the idea of everything that emma's on offers now lives in virginia oregon and a burning fire. Somewhere in other parts of virginia are great. But we're starting to see that if that rapidly expanding not just with new regions and availability zone spending up but by not by be taking these things deploying them now into customer sites. There's a great story there. We'll tell me more about that. So you mentioned apps sank. That's like a mobile computing thing that they have or touched. Tell me more about what they're doing on the edge. You're more of a software person. I am. I suspect by a long stretch. But there's the idea that you can now have a mobile app that you break the network connection you can still update things in that app and once it rejoins the network it'll automatically wind up sinking any changes that wind up happening. That sounds like a minor implementation detail but it starts to wind up pointing to a whole bunch of things. Do you wind reconciliation. You wind up with singing ability. You wind up with now not having to trust the network and you're starting to see things that are remote and far flung going away from effectively dumb clients and into something that can that has more and more intelligence where you are. That has the effect of reducing latency. It reduces the reliance on the network bottleneck. It has the advantage of in some cases all of the processing of your sensitive data winds up happening in your environment not in. There's and for things that are regulatory sensitive for example stripping out credit card numbers from log data. Quick and easy example. That's something that by. Not ever having that leave your facility or the device this stuff happens on you wind up with a much neater story for not just being able to do the right. Things from a security perspective for compliance but as compliance always matters demonstrating that and being able to prove to auditors. That this is how this works. So it's addressing an awful lot of not just capability stories from a technology perspective but as well as addressing higher level business needs stream is an enterprise grade chat and activity feed provider serving more than a billion end users feature rich products include robust client side. Sdk's for irs android react react native flutter and support for the most commonly used server side languages scalable insecure. Api's a beautiful you. I kit if you need to build a chat solution for your application check out. Remm stream gives you building blocks for chat and activity feed. Check it out and learn more about how to build with stream at get stream dot. Io s d. That's get stream dot io slash s ed. Thanks to stream for supporting us and check it out at get stream dot. Io slash s. e. d. demand for on prem software remains enormous. It continues to grow and it's not going away. Take advantage of the automation. Reliability patterns and primitives provided by coober netease for not only our applications. But also in how your on prem and multi-forum apps are delivered managed cooper netease and other cloud. Native technologies have led the way to modernizing on prem software delivery. It no longer has to be a tar ball. In one hundred and fifty page manual good replicated dot com slash se daily to learn how replicated can help you modernize your on prem software delivery strategy. If you're a software vendor looking to modernize your application delivery and management to gain more enterprise adoption checkout replicated dot com slash s daily replicated gives software vendors a container based platform for easily deployed cloud native applications inside customers environments to provide greater security and control so check out replicated dot com slash. Save daily and learn how to deliver and manage your software through all kinds of methods bare metal servers cloud vp governor cloud even air gapped. There's a secure way the your customers can use your application without ever having to send data outside of their control and replicated already trusted by a noteworthy customers like hashi core circle. I and snick go to replicate dot com slash daily to get a free twenty one day trial of the full replicated platform there is a preponderance of aws services. And if you go to the dashboard you will see all. These services and newer developers can get intimidated by the amount of options on. Ws there has been this rise of the cloud providers that are simpler and have easier onboarding. This was early on new saw this with oku. And then there's things like firebase and now netla fi is getting quite popular does. Aws have a strategy for appealing to these kinds of users that want a simpler experience. I wanna say yes. But i'm not sure how it's being implemented when i first started using aws logged into the console and was overwhelmed by the sheer number of services. I'm never going to learn all of this. What do i focus on first. This is incredibly confusing. And i have no idea how to go about solving my problem. There were twelve services. Now there's over one hundred fifty and that problem hasn't gotten better. I'm considered to be something of an expert in. Aws largely because two years ago. I said i was and then it turns into a scenario where i'm very rapidly continuing to find myself incredibly overwhelmed. You take a look at services that meet people where they are elastic. Beanstalk early on was a decent example of this. A better one now is light sale where you wind up getting an instance. It winds up having load balancers. You can databases and discs. But it's fixed fee. There aren't five dimensions. You get build on five bucks a month ten bucks a month. Whatever steer you pick and at reinvent this year they also announced a transition process where you can take that and converted into some of the higher level services like you see two when you hit that point of evolution but that is probably one of the best examples of easy on boarding for humans without having to spend six weeks in cloud school. I that i could point to. We're also this is going to be somewhat controversial. I suspect we're seeing that with lambda when you understand as a developer constraints are on the code. You're writing and how it has to behave and you don't have to worry about. Things like fell over durability anything of that sort and all of that is managed for you. There's an entire class of problem that largely goes away. And you just have to worry about writing code now. There are still constraints and how that code winds up. Manifesting is still the subject of some debate. But that's the future. That's where we're heading to. How does aws compared to google cloud. These days There are a couple of ways to answer that. I would start by saying that. Google cloud is arguably three to five years ahead of aws in terms of pure technology. The problem is that it is not at all clear to me that google has ever learned to speak to business. Aws exemplifies meeting customers where they are. Google tends to not understand a few key things first when a customer tries to move to google and it doesn't work very well. Google takes a look at what they're doing and more or less says. Okay the problem ears at your code is written my crap you should instead write code the way we write it at google and it turns out the being incredibly condescending to the people who you're hoping to get money from is a winning sales strategy. There's also the concern and google people do yell at me when i bring this up but we all remember google reader where widely beloved service was suddenly turned off as of the day that we're recording this which december six. We wound up seeing that. They turned off aloe this morning or yesterday made the headlines. Google has a history of turning off services that people who grow to depend on and while they do that lesson the enterprise space. They all have the word google in front of them. So people are leery about building their business on the backs of a service. That may very well be turned off with over one hundred and fifty services that aws offered going back to two thousand six. They have never turned off a service that had active users. That's something that winds up resonating with serious companies take their business seriously when a migration takes three years to execute. You don't wanna have to do that again. So there's something to be said for being able to speak the language of business. And the more i talked to companies that are not tech darlings in san francisco the more i realize the second choice for cloud after. Aws is going to be azure. Not juicy. pay. Not until google fundamentally changes how they approach this. I don't know how they do that. Because it requires a complete shopping and re formation of the culture in some ways and that is almost impossible but until that happens. I don't see them growing outside of a very specific customer profile. With a few exceptions now most startups are using google productivity suite g mail and google docs. And so i'm using it myself. So does this give google cloud any advantage or are the productivity tools dc them as totally disjointed from the cloud infrastructure to my understanding of the fact that i pay them five bucks per user per month for this does get lumped into the bucket that contains google cloud revenue. So there's that argument to be made but at this point. It's it's from a collaboration perspective. It's kind of neat. But i have never yet built anything where i'm working on a data store or i'm working with s. three or whatnot. Ooh now i need to integrate it with my office suite. That's not really how. I tend to operate if you start looking at serious businesses as well were they have built entire complex applications that tie into spreadsheets. They're doing that in excel. They're not doing that in google sheets and to some extent you might see office three sixty five having a story here. The ties large into azure. But i don't see that level of integration on the g. suite or google apps for domains or whatever it is the calling it this month so you you mentioned your your take on azure. Sounds like your belief is that that is the going to have the second biggest market share. At least in the near future. I think it already does against because of a similar willingness to meet customers where they're at as i guess as well as their their channel advantage with You know pre existing microsoft services. Is there anything else that you see as differentiating with the as your world. I do the two things i and i do. Not in any way i mean. This is an insult. Microsoft has over forty years of experience apologizing for software failures. They speak the language of business fluently. And you need that ability in the cloud. Because that's what things in the cloud. Do they fail as much as you try to build. Things that are completely bulletproof. Nothing ever is. Everything breaks eventually and being able to explain that in a realistic way without. I read the book about error budgets. And then talk to us is incredibly valuable skill. The second thing that i find that microsoft is doing. That's going to absolutely change. The landscape and has changed the landscape. Is i mentioned a minute or two ago. That culture changes almost impossible. Microsoft has done it. They've gone from a company that i despised in the nineties to one that i deeply admire and i don't know people say. Oh what's what's the secret to that. And i've asked people and they say half ingest. Oh fired their loud. Ceo and then replaced it. Someone good cool. There's more to it than that. I don't believe that one person can drive this. There has to be a collective cultural reckoning. And i can't believe i'm saying this in twenty. Eighteen microsoft is a bit of a darling of the open source world. That is a statement that angry twenty year. Old version of me. A couple of decades ago would be gassed at and wonder what had happened to be. The world changed and an awareness of that change is something that as absolutely catapulted microsoft to one of the most admired companies in the world. Right now another sizable player in the space is digital ocean and digital ocean is full disclosure a sponsor. I think also sponsor of you but they are. But i i will say i think digital is a sort of sleeping giant because what i like about them is they. They take the alternative path than to the other cloud providers in the sense that they're super selective about the services that they reveal or the services that they deploy that they make available to the developers and so it makes for it does make for this kind of constrained experience that i think. Aws might be trying to do with light sale. What do you think is the long term strategy of digital ocean. I think that whatever their long term strategy is. It's very clearly working. I occasionally as your and my client accounts occasionally gdp in my client accounts all by accounts or have aws that. That's sort of a bit of a selection bias there but i see digital ocean frequently enough that it rounds to all of my clients. There's always something running their marketing side status page blog or something else where they wind up spending something ancillary to the core product. I don't tend to see a lot of this. Is the core application that makes us money living in digital ocean. But i see an awful lot of other stuff. And the reason that i find when i dig into that almost invariably is that it is extremely approachable. You can get up and running within minutes. And a matter of clicks. There's no back and forth of i setup these twelve foundational services like i am and all the rest in order to get using it. It's click click done and you're up and running and being able to build something. They've been extremely selective in the services. They support they have a block store. They have managed as they have a load balancer and they of course have the vm or ec two equivalent. They recently announced that they're launching. The process of launching kuban eddie's cluster service which cool good for them. I don't have a problem. Looks like that right now. But i'm curious to see what they do with it but there you're right. They're very selective. Their product strategy is not yes. They're not launching a bunch of high end machine learning services to the best of by knowledge. They're not out there building. Incredible data lake architectures on how to wind up doing incredibly complex queries upon unstructured data. That live in the exit range. That's never been what they've been about. It's more or less a cloud for human beings. A lot of their constituents tend to be business side users not engineering side. So it's easy for those of us who've spent a decade in scottsd- in the deep dive technical architecture work to sit here and say yeah. That's what we're going to wind up the we're going to ignore that and instead focus on the bigger more exciting flashier things. But you're right. There is sleeping giant. They're quiet but they're everywhere. And every time. I have dealt with them from a support perspective from a business perspective or from a from the perspective of just popping in and seeing what they're up to it's been a wonderful experience and again they are a sponsor of some of the things that i do but they pay me to include their links not to say nice things about them. Personally this is. This is me being genuine. This is not me being paid for this one subject that we've discussed in recent episodes is the the idea of open source. Companies that are competing with aws companies like elastic. That competes with amazon's elastic. Search product. or you have read his. Labs that competes with amazon's hosted readiness. What does an open source project have to do. In order to succeed as a product company that might be competing with amazon's much cheaper easier to sell hosted product. I would say give up because an open source project is not a business model. it's a means of development. It's a means of community engagement. It's a way of solving technical challenges. But there's an enormous difference between that and having a viable functional healthy business if you even to a good example is might very well be allowed. Search on aws. It's an awesome service. Click click and you receive it and it works super well until you try to do anything even slightly off book or complex at which point it turns into a screaming fire at that point. You're generally reaching out to elastic to work with them. And that winds up being a sort of the narrative of how the rest of this works. I think that if your company is solely built around this open source project that you've built and your value add is either a pretty dashboard for it or a consulting assistance series around that you don't have much of a business to begin with and without casting aspersions at them i think on some level docker suffered from this. Their entire company was built around a transformative repackaging and branding around an idea containers whose time had largely come but the best part of what dr did was given away for free as part of open source. There was no narrative when i was deep into the docker world of step seven. Now i cut docker a large check. There was no upside outcome from my position in the universe. And maybe i'm wrong on that but it felt like they wound up articulating different attempts at business models periodically. They weren't really sure what they were going to do. Next and now similar to what we'll see happening to coober nineties at some point what container system you use has slipped beneath the waves as well. That's no longer the interesting part of the story. Now it's a question how you orchestrate them. But last year at reinvent dr verna vogel's got onstage and had a great slide in the future. what does the future look like the only code. You'll write as business logic. That i of course photoshop the crap out of that and made him say ridiculous things. Like what does the future look like. Caps will love amazon prime meow but his point was well taken in. That people. don't want to think about this at a business level as you move up the stack with things like lambda your code now handles business logic and becomes valuable and important to people making strategic decisions when you're also paying per invocation in a large micro services environment like lambda. You can also trace to a high degree of accuracy. Exactly what is costing you money where you trace the capital flow throughout your organization assignment. Wardley puts it. That is transformative. Once you get more than trivial amounts of money being spent on lambda. So we're nearing the end of our time. I just wanna say you make some really good developer content and so whether it's your newsletter you also host the screaming in the cloud podcast and you have a distinctive voice. Which i think is something. That's that's pretty important for at least a lot. The content that. I wanna consume these days because most of the content i consume these days written by somebody who i respect and i know what the voice is. I know who who i'm listening to are listening to a podcast. If somebody who whose opinion i respect and whose personality is something i can i can tolerate or i look forward to tolerating what's been your experience producing content that developers want to consume. It's sort of a by product of the confidence of too happy accidents. The first is that sarcasm is my first language. It's how i grew up. We spoke at home. And i wind up seeing the world. The snarky sarcastic lens so being able to speak with. That voice is refreshing. Because not many people do it. And that's the reason for that is tied to the second which is because i work on. Aws bills the opportunity for conflicts of interest to arise is massive. I have no partnerships. I am not an aws partner. I have no partnerships with any vendors in this space. And i have no. I'm not one of those consultants who has a single large client that drives most of my business. I've no one company. That's more than twenty percent of revenue by design so as a direct result. I am not beholden to anyone else. I'm not one awkward meeting with. Hr away from not having a job anymore. My personality and my voice were tremendous liabilities to me personally when i was an employee. Now that i'm independent and really don't have a corporate overseer at become free in a way that i would never was before and i'm taking advantage of that to say what i think. I don't always get it right but i do. Occasionally tend to hit the nail on the head. It's one of those areas where that voice is hard to find. But i wouldn't give it up now for anything corey quinn. Thanks for coming out. Software engineering daily. Thank you for having me. A pleasure wow your.

amazon cooper netease arp corey Corey Quinn hashi corp kuban eddie cooper nettie Google Amazon
Digital Ocean with John Allspaw

Software Engineering Daily

56:03 min | 4 months ago

Digital Ocean with John Allspaw

"Low there i'm cory chiefs cloud economist at the duck billed group. I also host to podcasts. Aws morning brief and screaming in the cloud. But also right. The last week in aws newsletter. I'll be taking over hosting duties of software engineering daily for this week and taking you on a tour of the cloud to cloud that we're exploring is digital ocean which people have heard of but often don't know where it starts and where it stops a few announcements before we get started one if you like. Clubhouse subscribe to the club for software daily on clubhouse. It's just software daily and will be doing some interesting clubhouse sessions within the next few weeks and two. If you're looking for a job we are hiring a variety of roles. We're looking for a social media manager. We're looking for a graphic designer and we're looking for writers if you are interested in contributing content to software engineering daily or even if you're a podcast or and you're curious about how to get involved. We are looking for people with interesting backgrounds who can contribute to software engineering daily again. Mostly we're looking for social media help and design help. But if you're a writer or a podcast we'd also love to hear from you. You can send me an email with your resume. Jeff at software. Engineering daily dot com. That's jeff at engineering daily. Dot com team city cloud is a new continuous integration. Service that is hosted by jetbrains. Last year we invited listeners of software engineering daily to take part in the team. City cloud beta now the services officially released and ready to be used in production environments team. City cloud is based on the original on premise version of team city. It's the same great cd but managed by jetbrains. The best thing about team city cloud is that it doesn't tie you to any particular technology or workflow it integrates with all popular version control systems. Build test frameworks issue. Trackers id's cloud providers and it supports them all equally well and you don't have to deal with updating your tools or installing security patches. That's all done by jetbrains team city cloud lets you run your pipelines on cloud agents provided by jetbrains or connect build agents from your own network to get started good at team city dot com create an account and get twenty hours of bill. Time for free once again. That's team city dot com and as a bonus. If you want a personal introduction to team city cloud just use the contact us form at team city dot com and the came from software engineering daily. And the guys from jetbrains will get in touch with you and help you through your cd path for free. Thanks for listening and thanks to team. City cloud for being sponsor of software engineering dealer up joining me. Today is john all spa. Founder of adaptive capacity labs. John thanks for joining me. Thanks for having me so people understand where you're coming from you do not nor have you ever worked at digital ocean correct correct. I have absolutely have not. You have of course been in a lot of interesting places. Don an awful lot of interesting things. The joy that i of know myself from going down the path of being the independent consultant or starting a consultancy means that. Oh people don't really know what you do or where you've been or where your expertise lies so you're at sea for something like seven years. Yeah yep that's about right. Yeah yeah lots happened. During those seven years. Sometimes it felt like two weeks and sometimes it felt like seventy years. You transitioning into a number of different roles there and wound up leaving as their cto been for a little over a year and a half so you started your career as says here that you started off. Once upon a time as a systems engineer slash unix systems engineer which i sort of shorthand in my mind to unix system. Then you can put the word grumpy in front of that because having been one myself. It's not like there's another kind unique sort of tears it out of you. Yeah yeah yes yes. That's what does learning curve at least for me in the mid nineties did to me. I'd like to say that. I certainly was was grumpy because it would be weird if i wasn't but i'd certainly i would like to say that i wasn't grumpiest. No i think. I worked with the person who was and i think that for people who have gone down the path we have walked where we got into tack and became the tech. People that we are at our path was starting off as a unix system. Devolving from there and even now at a time when everything is cloud cloud cloud cloud cloud we still find ourselves in many cases. At least i do returning to my roots. When i'm trying to get a problem. Solved quickly might default set of tooling. When i'm trying to solve something is usually a pretty crappy bash script. Because i'm not sure there's such a thing as a good one. But i tied together a whole bunch of unix commands a do one thing right pipe from one to the next to the next and are there better more elegant ways to do these things absolutely but i know these tools i've been using them for most of my career and they just work. Yeah yep same. Here i would. If there's any reflection on what you said it would be that. I'm at least old enough to know that the best tools. What helps you get done. And i'm quite okay with not winning any elegance of wars and on some level though that's a little more of a negative take that i was going to frame it has but we're it's perfect for a segue. We're going to go with it. This brings us to digital ocean because when you start looking around. They recently gone public and they've done well but they're not a hyperscale by any stretch of the imagination and they're making money most years and things are going well but there are no companies that we really talked to in the hyperscale space when you ask them. Oh so what do you do for infrastructure. You saw the older companies like google amazon themselves built all this stuff out you have the some of the facebooks also doing that in different ways apple is sort of a weird special case in its own right but then you have the new crop of hyperscalers that are very publicly. References cloud customers. You have spotify. That has done an awful lot of work on top of gdp. Netflix talks constantly. About how much work they do with. Aws pinterest is also the us but you look around. Great who are those companies that are running on top of digital ocean. And you don't see any of them and it's oh so they must not be a real company in some respects but then you start digging a little bit and okay so who who is a digital ocean customer and the answer is everyone but they're not running these massive buildouts there. And that's why i wanted to talk to you when i put the call out for who is willing to go on record talking about digital ocean. You're one of the few people who said yes. Because i talked to a bunch of digital ocean customers and the story was always. Hey can i get to talk about your experience while it's kind of shadow. It they don't really know that we use it here. And i don't want to get in trouble. Which is kind of a weird thing but it makes perfect sense and we'll get there but tell me about what you do with digital ocean so we've got. We used droplets. We've been doing it for since we started. Adaptive capacity labs and for the most part like in a nutshell. I think will will. Sometimes you know temporarily. I think launch. I'll maybe two or three sort of just droplets to like mess with me. Ended up destroying him after moseley experiments. But we've have two or three droplets that have continually. I don't know what the up time on looks like. But they've been running for couple years at this point and they host hodgepodge of tools that we use ourselves and develop ourselves in doing incident analysis work in learning from incidence in this. There's nothing short of technically exotic or novel. I mean these. Are you know jay ass or data fuel databases and like a couple of other like open source. Packages like open refine sort of data cleaning tools. And and that sort of thing. So that's the that's the high level if you go to digital ocean dot com today and take a look at what they're offering. It feels like they've sort of departed from the way that you and i are clearly think about what they do. They've now instead of talking about droplets. Which are their virtual machines. Could they're called digital ocean. They call their instances droplets. It's adorable to someone i'm sure. But they've broken it down now so instead they have compute offerings database offerings and storage offerings and network offerings. And for some god forsaken reason they have managed cooper netease platform and a bunch of other things in there. And that's great. But the reason that i was a digital customer at was simply that you can come in from the perspective of knowing vaguely how a lennox box works and sign up for an account. Start clicking around and within a minute you'll have that you're drop it up and running that you can log into and then you can get to work and say what you will about the other larger hyperscalers and all of crawford technical mastery and the rest. There is so much work and fundamental that you need when approaching these things for the first time getting an ec. Two instance open running in aws takes fetching almost twelve other services. I am learning how the console works making sure the bill isn't going to surprise you with digital ocean's cool you're gonna get this size resource for whatever it is that you want. You're going to pay a fixed fee every month for the thing and that's it years later. Aws came up with their light sale offering which tries to be the same thing but reading their terms and conditions. That sounds like a terrified. Some would actually use it that way. Their support for it can only be described as begrudging. Honest that is exactly what is what exactly what you described. And frankly you know like i said i what i like about digital ocean makes it. I mentioned just spinning stuff up to like mess with something really is we can focus on whatever it is. We're doing with the computer that they're giving us is our air quotes around the computer and not like how to get the computer or you know what i'm gonna you know. I don't know it's it's the use case. It just gets everything else out of the way. So i can focus on whatever it is i need to do. It's a very get shit done platforms concern right. It doesn't force me to become an expert on digital ocean. And i know that sounds like a weird thing but i've never understood. Has some of these tech. Companies have somehow gotten to a point where in order to use technology effectively. You don't just have to use it. You have to take classes on it and do deep studies on it and they will sell you training on using their platform and give certifications that they of course charge you for on top of their platform and it just goes on and on and on and at some point it's i assure him doing an awful lot of work for something that directly benefits you. What's the deal here. it's it's volunteering in some ways for large companies and i've always had a problem with that. Did you lotion has again. They are not sponsoring this to my knowledge. They haven't paid me honestly. I wish they would. But i love the fact that they are these at opposite of that one thing that they've done for years is i have to say far and away their documentation for how to do. Various things is some of the best on the internet full stop. Yes full disclosure. You can imagine based on said. I don't do anything. We are not my colleagues. And i are not doing anything particularly difficult but any time that i've needed to figure out something. It's easy to find the stuff that i need to find the documentation and i. I'm not. I'm not spending as much time navigating as i am reading and understanding. It really straightforward. I don't actually. I know that i've seen their docking dacian. But i don't i kind of different things and i. I'm glad that you sent this. Because that brings us to add opportunity to talk about what i'm getting at. I don't run things on digital ocean myself. I haven't for a couple of years because my business is so. Aws centric that anything. Even slightly load-bearing to be very direct. I prefer to have it in an environment. Where i know people who are high enough in the org that i can call them personally and yell at them to get it fixed if i need to. It's a weird support plan. That's called being loud on twitter. Who knew what. I will spin up something over on. Aws that i'm trying to test something. Like oh. I have to remember how to configure engine x. That used to be my daily job. But it's been a few years. And i don't remember the configuration file syntax off the top of my head. I punched into google how to configure engine x. And more often than not a digital ocean documentation page is one of the first things you'll see and sure talks on some level assuming digital ocean droplet size environment but it's not prescriptive to the point where none of it matches it's okay you're on a unix environment. Here's your start. Here's where you stop and there's nothing about digital ocean specifically that matters in that article so i did some digging into how this works and it turns out that they pay people in the community to write documentation articles on how to do things like that and they will take whatever it is that they pay. I forget the exact dollar figure and pay that much again to a registered charity. I believe of the author's choice or select from a list or something like that so you're writing documentation and getting your name out there. You're getting paid to do it. And you're benefiting good cause and i'm looking at this. This is amazing. This is the exact opposite of payoffs will tell people that you're certified on our platform. It's it's the inverse of that and it's it's one of the most amazing things. I've seen on soon as i discovered this was. Why does everyone not do this. Yeah that is may. I didn't know i didn't know this. Yeah and by the way. I'll just also say that there's all kinds of things that i will search for all the time and yeah a digital article that even just like little portion of the digitalization article will have one. I'm looking for. But i did not know. That's how they that's how writing happens over there. That's very cool. Yeah okay. I just pulled up the page. Now it's linked off the front page of their website. Authors typically receive three hundred dollars per tutorial the author then selects a single tech focus charity or nonprofit group of nonprofits united by similar missions to receive a donation from digital ocean and then they wind up paying in both directions which is kind of awesome. Absolutely that is wild. It's why the documentation is so good to apply for this so it's not it's not content farm where any yobbo can wind up putting together something that reads old e how documents of. I'm just going to basically jam every Search phrase we're trying to optimize for into this thing eighty five times it's well written. They clearly have an editing process. Because the voices relatively consistent between most of the things that i'm reading and it's it's sort of go to answer as opposed something like stack overflow where i. How do i configure apache. And it's the first result on google and the response is this topic has been closed as off topic or as a duplicate. It's great but you're not gonna delisted or anything from google going to pollute. The search results awesome great. That is what it takes to be. A full stack overflow developer. These days excellent excellent. Yeah yeah there's all of the articles in. I'm just clicking around here. But it seems as if they there's so much in their little tutorials and other sort of articles that it's just. It's exactly what i want to know. And it doesn't really say it's not doesn't seem like it's if it is digital osa if if there are like nudges or pushing to spend more on digital ocean i've not taking. I'm not remembering that and definitely not. I'm not conscious of short. Yeah for whatever reason. It just seems to work now. They did do a significant layoffs series over the past year. Not the exact dates on this but it seems that a lot of people. I knew who work there. Were great at doing. Things are no longer there. And it's clear that again as they filed to go public. I don't believe if they completed that or not. It's it's one of those areas where they are as of march. They are publicly traded on the new york stock exchange. Good for it. They made three hundred nineteen million dollars in revenue in twenty twenty. They're losing forty three million a year right now but marketing expense fine. Making money is so passe. Apparently and great but it seems like they've changed their strategy as a result and their messaging because going to digital ocean dot com is not what it used to be for me where it was get started. Click here we'll give you a pile of free credits and you can spin up a you can sort of a droplet and get started now. It feels like they're trying to compete on the same grounds as the big cloud providers tier one hyperscalers but with love and respect to the company and the people there they can't compete in that world they can't they don't have the capital expenditures. That are going to be are almost necessary to get there. They don't have the ability for better worse to wind up offering the full breadth of solutions. And perhaps most damaging there is no existing giant pile of customers. You can point at who've done all of these things with aws or azure for better or worse. It doesn't matter what you're trying to do in those environments punch into google asked on twitter. Someone has done something very similar on that platform. Before in many cases the public community blogs are far better than the official documentation because they they talk about the real world personal experience. People have gone through. And i find that a lot. More understandable and relatable there you do have to evaluate for me somewhat conservative point of view as far as. Is this actually going to be the right way to do things. Because there's some terrible advice on the internet and putting it in blog form doesn't make it better. Yeah yeah. I don't know. I'm certainly not a marketing business. Element in person but if anybody from digital ocean's listening i like doing what i'm doing. These continued to help me do that up quite fine with whenever you wanna try for me. At least it's i can answer questions with digital ocean that i cannot in. Aws around billing. For example my entire business is built around optimizing the aws bill for large environments. I would argue that. At this point. I am pretty close to being in the undisputed expert in aws billing on a global scale just because of the different experiences. I have the customers. I talked to the way i think about these things and i still cannot tell you today if i spin up an ec two instance in aws of a given size. And i put a wordpress blog on that thing and start posting about it. What my bill will be at the end of the month within any close about twenty percent accuracy. Because you're metered on snap. Volumes that store desk great the snapshot backup functionality. Of that is based on how much has changed. Since last snapshot data transfer is going to be non deterministic if i use their c. n. In front of it at the data transfer is going to change based upon the geographic location of the request. Her so i will pay more for traffic when someone requests something from the middle east. Then i will if they requested from baltimore and that becomes a very weird thing. I clicked the pricing page digital ocean. And wow it. Turns out that digital ocean bill is not a business. I will be able to get into because it tells you upfront. Their basic droplets start at five dollars a month and it comes with a gigabyte of ram twenty five gigs of ssd disk and a terabyte of transfer and it goes up from their various tearing and a whole bunch of different sizes. Go look at this themselves. I'm not here to wind up selling products for them but it is easy is easily understood in. It's easily understandable and at this point predicting what my bill is going to be distills down to basic arithmetic. Yeah absolutely. I don't know if you could hear my my eyes. Were rolling when you're describing trying to figure out what an easy to work be. That's my experience. Well there's if. I wanted to change something about my health insurance plan than i would go to my insurance company and go through their ridiculous mind-bending through labyrinth than it sounds very similar to that iota. Scale is a leading cloud cost management solution designed uniquely for engineers to make smart cloud cost decisions with smarter attributions and smarter analysis with iota scale. You get a complete view of your cloud. Infrastructure spent including containers and kuban eddie's ninety five percent cost attribution accuracy actionable recommendations and continuous. Cost a nominally detection. You get team. Based alerts via microsoft teams and slack to prevent monthly bill shocks and machine learning based projections with predictive analysis and budget alerts for teams products and applications iota scale is widely adopted by some of the best engineering teams in the world including zoom hulu encompass. Who depend on iota scale to help them save up to fifty percent on their cloud costs requested demo and find out how scale can empower your engineering teams today. Visit iota. Scale dot com slash demo. That's iota scale dot com slash demo. The apache airflow community would like to invite our listeners. To join google astronomer. Aws electronic arts bbc pinterest and more leading companies on july eighth at the airflow summit. Twenty twenty one. A virtual conference designed for data engineers data scientists and anyone with a need to author schedule and monitor data pipelines using python in the keynotes talks workshops and panels you'll hear from the airflow contributors as well as data pipeline experts using air flow in their organizations topics covered include patterns for deploying a patchy airflow for analytics creating airflow providers production izing machine learning pipelines. As well as lessons learned. Through real life implementations. The conference runs july eighth through sixteenth. And we'll be held in multiple time zones around the world last year summit attracted over six thousand professionals working in the data stack and the airflow communities expanded rapidly over the last twelve months. Discover what's driving this excitement. Check out the full agenda at software engineering daily dot com slash airflow summit and registered now to reserve your spot that software engineering daily dot com slash airflow summit. It is often those are directionally aligned. As far as how the source relative complexity goes now. I feel like on some level folks are looking at digital ocean internally and saying look we we need to compete with these other folks because we need to offer all these higher level differentiated services. Oh yeah look at this. And i'm scoffing. Because they don't even offer managed blockchain. How can they possibly be a serious provider. That one of those useless things. No it's they have a different perspective on this and they have the building blocks of things that i consider the primitive. You're need our virtual machines. They have an object store that they call spaces also with a reasonable pricing predictable. Model that is an s. three equivalent. They have load balancers which is incredibly important. When you're trying to have more than one of things a resiliency who'd thunk it and they have managed database offerings. because honestly i really don't want to manage. bespoke handcrafted. My seager post graph databases ever again. And that's it. That's really all i want from you. And that is what. I think of when you say. I have digital ocean environment. But now they offer out a whole marketplace's area where a bunch of vendors can configure things. I don't know what it uses that if you've ever actually have actually two wonderful tell me more he was maybe one of the first time we we're looking at open refine This site open source project. It's basically like cleaning shitless data. I think amount of google. And i. If i'm miss remembering this when i when i had this from the use case for it i didn't really i just wanted to just click button on our set. This might be a thing. And i think they just had it. I think in the marketplace they just have it is said go in so like know. The droplet comes up. It's just refining running. And so i think that was my experience. I think that's that's what it was. So those sort of experiments where i said like you know maybe once or twice a month just sort of check something up or i'd actually just have to do something just very real in critical but i don't know it doesn't need to be up permanently. I think that's what. I used the marketplace for. I've seen some amazing stuff in the various marketplaces One of the early folks back wordpress was one of the things that they wound up. Shipping was a company called bit nami. Or as i insisted on mispronouncing it bitten. Am i what i found. Great about them was that you could. You could just go ahead and grab a thing from the marketplace with a small premium. That germany didn't matter that much in a business context. And i didn't have to think about these things. I didn't have to spend an hour setting wordpress exactly as it should be done. And i've run wordpress at large-scale for hosting companies. Before i don't want to deal with it. I just don't. It's these are sensible defaults configured by someone presumably knows what they're doing better than i do and you get to get on with your business and that's a powerful thing. The world is shifting and moving in a more up. The stag says direction. At this point. I argue the next phase of cloud is gonna look a lot less like by a bunch of virtual machines and string them together with your custom software and instead starting to tie together a bunch of disparate. Api's with either low code no code option or building things integrate between different providers more efficiently service is going to play some part in that. I suspect but i'm not here to predict the future. I love how the love of digital type of offering works today. And i have and i hope they continue to go in this direction. I'm just kinda worried by their website says these days because it starts to feel like a very different thing. Yeah yeah. I really hope they. Don't i mean if emulating the larger cloud providers means not paying attention to all of the things that i that. I love budget lotion. That'll be a huge well. Frankly that'll be a huge opportunity for some other for some other company. Because i i don't know i mean the last time i use leno to longtime ago i don't remember it but i don't really know who digital ocean's despite what they think their competition is in my mind i don't know who digital ocean's served competitors are probably because by really care 'cause they're doing whatever right gone. I click on learning about the products that it tells me that you can have a bunch of different decisions. You to santo as debbie. Fidora coral asan freebie. Sd which was my first operating system and those were the days but great. A lot of people. Listen to this. I don't know what those things are. And that's great. You don't need to. That does not make you any less of engineer or less a developer. That's fine but for those of us who are looking for things like that and angry loud obnoxious opinions about what distribution operating systems use. It's there but but then it goes directly down into We we have a benchmark as the next thing they tell you about industry-leading price performance and they're doing a comparison on the performance story between digital ocean's offering a google and aws. I'm sorry who cares about that. When i'm looking at do i put this workload in. Aws or do. I put it on digital ocean. I'm optimizing for simplicity not. Well how do. I ring the most money for infrastructure out of these things. Because let's be very realistic here. You're not going to spin up ten thousand droplets to host an application. If you're most customers you wanna get a few of these things up. They're going to be much more like pets instead of cattle. And you don't really care about squeezing performance out of it because the cost economics blow themselves out completely when you start to realize just how much engineering time it's gonna take to get up to speed with other providers it. It's going in the wrong direction. They still do a lot of great stuff to they have built in cloud firewalls for example. They have a backup option recheck. A box and it is it backs up the stuff you care about. Not oh you got the wrong volume and this has got us out a little ridiculous but bear with me. You turn on backups. At a just starts working and it twenty percent of the cost of the droplet the end. And oh wow. I can figure out how much that's likely to cost and you're done. That's kind of awesome. That is excellent and frankly my company is entirely bootstrapped. And i just don't wanna spend either money or time. Having to deal with steph. The use case. That i have is my guest. There's a whole bunch of lots of dilution partner. Customers have are using it. The same way that i'm used this and that they've important but it's not like i don't need to need to read a dissertation. I'm not interested in doing if i if i was interested in proselytizing the tools that we build for example than i would go. I probably go somewhere else but getting to the point where we would hard ties is entirely because of digital ocean. The way they have things absolutely. There's such value in having that. Simplicity and being able to articulate what they're doing and how my use case fit super well with the larger cloud providers. When i travel remember back in the days of travel how wonderful that could have been. Sometimes the only computer. I've been reduced taking with me. And i present a conferences from it. I wind up doing running my business from it is my ipad. And that's great so my development box was fundamentally just olympics. Box sitting in some cloud provider because honestly. I don't want to hear the fans in my home office anymore. So let's put that somewhere with good connectivity. Where i'd never have to worry about hard-drive replacements again and call it good. And there's so many finicky things you have to worry about. When getting these things set up and let's be clear. I don't know about your development practices but mine are rubbish. I wind up writing code in one key way. It's style guide here. That is badly. I look at these things. And whenever i have a professional engineer come in and tidy up some basic tool that i wrote to turn it into something. That's more extensible. Supportable the rest. They said great so. This is the input. this is the output. That's really all in. I don't wanna look at your code base anymore. Follow up question. Do you want me to repeat the replicate. The bugs are fixed them as we go and it's well that's just hurtful but thank you for asking. Yeah and that's awesome but my development environment is. I will install nonsense on my system when i need a new library there and then well that's going to stick around until i re image the box because i don't get done well. I can remove all those extra libraries. I just installed by after dna yom or whatever but it gathers cross and it becomes this bespoke unicorn box if i had to go and provisioned the whole thing again. I'd start over with very few packages installed and for the first few days. So i tried to build things i would keep. Smacking face burst into a bunch of dependency errs and figure out. Oh yeah i have been stalled that thing again and that catches up with time. Everyt- configuration management option about. Oh you should put this into a user data script or use popular shafir salter answerable or something like it or you could tariff form all this stuff. It's it's easier for me to type apt. Install that i want and whack at or not think about it again than it is for me to sit there and update and manifest somewhere committed to get push it and i treat it the very much like a pet. This aligns very well with a digital ocean story. Yeah absolutely as you were describing. That just noticed. I have seventeen copies of file called. Do it dot s. h. On my digital ocean instance. So i think that's good actually undergo Go ahead and that's primarily good. I think you're absolutely right. That's the thing about the getting shit done. I just don't really. I don't need to for what we're doing Have to pay way more attention to the the actual contents. The purpose of what i'm doing again i don't i don't need to win any elegance awards or any of the things that we do. None of it needs to coach scale absolutely nothing we say versus the things we do. Our worlds apart by that was my joy being a senior engineer. Once upon a time i would be talking to new hires or interns and i would lecture them very sternly of remember everything river cattle not pets. Everything must be done pratically. Never by hand as i begun the nine month process of setting up my work. Laptop we all through these things and that's how life works. I think if i wound up having to re image my mac try backups. The data actually care about is security. You you don't get to be assistant gm for very long before you really learn to appreciate the value backups. Beginning all the applications that. I use on a week by week basis installed. I'm still gonna be tripping over things that i've missed for months. I got a new look at the start of the pandemic. Because i hadn't had a desktop computer in ten years and i was troubleshooting a weird network issue the other day and that was the first time i had to install wire shark on this thing. Oh great. Let's get that installed. And because i used to use that all the time these days i don't really care about what's going on in the wire in most environments. I'm in clouds. Lie about it anyway. But i needed it and it was right there but yeah that's not gonna sit on my drive probably not getting updated until i replaced the machinery image and again that is that is the circle of life and becomes load bearing on some level becomes comfortable. Yes full agreed on some level admit it feels a little disingenuous to have digital ocean included tour of the cloud series here because there's a strong argument that it is not a cloud. It's someone else's computer now. There's that old saw. There is no cloud to someone else's computer which you may as well to spend with. I'm very intelligent. It great you have made a point. That doesn't actually change anything and is not going to make anyone like you or agree wholeheartedly. With what you're saying good for you. You are the specialist person now here in the real world. The rest of us have work to do. But it is a cloud in the purest sense. Where i used to work in data centers i've been in the environments were. Oh we need a new media server. Great we're going to expedite this so it'll be here in six weeks. It takes time. It takes capital expenditures. Unless you're doing some zany lease agreement deal. You have to clear space for it. Make time to go to the data center of someone else. Do it for you get it. Provisioned get racked. Lose four hours because you forgot about the cable. That wasn't working properly and put it back in the din. The not there it is again and it becomes an unfortunate process mass where now. It's oh. I wanna test something happening at two in the morning because i can't sleep for whatever reason usually baby related and i'm going to i wanna spend something up and test it. Well great i can do that. I can spin up a node. Install something on it and discover a couple hours later That didn't do what i thought it was going to do. Oh well never mind. Turn it off turning it off. His key cloud economics and my total infrastructure costs for that is something like twenty six cents for the experiment. Conversely theoretically if i ever break character and come up with an actual good idea and it becomes something that has legs a sass product or something that twenty six cents investment at the initial infrastructure size could at one point theoretically grow to become millions and millions of dollars a month like a lot of a hyperscale successful publicly traded. Companies are but the fact that i have access to this type of doing and rapid prototyping and interational on the same type of infrastructure with the same technology basis that i would responsibly incorrectly. Run a very large very professional bank on top of that's powerful. There's no crap too good infrastructure transition here. And i want to be clear. I don't even it to lotion through that light. Either at some point. If i were doing this on top of digital show which is not a terrible idea i would have to go ahead at some point realize i've outgrown much of the platform and it's time for me to migrate somewhere else. That's more capable. But that's there's no shame in that and trying to say that i've never. I've never going to build something. On digital ocean as a direct result of that itself inherently becomes a early optimization. That's not helpful for anyone one thousand percent and again like from an economic standpoint. I mean let's let's let's face it. There's what i pay for. Coffee is way way more on a monthly basis than what i do on digital ocean. And if we were to ever you know have gazillion users on the application. That that we're using then great. But i would never go i would never reach for any of gdp or aws for azure for like. I'm just going to mess with this. Because i already predict. I'm going to spend way more time dealing with the stuff that they need to use in order to get the thing. That is the entire goal of whatever. I'm doing then i would be than actually doing it. I think that it's difficult for me. Even at the scale that. I'm at where i have. Customers in aggregate spent billions of dollars a year on aws. I don't know if any of my clients who spend more on. Aws than they do on their own payroll. The people who work on technology are always more expensive than the technology itself. Unless you're in a very strange very interestingly structured company like if you're a very small sprout up where you're basically just started take a salary or three people and your data science company. Okay great review have that kind of data storage. Their data bites impediments. And you're paying yourself below market rate. Okay you're an exception. You're also generally not of the scale where i'm consulting with you folks but by the time you actually build this out and become a quote unquote big company. That changes that inflex massively. Yeah i would be concerned. If digital ocean's transition to a public company would be influencing seems probably. It's almost almost certainly true that they're investors out in the world of the market. Don't have some way of looking at one cloud provider versus another amundsen. They probably just all lumped them all together and so maybe that's why they're marketing their their nudging directions that at least initially smell. Like aws and azure gp. But i hope that that. I would like to stay as a customer. So let's let's keep us give what you have been doing that. Got you to be successful enough to become a public. Exactly whenever i talk to people who are trying to optimize a bill. Well i'm spending five hundred bucks a month on aws. I'd like to get it down to three hundred or so. It's honestly unless you're about to just start hitting the scale button on this thing trying to optimize so that it doesn't become absolute massive which almost never the case. Great ignore it. You are not going to cut cost. Cut your way to finding product market fit to scaling to building out something that is in any way significant either. Build the thing and make sure it works. You can always optimize it later. Because if it doesn't that all the time spent optimizing something was wasted time and effort. Get to your next feature. Launched that more effectively and it will be more transformative to your business than saving almost any amount of money underclass bill in most cases and there's nothing inherently wrong with that some sort of pathological drive to how to be set up for success setup for scaling early. I think that's dumb. This episode of software engineering daily is brought to you by data dog a full stack monitoring platform that integrates with over three hundred fifty technologies like griffin pager duty. Aws lambda spinnaker and more with rich visualizations and algorithms. Alert data. Dog can help you. Monitor the effects of chaos experiments can also identify weaknesses and improve the reliability of your systems visit software engineering daily dot com slash data dog to start a free fourteen day trial and receive one of data dogs famously. Cozy t shirts that software engineering daily dot com slash data. Doc thank you to data for being a long-running sponsor of software engineering daily counts. Showcase is a social network built an optimized for developers developers can connect share their knowledge and showcase their projects with like minded individuals as the world is increasingly filled with more and more developers. It's about time we had a network built around developer workflows tools and features on showcase share blogs videos projects short messages and connect with developers. That you've worked with and have customizable profiles join thousands of developers on showcase today. If you're a content creator for developers showcase helps you make money by putting your content behind a paywall whether you write daily java script development articles share advice for programmers career or make video tutorials about certain frameworks creators can grow their own communities connect with an audience and earn money sharing their knowledge with paid subscribers to activate your paywall free for six months go to showcase dot com slash sl daily. That's showcase with two ws s. h. o. w. w. case dot com slash. Save daily thanks to showcase for being a sponsor of software engineering daily. It is it absolutely is but we. We tend to worship at that altar. One thing that i've noticed historically is that engineers are incredibly good at you know engineering. But in many cases they don't have the overall business context and face as one of those engineers myself. I love solving puzzles. It's fun leave me to my own devices. And i will cheerfully look my six hundred dollar developer environment in aws and spend a month of my time on the clock trying to find a way to knock two hundred bucks off of that great. That's fun but that's more or less like doing so at work. Which is fine. It's a great distraction from time to time. But you want a time box that because at some point that's not going to appear on your evaluation unnecessary. I have enough problems with being addicted to buying too many books that i won't read. I don't need another. I don't need another distraction. Exactly so i don't want to come across as being positive towards digital ocean. I i think we've been pretty even keeled here. What as a customer of you've seen from that you don't like what are you wish that they did differently from someone who cuts them a check on time and full every month. You know. it's a good question. i entirely. Sure the downside of it just works. And i don't have to worry about. It is at least for them. I if i don't find anything lacking then. If i'm trying to do a thing and i find that i can't do it with digital ocean that time hasn't shown up yet i don't really. There's there's nothing that comes to mind about what they could. You know what they could do better. There's there's there's very little that comes to mind. There's yeah there's there's nothing that would even waste any time talking about what i'm going to say. The thirty seconds at times it takes for a droplet to spin up is too long. I'm not gonna complain about their pricy seconds. But i want it now. Yeah if that would be the folks that wanna complain as a pastime. That's just not me. So yeah i don't want to be clear a lot. What i'm saying is my own personal opinion on these. Things is what i'm talking about here. There are people who could disagree and do so quite reasonably in that direction. For example there has full tariff support. You absolutely can do infrastructure code on top of digital ocean. I just don't want people who are actually doing it as all it you could in theory build out a very large scale site on top of digital ocean. I'm sure some people have. They did not make hundreds of millions of dollars in revenue last year. From just a whole bunch of developers each running five dollar droplets. Let's be realistic with ourselves. There's there are clearly things there. I just don't see it in my ecosystem and that's okay. I speak to generally a fairly specific customer profile and running very large environments on digital ocean is not that. Yeah although if i were to say just as a as like a member of this community i if they wanted to compete with the scale the aws the jc paisley and if they can do that without messing with or or decreasing the attention that they give to the services that i'm using more power to them find great even even a temporary addition to the competition of the big three sure. I'm sure that that would probably be ba. Fine i'm just not. I'm just not interested in it but yeah almost certainly. There's there's at least some large customers don't know who they are some large customers. They're certainly the largest whoever that is at digital ocean. I just said. I i don't know i i wouldn't reach for digital ocean if i had really high confidence and the intention of spending the money on like scaling roller coaster style. But i don't think i just don't want them to change and i think that we're direction correct on this. I just clicked on their sales page contact sales. There's a bunch of mandatory questions. one of them is. What is your estimated spend going to date and the top tier is above twenty five hundred dollars. I'm a consultancy that doesn't really run a whole lot of no production workloads in critical path. For anything we have some with a few terabytes of data we have floating around that. We do some analysis on but our total spend when all is said and done for internal tooling is about that on. Aws it and that's the top tier. It tells me that pricing discrimination like that as far as what would bracket you put yourself in by when when you start talking about actual dollars and cents that is positioning and you start bucking yourself into how people should be thinking of you at enterprise scale for. It spend if someone pulls that up and sees that as the highest bucket default response from a psychological point of us. Oh never mind. They're not serious. But i've worked with digital and they are serious if if i needed for some reason to have a product for the correct answer was two thousand droplets provided that they have the capacity and through put to do it which i have no reason to doubt. There are perfectly viable option. And yeah that's gonna cost way more than their top end. Call call for details support tier. But i wonder if that's not doing them any favors. Yeah yeah how many cupcakes do you want. And how many pallets. How many trucks of cupcakes do you want. Those are two different things. Yeah yep fuller berry. I like to wind up having some baseline pricing that makes sense. But there's always always always got to be the call for details tear because when you're a big enterprise and you're going to buy a lot of things you know that there is no standard pricing. Click the agreement for scale that you're ever going to use. There's going to be custom deals. Negotiating with the cloud providers. None of these big customers are paying retail price full. Stop if they are. They're missing a very key. Point discounts tidal volume. Who knew exactly korea. I can't imagine that digital ocean would be a target for for duck. Bill just seems like it's entirely invited will be just a completely different business credit. Where do in years past. They have sponsored by nonsense and they did for a while because it turns out that well why why would digital ocean sponsor an aws newsletter. Yeah because you of was customers are everyone. And you know digital ocean's customers are those exact same people it's not either or it in fact. That is the one time. I've got an actual pushback for my newsletter over a sponsor that included someone at aws who is no longer there ruin. Their sales group reached out. Incensed that i had a non partner sponsoring this and if i was not going to pick one of their partners to sponsor who was aligned with their go to market strategy than this person was going to stop recommending. That people. Subscribe to the newsletter. I i happen to remind me. Every time i don't work for you first off. Secondly oh if you would like to sponsor or have some of your partner spots great. Here's my media kit. Let me know again. I as long as they're not going to harm people who are reading my newsletter if they implement whatever it is it's advertising i tend to be okay with it. I've not being prescriptive or exclusionary in that respect. I have thrilled to have reasonable products. That do good things advertising to the market. That's fine that's how this works. But i'm not going to sit here and and worry about that but also just blew my mind because it was very early days. It's your amazon and you're worried about competition from digital ocean. How insecure are you folks. Waste more than that under banana budget than their entire market cap. How how does that work. Yeah well it sounds like that Sponsorship way back then touched a nerve and did and honestly. I'm quite okay. And do it twice again if i could again. Most of my sponsors ball digestion pattern they sponsor for a while and they don't they come back for a while and they don't. There are very few publications that have one recurring sponsor that has been there for eight years. That people want to rotate try different things. Keep it fresh again. They're welcome to come back if not that's fine too. It's i have no ill-will toward digital ocean feel like someone there's going to be sitting there seething listening to this saying he hates us and he's being us how about we go to his house and kidnap his dog. I i don't need it to come across that way. And secondly she's a chihuahua please. She's awful take her off my hands. Corey look here's something that i know about you. If you don't think anything is worth your time you don't say anything about it. So the worst thing that you could do for a company is not mentioned them. All are picking five companies to talk about this week. They were one of them and they were very clear right near the top list when i was putting together my list of initial talk about new ship night. Because they're great. I love what they're doing. Yeah it is is swinging heavier bat. If i were if i were to start only gone records. Say that if i were to start some bananas idea of like starting cloud provider company. The i o k ours or whatever serve goal would be. How long does it take for core quinn to acknowledge us so even with absolute shit talking comment. That would be a win absolutely again. I've always said about sponsors is that you can buy my attention. You cannot buy my opinion. i don't care how much money you throw it. It's not going to be enough to offset my credibility and i would use digital ocean for a number of projects in fact having spent the past hour talking to you about this. I very likely will again soon. Other top of mind for something. I'm thinking about again. Customers come customers go and it's fine. I hopefully this has been helpful for folks and they have a bit more insight into what digital ocean is and how we think about it. People already know where to find me but if people wanna yell at you for your terrible opinions today and they're very angry about it where can they find you. You can find me on twitter where lots of yelling happens. I'm all spa ael l. Sba w on twitter excellent. Thank you so much for taking the time to speak with me about digital Always fun having conversation. Yeah thanks for riffing with me. John all spa founder. And principal at adaptive capacity labs and this ends. Today's tore the cloud. If you've enjoyed this. Podcast plays follow me quinny pig. That's cute doubled. Y pig on twitter and head on over to last week and aws dot com and subscribe to hear more from me on my podcasts. The aws morning brief and screaming in the cloud and of course my newsletter. Last week in aws. Thanks for listening. Make good choices.

google cory chiefs jetbrains jetbrains cooper netease kuban eddie moseley pinterest twitter Netflix crawford Jeff jeff Don
Data Lineage For Your Pipelines - Episode 82

Data Engineering Podcast

49:01 min | 2 years ago

Data Lineage For Your Pipelines - Episode 82

"The. Hello. And welcome to the data engineering podcast, the show about modern data management. When you ready to build your next pipeline or want to test out the project to hear about on the show, you'll need somewhere to play it. So check out our friends at Leonard with two hundred gigabit private networking, scalable shared block storage, and forty gigabit public network. You get everything you need to run a fast, reliable, and bulletproof data platform, if you need global distribution, they've got that covered two with worldwide data centers, including new ones in Toronto and Mumbai and for your machine learning workloads, they just announced dedicated CPU instances. Good data engineering, podcast dot com slash lynyrd. That's L I N OD today to get a twenty dollar credit and watch a new server and under a minute and understanding how your customers are using your product as critical for businesses of any size to make it easier for startups to focus on delivering useful. Features segment offers a flexible and reliable data infrastructure for your customer analytics and custom events, you only need to maintain one integration to instrument your. Code and get a future proof way to send data to over two hundred fifty services with the flip of a switch, not only does it free up your engineers time. It lets your business users decide what day do they want wear? Good date engineering, podcast dot com slash segment. I o today to sign up for their start up plan and get twenty five thousand dollars and segment credits and one million dollars in free software for marketing analytics companies like AWS Google intercom on top of that. You'll get access to analyze academy for the educational resources you need to become an expert data analytics for measuring product market fit, and you listen to the show to learn and stay up to date with what's happening in databases streaming platforms. Big data and everything else, you need to know about modern data management for even more opportunities to meet listen and learn from your peers, you don't want to miss out on this year's conference season. We have partnered with organizations, such as Riley media data. Versity into the open data science conference, go to date engineering, podcast dot com slash conferences to learn more and take advantage of our partner. Counts when you register and good date engineering, podcast dot com to subscribe to the show, sign up for the mailing list, read, the show notes, and get in touch, and please help other people find the show by relieving review on I tunes and telling your friends and coworkers your host is Tobias. Macy and today Madrid doing. Joe Dolan her about pachyderm the platform the Letsie, deployed, manage stage, language agnostic data pipelines, maintain incomplete reproducibility and provenance. So Joe could you start introducing yourself? Yes. Sure. My name's Joe donor is great to be here talking to you today. Tobias. I am the founder and CEO of Packer, and I started life as a software engineer the first company ever worked at was rethink DB. And that's basically the only other company, I worked at besides little while at Airbnb between. And so, do you remember how you first got involved in the area of data management. Yeah. Absolutely. I mean, I have always been interested in data infrastructure tools. So rethink DB was an open source database. And I knew coming out of college. I wanted to work on these types of data management, data analysis data manipulation tool. So I joined that company, right out of college and got to cut my teeth doing like open source software development and data infrastructure and things like that. Absolutely fell in love with it. And then after I left rethink DB I got really interested in big data rethink Deby's, more of a transactional database, you use it as like the back end of your website. And so I wanted to learn what the world of like data science, data analysis, and everything looked like, and so, I started sort of hacking on pachyderm in my spare time, it was actually sort of started because I wanted to use the do platform to analyze some chess games. I'm a big chess fan and the system was just really, really Cluj. He and it was all based on Java, which I didn't like that much. So I sort of started hacking on what an alternative to this might be. And along the way, I spent some time working at Airbnb. And so I got a chance to see. There hoop infrastructure look like and what the challenges were there and so doing this concurrently with hacking on my own stuff. It sort of eventually turned into the platform that became pachyderm, and then we manage to get funding as a company in the company sort of took off from there. And so actually had Dan white knack, on to talk about pachyderm way back in episode one about two years ago. But wondering if you can talk a bit about what has happened in those two years, both in terms of the platform itself, and the company and just the overall environment of big data and data analytics that you're fitting, your platform into. Yeah, absolutely on the, the core mission of the company hasn't really changed much when I was working at Airbnb. I saw a lot of gaps in the data infrastructure that existed in the day in that day and age on the biggest one I saw was sort of an absence of the ability to track any sort of providence or lineage of the data and. The way that this really came up for us at Airbnb was, we had this massive pipeline of data analysis tasks that had been written by a bunch of different data scientists, and it was really, really challenging to keep all of these green at the same time, because everybody's modifying them, and they're all sort of working dependency, in someone makes a change this incompatible with ones downstream in the whole thing. Just cascades, read all the way down. And so we would have important tasks like our fraud, models that would just sort of start coming out blank when when something went wrong. And when that happened, I'd be going into bug it, and sort of try to figure out like all right. Where where along the way did this break, and I didn't have any way to ask the system like give me the full lineage of this data because because it looks wrong or something like that. And so that hasn't really changed what has changed is sort of the, the rest of the platform maturing around us. So when you first talked to Dan, we were probably about six. Into using Cooper netties, and that was because Cooper, Netease hit, one point, had been released probably five months ago. And so we were sort of trying to figure out what we could do on this platform, what sort of stuff it could provide. And now that's a lot more clear. And there's been a lot of features that have been figured out in coober, Netease, that we've been able to just sort of like pass along to our users. We've also figured out a lot, how to integrate with various machine learning packages that exists. So cute flow didn't exist at the time when you talk to Dan or when Cooper daddy's first came out, but it does now and it gives you a very, very good way to deploy a machine learning pipeline on Cooper, Netease, which by extension gives you a good way to deploy these machine learning tasks, pachyderm and xactly very complimentary with pachyderm because pachyderm basically takes the data right up to the point where it gets into the machine learning models. So, you know, any, any type of sophisticated machine learning model is going to have a lot of steps in it that are Cl. Cleaning the data getting into the right format. Joining it with the right data. You need to train stuff, and then the actual training process happens inside of tube flow. And then that comes back out into pecker, and then we start doing the inference steps in the checking how good this machine learning models and stuff like that. And all that all happens within pachyderm, as well. And I know that there are a number of other features of Kuban itself that have occurred in those past two years, including things like stateful set. So I'm wondering what are some of the other primitives of the platform that have come along that have simplified or via d- certain parts of the pachyderm? Kobe self stateful sets are definitely one of them because we are a we're not a stateless service pachyderm, in fact is all about storing state, because it's like for storing large amounts of data the majority of the data is actually stored in object storage, and so that, that worked before stateful sets, but we also rely on CD as the sort of meta date end consensus system. Mm for for pachyderm. And so having a stateful set set up to manage at a d is really nice, and manages things really, well, another thing that's been really, really big for our customers in particular is GP you support. So you know you can now in coober Netease and this has been true for a while. You can submit a resource requests to say that this pod needs this much memory needs this much CPU, and you can also have it asked for GPS and what they'll do is, it'll tell the scheduler that this needs to be scheduled on a machine that has GP when it needs to be given a GPO and have that available to during processing. And so this is really nice. When you wanna run these high-powered machine learning tasks that train a lot faster on a GP you see other things in, in criminals that we've used we've been relying a lot on the ingress features for some of the cloud stuff that we're building now. We're in the process of, of rolling out. Our cloud offering for pachyderm and the fact that Cooper Netease can do a lot of sophisticated ingress things with load ballet. And you can build authentication right into those been really, really useful for us. And so as you mentioned keep flow, has come along. And that's as you said, complementary to the capabilities of pachyderm, but I'm wondering if you can just briefly talk about, what are the sort of main pieces of pachyderm itself. I know that there's the pachyderm file system for supporting version ING, there's the pachyderm pipeline system. And I'm wondering if you can talk about more about any sort of additional complementary aspects of the overall big data ecosystem of things like airflow or kaffa, or various other sort of big data pieces that fit together nicely with pachyderm, or that pachyderm sort of supplants in terms of the overall workflow of somebody who's billing analytics pipeline on the pachyderm platform. Yeah. Absolutely. So it a very high level, the, the two pieces that you just touched on the packet on file system and the Packer pipeline system. Basically, all of pachyderm, everything that we have we think of is in one of those two camps the file system like you mentioned is responsible for providing version control for your big data. So if you for those of you who haven't heard the first episode Dan, talked about it. It's semantics very similar to get you've got commits, you've got repos you've got branches, but it can store massive amounts of data and storing it in cloud storage. So it storing it in like S three or, or something like that. The pachyderm file system is also the thing, that's responsible for enforcing the providence constraints. So it's got basically this constraint solver built into it, where you say, you know, here's, here's a repo that contains images. And here's a report that contains tags on those images. And then here's a branch that is associated with those two branches, meaning that it contains computations that have been done using those images in those tags on those images and the Packer pipeline system. Uses this API to then implemented, machine learning pipeline that takes these tags in these these images and trains a classified based on those, but in, in theory, something else can use that another people do their own things on top of this, this, and basically, use the providence system without using our containerized execution system, you can insert sequel crews in there. You can insert all sorts of things in there. The pipeline system is what's responsible for the scheduling of those these tasks. And so, that's you know uses Cooper netties to say, I want this branch to be materialized that contains machine. Learning models trained on the images that come in here in the tags that come in here and the pipeline system knows okay when a new commit comes in. I need to spin up these pods. I need them have GPS's. I need them to, you know, have these containers in there. So they have tents flow that it runs all the code, and it slurps up all the data and make sure, you know, the data gets into the pod, and then the data gets out of the pod, and ultimately, you get your results and because of the providence system, your results. It's will always be linked to the inputs that created the so there's no way to short circuit this, it's not like a system where you need to sort of, like when you check in your results, you also check in a manifesto where it came from. It's basically, you know, hardened forced by the system to give you an idea of some of the sort of new things in how this plays into other data systems. We recently released this feature cult spouts, and these are sort of like a pipeline in the schedule a pot on Cooper netties. What's different about them is that rather than pipelines, which normally take inputs process that data and produce outputs, these just stay up all the time. And they produce outputs. So it's like a spout of data coming into your system. And so this is really, really useful for subscribing to a Costco, topic, for example, and this sort of allows you to have a very convenient shinned between pachyderm any other system that you can subscribe to because it's a container, you can put whatever code whatever, libraries. Want in there? So, you know, you can very easily have something that subscribes to feed on Twitter and gets new tweets coming in, and those will just show up in your packet, and file system. And then downstream of that you can have all sorts of sophisticated pipelines and stuff that are processing those tweets that are training models on those tweets stuff like that. And I think that that's definitely one great differentiating factor between the duper platform that you're working to sort of replace where it's entirely batch, oriented. And there are sort of streaming capabilities that have been bolted onto it, but having it built into pachyderm as a first class feature, I think is definitely useful given that there is particularly in the past couple of years. A lot more of a push to doing real time streaming analytics. Yeah. Absolutely. And we this, this was one of the sort of earliest features that we conceived of because we, we have very sophisticated capabilities. But it's not like when you're making a pack. Pipeline. You choose like okay this is going to be a batch pipeline. And this is going to be a streaming pipeline. Those really no difference between the two and the reason that, that is the reason that we can do that is the underlying version control system. So because we can always say, all right. This is this data has this hash. It's part of this commit it hasn't changed since the last commit we processed. Then we got a successful result. Here's the result again, dented by hash. So we know that it corresponds to the same code the same data and everything we just get to reuse that result. And so really the reason that it's a streaming system. Is that we've got this pretty sophisticated computation duplication system in the background that just go whenever it goes to compute something tries to figure out if it's already computed? And if it has it just uses that result. And so this is often a bit of a magic moment for people when they first start using pachyderm is that they put in a bunch of data, they turned through it, it takes a little while because it's expensive computation and then. The ad a little bit more data and it happens super quickly. And we actually get people coming into our user channel asking. Why did this happen so quickly? I think something's broken did process it like no, the system just figured out that it didn't need to reprocess all of that data. And so you got a result really quickly because we're actually wasn't much to do. And when I was reading through the documentation, I was definitely impressed by the deduplication and data hashing capabilities that you have in the file system. And how that supports the increment Haliti of computation. So that as you said, you don't have to do a complete rebuild of an entire batch job, you can just work on the data that's news. Since the last time he ran something that was that was one of the things that I was most excited about having before before I even really started working on packet, or because I spent so much time waiting for things to recompete. And I think probably anybody who's tried to, to do a decent sized data project experienced this where like you right out all of your. Code you run it on all of your data. And then you find that there's this like one or two like files that have like some slightly different format that crash, the whole thing. And so then you fix your code and try to get it to run. And you can't get it to run on, on just the stuff that it failed on. And so you have to sit there and wait for two hours to see if it works on these two files. And then if it doesn't you have to do that again. And this was this was even worse at Airbnb, because we had like so many things depending on each other, and we had so much data there that basically the granularity that we had was running stuff once day because the pipelines would run every single night. And so if things were broken than we've got we basically right some new code. We commit it. And then we come in the next morning and hope that it worked. And if it didn't than we do the same thing the next night. Yeah. That's definitely a quick way to build a lot of frustration and burn out on a data team. Yeah. You know, that's and that's really the. The, the biggest reason that I wanted to do this company and this open source project is just that I felt like data teams in general were in a state where they could be a lot more productive. If the tools looked a lot better, it reminded me a lot and still does to a certain extent of what making websites looked like before the lamp stack in the people had all these CGI scripts, they're all these things that you could sort of cobble together. But there wasn't just as like well known good platform that you could just get out of the box, and build a website in like a weekend in your Karaj or something like that. And then once platform existed and people started to conceal around it in the tooling star to explode, you've got all of these, like, explosion of websites, and people were able to make all of this cool stuff, and I feel like that still hasn't quite happened yet for data science and data engineering. But we're getting a lot closer to it. And that's definitely something that I'd like to talk through in the context of pachyderm is how. The sort of collaboration between data scientists and data engineers, and the sort of breakdown of responsibilities and workflow happens within data teams. Both when it's just one day to scientist doing everything or when you're working at a medium to large organization, where you actually have that separation of roles and just the overall process of going from conception to delivery of data project. Yeah, absolutely. So one of the first most important things to say about this, because it's often sort of misconception that people have the throat them off a lot at the beginning, is that tech it or is not trying to be replacement forget or get hub or any of these other version code version control, and collaboration tools the we're version controlling different things. And so when people are successfully collaborating on packet. Or normally what this looks like is you have your code and get hub or get lab somewhere stored in version control, and you have a repo. I like to have it all in one repo, but you can have across multiple repos. You have a repo that has your analysis code that can be compiled into Docker containers. And then also has your pipeline manifest that explain how to deploy this onto a packet cluster. And then from there, you set up a CI pipeline that basically redeploys these. These pipelines, when when commits come in so that you can basically like merge into master, and you can have CIC process on top of this, and then from that, where you start to leverage, the packet on features is the fact that when you wanna have a branch that people are working on. That's a sort of experimental thing you can have your CIC de process, deploy that into separate branches in separate pipelines, in pachyderm, that can still share all the underlying data. So you don't need to make a copy of the data still all version control duped, but you can have these two pipelines running concurrently, and you can see okay, this one's running like this. You know, it's this one, succeeding, whereas this one's failing, so we wanna move on that succeed in this one is performing this much better based on, like these metrics pipelines that we've put on at the end and you can basically have a collaborative process around this, because the tools, enable it, it's a very open ended tool similar to get like, people have a million different branching strategies on gay people use mono repos people use like. Micro repos for their projects and impact, or isn't particularly more prescriptive than get in that regard. So we see people using this in a bunch of different ways. But the core like underlying concept is that you can collaborate, because the system is tracking your versions for you. And so you sort of always know, which way is up because you can always just ask the system. What's the history of this data? What's the the lineage meeting like take me back to how this data was produced versus history is taking back to what it looked like yesterday a year ago at cetera. And you can do things like buy sets. You know, you can say, like this look bad. Now it looked good a week ago where in between to change and one of the challenges inherent in pachyderm is just understanding some of the principal, some of the primitives of things like Docker and coober Netease. And so I'm wondering what you found to be some of the common challenges or points of confusion, or a stumbling blocks for people who are coming into this. Project and trying to get up and running with it because even just trying to define a Docker file can often times be a nightmare in and of itself. So that's definitely one of them is just understanding, you know, this idea that Docker is like a machine that you're sort of setting up every single time. But it's not really a VM and sometimes like the details of your machine poke up into it, because it's the same Lennox karnal everything like that. That's definitely one of the challenges, I think that, that one that's one that people normally get past at least these days that used to be a lot more of a challenge, maybe three years ago. But I think that just the both the learning materials about Docker and Docker files and stuff and just the sort of communal knowledge of that have really started to take hold. So, you know, most people at this point, if you're working at a decent sized company, even if you don't know, Docker, somebody there, does it will be happy to sit you down and explain, like here's how you make a Docker file. Here's how you building. I don't think the same can really be said for Cooper, Netease yet. And in some ways, I think that makes sense because Kerber Nettie is is newer, and it's also a more specific tool in a more complicated tool. So definitely the biggest stumbling block for people. Getting started with pachyderm is just getting Cooper, Nebi setup any we re help people with that all the time as much as we can. But we're actually not super Cooper, Netease expertise. We understand how to use it, and we understand how to deploy it in our, our system and stuff like that. But do people who wanted to play it on Prem, people who wanted to play it in, in sort of weird settings and stuff like that? We don't always know what to tell them about how to get Cooper Netease to work. I think that those are the biggest to the other one that is kind of interesting is getting the underlying storage setup soda run pack at are you need access to an object store, and you need some sort of persistent volume for. TD to run on and on AWS or Azure. This is all pretty well known and we have a deploy assistant that will basically just spin out, a manifest that you can give to it that will set up all of these things for you on couvert Eddie's. But the variety of object stores that people want to run against seems to be growing in our experience. And so there's all of these sort of slightly off the beaten path ones like Seth and swift, stack ECS and things like that. And each one of those is a little bit of a new adventure adventure to get the system set up on. And then it's also a bit of a new adventure for us because while they all ostensibly support the same S three API. There are little subtle differences in how they support that s three a PI that occasionally trip our system up. And so we've been doing a decent amount of work recently on just like trying to cover all of these different subtle differences, between them and get it to work on all of these objects stores. And another thing that can often be challenging when working with cloud oriented workflows is trying to figure out what the local depth story looks like some curious what the general approaches or at least what your general approach is for trying to do local, experimentation and ration- on some code, or maybe trying to pull in some subset of the data in the packet, and file system for getting things ready to go before you ship it off to production. Yeah. Absolutely. The all say, sort of up front, that this is one of the parts of packet that I'm least satisfied with how it is right now, there is, I think a lot of work to be done on it. And I think that there's a decent out of work in just Docker land and general to, to make this really good the sort of anti pattern that you get into that really sucks. Is that your development, look loop looks like write some code build a Docker container push that Docker container to Docker? Hub redeploy pipeline that points to that container, which then pulls down the container and runs it. And then you see that's, that's, that's possibly taken, you know, ten minutes or so. And then you get you get some results back on what you need to change is like, oh, you, you know, this, python code doesn't run like you referencing variable that doesn't exist. You try it again. And you can't just run it on your local machine because you don't have the data accessible to you what I do when I'm developing pipelines on pachyderm that works pretty well is I do everything entirely on the same Docker host. So I have many cube running, and that's just running on Docker on my local machine. And then when I build my image, it just builds on my local Docker host. And then when I run it it's, it's the images right there. So I don't need to push it anywhere. I don't need to pull it anywhere because it's right there. And that leads to a pretty quick development loop, the other thing that you can do that can be pretty effective is pachyderm supports a few. Mount for your file system. So you can just do Pat control. Mount and a directory will show up that has all of the data in. That's available within your distributed file system within PF s and it's kind of cool because you like LS directory OSHA. Here's like a file. That's that's terabytes incised. And of course, this is only working because it's not actually on your file system. And then you can run your coat against this fused bound, and you can run with actual data and see how things are going to work the challenge with this is that one. It doesn't create the coober netties environment around it. So if you wanna have like a secret available to you in coober netties, such that you can access some outside service than you need to sort of, knock that up. And sometimes the time spent balking it is, like not really cancelling out the time that you're saving by not just pushing this into the Kuban Eddie's cluster. The other thing is that pachyderm gives you this pretty nice way to describe how data get. Split up, which is just using club patterns, which are the things that you're like, if you're familiar with LS around on a command line, when you do like LS star that star is a glob character for glob patterns. And this is how you define that. Like you can process all of these things in parallel in it, paralyzes it. But when you just mount data in it's not respecting that in any way, so we have some work to do in terms of the local development story for pachyderm for sure it's right now it's good enough that people can get things done. And the real visit when things really get nice is when you have some code that you sort of want to be running in production and you wanna be able to rely on this just running every single night, and then pachyderm is great, just like running every single night, keeping it keeping it going, and letting no when there's an error and going back to the idea of data provenance data lineage. You've mentioned that some of the way that it's tracked is through these version and capabilities of the file system. But morning if you can just dig deeper into the. The underlying way that it's represented as far as tracking it both from source to delivery, and how that actually is exposed when you're trying to trace back from the end result, all the way back to where the data came from and what's happened to it along the way. Yeah. Absolutely. So the layers, the level that we track providence at is the commit level, and the sort of first problem that you have to solve if you wanna track providence is you wanna store reference to some data that, you know, isn't going to change. Right. Because if I tell you this machine learning model was created using all of the images in this, image directory, and then I go and Ed to ten new images to that image, directory. Well, then that doesn't tell you anything anymore. Right. Because you don't know what was actually used to create a model you just know where that data happened to be stored at a time when it was used. So commits allow us to have this immutable. Snapshot of what data looked like at a certain point in time. From there, we link these commits together. So if you've got pipelines pachyderm than the input to those pipelines is data commits and the output from those pipelines is also data commits and the relationship between these commits is the providence relationship. So any, any committed packet, on, basically has this meta data attached to it? That just is all of the commits that it is providence on, and you can inspect these commits using the command line using API using the web, interface, and it'll just show you elicit these commits, and then, of course, you can like track those commits up and look at what's in those commits, and so the actual structure of this is a pretty strand, standard directed a cyclic graph structure from computer science now, something, that's sort of a cool aspect of the providence system is that we actually track providence at another level, which is the branch level. And this doesn't quite mean the same thing as commit, providence, commit, Providence's, this sort of. Immutable. Snapshot that tells you here's where this data came from the providence on branches basically, describes how your data is flowing at the time. So if a branch is providence on another branch than that means that every time you get commit to the upstream branch, you also get a commit to the downstream branch and that downstream, commit is the results of processing, the upstream commit. Which means, of course, these commits are gonna be linked via Providence's. Well, what are some of the other advanced capabilities of pachyderm that you think are worth calling out that are often overlooked, or underutilized, I think I wouldn't say it's necessarily underutilized, but it's definitely not something that people immediately associate with it, but gets a lot of use, which is our sort of Krahn functionality. And that's the ability to have a pipeline that isn't triggered by putting data in the top and getting data out the bottom, but rather it's triggered just on a cadence. And so people use this, a lot of times as a way to. Do something every hour do something every night. They use it to scrape things. They use it to push things and stuff like that. I think that, that is definitely that's one of those features that's not actually super sexy. It's just super useful. Let's see, I think that the, the fact that you can do you can sort of expose all allot of the various coober, Netease, underlying at Eddie's things is something that isn't hasn't been fully explored people are sort of finding new things to do with that every single day. So you can impact it are you can attach pipelines AB sort of random modifications to your pods. And so this can be useful for assigning affinities. It can be useful for declaring resources that you need. But there's always like these new things being added to coober Netease that are really, really useful. And those sort of just naturally propagate up into packet are and for the file system and for interacting. With other source systems does pachyderm support things like the s three select API or being able to run push downs on the different data sources for trying to optimize for speed and latency and reducing the amount of data that actually needs to be transferred over the wire. So it does we can sort of like select individual pieces of it. If that's if that's what you're talking about. I'm just I actually don't know what the selected does specifically as my understanding is that as three recently added a an API where certain file types, you can actually run a select query so that rather than just pulling down a blob. It can actually index into the data self and understand what's contained within it so that you don't have to return the entire object will probably have some trouble leveraging this because pachyderm is designed to work on a bunch of different objects stores. And so we're pretty reluctant to implement anything. That's only gonna work on three one thing that this did remind me of though this cool new feature that we just added. And so it hasn't gotten anywhere near enough love because it is only very recently released is we now support, an S three API on top of PF, s and so if you have applications that are used to sort of writing data into S, three as their data lake, then you can just swap in pachyderm and it speaks the s three and you can put things in there in those will turn into files in PF s that are committed and stuff like that. And underneath the hood, this is all still going into S three. So it's going to have much the same storage characteristics that you are used to in terms of cost and everything like that. But you're going to get this version control in the ability to like run pipelines on top of it. In addition, definitely really cool being able to just transparently put pachyderm in there, so that the end user doesn't even have to be aware of it. But at the same time they're getting some of that added benefit of provenance, and debarkation pachyderm supports, this is also how we supply. Port data tools that are used to reading stuff out of three. So, for example, this is how we support spark is that spark can could be told like read this data out of s three perform these the spark operation on it, and then right back into this other place S three and now because we speak the API that can just be pachyderm under the hood and, you know, you're now have providence on your spark operations. And so, in terms of the provenance, I know that because are version of the containers that are executing as part of the pipeline that is an added piece of information that goes into it, as far as this is the data that was there when we started at this is the code that actually executed. And then this was the output. But for external systems, do you have any means of tracking the actual operations that were performed to enrich, the meta data socio with the provenance? Yes. So those can basically use the same system we, we use, which is that. So we track the information about. All of the code that ran and, you know, the Docker container and everything like that. But we actually just use that by piggybacking on PF S's, providence system, because we just add that as a commit. So every job has what we call a spec, commit that specifies how the job is supposed to be run. And that includes the code, and the, the Docker container and everything like that. And so outside systems are basically just expected to, you know, whatever whatever you can serialize this information as just put it in commit. And then that's just in essence at considered as an input into the pipeline like it's really not in terms of the providence tracking in the storage system, any different than any other input. It's just this one happens to define the code, that's running in the computation. And so earlier you're saying, how pachyderm because it is. So flexible, the ways that people are using it is sort of up to everyone's imagination. And so I'm curious what you have seen as far as being the most interesting or innovative. Unexpect-. Ways that people have been leveraging the pachyderm platform. Man. Let's see. I mean so there's things that are interesting because the end results are interesting. So I think that, you know, a lot of the image processing and machine. Learning things that I've been seeing trained on that are the most interesting to me, they're not really like, you know, sort of cute little hacks in the system or like interesting abuses of the system. Some of some of the really interesting things that people do in terms of, like things that I never thought anyone would do in terms of the system, are sort of calling out to other packet, or API's from within the pipeline so you can have a pipeline that as part of its operation, like creates another pipeline or does something like that. And this is something that, we're we don't officially recommend that people do, because we haven't really thought about it and think there might be some weird things, but we've seen people do some really cool things with it at stuff. So, you know, we're not we don't police. This in any way or anything like that. That's sort of the great thing about open source software. We're not gonna stop you doing stuff. It's yours, you can do whatever you want with it. But it's something that we had sort of officially thought about as it used for it will let you aim your gun on one whichever put you on. Right. Exactly. That's, that's a very important principle to us as we don't wanna give you like foot guns in disguise in a we don't want want to trick people into using foot guns. But we also you know, it's if you can't if you can't shoot your foot off with the system. You also can't do anything clever with it. And this is true of all of the sort of Unix ecosystem and things like that. It's like you. Can you can shoot your foot off with it? But you also in those abuses, can find really cool useful things to do. And so we feel like we have to be open to people doing that with pachyderm because we have gotten a lot of our best features at our best understanding did originally come from people abusing the system. And so few months ago you announced that you. Had raised a series a round of funding. And I know that with most venture capital that usually comes with some strings attached where they're hoping for some measure of hyper growth. And so I'm curious how you're approaching that stage of growing, and scaling, the pachyderm platform and business. Yeah, absolutely. And this is always, I think, has an extra wrinkle to it. When you're talking about an open source company, because, you know, there have been some, I think notable cases where open source company has sort of like raised money, and like stopped really ascribing to the open source routes that they got the where they were and it's gone pretty badly for the community. We feel like we are very aligned with our investors both in terms of what long-term pachyderm needs to do to be successful. And so we're not as much focused on, like, okay. We need to have this amount of revenue by this day, we need by this, not this day like this quarter this year, something like that we need to have this number of users and. We're, we're much more focused on what does it take to build a long term sustainable open source project, and a company that is also long term, sustainable around that. And so we are much less focused on any particular revenue goal, in the short term, and much more focused on, basically, making pachyderm into the platform that we've always believed that it could be and making it something that's like a ubiquitous tool for sort of the underlying data infrastructure, particularly on top of containers. But we feel as if containers are going to be the underlying like cloud infrastructure for everything. And so the data infrastructure that goes on top of them is really gonna be the defacto infrastructure for everybody. It does, of course, you know, investors invest because they ultimately wanna see a return. And so we do to make money off of pachyderm, and that comes from both support contracts are enterprise product end. We're currently rolling out our cloud offering, which we think is going to overtime become basically, the vast majority of our revenue, and so far we don't feel as if any of these things are. At odds with each other not maligned is just a little bit tricky to get all the puzzle pieces to fit together to make sure that we're staying true to the open source community at, you know, everybody, who's used in contributed to this product up until this point. And also keeping the company around it going because the reality is the open source project, probably could survive without the company contributing to it. And so in terms of your overall experience of building and maintaining and scaling, the pachyderm project and business, what have you found to be some of the most challenging, or useful or unexpected lessons that you've learned definitely the most useful lesson? I think I've learned is just to really listen to your users and see how they're using the product and try to go from there. You know, I came in to this with a whole bunch of ideas of, what I thought cool data infrastructure system would look like and what I thought was going to be important to people, and I wouldn't say that I was wrong about everything. But I was surprised how much I didn't know it wasn't. Even so much that the things I knew were wrong just that there were these massive things that I hadn't even thought about providence is kind of a great example, actually, we, we initially implemented providence, as a sort of internal thing that we're like, okay, we need to do this to track, and keep things consistent and everything like that, and be able to sort of see see this stuff, and then it started to get more and more important for people in people wanted it more and more. And then there started to be like, things like GDP are that actually legislated providence into the system and stuff like that, at least legislated, on companies that they had to be able to give people an explanation, for bushy learning decisions, things like that. And so all of these things would would have been easily missed if we hadn't really been listening and sort of going back every single day. Okay. How are people using this hour, people failing to use this things like that? V other thing I think that I've, I've learned been been rewarded with is both take. Taking risks on new open source projects. Like Docker was pretty new when we started using it, and coober Netease was, like, brand new when we first started using it, there were a decent amount of internal discussions about, like, do we wanna use a platform this new, like even at the beginning? There were a lot of discussions like, why are you guys building this system, rather than just building a dock arise thing on top of dupe, or like a providence tracking thing for hoop? And it took a lot of conviction to just say, no, we're going to build something new. We're kinda like take a stab at to miss our way and see what happens. And ultimately, I feel like we've been very rewarded for that. But it took a lot to be confidence in doing that. And what are some of the limitations or educate pachyderm, and when is at the wrong choice? So it's definitely the wrong choice when what you're doing is sort of like a very well established data pattern, that those very good tools for, I think the. Example of this sequel. We have a lot of people ask like what you imagine. I wanna do Redshift style light data warehouse queries against packet on what's the best way to do that. And right now, the best answer to that is to just use Redshift because it's it's really good at any of the various options. There's like big query. There's hive presto, and things like that you can sort of start to integrate those things into pachyderm, like people will build packet, or pipelines, basically, just orchestrate Redshift pipelines, or big query pipelines or things like that. But sequel is not something that we're able to be to be the things that just do sequel at because there's just there's, there's a lot there and it's not the most interesting challenge to us right now packet, or really tends to do well, in kind of like the everything else, data case, you know, when people are thinking, like I've got, you know, these genetics files and while you know there's a pretty good tool kits for analyzing the us like on a single machine or something like that. There isn't really like the distributed genetics pipeline tool or anything like that. And so for those because pachyderms a super generic system and you could just package, those tools up into Docker containers and run them stuff. It's very, very nice for that gives you some structure to these tools that otherwise you just be like firing off with scripts at hawk on, like random EC, two boxes, things like that and looking forward, what do you have planned for the future of pachyderm? So the biggest sort of change in terms of what the company offers is rolling out of our cloud offering, which is called pack hub. And if you sort of think of everything in open source, pachyderm as get in it enables collaboration on data science, things like that, then pack hub is kind of like get hub for data science. And so it's basically an online site where you can go, and you have your account in the contains your data repositories in your pipelines. Process those repositories and you could fork other people's pipelines, and you can pull in other people's repositories, and things like that it's a way for people to actually collaborate on live running big data pipelines. That's the thing that we're most excited about different. There's also, of course, tons and tons of work to be done on the core open source project. So there's a lot of upgrades to the storage layer that are going to make it a lot more sophisticated in a lot faster that I'm very, very excited about, and there are a lot of sort of new pipeline features that are coming out. I mean, spouts was one of the first ones of those were also sort of implementing more sophisticated. Join support so that you can join data sets together in the ability to have more sophisticated pipelines that do loops and conditional branching and things like that. And so for anybody who wants to follow along with you in the work that you're doing at pachyderm hive, you add, your preferred contact information to the show notes, and as a final question, I'd just like to get your perspective on what you see as being the biggest gap and the tooling, or technology. That's available. For data management today. Yeah. I mean, that's sort of the biggest gap that I see is the one that I'm trying to fill because I felt like that's why I wanted to do this company and what I felt like the opportunity was. But I would basically describe that as the absence of a really good set of tools that are just sort of prescriptive in how you're supposed to do these things, there ways to do all of the things that pachyderm allows you to do you can write some form of version, control on top of object storage, you can like us get repose in theory, and stuff like that. But there's nothing that really ties it all together and gets out of your way and let you focus on the actual data science that you're really good at it. And again, I'll go back to the analogy of the lamp, stack wherein, you know, it used to be to build a website like you needed somebody who was an expert on actually implementing databases, because none of them works out, well for you needed somebody who like understood how to run all these servers, and all this stuff and then. Once you get this stack that people can conceal around this just a very well known. Well, trodden path, the documentation starts to get really good because there's so many people using it, then we can stop thinking about that stuff and do all the interesting stuff that, that allows like Bill, Facebook's ebays, and things like that. And so we, we still feel that hasn't really happened with data science, and the way to get that to happen is to focus on the infrastructure layer that's needed to tie everything together and do it in a generic enough way that people can use all of their different tools on top of it. So I think that the lamp stack worked really well, because you could do all sorts of things that you wanted to do it, you know, the p part, the PHP part became very generic and people started swapping, python, and there and people Certes Wapping Pearl in their everything like that. We have that same level of flexibility with our Docker container Centric workloads, but we provide the same underlying storage, and orchestration primitives, that we think are, basically, what people need to get stuff. Done. Appreciate you taking the time today to join me and discuss the work the year doing on pachyderm and how it has grown evolved in the past couple of years. So I definitely think that it's a great project. It's one that I've been keeping track of for a long time now and I hope to be able to use it for my own purposes soon. So thank you for all that. And I hope enjoy the rest of your day. Thank you Tobias. It was great to be here.

pachyderm Netease founder and CEO Docker Providence Airbnb pachyderm Cooper Cooper Netease Cooper netties Dan white chess Leonard
Distributed Open Source Databases with Jonathan Ellis and Spencer Kimball

Software Engineering Daily

1:00:11 hr | Last month

Distributed Open Source Databases with Jonathan Ellis and Spencer Kimball

"But most accounts the first databases came online in the nineteen sixties. This class of software has continued to evolve side. The technology runs on and the applications. It supports in the early days. Databases were typically close source commercial products today many databases run in the cloud on distributed systems increasingly. The leading tools are open source yet frequently supported by related commercial entity offering managed services and white gloves support in this episode. We interview jonathan ellis. Cto of data stacks and spencer kimball ceo of cockroach labs about the current state of distributed databases and the open source ecosystem for nearly two decades. The guard square team has been building and improving on pro guard. The top open source android app optimize her from decks guard a leading static and dynamic android app security solution to app. Sweep its newest android app testing tool great things. Come from open. Source routes guard squares mobile app. Security solutions are built for developers by developers with open source at the core. The latest release apps sweep is no exception to that rule with apps. Sweep you can scan your android app for free and get recommendations and how to fix potential security issues in your code and dependencies need support or want to network with like minded peers. The guard square community includes hundreds of talented developers users and guard square engineers eager to share their knowledge on open source and mobile app security to learn more go to guard square dot com slash s. e. daily. We'll spencer jonathan both of you. Welcome to software engineering daily. Thanks guys it's great to be back. Thank you kyle. Please join so. I hope listeners will go back and check out your previous appearances and little bit more about your background. And some of the things you've worked on but For those who have inter haven't done that yet. Maybe we can do a quick deep dive on your companies in various projects. Prep starting with you. Spencer tell us a little bit about cockroach labs. Cockroaches labs is a relational sequel based database system. And it's really aiming at that product category. Let's say the oracle probably the exemplar of the product category but really re imagining what a relational database looks like when we have a global public cloud. I so that i think really is embracing two concepts of scale. One is extremely large. Data sets where you really wanna relational database and the other is geographic scale. So how'd you really make the database work well for customers no matter where they are around the world. We're about almost seven years old now based in new york. We're about three hundred employees and growing quickly. Very cool and jonathan. How 'bout you give us the high level data stacks. Yeah started data. Stacks eleven years ago to commercialize. Apache cassandra which is a no sequel database. That is focused on availability at scale. So we've had. Cassandra clusters go through hurricane. Sandy and never lose neither data nor availability because cassandra's designed to run across regions and acceptable reads and writes anywhere in the world and so for the past year or so. I've been working on a complementary project which apache pulsar. And how do we tie that. In to cassandra to not just the system of record but the messaging between services at scale when we moved databases to the cloud we get a lot of benefits. You know certain redundancy distributed things. Like that i guess. What are some of the drawbacks. So why is maybe a a single simple tenant. one database instance even if it's impracticable desirable. Are there any challenges to moving to the cloud. Approach for databases. I think the more you care about performance than the more you need to understand how your system works so and most people are coming from that single system background and so in some ways. Cassandra or a cockroach isn't actually more complicated than an oracle or a sequel. But it's it's different. It's not what people are used to and so when you have ten years of experience with that single machine model. There's there's definitely a tendency to say. Oh this new. This new thing is is worse. It's more complicated. and there. There is some of that in when you move to a distributed system but some of it's just that it's different. I definitely second that you hit the nail on the head jonathan. It's about inertia in some of the customers that we have for cockroach especially the really large ones that have been around for decades. They've had a very evolved in complex deployment reality and that evolution often incorporates a quite a bit of regulation depending on the vertical financial services. A really good example and in a very evolved. It infosec posture. And so if you think about all of those the evolution that's occurred for decades of using databases to up and move it into the cloud and and even go step further to move it into a fully managed service where another vendor hold your operational data lot. Wrap your head around. So i think that's where there's the most significant hurdle i think in the long run when you kind of squint at horizon. You see that for every company in the smallest where it's most obviously useful to move to a fully managed service in the cloud to the the largest where there's sort of the most sort of practical impediments for everyone. It's going to be a better t- co to really embrace cloud because it really. I think lowers the cost. Your total cost of ownership. You just faster time to value can ultimately do more and do it less expensively so that that's just the picture that we have to paint and the reality is that inertia natural. It's natural for everyone wrapping your head around a new databases. Not something you do. Unless you really feel like there's a big benefit absolutely with these cloud offerings available. You have options like you'd mentioned to go fully manage that rather than standing this up myself. I can hire someone. Who does it best in class and knows how to do it for me. And then i am a user and pay a service fee or something along those lines but often there is still this story of some technology groups that want to maintain the presence or monitoring or maybe they think they need to fine-tune system. What do you see in practice. Are people migrating more towards fully managed services or their industries. That really want to keep more control on things. I think that there's a mix. And i think that there's a growing understanding in the industry as well of how to best leverage that mix and what i mean by that. Is that new cloud and cloud infrastructure in general. You're paying a premium for those and so you get the best return for that premium win you have a very elastic workloads when you need to expand and contract that and when you're on the contracting side you're only paying for what you use so if you have a workload very consistent and very high volume then that's probably going to cost you more money in the cloud than doing it yourself. Even when you factor in all of the overheads of running your own infrastructure so one of the great things about distributed infrastructure and distributed stateful infrastructure. Like cassandra cockroach is that you can take advantage of that infrastructure. Duality and you can deploy on premises where that makes sense. And you can deploy in the cloud that make sense and you can replicate between the two in a way that you couldn't with technology from fifteen years ago those are good points it definitely. We see the whole spectrum. And there's huge companies that are ready to move to fully manage already and there's of course. Small companies are insistent that they still run it themselves. And i think what we're going to see is that those distributions across the different segments of companies. They're all going to move continually towards being more eager and more accepting of having fully managed services but they'll always remain a distribution just sort of as an example company like facebook the data architecture that they've built i mean they spent engineering millennia on it and it is not something that could ever be put into cockroach. Or put into cassandra right. Because they've built a custom purpose built database that is sort of a meta database made hundreds of thousands. Maybe who knows now. Even millions of instances of my sequel and that is a an incredible system that built. That's not something that they would ever be to have a vendor fully manage for them on the other hand. There's there's things that looked big five years ago that are going to be easily within the footprint of a falling managed service and i just feel like those distributions are gonna move so that there's more and more mix of the fully managed over time and the other thing bring up. Is i think the dynamics of these of industry-wide changes they have these tipping point dynamics. So there's this it happens in fits and starts and you see little green shoots and all of a sudden. Everyone's doing it so that that's something that's a dynamic that we're already seeing playing out and i think it'll become very obvious just like for example. It's very obvious in two thousand twenty one that no matter how would vertical. You're in how big your company is. Why would you run your own private cloud or build your own private cloud anymore. You've got to have an incredibly specific purpose. in fact everyone realizes hey. The public cloud really has economics. That are favorable. That's to permeate the entire ecosystem all the way down to the database have a slightly related. Take on that. Which is that just as there are some of the factors that you mentioned making it more attractive to us and adopt of cloud in general as well as fully managed cloud services. Some of those factors are bleeding in to private cloud and reducing costs on that side as well and i'm thinking specifically here of coober netease where even a couple years ago you could make a case that it was too early to go all in on coober netties you know some growing pains around stateful sets and there were still other companies saying that. Hey our technologies better than coober netties. But i think in late two thousand twenty one. It's fair to say. Cooper netease has one and as someone who has issues with some of coober netease designed decisions. That still makes me happy because having a single standard is just so much more convenient for everyone in the industry and so what that. What that means is that as you as. We're creating coober netease based solutions to run cassandra. In the cloud is part of our managed service we can also bring that technology to customers who preferred to run their own private clouds and leverage that same coober netease technology to reduce their costs and improve their efficiency. we see that dynamic also playing out and What we to do is we use cooper netease internally so that the tools in the serb-run books that we develop the operator for example is becoming very sophisticated. And that's the same one that ultimately wanna share to all of our customers that still do wanna self host and there are still many really. Do believe that's going to continue. I think the dynamic that's going to besides economics that's really going to push. Companies that can have the expertise to make their way in private or hybrid environment with cloud is in addition to. I think what's going to be favourable economics in terms of total cost of ownership. There's increasingly this ecosystem advantage. I'd in the public cloud. It's not just cockroaches for example running a fully managed service on. Aws is easy to consume in has better economics. It's all the other things you need to build an application. Cassandra right it's your. it's your elastic. Search your conflict to the extent that you do that in a private cloud. There's an increasing set of things you've urban areas which makes it easier and people will follow that road but there's pressure because of the additional ecosystem advantage when you're in when you're playing in an ecosystem. Aws or is your you get a growing list of very competent vendors each with their own economies of scale that ultimately if you look at the rising can run that better even than you can run even with a very sophisticated brunetti operator and all those integration points kinda get built by the vendors in terms of their partnerships. So there's i think there's a there's a gravitational pull towards manage services in the public cloud which would be difficult even for the companies that have today and even in this short-to-medium-term an advantage that they feel that they can maintain by going it alone and really making it a core competency to run a private cloud but like i said it's just distributions moving you're going to see the whole spectrum and i think for companies like data stocks cockroach. One of our big advantages in two thousand. Twenty one is that we do allow hybrid. We do allow private. And that's a strategic advantage that we have over the big cloudplayer's say you're gonna only use the public cloud in fact we're going to make it so that you only use our public cloud that's not what a lot of companies are looking for. There is a bit of an existential threat in that regard That maybe the big bad cloud provider could come in and offer a redundant service to what you're doing sort of especially a threat. I guess when there's open source components we've seen some drama recently between elastic search and amazon and their fork. And all that sort of thing. How do you guys perceive this potential existential threat to your own companies. I'll let go first. Because i think he's got two more interesting things to say about this one. I don't know about that. I'm sure we're we're both noodling on this problem for years now and so everyone in our shoes you know. Both of us are working on What we're open core systems and we're all trying to evolve and meet the challenges of this new reality an amazon's done more than any other vendor to to move the state of the art but also put competitive pressure on what i thought was really good open core model but hey you either volver you die right. So i actually think that amazon's moves make sense in terms of how they run their business and so why they're successful they are of course. We have to respond to that. As i mentioned. I think we actually there are for all of the advantages. That hyper scale. Cloud vendor like amazon. Aws has there are openings. There are advantages that players like cockroach and data stocks and one of those big ones is. We're going to offer a lot more flexibility. And if you think about the market there's a an incredible contingent of companies that are looking for. What are they going to use for. Operational data stores to build their next generation of products and services and those companies weren't have haven't really ever been in this segment of companies that are driving amazon's success with for example aurora and dynamo. Db these are companies that are still using oracle and and all of the sort of last generation of technology but they're coming in mass to count environments and they're buying huge numbers of credit credits for example with aws with these blocks of hundreds of millions of dollars spent over some number of years. Those companies are a lot more sensitive to the flexibility. That any solution that they're going to embrace in two thousand twenty one is going to offer them over the next ten years and so what amazon's building on the model that's worked very well sort of the growth in sort of mid level. Snb or commercial segments of the market. But what everyone's going to be contending for is the incredible inflow dollars into cloud spend in cloud platforms blend spend. That's coming from the world's biggest companies which dwarf the other segments in terms of the potential. All that's going to grow the size of this market. So there's there's a lotta steak and there's a lot at play and there's plenty of room for companies like ours to compete against these these bigger players because we're very focused and we have the opportunity through that focus of really providing different consumption models that appeal to with ultimately is going to be an outsized portion of this very fast growing market besides the consumption models which is definitely an interesting aspect of the competitive market dynamics. There's going to be some effect here on open source itself so the the classic open source model. Of course. I think it's fair to say that richard stamina and the free software foundation kind of kicked that off in the eighties with the gpl and then of course became even more popular in the nineties. With lenox and you started to see other licenses like the mp l. The mit license the bsd license of courses is super old the apache license but all of these kind of had the assumption that you were either going to you. Run this software yourself or you were going to pay a vendor to help you with it and so if you're if you're running it yourself then you're motivated to contribute if not patches. I think it's fairly rare. Even historically for people using links to contribute to the lennox kernel itself but can contribute patches but contributing bug reports like. that's a form of contribution as well you're creating value for the project and for everyone else using it by sending a bug report or even just asking or answering questions on forums on stack overflow and so forth. You're actually adding value to that to that project. And so that's what made open source so powerful as an engine for for growth and for innovation is that dynamic that either you were paying the vendor who created it and and so you were literally paying for it that way or you were contributing in other ways and that was your form of paying for that project. I saw a statistic famously. A few years back that amazon had made more money from my sequel than oracle. And that's probably true for most infrastructure projects out there and not only are the making money from providing that infrastructure for the open source to run on. But they're standing up a a fork of elastic search to compete with elastic. they're standing up They have a kafka service that competes with confluence. They have a key spaces service. That competes with data. Stacks with cassandra and critically amazon hasn't shown a whole lot of interest in contributing to these projects themselves. If it's one thing if you're coming in and saying hey we're running. A cassandra service and by the way. Here's the improvements. We've made to cassandra to make it run better on e. c. two and so forth but that's that hasn't been their method of engagement so far and so you're seeing what you're seeing is a new generation of infrastructure companies are treating this as basically a bug in the licenses that they're distributing their code under and so i think the first one would have been the phero. Gpl and then people looked at that and said well. This probably doesn't actually fix the problem as well as we need to. And so then you kind of got the second generation with the business source license and the reds community license in the conflict community license and so forth. Yeah that's actually the route that we've embraced. Which is the esl and happy league. Maria db created. It seemed like a really good model when we looked into all of the different alternatives. And essentially i just give a quick explanation of the business source. License involves a couple unique features. You sort of start with the underlying license in our case. It's the apache that our core was previously. Licensed with. And then when you introduce the esl what you do is you create a term so for how long the bs l. will control and a list of exclusions on that sort of underlying license. So what you want to think of it as a sort of like a patent protection so you of protect your innovation for some time. So in our case we set the term three years and the exclusion. We have is you. Don't you're not allowed with that license to create a commercially available. Databases the service of cockroach. So it's kind of like you can't just put cockroach into rds if you're amazon and of course it's not just amazon's anyone and then after three years that version that's three years old. Can then it reverts to apache. So it always leaves open source in. Its wake a trail of open source but it provides three years of protection for innovation. And i think that's one of the really crucial takeaways from me from this evolution of open source and that's really that there's still plenty of room for it but you just need to make some some common sense alterations so that there's still room for innovation and the prophet that can feed the innovation in the reality. Is you look at readiness look at elastic. You'll get conflict all the examples that jonathan mentioned in all of them are doing a very wonderful building. Very wonderful businesses competing directly with amazon for example and others as well and i mean amazon also has a great business selling lettuce and even more money than oracle. I didn't know that that's a pretty interesting stat. But at the same time none of these companies are competing with doing that poorly. And i think there's there's that much more interesting developments to come because at that plane and made earlier there's a lot of additional companies now entering into this new ecosystem and that's going to drive more innovation in companies that really focus on their core product as opposed to offering everything under the sun like amazon. They can really create innovation. That's going to be interesting at that high end of the market where a lot of the new opportunities right. I think it's going to be super interesting to see what happens. Over the next couple of years to see if there's some kind of standardization around the esl or around one of these other licensing approaches with this next generation of infrastructure. Do you guys have any advice for how to start and nurture an open source project. I guess it depends on what stage. You're starting out at at. If you're starting from zero then step one is you need to eat. Breathe and dream about the community that you're building so it's not just about the code. The code engineers tend to have this kind of assumption. That if you write great code then it will be. Its own marketing. But that's not. That's not the way it works. And that's not the way it works in either open source or proprietary software. And so the first thing you do after you have that minimum via product is you need to be available if somebody comes onto your forum or onto stack overflow onto discord or wherever. You are directing people to ask questions. They need to get answers. Within a single digits number of minutes or they're going to go away and so it's just super super critical to bootstrap that as religiously as you possibly can. I really would second that. That's a great answer. This open source projects when you start them bear fragile. It's exactly what jonathan same and there's a there's a short window by which you can make somebody a champion or lose them forever. I would say that there's also sort of a short window for a product along in a bubble. You've got to keep this thing. Secret is going to be the greatest thing. And i needed to get get it really impressive. Mvp i think that can be a mistake because you want to get information from your fledgling community as soon as possible so really. Dial back with that. Mvp is in blog about it right. So that's the one of the practical steps you can take to really start to find. Put your tentacles out there. You're feelers find out who might be interested in what you're building blog about. It tried to get that on hacker news which by the way is is priceless. If you can get on the front page of hacker news that will that will give you your initial cohort of those early doctors and innovators. That are interested in what you're building and then like everything in life. There's just this really simple way to succeed. Which is focused on the small things and put love and attention into them. Never miss an opportunity of someone reaches out to you to take the time to answer their question. No matter how foolish you might think the question is there how off-topic find out a little bit about what they're doing. Take the time to help them. You build these communities one person at time one question at the time you start to focus on those little things with love and attention and the whole thing will blossom cox. Automotive is technology company behind kelley. Blue book auto trader dot com and many other car sales and information platforms cox automotive transforms the way that the world buys sells and owns cars. They have the data and the user base to understand the future of car. Purchasing an ownership cox automotive is looking for software. Engineers data engineers scrum masters and a variety of other positions to help push the technology forward if you want to innovate in the world of car. Buying selling an ownership checkout cox auto tech dot com. that's c. o. X. a. u. t. o. t. e. c. h. dot com to find out more about career opportunities. And what it's like working at cox automotive cox automotive isn't a car company. They're very technology company. That's transforming the automotive industry. Thanks to cox automotive. And if you want to support the show and check out the job opportunities at cox automotive gooda cox auto tech dot com. If you're a software engineer you may know the old proverb give a man a program. Frustrate him for a day. Teach him into program frustrate him for a lifetime. Click up as the no code project management software and jira alternative. That brings all your engineering work into one secure speedy you x. Friendly place from startups to enterprise. Engineering teams used click up to collaborate on code. Docs goals sprints bug tracking roadmaps. And more you can also connect your favorite development tools. Like get hub it bucket and get lab to manage your code and team in one place used by over two hundred thousand teams at companies like web flow google and uber. Click up has become the go-to platform for engineers. That want to save time and get more done offer free. don't get stuck with jira sprint. Through agile with click up and save one day every week. Try click up for free today as click dot com slash s daily and. I'm curious how being open source or at least having certain open source components of your projects impacts your development cycle. Are you feel under more pressure because all the code is there naked right on the repository think probably has less of an impact on the development cycle than just on the the mentality for lack of a better word of doing everything in the open. So this this can definitely be an adjustment for people who haven't worked on open source before to come in and your job is to contribute to cassandra now and you post your first poll request and then some senior engineer. Who's probably at a different company. Points out the things that you need to improve and so having that very public literally anyone on the internet can read where somebody reviewed your code and send it back for improvements. That's that's a really hard adjustment for some people to make for people fresh out of college where that's like the only experience they've had and that's normal for them not so much so again. There's there's there's definitely a difference based on what your expectations are. I've definitely found in my experience with all the open source projects. I've worked on that. There's value in terms of quality and having your work available just kind of like there's some aphorisms right. Many eyeballs makes bug might make bugs transparent and open source and that's one of the early open source. Luminaries said since something along those lines and also is the best disinfectant. I think these things are true to have your your your work available out there. It does i think. Help the average open source. Contributor to have some motivation like an impetus to really care about what they're putting out because their name is fundamentally attached to that right Anyone can can can blame that line and find out exactly what the providence was of it and the all the comments in the code reviews and things like that so i think it's kind of like this constant pressure that that makes people try a little bit harder adds up every little everyone knows. It's build a complex project of any sort. Whether it's been closed doors have open source. Like the little bits of technical ukraine they create a cumulative cost so you want to minimize that as far as possible i think open source is a really strong positive influence in that direction. The other thing. I really like about open. Sources is how it strips away the distractions from the technology and the decisions around that. So there's i guess there's a there's a good side and a dark side to this. The dark side is when sometimes you get this situation wherein an open source project. There will be new some engineer. Who has way more time on his hands than than is good and who really wants to go to the mat to argue something to the to the death said his it could be a her but it usually is a his who wants to you. Just just stahl the progress on you know some innovation and tell everyone agrees to their way but the good side of of just kind of stripping away the usual interpersonal things and and so forth is that i would review the most senior engineers patch the same way as the most junior right. Like you have that kind of democratization of the code that you know. It's not what it says on your resume or where you went to school. It's what happened. How how good is your code. How good is this. Patch that you sent to the project and it's refreshing to have that level playing field like that really. Echoes my sentiment exactly. It's that that idea of it doesn't matter who the patches coming from put that equal effort into it really especially give your love to that More junior engineer and your attention. Because you're going up level them and then they're gonna become a stronger contributor over time ultimately to take take on some of that task themselves and really scale the project you get in every open source project. I've been involved in. This is true for closed source to unquestionably there. There are certain connectors become sort of the hubs in graph of all the different people that are contributors to the code. And those are folks. You never wanna lose because they really put in this incredible effort that i can never find myself capable of matching but responding. Virtually every pull request that comes in. They're constantly out there trying to improve everything into up level. Everyone and when you when you get someone like that. That's that's that's someone to cultivate and someone to really embrace and give them as much responsibilities. They're willing to take. This is one of the cool things about open sources that it really is larger than any single company. So i can think of engineers who have contributed cassandra as employees of three different companies. And that's really cool that that that institutional knowledge about the project can be retained across that that lifetime work. That's very neat example. Yeah there was a historical perception. That open source was often lagging behind that it was the knockoff of the quote unquote real product made by the you know commercial closed source team and they were just kind of i know having a lagging process of copying features or something like that. I think that's probably a straw man argument but maybe it was true at some point. What you see is the current state. Why his open source really been a leading a flagship path. That that a lot of systems are taking rather than being close source why i can offer perspective just on databases when we started building cockroach db seven three quarters years ago. My reaction post google in two thousand twelve when i left and looked at all the different things that were available for operational databases was just that boy of all the things out there. I'm certainly gonna use open source because database can be pretty complex and if you want to build something really ambitious not being able to rely on that larger community not being go in there and sort of debugged your own problems almost. It's a nonstarter if you really have ambitions and that was really the the position of people at google just because no matter what open source project. Google might bring in and use in really scale. They're going to be issues. I mean google is busy trying to fix things in the lyrics colonel as a pretty interesting example but for them not having open source anything that they were forced to use closed source. It's just a much more difficult process to really get it too. Exactly what they wanted. And that's always been my approach and so open source for me. Was the the only viable path to actually building a database. That would could succeed in the late. Two thousand ten's and now going into the two thousand and twenties. I think that if you queried developers for example that are kind of in that early adopter innovator part of that crossing the chasm bell curve those those are the folks that are that are often on hacker news. I bet if you survey them and you ask them whether their comfort level is higher with open source than close you get a pretty definitive answer. And that's ultimately why it's open source that's leading the charge. Now it's it's the faith that developers sort of worldwide put into open source software. Just a better model. I think people recognize that. I think there's a couple just market dynamics that are contributing to this as well. If i'm starting a company around and this is specific to the infrastructure space right. If i'm starting a company where i'm saying i'm going to build a better message bus and you should trust your data to my new product. I'm going to need an a round of hundreds of millions of dollars to get that product to the point of maturity takes to get awareness in the marketplace into get people to have that level of trust in that proprietary product. Contrariwise if i if i start that as open sourced the way. Since time message buses the way apache kafka was by the founders of confluence or apache pulsar was by the people yahu then. I can use that as a growth hack. It's an adoption and a marketing mechanism. Where companies are much more comfortable. Deploying an open source project that one dot project or maybe it's an even dot nine project because you know the recognize that worst case scenario if i have to might engineer can figure this out and how to fix bugs in this or how to migrate me off of it if it comes to that and so there's that security and assurance that lets companies adopt your project or your product faster than they would with a proprietary approach and then the other thing is that. You're just like one of the things that i i was very serious about when i started. Data stacks is building a remote first engineering culture. Because you're there are some really really smart engineers in in google and in the bay area in general right but if we compare all the smart engineers in all of the bay area versus the smart engineers in the rest of the world combined like the rest of the world wins hands down and so there's a bit of that dynamic in open sources. Well where do you want to bet on engineers building the solaris kernel light. They're super smart. Guys not take metal away from them but do you want to bet on them or do you want to bet on everyone else in the world. Contributing to lennox has a great points. You knock something loose in my memory. And i wish i can remember the gentleman. I was talking to someone that was advising me early on when building cockroach labs and he made this point that There's an information. A symmetry available to companies that are building open source. Products are open core products in that by having that community that starts so early in that early feedback outside contributors and one thing that you always see when you have an open source project is some huge company or some champion at a huge companies. Like we wanna use this now and it's like well it's alpha doesn't quite work yet but that person can immediately give you all kinds of insight into what's going to make your product wonderful. It could do this this and the other thing. You don't get that when you're building in a bubble in a closed source environment or it's much more difficult you know. You're you're out there trying to talk to people and educate them. You don't get a blast out on hacker news to ten thousand people of which some small percent but it actually adds up to an absolute number that's meaningful in terms of of information that's really targeted. About how you can build the best product actually meet the market so there's an information asymmetry advantage available in the open source model that i think allows you to build better software faster. That's a really good point. Engineers working on open source. Have like two clicks away from getting direct feedback from you know people using that software versus my experience at least in the proprietary software world is that's usually filtered through several layers of support and product management. So i'm curious when commercialization of open source was first introduced to me. It was the i the company red hats you know. they built a very popular lennox distribution. I guess if you're a developer evangelist at the company you'd say it's the best lennox distribution and they gave it away and then they were gonna sell services on top of that. You come to us when you need. You know the people that built it to help you run it or things like that and the unit economics of that. I guess worked out pretty well for them. Could you compare. And contrast how that model compares to your own companies. I think at a super super high level new data stacks wants to be the red head of cassandra right and i think that's definitely the example that most people point to when they're talking to venture capitalists and so forth of red hat is like the poster child for creating a successful business on top of open source. I think that the biggest i would point to. That's that's different in two thousand twenty one versus higher. Gosh i don't remember when red hat was founded but it was in the ninety s sometime the biggest different out difference now. At in the infrastructure space is the prevalence of the cloud and so that includes both the coober. Netease layer that allows you to you deploy across private clouds in a standard way. But it also includes the managed services that you can build as one of the innovators around an open source project. I think that is the main way more than the classics services and support model. The main way to fund open source in the future is These these managed services. And that's why coming back to the earlier point that's why it's so critical to get the licensing model right in a way that allows the company's doing the innovation to continue to fund that innovation. I love try to add. Maybe i'm just going to echo your point exactly jonathan. You've made all the right. You touched on all the right things. Basically i think the reason open source had such an ascendancy in the arts and two thousand ten's maybe even the late ninety s is because offered a faster time to value adoption of any sulfur ultimately going to come down to its utility and developers. If you think about the close source model previously right they they would get some something at a conference or in some trade mag and they'd be interested in it and then some get some rep on the phone and they'd come visit you and they'd explain in you and then you'd have to go through procurement you'd finally get the thing and maybe need some machines to run into the requisition. Listening months literally could easily take six. Months depends on complex. The software was of course. Then you have to learn how to run it and everything and use it with open source all of a sudden you had an by the way you got printed manuals sent to you in the mail right and and things to open source of course with the rise of the sort of early cloud and the p. and so forth Being able to surf the web. Find these communities things that have morphed into stack overflow over the years but the needs to us groups and things all of that provided a much faster time to value right all of a sudden now a developer organization trying to make some product or service. Reality you're able to say. I need this this this other thing. They're all open source and go read about them. I have questions. I put him in response in in our other sponsors. Already there when i searched on google so that was such an obvious wind. You could shave months off your time to get something valuable that you could either show to other folks in order to get funding. We're actually just build deploy which is another very common pattern. What we're seeing now is a a move to even faster time to value. And that's why these open sources is again evolving. It's certainly not going away. For some of the for the reasons of community transparency speed of evolution information asymmetry in marketing open sources. A wonderful marketing channel. People people have a good feeling about it right. They liked the idea that they can get in there and understand the ideas if they need to solve that is gonna keep open source there. But it's really how does it. Further evolve in order to create faster time to value in here. You realize that a fully managed services especially one that has a totally free tier that you can start on. That can get you really a long way before you might have to put credit card on. This actually means that not only. Can you access that community in embrace. All the benefits of open source. But now you can eliminate the need to learn how to run. The software is used to learn how to use it for example with the database. You gotta learn how to you know. What are the differences in terms of the or ended after us or whatever. It is how integrate this my application. But now you don't learn how to set up alerting and how did properly deploy and configure things in complex ways. So now you say that. Hey instead of getting started in say several weeks of china learn how to run the different pieces of open source software that you have to deploy into containers and on. Vn's out during the public cloud. But it's like okay. I'm actually just going to write my application. And this whole thing's gonna run because other other folks already know how to run. This actually have a very competent wave running them so the time to values decrease so open source coupled with fully managed services that have free tears. It's all the best aspects of open source but even faster time to value so it seems pretty clear to me that that will be the future exactly stack overflow for teams brings the power of stack overflow to your company. It's an easy to use flexible platform that helps thousands of developers answer questions and make progress in their work. Features in stack overflow for teams include robust search functionality. So that you can easily benefit from the questions and answers documented on your team. Surface the most important information about on boarding the development life cycle feature releases and more stack overflow for teams saves users time and it powers up the work day by clearing the obstacles caused by unanswered questions. Try it now. Create a free team at stack overflow dot com slash teams slash s daily that stack overflow dot com slash teams slash. Se daily g two. I is a marketplace for pre-vetted javascript. Developers hire react react native and no jas developers. Do you can trust on for a contract or full-time basis g to i will match you with pre vetted developers within three days of your on boarding call. You'll be able to review their technical profiles and set up interviews with candidates that you like you get a detailed technical profile that provides the developers assessment scores in each category. A copy of their code challenge and a recording of technical interview. I love g too. I and i use it for all of my companies. I just think it's a great way to get started. It's a really fast way to find a great front end developer. Which is most of what we need these days react. Developers are in such high demand and g. two. I is the place to find the best react developers. You can test a working relationship with no risk. The first week is free if you decided to developers in a good fit. If you don't like it you won't have to pay for it. G two is litmus test is simple can this developer make an impact in your code base within their first week. Impact is the thing that matters. Go to software engineering daily dot com slash g to. I'd get started that software engineering daily dot com slash g too. I i really don't have any problem. Promoting g too. I because i have been a power user of their services and i gotta tell you it will accelerate your development cadence as a software engineer. I really don't want to ever have to worry about the compiler. Same as i don't have to worry about right electric or the water coming into my house. Maybe if i'm a heads down application developer and i'd like to take some of the same approaches to adopting a distributed database. I'd love to just benefit from the right product choice but under the hood. I know there's things like the cap theorem. And maybe the paxos protocol going on to what degree should a developer themselves about distributed systems before working with the distributed systems database man. That's that's a really excellent topic there as a database nerd. It hurts me to say this but a lot of people want the database to be like the compiler like you said they just wanted to work. Don't wanna have to think too hard about dash versus dash three or whatever and similarly on the database side a lot of developers just want to put their data in and get their data out and move on with their day. So one of the things that i think is going to come out of this and don't come back to what i think. The flip side is but one of the things that's going to come out of this. Is that people just want to get their data out with the rest. Api they don't want to install are they don't want to install it traditional fat client driver. Just want to make an http call whether it's whether it's graft well and that's how they're going to interact with their database. The flip side is said much earlier that the more you care about performance the more you do need to understand how the system works under the hood and that that's not going to change. I don't see how it could change and so there is always going to be that need for people who have that next level down understanding and you know what happens when i do a join that has to hit multiple machines and how why is why is that significantly slower than than a query that only hits a single partition and so forth. Yeah i think performances the critical threshold. Once you start to care about performance. Then you need to learn more. I think before that there's an opportunity to really treat a lot of the things out on the market whether it's mongo or dynamo cassandra cockroach aurora you. Can you can. Squint in they're all going to pretty much. Do what you want especially if they offer simplified interfaces to jonathan's point. I think that there's kind of an approach that can be very hopeful for developers that are trying to build something and don't yet know all the ins and outs that would allow them to make truly informed choice about what's what's the best software for the task at hand and that's association. So what are other people doing. What are some of those use cases that you can pattern match with the thing that you have in mind that you want to build like where are your customers. A they geographically distributed. How much scale are you planning to have you know. What sort of features do you have. And then you look at a company that has something similar that they've built or another open source project for example that that you can kind of look in and see what the choices lots of blog posts that companies put out and you can just pattern maps there and find something that's relatively close and said well. What did they choose do i do. I think they're a good set of engineers. That are making good decisions and sort of part and parcel of that when you actually look at. What a company like cockroaches data stocks and needs to do it's really to build reference architectures. So you have less affordable repo that uses. Cassandra that get something done. That's useful and they can just fork that code and actually start to build on that and they know that hey this is already something that is a reference architecture that's well tuned to what cassandra's going to offer and so i can run with that and feel pretty confident that both the company understands my use case that they can support me. But that this case will definitely work with cassandra for doing something. That's very close to what i need to do so. I think that's a good approach for people that because it's a jonathan's plane about being database and i guess i'd become one too you can spend years understanding these things still realize you've got a lot to learn. Yes i think. Difference between good product design and bad is that with the good design you can let people be productive with a set of knowledge. That's appropriate to what they're trying to accomplish in other words. If i'm trying to build a demo on my laptop. I should have to understand the requirement of what i need to know about. The system should be much lower. Than if i'm home depot serving one hundred thousand requests per second out of my production cluster. So having that learning curve where you can get to the next level when you need it and not have to reload that to get your first. Hello world done. That's a good design. Well speaking of hello world maybe to wrap up we could give the pitch for what is the use case that typical developers and companies find themselves having the leads them to explore cockroach db and also data stacks as the tool of choice so for cockroach and say the really killer. Differentiator that we've put a lot into well. There's two there's there's high scale in. We have a theory. Just that there's there's sort of big data but now there's transactional big data because it's not just humans on desktop search humans on mobile devices. It's now virtual and not type things that are all hitting api's which ultimately did operational database so that data intensity and there's also hate you know nowadays with global app stores and global public cloud. Anyone could build a product or service that could reach people in brazil as easily as it can reach people in australia. And you want to work equally well and so really building for that geographic. Reach is something that we've invested heavily in with cockroach and of course both of those are within the context of do you need a sequel database. So i think that's sort of the killer feature you need those differentiator and you really your preference whether it's institutional muscle memory or it's You know we really want to have. The sequel is our query engine or we need to have ways to explicitly manage the data model. All those things would would buy issue towards using sequel so if sequels your product category and you have a need for those differentiator. That's the sweet spot for cockroach. Cassandra was was created to solve problems of performance and scale that sequel databases couldn't tackle and certainly cockroaches is solving those in a different way. So the the distinction. I would make. Is that cassandra places. More emphasis on performance to the point. Where our default isolation level to use a relational term the default isolation level is just read everything and so we can opt in to more strict civilization but cassandra's emphasis is on. I want to do high-performance thousands or hundreds of thousands of operations per second across anywhere in the world very cool. We'll spencer jonathan. Thank you both so much for taking the time to come on software engineering daily. Thanks column was my pleasure.

amazon cassandra guard square cox automotive oracle Cassandra jonathan jonathan ellis spencer kimball cockroach labs spencer jonathan Apache cassandra cassandra cockroach Cooper netease cooper netease richard stamina google Maria db cox automotive cox automotive jira sprint
Autonomous Driving Infrastructure with Vinoj Kumar

Software Engineering Daily

37:07 min | 3 weeks ago

Autonomous Driving Infrastructure with Vinoj Kumar

"Interest in autonomous vehicles dates back to the nineteen twenties. It wasn't until the eighty s when the first truly autonomous vehicle prototypes began to appear the first darpa grand challenge took place in two thousand and four offering competitors one million dollars to complete a one hundred fifty mile course through the mojave desert. The prize was not claimed since then. Rapid progress has been made in autonomous. Driving fueled by advances in sensor technology software and the hardware which runs it. Infrastructure has become a serious consideration for autonomous vehicle companies. In this episode. I speak with vinod kumar about infrastructure at cruise. The company helping walmart do all electric. Self-driving grocery delivery today's episode of software engineering. Daily is brought to you by vm. Ware tanzania advanced. Isn't it time to let developers focus on code not infrastructure with vm. Ware tens of advanced. They can plug into a secure automated. Software supply chain and get to production quickly. Tanzer advanced has modular full stack capabilities. That help your teams in a variety of ways embrace dev setups and stand up a platform for modern apps that works for your organization automatically build a stream of compliance containers. Secure your software. Supply chain end to end operationalized containers clusters across clouds. And do it. Aw while scaling operations atop cougar netease see how your organization can finally deliver containers at scale. Using vm ware. Tom zoo spelled. T. a. n. z. You head over to hello dash tom. Zoo dot vm. Ware dot com slash se daily. That's hello dash tom. Zoo dot vm. Ware dot com slash se daily to learn more vanilla. Welcome to software engineering daily. Thank you my pleasure will before we get into infrastructure at crews. Can you tell me a little bit about your background. So i spent a brief interaction. I spent over a decade at google in helping them scale their technical infrastructure that covered compute storage networking machine learning products data center. Stuff all the stuff that goes inside the center. So that's primarily the last twelve to fourteen years believing what google is a company. That's pretty famous for scaling things and also scaling them in multiple different complex directions. Can you talk about what part of the scale ability store. You contributed to indeed. When i came in there. The technical infrastructure group was very tiny. It wasn't the hundreds back then. Google is building their own servers networking. When i first came on board. I was working on building. What's washed our the first one hundred sixty poor ten gig worse and that kind of grew and if you look at are some papers published around this. If you look at the epoch of networking. I kind of have played a key role. Every single one of them like the jupiter networking stuff scaled ran and data center. Networking related parks. What are some of the major innovations in infrastructure. You over that time. yeah tomorrow. Feed and speed for example on the networking side of things. It's about going from ten gig hundred gigs to the poor densities. As well and also bailing compute farms from say ten thousand verse to thirty thousand servers kind of thing campuses and network in the white working site. I saw a lot of innovation. They're look for example in the optic side of things and also on the compute side going from single sockets rivers to multi socket servers the powers cloud and also into gps and tpu's because google kinda rating with tens of or processing units will for any listeners. Not yet familiar with crews. Tell us a little bit about your products in mission. Yeah lamest harvard cruises mission cruise mission is around building. The world's most advanced driving record to transport people and things. It's briefed on around three pillars safety. I powered by clean energy right protecting the environment and freedom of movement that very affordable to cruise story cruises bathing an advanced frigging winkle from the ground up. I'm reminded of the expression. If you wanna create an apple pie from scratch you must first invent the universe. We're does building it from the ground up. Begin all the way for this technology. Technology is a big part of it. Read being able to conquer it in the most complex environment. For example cruise very consciously chose san francisco as the training testing. If you look at san francisco it's one of the most dense urban area and a very good testing ground for tournus vehicle with pedestrians cyclists. Your name right even for the most experienced drivers. San francisco's very daunting experience. So crews have made a very conscious decision to actually go testing that dense environmental. You can learn quicker have better training models and stuff so technology is one aspect of that but again if you think about it it's not just technology right you have to build trust with users if to build this whole ecosystem around autonomous vehicles to transport people. That's what crews has been working for the last five to seven years. There's interesting work to be done in both hardware and software for this application. We are some of your main areas of focus for innovation from the harvard. Point if view yes. It's what harvard's the three p. Three components if you look at it right the hardware. That powers the autonomous vehicles. And the of big part of it software that actually makes the hickel move and the third component or dimensions. The infrastructure is supports motion of this vehicle. Right if you look at from the hardware side of things. It has lots of sensors that actually collect data which needs to be analyzed and trained and and the way kel is in motion. You're going to have to interpret right. Perceived predict and act upon it so the software stack has very resilient. I has be cutting edge supporting all this on the infrastructure side of things. Where my specialization. Right if you think about infrastructure infrastructure needs to be built on multiple pillars one something. That's very efficient. Something that scales. When you go from five acres to five hundred rickles you go from one to tens of cities right in a very cost effective manner so that is where a lot of innovation. It's a combined. If you think of it it's a multidimensional when you put together cutting edge hardware. Cutting it sensors with cutting it software combined with an infrastructure that supports sect all of those cutting edge sensors must produce a lot of data which means there's a lot of things to be processed. Where do you need to be. Strategic about data management so when it comes to autonomous vehicles. It's all about data movement on a daily basis. For example. As you have a fleet of a hickel's they have forty norther sensors collecting data. And then just around four plus payer bite of right that data has to be back hauled stored somewhere and then you even engineers analyze right. You go through it. You have segments you. Look at scenarios and figure out what went wrong. Or how can we improve. And this loop of look analyze make changes run simulations execute in assimilation environment or may perhaps on the road and that loop has to title right so what it comes down to audible data moment so you have of data. You're running on billions of simulation passer day and the data stored in various like for example areas back indoors data storage components within the cloud. The question is how do you move that efficiently. So that you loop is very tied in a loop of look at the data analyze test. Execute is very tight. Moving the data important. Do you also face challenges. Giving the right people access to query it and this could be a huge bottleneck right for example to give you an example scale. We leverage victory and for example we process in excess of four bytes per month. Rate this around twenty plus million queries across five hundred plus users that we have to process what this means is this impacts a lot more on the engineering iteration cycle right the time it takes an engine your to find out if your changes work. If the accord improves the way kambas intended and the cycle can take an entire day or so. The question is how do you monte simulations which will you don't use the constrain jeep. Your cloud pull analyze the output. Maybe perhaps use a hybrid between automation and ls analysis and make improvements so these are some of the challenges that you have to work with jeep user expensive when you buy them in the cloud. Although i guess there are also expensive to purchase the hardware. But once you've done that you can run it till the device fails it's yours after that. Although you have to power it how do you look at decisions. Around allocation in the cloud versus perhaps doing something on prim so evenings in general limiting. Step back in general in all off the shelf as well as custom hardware components which means you can have both on prem and cloud leverage infrastructure. Right and you do need a solid on prem infrastructure for custom target hardware and perhaps very low latency high touch hardware in the loop type of testing right so going back to your question. The way you manage each is very different with cloud. We don't have to be concerned with physical harvard. It's all about epa's abstractions for everything but it also adds complexity for cloud providers. Also what we've noticed that are not so quite published contention points that you don't have visibility to and rely on the cloud provider for assistance again. I talked about cloud data storage data moment for example when you have data spread around different regions or different subsistence when you move data to mock computation and users use these contention points. Come into play either through a egress bandwidth bottlenecks or some other throttles. That are there right on the flip side for on prem. You're very clear about how the laws but you have the challenge of managing soup to nuts including supply chain right for example provisioning and managing tastic take a very different technology stack. You need to plan space power well in advance and given the long term for all these could be quite challenging on the flip side with cloud you need to worry about for example multi-region support designer systems. So that it's very resilient is one thing goes down so you can feel over to another subsystem have you. And for example figured out where. Your data and compute needs right so that you can your simulations or even trainings efficient and most importantly and the considering all this the cost pack for example. If you don't design your architecture so to speak very well in the cloud you could hear cloud bills. Go up significantly very quickly. Are you ready to build innovative apps and accelerate your app dev career. You're invited to join the global community of developers at the virtual out systems developer conference to learn connect and build the future of software development. O s d. Is a conference being held november seventeenth and eighteenth by developers for developers and focused exclusively on what you want to learn. Plus this will be the biggest out systems developer conference yet with more than fifty sessions and five plus tracks. If you're interested in learning about the latest app dev tech connecting with developers from around the world in super charging your career. Make sure to register today. Plus the event is one hundred percent virtual and one hundred percent free visit out systems dot com slash s. e. daily to learn more about the event and to claim your free ticket. Hope to see you there. This episode of software engineering daily is brought to you by data dog. A full stack monitoring platform that integrates with over three hundred and fifty technologies like gremlin pager duty. Aws lambda spinnaker and more with rich visualizations and algorithms. Alerts data. Dog can help you. Monitor the effects of chaos experiments can also identify weaknesses and improve the reliability of your systems visit software engineering daily dot com slash data dog to start a free fourteen day trial and receive one of data dogs famously. Cozy t shirts that software engineering daily dot com slash data dog. Thank you to data dog for being a long running sponsor of software engineering daily. I see a lot of reasons why simulation would be an important part of your process. Said it's very safe for one thing which is nice. How does this simulation contribute to the things. You're working to improve in the product if you think about in the real world right when the cup in the fleet drives around you could have lots of scenarios these edge cases or nugget. Like for example. I bicycle is not yielding. Cutting across and these type of kinda corky are educators. Very hard to recreate in real to test your scenario right. That's where for example. Simulations come in handy for example you can actually improve upon a situation like that golden nugget where you have a corner case and have a whole bunch of tests. That's around this type of scenario relevancy malaysian. You can actually create kind of these type of scenarios intestine right then. Orderly cost effective. You can actually test cases where it's of difficult to do it in real world so that's one area so that is where we and also if you think about like all. The i mean. Crews mentioned has announced origin which is specially designed fleet. Doesn't have steering wheels so which means a lot of your testing improvement has to be done in simulation. That's where we expect to see in the next couple of years be infrastructure around simulation to go significantly and that's wherein lies the challenge for infrastructure for example. Your simulations are going to go up. Because you're gonna to test lot more scenarios in the world which means from the infrastructure point of view. You gotta make sure that even your cost structure and your infrastructure is not linear rate. It has to be sub linear. So you can. Your bottom line is not affected. How much cost forecasting and prediction work. Do you have to do. Is you think about budgeting for the future hall. We have to absolutely work with this. It's like a teamwork example. We have dedicated capacity teams. We actually some odyssey of debate for example organic growth in terms of simulations or how they can go right on top of that. Like i said architecturally we can have a lot of influence on what type of tasks ceron or what kind of meta data that we choose to pull segmentation for example. We have like i mentioned. We have abide bits of data. That's collected daily basis. This is where you can actually be intelligent in terms of how to segment it. How to load appropriately right and kind of utilize your infrastructure in a very efficient way so in that sense. My team works very closely with the simulation team. For example av engineers. Emily engineers kind of figure out how to be a plan capacity. How do we look ahead. What models. dvu's and how do we use very putin sensitivity analysis to kinda see. We are meeting pointer in terms of utilization rate. What are some of the scale ability challenges you think a lot about. So the biggest challenges data women can be a bottleneck. I mentioned this right. Also mentioned that we're on millions assimilation as the and it gave you a flavor of millions of standard millions millions of vm's running these computations so where it impacts is on the eib engineering iteration cycle and our goal here is how do we make it fritsch. And let's how do we shortened that cycle. If can think of data moment is one of the biggest bottleneck right as i mentioned earlier. The other bottle. Today's for example the global supply chain for example as you scale in terms of your own target hardware number of cities in the cars today. That's becoming a huge problem as well as you scale from. Rnd to commercialization managing this global supply chain of both hardware and systems associated with. This is a bottleneck. What sorts of technologies are in the typical autonomous driving stack autonomous vehicle in was lots and lots of sensors collecting data and acting on it hands. It has lots of data movement computation. Also back in components custom components in world for example need to securely connect toback and services the broker communication to a set of micro services that handle everything from dispatch. Remote assistance mapping roll planning etc. Right and once a driver's done there is more data moment for example ingesting data into our lakes post processing. Then you do analysis continuous improvement and stuff like that for this type of stuff you know. It's the micro service architecture distribution of these and also. We rely a lot of custom tools for analysis and visualization of data for example. Where business one that we open sourced as an example right so these are some of the components that are in the stack that supports autonomous driving again. Not to mention the business side of things when you have this roaming around you need to have a business. A good ecosystem of business components like crm tools and stuff like that for example. If you're doing right here right share being able to look doing market analysis payments etc. That go alongside. The ecosystem of autonomous driving and infrastructure. Needs to be very reliable right. You're putting cars on the road without drivers behind it. It needs to scale as you grow the number of wiggles as the number as you grow into more cities it needs to scale and equally importantly as i mentioned it needs to be cost effective otherwise the businesses already viable right so the first and foremost focus is on automation. Historically infrastructures has been fairly manual ordeal. Now it's all about automation using software to solve a lot of processes gear problems. Part of that is focused on the api driven independent services and leverage. Open source with cooper netease issue and other type of mesh pictures and one thinks that for example we at cruise focused on the latest technology in terms of high scale props processing right leverage jeep. Use perhaps custom inference engines and mellberg's parkway right. I mean it's an involving technology leverage. Those and the key for all this is going to know in knowing when to build versus buy and you collaborate with large industry trends with open software. Lastly you're going to need to leverage multicolored at some point at cruise. We're doing a lot work with partners. Such as gcp and azure to optimize usage and cost. I'm aware of people being interested in multi code for redundancy reasons and also for cost reasons which is interesting to you. Both right budgetary is one aspect. The second aspect is also kind of leveraging strengths of each of the vendors for example rate it. It glowed has their own unique. Strengths can leveraging both his tour advantage. It's no surprise that you'd be making good use of us. I'm also aware that there's some interest. Just broadly speaking in specialized hardware. Maybe the automated vehicle industry is probably an exemplar of pushing the limits of certain things. Maybe there should be custom hardware or or do you think things will standardize on just the standard compute systems. Usually what happens in these type of industry you start with generalize solution is you can attack problems as you stole the gender problems you start getting to these corner cases differentiated cases where you could potentially end up with the customized solution again. This comes down to cost versus. What's unique to you and what's unique to your business plan so to speak and that's where you may have to make your choice. Autonomous vehicles is a up and coming industry. Certainly a good deal of what you're working on would be considered. Rnd can you speak to the role of our indeed crews as we move from indeed to commercialization then it becomes about scale right. Then you start later. The two things need to happen when you start. When you oughta skill development cycle are kind of fix cycle the title that we talked about needs to be sharper which means the infrastructure needs to support the engineers so that infrastructure is friction less efficient right for example. They can quickly analyze the data. Make changes put in the cold for that to happen. The pipeline has to be efficient again. We talked about on premises cloud for example. Your cd pipeline needs to be very abstracted inefficient. So that you can support both on prem and infrastructures where you're handling millions of bills giving an example when we run. Simulations are when baker on the road. Where all the back and microsoft's computation. Sometimes we run right the run into millions of emc for example right bo training as well as perhaps pure computation for these kinds of stuff. Your platform needs to be very robust right. For example we want the engineers to have best in class. Tools in terms of compilers debugged tools visual tools etc. Infrastructure is not purely focused on the side or supporting the development infrastructure. It plays a key role in supporting the As well g two. I is a marketplace for pre-vetted java script developers hire react react native and no developers. Do you can trust on for a contract or full time. Basis g to i will match you with pre vetted developers within three days of your on boarding call. You'll be able to review their technical profiles and set up interviews with candidates that you like you get detailed technical profile that provides the developers assessment scores in each category a copy of their code challenge and a recording of their technical interview. I love you too. I and i use it for all of my companies. I just think it's a great way to get started. It's a really fast way to find a great front end developer. Which is most of what we need these days react. Developers are in such high demand and g. two. I is the place to find the best react developers. You can test a working relationship with no risk. The first week is free if you decided to develop a good fit. If you don't like it you won't have to pay for it. Two is litmus test is simple. Can this developer. Make an impact in your code base within their first week. Impact is the thing that matters. Go to software engineering daily dot com slash g to get started that suffering daily dot com slash g too. I i really don't have any problem. Promoting g too. I because i have been a power user of their services and i just got to tell you. It will accelerate your development kate. Today's episode of software daily sponsored by prophecy iso a complete low code data engineering platform for the enterprise. Prophecy enables all your teams on apache spark with a unique loco designer while you visually build your data. Flows prophecy generates high quality spark code on it. Then you can scheduled with prophecies low code airflow not just that prophecy provides into invisibility into your data flows with meta data search and column level. Lineage prophecy makes enterprise data teams productive with new workflows but what about existing workflows. That are stuck an old proprietary. Etl formats for that prophecy has a transplant. That automatically converts benicio informatica sis eric's workflows too high quality spark owed see how prophecy can modernize your data engineering with a free account at prophecy dot io slash s daily. How do your customers interact with the product under what opportunity might be able to get into a cruise vehicle. Then he should go at least from a cruise. Point of view is right. Share right making transportation affordable to everyone so the focus has been a rideshare moving people and with our arrangement with walmart delivering goods at a very cost affordable cost point who knows it could end up in a driving near you. What are some of the ways you have to give special consideration. I security great question securities. We take security and safety very seriously right and again when it comes to security is defense in depth. It's it's not like one solution that fits all like you said it's more best practices and not only about break best practices. It's about what frameworks exciting. In place to be secure for example. You're collecting data. Data has to be secure. You need to selective as cars more around. They connect the back and talk to whole bunch of micro services in the back right for the automobiles to move. So in that sense every segment has to be secure and overall security is sorta ingrained as a dna within us at every step of the way. Like i said it's defense in depth at every lear it needs to be secure so talk over commercialization it comes down to capacity at scale. So the capacity to acquire commissioned operates sites at scale becomes critical in that sense. For example network monitoring and automation automation is an area for example. When you different cities how do you back. Haul data through the network. News republic natural. Have your own backbone. What does it mean to be secure and have a caustic effective connectivity. Those are some areas anti-2 that for example analytics in terms of data moment for example as do data moves around. Not only. does it have to be secure. Depend on you ingress and egress. There's a cost associated right as an example as you get into. The realm of commercialization from are indeed needs to be reliable. You need to stay up so hands enhanced visibility into network capacity connectivity failures failure detection responses that is an area year focusing right now and lord work has been happening from that at the scale you're describing it seems you might be testing the limits of some of the cloud providers you're working with maybe see edge cases that i as independent developer would never see a smaller scale. Do you ever find that you are testing those limits. Yes we do at times. Gpo capacities example right. We're pushed the limits because the amount of data that you're processing in terms of analyzing right and acting that's an example where you push the computation capacity of cloud providers sometimes for example we use cape trainer models and as as a data point we process at on roughly thirty million frame of day for continuous improvement of car models. When you process that amount of data sometimes depending on where the ability of the gpo. Sometimes you push push to. the limits. Same thing with computation was compute power as well running a million simulation ties today with tens of millions of yams you push the limits. Was you do that. You're investing heavily so you need to see some return on that investment. What are some of the metrics. You look at to determine if you're really well optimized in allocations yeah that have settled metrics that which way that we track the lots of metrics for example from every point if you measure miles per safety critical event or from stability point of view in how many miles per save staff and you measure comfort right as an example from abc point of view again from an infrastructure point of view. Something that's very important was your total cost of ownership for bill for example holland. Does it take for a developer to make a change because it translates into efficiency and also cost so we look at those. Kpi's and again from a developer point of view. Look at how many engineering are saved. Right you could apply those to actually make changes to the stack right with the savings so at every point in time at every stage of the pipeline or the process that metrics that we track to make sure. We're always trying to improve. Do you face any observability challenges use cases in terms of again. So that's one another challenge or another big investment. We're making in terms of observable ability since you brought that up so this is where we wanna make sure that the system architectures such stat. There's resilience built in liability bilton especially as you move. From our to commercialization there is a huge focus on reliability systems. Have to be reliable since we have dense tends twenty thirty forty micro services each dependent on one another. We wanna make sure. Each of these micro services are architectured. Well so that brings in experts in observability. Sra's we embed them within the engineering teams so they can work with engineering teams in developing highly. Reliable and scalable microsoft's architectures. Can you comment on the size of the team. In some of the efforts to make them effectively collaborated scale. Yeah hundreds of development. They will melania engineers working collaboratively. It's usually the cycle is where deep very collaborative announcement to speak especially for something. That's cutting edge. We want to move quicker. The loop of developing to make changes bill integrate. Roll it out. How do you test. It's a combination of process systems or frameworks in place and also the collaboration. Within the team that rebuild build within our for that call cruising another words people have developers one another. Something's blocked. they can actually walk. Oh world of course. now we are doing worthily. They can slack and take any helping with that. It's a combination enough a common mission or goal towards accomplishing what we want supported by tools. Infrastructure process season framework is what differentiates in my mind from. Anyone else it's no surprise that the players in your industry are tackling will require massive amounts of data. Is it that. Clean of a competition is it. Whoever has the most data is likely to produce the best products. I can't speak to the other players in this area. But i do know what we value for. Example at crews are focuses on quality of data quantity of data not volume of data going back. Forget for example. I mentioned cruz. Consciously decided on san francisco as an example or high dense or an environment where the quality of the data is lot high. Because you're gonna run into situations that you wouldn't see sparsely populated suburb right. So our focus has been using that quality of data and then working on the toughest problems so that we can get to the market faster and skill faster as well in terms of a launch. When might someone be ina- cruise vehicle or getting something. Delivered via cruise vehicle has already announced announced partnership with walmart. We are actually doing pilots with the delivery and pretty soon in months naarden years that will be launching a commercial service in san francisco and crews already also announced a partnership with dubai coming out in twenty twenty three as the first international market. So pretty soon you'll be able to hop onto one of those rights within crews waco. Let's very exciting. Are there any particular technical challenges. You're working on in preparation for that launch. Yeah this falls into kind of three different photo operation like this to be successful. It's not just technical aspect right for example you need to have an ecosystem of commercial subsystems in place. Your capacity to scale operated sites i mentioned networking systems and stuff and the back into seems to be reliable and also you need to have a whole bunch of partners in place to support your commercialization effort for example. We have backed by leading cloud provider microsoft backed by an also from harvard. Point of view. We have a strong relationship with honda and gm who have hundreds of years of combined experience in building hardware which is an important aspect when you're building cars for commercials right and not to mention our agreement with walmart retail giant so as we move from a self driving from our two commercialists an inflection point. Pretty exciting time. I mean at cruise especially in general and cruz in particular better than inflection. Point this is the most exciting face where you're going from. Rnd to commercialization as we hop on you can see autonomous vehicle without drivers behind the real driving around the city. A lot of exciting work both on the stack as well as infrastructure is side of things. So i'm super excited to be at this time especially in an era in the company. That's the cutting edge. We'll the venos- thank you so much for taking the time to come on software engineering daily. Thank you very much. It's been a pleasure talking to you kyle.

hickel Google vinod kumar Tanzer Tom zoo harvard kambas san francisco walmart mojave desert dvu cooper netease Ware tanzania fritsch
Kubos: Moving space to the cloud

Your Space Journey

13:00 min | 2 months ago

Kubos: Moving space to the cloud

"Moving space to the cloud. That's what cuba's corporation is doing with their cloud based satellite software. Find out more in today's episode. Three to four. Welcome to your space journey. Where we venture into the future of space exploration is. Your journey begins now. Hello thanks so much for joining me today. In this episode we speak with tyler browder. Ceo and founder of cuba's a satellite software company. Tyler joins us today to discuss how their companies cloud base mission control platform called major. Tom helps operators take care of every aspect of a satellites flight software from mission design and onboard software-to-hardware testing and mission operations in the cloud. Your space journey tyler. Thank you so much for joining me today. Really appreciate it. Yeah thanks for having me. I appreciate Let me come on and talk while you've got some exciting things to do. Your company just came off last month. Pre some exciting news. He watched what four customers launch. Six satellites onto the rockets on the same day. It was five customers on six five customer. Six Satellites on onto different rockets on one day. Yes either way. It's amazing before we talk about your awesome software. Tommy just how you felt. Generally on that day. that's incredible. You know i actually was describing it to one of the customers yesterday They they had this feeling of like watching it. Go up in space than waiting using my software to let it come in the first day to come back from the space craft. And i can't see what they see. I see the back end that there's a data coming but i don't know who it was or what it was is it's this mix emotion Honestly i wanna bug my customers ask about going but they also are really busy and don't really need me bothering them and so i actually don't want them to talk to me because that means something went wrong right inside you know. It's kind of this really isolated work out of my home and like i just hear you know sites of weird emotional experience but it was a really great day. What a dream country though. Because if i remember right. A coup boost your company. You form that i think just a few years ago. Twenty fifteen somewhere on the right as right. Twenty fifteen all right us the the nutshell version of how you're moving to the cloud. Tell us about coupes. Yeah so cuba's has been on a long journey. We started out actually a flight software a working onboard the spacecraft Last year we pivoted hard to the cloud in doing mission control software in really traditionally space. If you're going to operate a spacecraft you do it on a server in a closet and you have a direct line to your your ground station and really a scalable for the number spacecraft. We're seeing scalable from the data. Were seeing zo. We find it more efficient better development experience Redundancies better if we just moved things the class so we're kind of really kind of leading the charge on adopting space Sorry adopting cloud in a space environment. See that had to be incredibly. I just went to find for those viewers out there who might not know. Just the cloud is basically your your computers. That are somewhere else. That's right. that's right in their attributed so we can work off. Lots of difference of even one thing goes down we can easily move to a different Computer that's over diverse while in keep everything up and running right. There's not a single point of failure like there is on on premise service. Exactly and that's what amaze me. Because i thought the same thing as you you know i grew up and you always pictured emission control and all these launch companies. That weren't near as many back. Then but you know they had the computers recyled out and never even occurred of taking that and doing a cloud based solution for it. So i imagine you came up with a little resistance when you did that. How sure sure we we still get resistance but the advantages are just overwhelming right now for the scale ability in the reliability. But you know what we do. Is that straight telling us ally with the getting the data bag. We have a direct line to the satellite. And then you have all these other services applications in order to actually do your mission right. What have you taken images or scientific or whatever and those are always been isolated in these data silos so with the cloud we're really loud to use. Api interconnected exchange data and make it much more of a connected ecosystem instead of these discreet individual silos. And it's just a much better experience for customers and their customers and it's just as ongoing chain one thing also love Tyler is the name of your platform major tom. That's right i You know space is really serious. You know you've got this this leg. You what this cool factor you know. i'm kinda goofy. And so yeah. And i'm my background's a music like growing up to be a rockstar. It so it just may sense to do something in this space. Oddity the song and you could tell the david bowie fans because they'll say does it. Make tom die in the salt. All you're focusing on the road details wrong detail tell. Just come back over here. Let's talk about this over the it. So it's it's it's a fun name that sticks with people. I think it's great. And if i if i had the money i would have licensed it for this episode right here but i now music wise. What what interested you play. I'll why i play variety instruments But i also was in country music a re did Recording studios and recorded touching music in nashville and that was another lifetime ago but It's part of kind of who i am. See i see in another life. I would have said. Oh we can do a live rendition entirely complete major tom right now. We can do a little agenda. Can't that good. There's a reason on the recording side and not the instruments side is not that good at it now. I wanna ask you this. Because i know you came from but where did the space interest come on. Was that a childhood thing. And it's something occur. That kinda said ou running in the section to the space arena. Yeah i'm more of an entrepreneur Like starting business started several of my life in group in entrepreneur family In so How it got in the space was a little bit accidental. At the time we started the company when my co-founders he was in space. It was a software engineer in work for some spa space startups and he wanted to start a company and we knew each other through you know various friends and stuff and he took me over tacos and he told me about what cubesats are in satellites and. He said he wanna start a company. I said you know. I don't know what those are in. he goes. Yeah as it okay. Let's do it and here we are. You know six seven years later or whatever and It's been a wild ride has been a lot of fun. I love that too. Because you've really. I think i would love to say this is that you're disrupting the space sector as well. You know i mean. I i compare that. To the elon. Musk's spacex kind of things. I mean it's just thinking outside that little box that companies tend to put ourselves in. I think as an entrepreneur. You're able to say hey wait you know just because we've done this the same way for so long doesn't mean we need to do it the same way. The space industry right now is going through quite a change. Just just from an industry symbol. A lot of people like me who never been industry. Didn't didn't come up through school and getting a job as an intern and working our way up. The we just come in and say there's a better way. The rest of the world is benefiting from the cloud. Why can't space space. Should we should really start innovating. A lot of the focus has been on hardware innovation making things smaller making rockets cheaper reusable. But i believe a lot of people believe that if we're really going to accelerate our use using space it has to be done through software and so that we're doing our part through cloud there's other ways to do software but that's really where the next evolution of of disruptions to come from. I love that andrew. It mentioned to you guys have a wonderful podcast for cuba is called ground. Control checking in. I checked out a couple upsets on that that you just started that really. It's really a threat. What one of the things. I really liked about that. Is you know you mentioned how you're you're startup. Companies and one of the challenges i. It's been a challenging recent time with the pandemic everything. That's sure but one thing that really occurred to me to keep things positive is that you've engineers coming in you have people coming into your company That have no space knowledge. Either and you have to have the challenge of introducing them to the space sector. So how do you that seems so overwhelming to me how do you. How do you accomplish that. Where do you start although the most fun things i get the do right as you get to have. The people who have great career very talented and software development and who've always dreamed about space but fascinated by space never had the opportunity to get in and now they had the opportunity to get in and they just eat it up. They learn so much in and went to know everything we also people who grew up in the industry right. Who that this is what they've been doing. And so we pair them together and we create new ideas and new opportunities. But it's just really fun to see These really talented engineers come in a new ideas. New way of doing things and then just absorb they. They're just ecstatic to be a part of this industry. That's wonderful you mentioned too before about you. Know the resistance with some some customers and clients. Who sort of who are your ideal customers out there. Yeah so right now. We're finding a lot of traction. With companies who are billing constellations of satellites. Large numbers satellites getting started a moot starting in a cloud environment In so they want to scale they want to be able to move quickly and so those the big sectors of their commercial entities trying to you know data to sell to other people. We're also citing success in traditional industries are traditional aerospace But the angle is we can lower your costs right. We get rid of all that expensive hardware and all those that staff. You need to maintain all this week a move to the cloud a lower. Your costs still give you the same performance And then also allow you to to do new ways and create new processes with those really the two angles. That were really going after right now. Okay now i do ask this question and feel free to tell me if this is going too far but just as a software developer And i know lots of software. Developers will be listened. This right now We're intrigued by your cloud. Solution is any way you can tell just a little bit about the technology used. Just in general sense. I'd love to show students. Yeah yeah so. We built everything on ruby on rails. At least that's what the -cation is built on top. We use cooper netease An influx db database We use heavily web socket. At is jason As well as python we use a lot of different excellently on our scripting languages and we have a graft ul api. So we're trying to bring some of these more modern new technologies and not just rely on c. or something to to develop But we also have a lot of flexibility to integrating with our stack That you can use whatever language you want us. We we expose a lot of debut is really pushing integrations angle. Yes all at your api documentation it. So it's wonderful. I took a look at that too. I'm so again folks out there if you're interested cubo dot com that's k. u. b. o. s. dot com. What's next for cuba's what's coming up. Well we have more launches this year So that's really exciting. We're not done this year So we've been really focused on supporting these customers. If a lot of feature development were working on new features Missing planning we integrate with a lot of difference new systems Such just a lot of exciting activity on the development side of things while. That's wonderful well again tyler. We're very impressed with your company. Just wish you in the best. Thank you so much for taking time to join today. Really appreciate it. they they very much appreciate it. Why really joe. My conversation with holiday. And i'm so excited for the future of space software in the cloud. I think that's incredible. And i'm excited for the future of cuba's feel more visit their website at cubo dot com. That's k. u. b. o. s. dot com tyler for joining day when a. Thank you for joining us as well Again we'd love it if you leave a review on her side or future this episode with a friend but other wages. Thank you so much for joining us. Today we'll see next time god bless.

cuba tyler browder Tyler joins Tom helps tyler tom Tommy david bowie Tyler Musk nashville kinda cooper netease andrew jason joe
Insight into Pivotal's Kubernetes and Data Analytics Strategy with Chad Sakac

Big Data Beard

50:38 min | 2 years ago

Insight into Pivotal's Kubernetes and Data Analytics Strategy with Chad Sakac

"You are now listening to the big data. This is our podcast for the trends, technology people making Vic dated. This is from the big data beer team. And we are recording at Deltec analogies world here in Las Vegas, and we're joined by chats at Chad. How you doing guys idea what I'm doing great? We'll for those of you who don't know you once you introduce yourself, tell us about what you do. My name's Chad sack, which I am lame by the beard. Standard saw looks good. He's, you can't see this on the podcast. But this SCRUFF here is like a solid, six months of growth for me this. So we'll be good though. It's yeah. It's I'm really working on it. But yeah my name's Chad. I am a nerd slash technologists slash. Leader inside the family of companies for a long time. So I, I lead all the all the my fellow nerds and brothers and sisters inside EMC and the Nell EMC. So anyone who was an SE, then I was the GM of the converge plot form solutions division, which is focused on customers consuming technology. Run trying to build it. So CI HCI, a lot of stuff around the solution stacks of the companies, and then I most recently joined pivotal, where I lead our Cooper, Netease efforts together with my colleagues at VM ware. So that's exactly the reason we wanted to have you on the show because one of the things that we're seeing both from, you know, software providers to service providers to hardware companies, they're all looking at this Cooper at his thing, as a deployment method all the methodology of the future, even in the big data workloads. So help me before we jumped all the way into Kuban. I want to just get quick, like pivotal as a company like what's the general mission of that or so, so pivotal pivotals a fascinating part of Deltec each each one of the companies are a little bit different in their own way. Pivotals mission is very simple which is to transform the way that customers and companies build software. That's in a single sentence. Now, inevitably that involves changing the way that people right applications so pivotal as existed for a long time. You know before agile was a thing. They kind of invented agile extreme programming paired programming models. Test driven development things that, you know, are, are now just considered the way that you build, great software. So they, they have kind of led the charge on the methodology of how people build software and over years of doing that realized that a part of it wasn't just teaching them how to do it. But to give them the tools to do it. So. You know, we, we are the curator of a ton of open source projects, ultimately, to build help them, build better platforms on which to build great software. So I'll tell this in. Funny. Maybe story. So core you and I have known each other for some time. You know, me, I'm well, and for the listeners that don't I'm very happy social, dude. I'm the kind of person that you when you sit down on an airplane reaches over and says, hi, my name's Chad. What's your name? What are you doing going, you know, and you're like, oh my yeah, they're social? This is this five hour flight, I'm trap beside this headphones right in. He's put my headphones in right? So I travel a lot and I see customers all around the world and I find myself in hotels very frequently. I'm fascinated by human beings and cultures and languages. This is a lot of runway for a very basic story. I love it, which is I've also been an infrastructure Centric person for the bulk of my career. Starting with infrastructure at the hardware layer very passionate about virtualization VM ware from two thousand and six before it was obvious that it was a thing through two thousand and ten where it started become really obvious. As a big thing even till now, but an infra person, so I'm at a hotel in Paris, and it's empty, and one thing I love about meeting people in hotels. This is again, getting into that. This is not a. Risque story, so people you don't have to worry. I see a guy and he's working on his computer and I go over and I'm like, hey, my name's Shad. What are you doing? What's your story? The reason I love doing that for people in hotels is by definition. They're there with a story. They're not at home, which means that they're going somewhere doing something. There's always a story like what your story? And he's like my name's Jack a PHD student working with series of enterprise companies building genomic. Sequencing and analysis software, and I'm like, that's really cool. And so he's kind and he doesn't say and Monsieur was at jackass can I be left alone? These very nicely said, well, what's your name, and what are you doing? This is pre del acquisition. And so I say, I worked for EMC and he goes, what does EMC do. I say we do storage. And he goes, you mean like, where I store my furniture. Locker like never heard of you. I try to explain like sands and NFS. Channel right there in the hotel. Bar fight over the pros and cons of fiber, channel versus ice Cussing. So I'm like, okay, how do I build a friend Bill bridge with Jack? And I'm like, well, you know, you're writing software. Maybe less of the event, have you heard of VM ware? He goes, Nope. And I'm like, okay, this is getting trickier. And then I'm like, well, what are you programming what framework what language and he goes, I'm using spring boot, and I love it, and I go great. We do that. We he's like, that's awesome. I love it. So what pivotal does when I'm saying they curate open source software platforms? The two parts of how we help people build great software is number one. This methodology and the other one is the technology in the plot from itself. So we were primary contributors leading the spring software ecosystem, which for those of you, that don't know it's the most widely used modern Java development framework that exists. We contribute to the open JD case, if you hate oracle, and you want to get Java open JD k support from someone who's friendly and nice. We would do that too good. Tomcat server, you know, is something that, you know, were one of the primary contributors to post grass. So for your listeners that are like, hey, I use post grass, you know, we're one of the primary contributors to post grass patchy geodesic. An in memory database, and then, of course, cloud foundry, which is a pass a platform, as a service that's been developed over years for people who want to build great twelve factor, cloud native apps on the cloud of their choice. But it's more than that to like projects like STO on voice service, meshes. Really cool work around spinnaker, and concourse. So next generation CI CD tools for how people build and manage the builds of their software overtime. So pivotals a really interesting place. In obviously over the last couple of years of big focus has been on. How do we make Netease great? So help me understand in the listen. Why has coober Netease just emerged in the market and been such? There's so much noise. And I think there's oftentimes a bunch of confusion as to why this exists and what problems it solves. So. Warning. Dear listeners, I'm gonna say some things that might to some sound negative hookah, but I'm Kuban Eddie's cheerleader, in fact, within pivotal, the chief cheerleader. But I'm also pragmatist right good. Good quality, so coober Netease is definitively declared of -ly officially in peak hype cycle face. It is something which is every, you know, if you go to Khan, you know, in AP J in a mea or the Americas. It is a huge event. There is an ecosystem, which has got hundreds of companies building software tooling, resources around, Kerber Netease. And if you have a fear of missing out foam, oh, due to the what's this thing. Everybody's talking about do. To do you know how like Duke went through a hype cycle Dollah, and like everything you go through a hype cycle, and then trough of disillusionment, and all of that stuff coober, Netease is in that hype cycle phase, and probably starting to enter into the trough of disillusionment face. But then there's the next two, and then there's next stage, which is you're doing things that actually matter with exactly. Okay. So I what is this Cooper had? He's thing. Right. Right. Coober Netease is. Container cluster managements and orchestration. It is it has emerged as the standard defacto way that people manage containers and container run times and deal with all of the fundamentals of how do I deploy, how do I date? How do I live cycle? You know. It is an open source project. It was born in two thousand fifteen. Within google. It was Google. Third iteration of how do we do contain a resource scheduling inside Google for a broad variety of purposes was preceded by something called Borg, which was the same kind of idea. What it does is no more and no less than B a standard way that you say via the I need to deploy pods containers on underlie ING infra, and it is declared if so it's a declared of distributed system so that it's as, hey, if my state is not what it's supposed to be. I will fix said state. Now it uses every container run time that you can think of so, you know, the most commonly deployed container run time as Docker, but container D And many other container run times or supported by Cooper Nettie stuck. So in two thousand and fifteen you know, it version one dot O got released to the wild. There was an echoes of formalization of the ecosystem, so underneath the Lennox foundation. They created the cloud native computing flout foundation, which is basically the home for governance of coober, Netease, plus all of the ecosystem around it. Dell. EMC VM ware, pivotal were all founding members along with many others. And in two thousand seventeen started to show up in enterprises it kind of start to settle it left. It's on phone. Terry blue phase and you know moved into a kind of more stable steady state. Exactly. And yeah that's, that's so then what's pivotals opinion on Cuban at this peak AS thing you talk about so. Again, for dear listeners. I'm very long winded as you can probably tell at this point big it the way I understand things is always understand the like. Okay. I principles go back, you know, if you understand this understand this you understand this aggressive, so first things, first, both VM were, and pivotal at the same time independently came to conclusion that this Cooper, nutty, thing was very important for us. Our customers for two wildly different reasons. Okay. The first one and his highlights something. Interesting about Cooper, Netease, which is it can be viewed as an infra thing or as a death thing. Right. And in the world of DevOps, those are kind of blurry definition kind of blurry. I get so. VM more looked at it and said, we are the kernel mode, VM leader. That's kind of indisputable at this point, we've helped enterprises around the world of every size and shape kinda grow. Rock the ideas of software, defined infra. How do you build software defined networking, compute storage contracts? How do you do automation, scheduling, the high availability restart all that stuff and? They do a great job of it. They said, well, what happens if today containers in the enterprise is a relatively small set of workloads, but it's going to grow. And even if the bulk of containers run on kernel mode VM's, which they do for a variety of reasons. It's something that they have to lead otherwise they're going to get disrupted by transition the market. So they're like, we must be the software defined infrastructure leader. And this is a form of software defined infra, and we need to therefore lead that charge. So we've got to contribute. We've got to build something. It's like except the reality. It's coming. We know there's no reason to embrace at elite at the same time, pivotal looked at what we were doing. And we built something called cloud, foundry and cloud foundry existed before Docker before coober Netease, and it is a higher order obstruction. So on the words if you think of coober, Netease, it says, give. Me your container, and I will run it for you. Right. But higher order abstract are like give me your code. I'll do stuff with it and part of what I do with it is all build a container and in schedule it for the make sense up. So it makes us. And so cloud, foundry had to invent container schedulers and orchestration tools and has an existing it didn't exist before. Right. And we were like, you know, again, if you think about it back for years ago, everyone would have said, Docker is going to be the container standard check that is proved to be true, but they also said, Docker would therefore, be the container scheduler leader. So thoughts more thing that hasn't really turned out to be the case. Maybe it's gonna be as fast forward a year. You know, maybe Meszaros is going to be the thing and their workloads that Docker swarm, and Mesler, great for sure. But increasingly what's become clear is that, that market is starting to settle and kind of. Curate arou-. Round Cooper, Netease is the standard, and we started looking at it and go. Well, we have built, a container thing that's embedded into cloud foundry called Diego. That's deep inside the bowels of cloud foundry. When someone does a cloud foundry push command part of that is builds the container and poops it out. Right runs it on the just visualize. What that looks like. Well, and the plot from takes care of doing all that. But we're like look ultimately if Cooper Nettie is now turned into the standard we should really figure out how do we make developer experiences on Cooper, Netease? Awesome Kuban Netease is just the beginning of developer platform, not the right. And so we said, okay, we need to start to invest big time. But a part of it is, we have to have a coup Venetis that is like ready to run the biggest enterprises in the world. This is not for your lab stuff stuff to be proven because cloud, foundry is running hundreds of thousands of containers at the biggest fortune five hundred enterprises on the planet if we're going to adapt and evolve cloud, foundry to run on Cooper Netease. We need to make it run on any Cooper, Netease, but we need to minimally be able to say there's one Cooper, Netease that rocks. So if you want one that we are. We're behind Cooper, Netease. But this is one. That's enterprise. Ready VM were in pivotal came to those two independent conclusions. At the same time. Interesting. Right. It was one of the first times where Michael Dell as Deltec came in and said, hey, we can't have, you know, pivotal running one way and VM running the other way like that's not cool. Yeah, for. Sure. And basically the effort got kicked off as a joint project. So sometimes, you'll hear pecans referred to as peak s sometimes you'll hear it referred to as VM where peak s those are the same things they are, okay, you can download it from my VM. We're dot com. You can download it from pivotal network. You can buy it from Dell EMC, pivotal VM ware same bits same engineering team same effort. So fundamentally though, containers, like if we think about -plication many in the big data space were tightly coupled up locations that one of the run without any obstruction. They wanted la- next, right? Give me bare metal servers and we whatever application called. Hoop. Spark pick, your flavor was very against it. Then mini-applications we started thinking about virtualization, because we were able to virtual is like data critical apps like SAP, oracle coats, containerization, though, feels like it's coming in and going, maybe it's a bit of all those. Why is that? Why is it that like Cooper Netease as a deployment methodology and has an orchestra orchestrator? Why is it now, becoming in vote for these like these data intensive applications? So the first the first reason, is that it is an open source project open source loves open source. Source. That's the first reason second reason, is that the Google team that built it two of three of which are now VM were employees, Joe beaten and Craig MC lucky. The third co-founder is over at Microsoft working on Azure community services. There were very smart at how do you create a distributed system that is designed to follow the Unix principles of do as little as you can. But do it really well chain these things together. So anyone who uses Lennox familiar with that least least complex design principle? Rosa. Yep. And, and also, how do you create a distributed system that can scale so containers are open Cooper, Netease is open Cooper, Netease, architect of very well. So it's very, very flexible, but also very, very scalable. And it also has the striving ecosystem around it. Right. That's that's one bit. Stripe. The other thing is that everybody is realizing that they've all been constructing their own resource schedulers independently and it's like man. This is a lot of work that if we could tap into it. That would be a good thing for us. Like if you talk to anybody in the data ecosystem, and you say, how excited are you about your customers loving Kafka or loving post-chris or loving mongo or what they're like? I'm really excited about that. Then if you said how excited are you about all of the work that goes into making it run? Right. And they're like, oh if I could make that, that, that was an awesome. Exactly. So all of them have built over the years their own software into their own stacks, because they're all distributed systems onto themselves that take care of like how do you stand? Shade, instance. How do you take care of distributed system? Faults restarts. And increasingly they're all scratching their head going, like, hey, can we tap into something to do this? So as an example. Dell EMC needed to create an action ration- object store and an object store and a data service are kinda kissing cousins. You know, they persist information right in their distributed systems, and when ECS was born we were like damn it. We need to have something that will deploy the components instances, think about cluster distributed system health, and we looked around and we said, we need one, we used Meszaros. Right. As ECS continues to Volve UCS is now said, hey, look, ideally, we'd love to have our UCS software and say, as long as you have Cooper, Netease that works are ECS will work on that. Okay. Interesting make sense. Now here's the. The warning negative comments so coming into this from a legacy infra worldview, and a VM ware worldview. The thing I would say is that this thing is moving at the speed of light. It's very still immature in many ways. Right. So, for example. Communities gets released every three three months. And every few every releases pretty materially difference. For example, in the most recent release one dot fourteen. There's a whole new standardized way for how do you deal with persistence storage, which is imagined for data platforms, kind of important and of important? Yeah. So there's that this. Oh, that's changed. Let's go to the big deal. Then there's whole new ways of doing networking that appear. Cooper. Netease is built a whole slew of ways of, how do you do cluster management, this cluster API project all brand new hot off the presses? And then the data ecosystem mongo, and all the others, all are just starting to release what are called their Cooper, Netease operators. Which are like, here's how you make this work on this confluence did a blog post on, how do you get confluence to work properly on pecans? Here's the operator. Here's how it works. All that stuff is like hot off the presses. They're still sharp edges. It's kind of like juggling flaming chainsaw. Is it possible? Yes. Is it time to get into juggling claiming chainsaws? I don't know. How do you see this compare with the emergence virtualization over a decade ago? So dating myself year. But I remember the first time I used ESX to dot O. And the first time I also remember the first I ever saw v motion occur. And I was like, holy shits. This is fricking loss up face melting, right? Smelt exactly. And I was thinking back. And I'm like, we're exactly where for the listeners who are from that universe of ESX, three dot O and V center to dot O. Right. It's just schist starting to like kinda get there and again, everyone who was part of that ecosystem we crossed the Rubicon when we got to vs, fear four zero four zero was like the first release where it was like, hey, this thing's really ready for prime time. But. Some people here that are like, well, then I'm not touching. This thing with the ten foot pole. That would be the wrong conclusion. And by the way, it's a good question the history's very instructive. Even though fear four zero four five. We're kind of the place where enterprises got a ton of value for a broad set of workloads with these fear. That was in two thousand and ten if you didn't do anything with VM ware until two thousand and ten you're kind of late to the party. There were truckloads of workloads where you could have been using it all the way back in two thousand and seven thousand six for sure. And you could have saved a lot of money you could have learned a lot of things you would have found all sorts of interesting use cases. But even in that time, there were people like, well, if it doesn't do this, it's a total waste. No, those people were wrong. Right. So today, you find people like, well, if Gruber, Netease, doesn't have this security model, then it's useless. Well, don't be that guy or gal they'll be wrong. Don't be wrong purview wrong again. So when you think about containers in the future, I wonder do you believe that the likelihood for Kuban Eddie's to get in? Into this like place. Where we're like actually delivering do you think it's better because there's so many more participants in the project by people like you know, suffer companies obvious gate in their engagement in the building of all these clusters systems near is a great sweet again, by Joe beta, where he basically said, like, hey, we've got to make this thing really boring. And, and I what I've noticed about ecosystems is that. In that hype cycle, like again, it's so funny, the people who went to VM world in two thousand and six to two thousand and ten were the same people who stopped going into thousand and ten was there like this isn't getting boring, and then they're over, and then they're like, I'm going to open stack summits, if right. And that was like two thousand and ten to two thousand and twelve thirteen and two thousand thirteen. They're like this is so boring. I'm going to Dr Khan, and then Mesdames Khan. And now. Kube con. Right. And then what happens is that when those people leave the party, a wave of pragmatists, like, come in and some people look at and go that means. No, that, that means it's like now getting into the point of. You've made it so simple to use and adopt. And it's moving into the primetime which people some people just don't want to do, right. Like to be in the cool crowd. So on the edge, you know what we're seeing? Now is that between us Google red hat and many? We're determined to make this boring and awesome in its boringness, which means that people can then do things with it that, our, our full. So I'm curious you. So speaking of like things that are in their hype cycle. I'm curious just based on your spending time with customers, lots of technologists you got a good can of view of the world. What's your take on like artificial intelligence in the enterprise today? We is in the hype cycle, that's not sure all marketing most dangerous milling. That's right. You know it's. Something can be in the hype cycle, and yet still be really important. So. I think part of the problem is terminology is frustrating. Right. So makes people think Terminator sky net. That sort of thing, we're toast. And when you describe, what is today, and machine learning and I are blurry lines. You know where you're basically, like look. It's a mathematical model that we iterative -ly churn through. Using a whole slew of data and iterating on that model until the model gets to the point where we can highly predictively, take a given set of input applied against the weighted model and predict outcome, people like is that how it works like. Yeah, that's how it works. They're like that sounds like pretty basic, math and hyper trivialized clearly is basic math. The math, by the way, is extremely complicated. Of relative. But the core idea is basic. The thing that, you know is fascinated me as a human is at highlights how bad humans are at really understanding the power of interational. This is true, by the way in software development, and in a in sheen learning where high degrees of interruption can produce these machine learning systems, which are really, really cool. And so look. I drive a tesla. I'm pissed that have a gen one tesla that that, you know, won't get fully autonomous driving. But even with my gen one system. I'm delighted to have a little bit of the low taken off me when I put it into autopilot. So. I hate stupid eight, I've ers. Okay. Right. But I love working with the chat, bot. Okay. Isn't that weird? I guess it is strange, but chat bots, work customers, prefer a chat bots, strangely. Then talking to a human. I think it's because we kind of like the distance of people headphones being right. Totally right. They do this thing like this. Like we the one of the they, they argue the truce us into society. What they're thinking is in their Google searches because there's that layer of obstruction between the person and like another person in verse. You're weird. Exactly. So, you know, like a and machine learning is everywhere. We spent a fair amount of time, working with the car, manufacturers, and that ecosystem about a Thomas driving systems. There's a lot of intersection of coober Netease and those workloads for sure. There is a ton of just fascinating stuff going on about AM L, and and again, Kuba, Netease is kind of like a little bit of a how can you orchestrate said workload. And so it's kind of like just a less interesting than the math models for sure. But if enable somebody actually gone, do those things that's kind of interesting. One thing you hit on there that I think is really fundamental interesting, especially for like where the world you live in the I think, so. Many folks in the data world need to be paying attention, is there is like, if you build a model you're going to have to deploy it and you deploy models in software. Yes, it's so if you're going to be successful data this stuff, you gotta have a software development, capability. Do you agree or totally totally and one thing that is interesting about Kerber Netease? His it comes in many shot sizes and shapes. Right. So at the one end you've got, you know, the Cooper Netease, hyper scale clouds on the other hand, you've got many. Cube and canonical. For example, put out something that basically runs a two hundred Meg download and runs on a standalone Lennox machine right, ranchers got a similar small footprint deployment thing. So in many cases, people are like, hey, it's pretty cool that I could take a container workload that runs stuff, and I could equally run it into a hyper scale cloud, where I could say auto scale, the moonbeam beyond, and I could also package it into something that could be deployed on a small footprint. We see a lot of interest by the way in retail. So a lot of customers that. You know, like a Starbucks. And you know, walmarts and nordstrom's that are going. Hey, how do I build a small footprint thing to deploy FedEx ground? They're like I need to have a common software. Deployment model in hundreds and thousands of FedEx ground locations. But my software developers are going to develop it in data centers, or in the public clouds and Kuban Eddie's being a standard and Abel's that sort of portability, for sure. Now again, hype cycle that sentence is hype cycle. This is the practical reality, even Kuba, Netease instances that are all conformant, meaning that you can run a test against it and say conforms with all of the Kuban Eddie's API's. They're wildly different still when it comes to observe ability. Tooling. At mission, controllers all sorts of stuff like that. And that dream of portability is something that's very important to us. So we're very determined to make our contribution to upstream Cooper Netease. Make sure we're aligned make sure that we keep it pure. But at the same time, you know, warning promised vendor promises of portability are often overblown. Exactly. So piddles had products in the passive folks on data science. And you guys contribute a lot, and you'll source community for, you know, products data science just curious what, what is pivotal view on just stayed analytics, big data in general, and how you are helping customers solve their data challenges. So. First things. First, this is now a personal common not a pivotal comet. One thing that has always pissed me off about our ecosystem is the overweight influence that our friend Larry Ellison has in our industry now. It's not that oracle doesn't make good stuff. They do. I've just yet to have yet to find a customer anywhere in any of my travels, regardless of company, vertical language, culture that says the following sentence I love for ical. There's been a lot of money with them. They're like, I don't feel loved. I don't feel like they're out to help me, and yet, I'm spending a truckload of money with them for sure. So the first thing in the world of data is we believe that relational open source databases are ready willing and able to save people from blowing their money with oracle. Based on post Grosso post grass my sequel are, are there in my humble opinion are ready to run the majority of workloads, that are relational databases that run on oracle today for sure the fact that we haven't gone more open source in than areas. Stunning title actually understand the why behind it. You'll think it's a proprietary lock in for like the application stack owning the underlying for stocks database gonna end for I by that and data has gravity for sure. Trump. Sure you've talked about on the podcast of it's difficult to move. It's difficult to get off change and all that stuff. But they're coming certain point where just sheer evilness and anger normally would overcome that inertia. Right. We think the moment is now to throw off the shackles of our oppressor. Well and, and, and say, hey look, we're not. We know we're never going to displace oracle in the market as a whole, but hey, reach out to us, and we think that with a fraction of your oracle spent like ten percents, we could reduce your dependence on oracle by thirty to forty to fifty percent using open source, relational databases boast, grass, my sequel, guy that's part one. Part two is we've realized that if relational data is like the majority of what's in the enterprise. The other big majority is data, warehouses, and data warehousing has gone through multiple like gyrations and have Aleutians and am I going to use, you know, Ducasse my big data warehouse, like, how's that going to work out? And you know what's interesting is we were kind of head of our time with green Plum, which was distributed. MPP massively parallel processing data warehouse that worked ridiculously well, and a couple of years back we basically went, and we open source, the whole thing and built it around a distributed massively parallel version of post crusts. We think that, that is an awesome way to build an incredibly performance OS s face data warehouse, and you can get it as software. You get a software that runs on coober Netease, or you can get it as a hardened appliance. There's think all the green Plum building block, which is green Plum on powered servers that is all of the goodness of appliances without their rigidity. So there's a broad set of x eighty six server configurations, and it's kind of the modular version of an appliance. It can start small grow. It's pretty awesome good. We continue to think that every micro services needs a cash, so part of the data world and ecosystem is cashing. And we do something where geodesic, which is the open source. Branch of gem fire and something called pivot. Cloud cash is a core part of how people build modern applications, because every modern app has something fronting the relational database. And that's an important part of the system. Now there's a ton more that we do. But that's where we stop organically, what we do with everyone else is we work with them furiously. Whether it's mongo confluence, a ton of intersection between modern micro services, design and event frameworks. So we see this combo of modern app and and cough all the time. And so working with confluence to make confluence awesome on Cooper, Netease and deeply integrated into spring is something that we're doing as well. There's a huge long list. It's fascinating stuff. Absolute is John. It's been fun to, to catch up with to hear what pivotals doing to solve problems, but also how pivotals helping drive forward, this exciting area of obstruction and containerisation that I think is something everybody needs to pay attention to it sounds like the standard going forward is the standard going forward, ton of also. Interesting stuff going on with us in video about, like, how do you take the tensor flow, libraries, make them work well, in our Kuban, Netease, and others? Great stuff off of accelerator technology is near and dear to our hearts, totally get that. So I do want to shift gears here. We're, we're going to have a little fun every time we have a guest on, we try to go through a rapid fire questions. We've learned a lot from our guests about big data now get fit personal in a segment we like to call rapid fire. So what is the latest book that you read that you would recommend to your listeners latest book that I read? So there's a whole bunch of like reading about Kuban Eddie's deep dives and refreshes on, on development stacks, those I wouldn't I'm just gonna those are those are fun professional reading. Homo sapiens, was a great book and do a sex, which was the follow on. I think I'm getting that, right. Was also good, but not as good as home associates. The home day is homogeneous. We heard what I just heard about. So it didn't quite capture the magic of homo sapiens, but home safety, it's really Doug, that was wild old classics Stiller, incredibly useful. So tribe is one of my favorites and tribes is, is so hit some of the fundamentals of like leadership and, and modern. How do you create cultures influence? Oh, leadership, which leadership. There's a big, big believer in regular reading of the economist. So this is gonna sound a little dry. Okay. But one thing that I think we live in a big wonderful world, the media, diet, that we normally watch read and consume is not highly reflective of the world, right? That's liberal, I strongly encourage anyone to get a passport. If you don't have one and travel immerse yourself in different cultures and languages and place in the world. And you realize the differences that exists are wonderful feature not a bug. Absolutely agree with economist while a little bit center. Right. Typically is a very balanced international worldview. And I would highly recommend it for people to understand how the world works very good like. All right. So we are Deltec world. You've been on stage in many conferences in your career in today's world. What is the song if you had to go onstage today tomorrow, whenever what's that song that you would wanna have played follow? Boy. Centuries. I'm a big boy fan Joyce, and they played live at one had one that was so cool. And so it's speed route bug. So, so, basically, my wife and I were he, I brought my wife first time last time to one of these big conferences, and we had no idea, it was going to be fall up or super fans super fans. We see them live like eight times. And so we're like, oh my God, I'm gonna go get to see the band. And so, like, we're getting ready to go. Backstage, meet the band, and, and we're like really excited nam playing in my head. I'm going to say you know, we love you, you know, big fan of your latest album. We saw you live in Montreal and Toronto. And, you know, just really happy to meet you, and I'm, like, playing my head like be cool. Chad Coleman John. We show up and they're like, it's a conga line like wedding reception. Right. So shake your hands. And I'm like, hi, Chad Toronto. Montreal follow up. They were like. They were like this guy. Gone away from me. It. That's okay. You fan girl for your big big fan. I get it. Right. So what piece of technology is currently making your life worse? Cooper netease. Good. I. Yeah. It's like I love it and they're still so much work that we're doing to make it better. I just I hate Facebook, just hate it viscerally. And it's kind of inescapable. And you know, just e it's weird huts woven itself, into our lives. Yeah. All right. So what is your biggest personal money pit right now? Cooper nettie. So, so money pit for me is a couple of things. So always my home lab. So I love technology. I love consuming it. I love, you know, playing around with it. So there's constantly a flow of new tech into the house. My wife once said that EBay saved our marriage because I was able to at least recoup some. Of the value. It's not the acquisition side that she's happy about it. It's the God. All this stuff is piling up. We gotta get rid of it. I bought a new e bike that I'm really excited about that. That's pretty cool. A good one. What kind of you back, did you? It's, it's what's Cusack specialized specialized, vodka, five okay because I, you know, I live in Toronto, pivotal is filled with millennial hipsters as a forty five year old I'm like two decades older than everybody else in the office, and they make me feel old and not not because they're mean literally just look around and go mad at all. A Frank assessment. So I'm like, you know what I'm gonna ride my bike into work, but I'm kinda lazy. So the idea of assisted like as. What are you binging on any shows right now? Oh my God. So yes game of thrones last night. Doc so good. Yeah. Although like by the time you hear this, if you haven't seen it, something's wrong with you internet will burn it for you. So the, the opening sequence the entire literally makes my salivated Pavlovian reflects like it's not good. But beyond that, so really dug the umbrella kademi on that flex of in the middle of their so I won't give sport. I it's it's I think really cool and a pretty faithful. Translation of the original comic book which was great. And then the other one on that flicks that I just binge watched and love was Russian doll. It's have you heard this one. So if you haven't watched it. Highly recommended. It's not for everybody. But. And then lastly, where's next interesting place that you're going. So always love going to visit Europe next week. I'm in New York. And how can you not love the Big Apple? I but just to be clear. This is a trick question. Because for me, the answer is anywhere. I love being at home with my family, my wife. My two wonderful daughters center of my universe. I love going to Miami Beach party. My brains out. And I love going to New York in seeping in the Big Apple I love going to France French's my favorite language in the world. I love Rome. I love Beijing. I love Sidney love, you know. So do you know almost five hundred thousand miles a year on one airline? Serious. Yeah. Which airline Air Canada, also like all of the other ones like periodically. I travel on. But since Toronto is home based securities. You know, like all things Canadian it's socialized nationalized really one air. Right. That's very. That's very, very friendly and choices. Eero. So, so, yeah, I, I love traveling all over the place. One thing I'm looking forward to is going to Cape Canaveral next time. So the launch well, we do that. We do this thing that, you know, the Dell EMC team has got this like heroes program then you bring together SE's from our partners and whatnot. And the guy leads it has like this book door to the two. One of the guys who runs the infrastructure for NASA real, and they hold it at that location. And I'm such a space, not that I'm trying desperately to coordinate it with either SpaceX launch or the new NASA. Esa less launch. And I've indoctrinated my children. So I'm I'm trying to coordinate that. But it's tricky. That's awesome. Glad you're NASA fan, even though you're Canadian dollars aren't helping the. Canadian space agency. Okay. By the way, like Chris Hatfield amazing astronaut Canadian. Right. We've actually had you know, a whole bunch of Canadian astronauts over the years. So it's actually a really good program. We provide those those arms. Yeah. You know. So the cannon arm one and two. That was cute. It is. It is classically Canadian like, hey, we can't get there. But we're happy to help. That's awesome. Well, chad. It has been super fun to chat with you about all things pivotal, what's happening in the world of Cooper, Netease and why it matters. Why people may be paying attention to it today. And frankly, it's just always fun to hang out with you, buddy. Thank you guys. Cool. Thanks for listening to the big data beard podcast. The music from this episode is by injury bell. Check him out on itunes or Spotify.

Cooper Netease Cooper Netease EMC google Kuban Eddie Kerber Netease Cooper Nettie Dell Chad Cooper Kuban Netease oracle Deltec Kuban Jack Chad sack Mesdames Khan
InfluxData: Time-Series Data with Russ Savage

Software Engineering Daily

50:19 min | 2 months ago

InfluxData: Time-Series Data with Russ Savage

"Time series data is composed of sequence. You'll measurements or events that are tracked monitored down sampled and aggregated over time. This could be server. Metrics application performance monitoring network data sensor data events clicks trades in a market and many other types of analytical data the platform influx data is designed for building and operating time-series applications influx data is engineered for growth with enterprise grade security ingestion metrics events and logs in a high performing time series database and platform analytics for detecting and resolving problems in this episode. We talked to rush savage director of project management at influx data. Thanks to hub spot for their support of software. Engineering. daily transform. your organization's website from brochure ware into powerful business tool with hubs spots. Cms hub a developer friendly content management system backed by hub spots full crm platform. Cms hub enables you to create seamless personalized digital experiences for website visitors by leveraging the same. Crm data your marketing teams used to build customer relationships. Cms hub is loaded with advanced features such as service functions dynamic content membership management and more developed locally with the workflows. You prefer then deployed hub. Spot were marketers can make changes and create content with visual design tools plus with fully managed hosting. You can spend more time making an impact. Learn more and create a free develop protests. Count at software engineering daily dot com slash hub spot that software engineering daily dot com slash hug spot. Ross welcome to the show. Thanks for having me you work. At influx data and inflicts data is a category. That is broadly known as time series database. My sense is that there's a lot more to the category of time series database than meets the eye and it feels like just a category of time. Series database is actually kind of a platform. Do you agree with that statement. Yeah one hundred percent so you know influx dv. I think when we first started the company it was very much focused on the time. Series database as the senator. But i think as as the use cases have expanded and his people of built more more advanced applications on top of that i think the database is still at the core and at the center. But you start seeing more and more capabilities needed in order to support some of those advanced use cases so you start layering on things like background processing scheduled task. You start layering on visualizations. You start layering on all sorts of other capabilities that when adults pieces together the some of its parts the platform as much more valuable to developers building what is essentially time-series applications on top of our platform. What is it time series application. Yeah great question so a time application is usually an analytics application but it's a it's an application. That centered around time. Series data and time series data is essentially any data set with a time. Samp the changes over time. So you think about time-series applications as analytics applications. You think of it as a machine learning and modelling applications you think of it as just applications on your phone anytime you're looking at then you pull up an application your phone. That shows a graph of data over time that essentially is part of time series application that could be powered by our platform. Do you know the druid database product like the interactive analytics thing. Have you seen that company. Yeah i'm familiar with druid. So is the category of time series database the same as the druid category. Because when i think about you want to be slicing and dicing large swath of time. Series data on the fly is drew in that category. The operational analytics thing. I think it's part of it for sure. So young time-series data's everywhere it. Broadly categorizes into kind of this infrastructure machine metrics and this infrastructure monitoring category and. There's also a huge swath of iot industrial iot hobbyists iot sensors out there. That is also providing ton of data. And so i think you know when you think about those two major use cases you know you start looking at in a where things are optimized for and what tools are a right for the job i think from our perspective. We want to satisfy both of those needs man. We want to build a platform. That's flexible enough to satisfy applications that do both the infrastructure monitoring aspect. But also you know the iot the industrial iot space and so yes. I think one hundred percent okay. So i've done a few shows. Recently on this druid thing and my understanding is druid is an advancement on the basic premise. Of let's make an in memory database. Which is like a great idea and a lot of people have built cool in memory database structure things. But i think in the druid world a lot of the niceties of the druid database are that you can do this on the fly slicing and dicing kind of processing sorta thing which doesn't feel like you need to do in every application. There's a lot of applications where you kinda just the raw time series data you want some simple interpolations extrapolation or i don't know i'm kind of wondering what the different transactional properties or the different memory or storage volatility properties that you want out of these different classes if database types. Yeah for sure. So i think one of the key differences and what makes the time series database so powerful. It's really really optimized for writing and querying real time information so for example we're talking about you know you think of an individual censor are an individual device could have thirty forty metrics coming off of it but the people that were working within teams that are working with they're deploying tens of thousands hundreds of thousands of these sensors times forty fifty thousand and that is bringing in data. You know multiple times per minute for sure and sometimes multiple times per second so you start seeing these massive ingestion loads because of all of the data getting multiply and you know bringing all that data in is one challenge and you know you have to use different ways of thinking about indexing and working with that data on the fly but then also clearing that data out and the idea that being able to access the most recent data quickly. It's not an easy task. And it's something that time. Series database is really really excel at and so it's really that combination of incredibly fast gestion and queering the real time real time data the makes the time series database so powerful and the notion that you know overtime the data that's coming in it might not be as useful as when you first collected and so the idea of being able to you know shed data. That isn't valuable you know that. Cpa metric from that device very useful for the first couple minutes first hours maybe days or weeks but over time you want to aggregate that such that your storage costs can be kept in check with the rest of rest of the cost of running the system so another example of of expiring data after a period of time. Tell me more about some of the prototypical engineering problems within time. Series database creation. So i think the storage the slicing charting of all of the data coming in is probably the main one at influx. Db we've developed our own storage mechanism for handling all that we call it. Call it tsm and that's probably the biggest one. I think the other parts are around working with data and exposing it to applications. And so you've got you need to have a really powerful scripting language powerful query engine on the other side that's able to access that data. You know it's talking really close with the storage at understands. The storage layer and sailed acquire that data really quickly and so at influx. Db we have a language. We call flux. That's a really powerful functional query language that you can really do some powerful analytics and you're you're doing those analytics as close to the storage layer as possible which means you're getting the most performance that you can get which is awesome. Why did you create a language. So when i announced that we were building our own language. I think there was a lot of. There's a lot of discussion. There was a lot of debate. Quite frankly there was a lot of discussion internally to and i think what we came to the realization was is that as we look at. The people that are are using our platform. We started to see is you know wanted to do is we wanted to bring the compute and the analytics as close to the storage layer as possible and so in order to do that. What we found is a lot of people who are who are writing applications against our existing query language which is which was a sequel like language. There are running into limitations where they would essentially dump a large portion of the data out into their application and manipulated in their own code. And so what we're looking for is how can we start bringing those manipulations closer to the actual platform so that we can speed them up and we can provide kind of mechanisms so that all the users could benefit from these custom functions. These custom is custom code instead of just the individual and so we've started to look at the different requirements that we needed in order to do that. It's made sense to take a look at it. From a fresh perspective obviously sequel is well entrenched in the in the data community which is why we support inflexibilities sequel like interface into our data. We also wanted to provide those power users in those people who are really building incredible applications on our platform kind of the a language that they could use to really turbocharged their analytics. So that's where that's where we decided to create our own language locks and it's really awesome the team. That's that's behind. It has been working nonstop for for years on on getting that stuff right and people are really really having successful that. Tell me about standing up this kind of thing into a cloud service. Yeah it's not easy. Sometimes so our influx cloud in terms of house architect did it's made up of a bunch of a bunch of services that run inside kusnits control plane. And so you know. We have been slowly rolling out new clusters across the globe over the past couple of years. I don't know. I think we're close to twenty twenty regions. But it's interesting. Because you know the kuban eddie's is is really awesome. It's this notion that you're supposed to be able to deploy services into into different cooper netease instances and have them all work without too much too many changes. But you know we've found rolling out a multi tenant cloud service across the aws azure and gcp. Google cloud services. It's tricky. there's a lot of. There's a lot of details and a lot of nuances there but it's a really awesome and really scalable platform. So that means that you know if we're if we're we're constantly monitoring all the information coming in from our users we see a huge spike in reload or a huge spike in in rights we can scale that or or cuban cuban can scale that automatically and creates a really powerful powerful way for us to optimize our cloud service elastic for over a year. How does running elastic compared to running influx debate a really enjoyed my time my time at elastic and i have no negative things to say about them. What i'll say is that the use cases for time series database at least when i was working at elastic search which was essentially a search database. They're different and i think you've seen elastic. It turns out that when you have a really powerful search capability you can apply it to a lot of use cases which is really awesome. You see elastic. Start coming more and more into into the time serie space but i think the biggest difference i'd say is the amount of infrastructure and hardware. You need to get the job done. And so i don't think this is a shock to anybody but obviously elastic search does a lot more processing on the data ingestion side. And so when you're ingesting large portions of data. You need more resources to do that. Indexing on the fly. So i see that as the as one of the major differences is that before the cases that time series database solves i think influx data has a lot lighter system requirements for something like that. The stuff that you would need a multi node. Elastic search cluster. To do you can do. With a single instance of influx db open source or a very small cloud footprint. So that's probably the biggest difference. How his the customer set changed as influx devious grown. Yeah it's interesting. The customer said is definitely evolved obviously the influx evolve as well. There's no surprise there. But i think you know we've been around for a few years and what we've we've started to see is knowing who first came out with our influx be database. You know the weren't a lot of other options out there. You've seen the market explode over the last couple of years. A lot of different options for not only time series database but also just metrics monitoring platforms in general some of them very purpose built for monitoring very very specific infrastructure. And then we've also shifted from kind of the on prem or the posted instance in cloud kind of world into this multi tenant shared infrastructure model. I think we've seen is users that were at the time. Leveraging database for very specific purpose look for ways to expand out into applications that are dedicated to their purpose or building their own application dedicated to their purpose. And so you know. I think our customer that has changed a little bit from maybe the operators or the or the database operators to potentially the application builders themselves or the developers that are that are building the applications. Because you're starting to see more people. More and more companies moved towards the developers are also responsible for monitoring and maintaining those services. And so they're looking for technology and tools to do that. Do you remember the michael stone breaker thing. The age of one-size-fits-all is over. Is that familiar to you at all. I can't say that i've that. I've heard of that although the phrase that you just uttered i've certainly heard that phrase before. Yes so there's this database guy. He's the founder of volt db and several other databases cereal database entrepreneur slash computer scientists person. They michael stone breaker. I think i became notorious because he was a hoop skeptic and he was kind of proven wrong a little bit but not proven wrong yet at very legitimate skepticism of poop. But then he he had this paper called the age of one size. Fits all is over. Basically the idea is you've got multi model database systems or multi database multi model systems and. It really feels like we're there right. You've basically got a database. That's called a time series database. It's a very domain specific. How domain specific do we need to get with our databases. I definitely heard that phrase before. I've thought of as the right tool for the job. It's not that you're Tool sets are ever increasing. And so yeah you have the option of choosing the right database for the right problem that you're that you're solving you start seeing you know. I think the idea of general data store works up to a certain scale. But once you hit a certain level you start spending more time on the infrastructure and less time on the actual application that you're that you're building or supporting with that infrastructure because of different concerns performance scaling all sort of things so yeah i mean at influx db we obviously believe in specialized specialized databases specialized platforms for for solving problems. I think what we're finding in terms of how specialize you need to be. You need to go. It really depends on the use case in the scale that you're that you're working towards so for example if you're working on a platform and i know you're you're working at a scale that can easily be solved by a generic database. There's no reason to over optimize before you actually need to unless you have specific plans or specific goals. I think we're starting to see each time. Series databases become more and more flexible and more and more useful and actually be able to bring in data from non time-series locations. Now one of the things we see customers do a lot is time. Series data will stream in and then they'll leverage a reference relational databases to enrich that time series data before it's stored right and so you get the benefits of bringing that data in storing all the meta data with the actual series information. So fast queering. But you also get the metadata stored in a relational data store. That's this really really great for those types of things and so you know in terms of how specialized you need to go. I would say you know it's it's kind of a cop out answer but it really depends on the use case and the and kind of the product that you're building. What's your modern take on the tension between volatile and nonvolatile storage. Like what do you wanna database. When are you tearing storage. How important is that tearing system. Is it only important to cost is also important to performance tummy. A little bit about architecture as it pertains to storage systems and the memory hierarchy. Yes i mean. That question feels more geared towards maybe a college lecture on storage boring. That's okay too boring. You wanna talk about developer experience or something at a we talk about. What's what's what's your back like. What do you like to talk about a moped talking about whatever it's a deep product. I don't think i'm going to break new ground on how to describe multi-layered storage actually. Here's one thing i'll say on that. This is only a boot camps. Don't teach which is fine but it's exactly the kind of thing that is useful to learn in a podcast. If you're somebody that came from a boot camp yeah yeah very true. So in our world data has like more. Recent data is more valuable in a lot of cases so the use cases that we optimize for is being able to really really quickly access recent data in the platform and what that means restorage perspective. Is you wanna keep that data as hot as possible and as easy to access as possible over time. The value of that data degrades. It's less likely that you're going to be querying it or when you do query it. You don't need sub second sub millisecond performance and then you start to look at the equation of storage cost versus access and so in our world. The older the data is it gradually gets moved in. In in our world we call it compacted but moved to different levels of performance from us Storage perspective so our databases are using. Ssd storage will is using ram for the most recent storage but sst for most of storage and then it can archive out to slower data stores if it needs to. So so yeah so. That's kind of the the the level that i can get to is in the time series world the more recent the data the more useful it is and so we make access to that information much faster. Data dog is the monitoring security analytics platform for developers. It operations teams security engineers and business users. In the cloud age they're sas platform integrates and automates infrastructure monitoring apm and log management and other features to provide unified real time observability of their customers entire technology stack. Data dog is used by organizations of all sizes and across a wide range of industries to enable digital transformation and cloud migration. Dr collaboration among teams accelerate time to market reduce meantime to recovery and track key business metrics give data dog a try with the free fourteen day. Trial listeners of this podcast will also receive a free data dog shirt go to software engineering daily dot com slash data dog to learn more that software engineering daily dot com slash data. Dog user. interfaces like a joke if you have to explain it. It's not that good. If you appreciate that. You might be a software engineer and if you're a software engineer you will definitely appreciate the number. One jira alternative click up no more using separate platforms for docs goals and sprints. Click brings all of your work together in one fast. Collaborative platform already used by over two hundred thousand teams at companies like google web flow and uber clip gives you the integrations and tools to perfect every agile process. You can also connect your favorite developer tools. Like get hub bit bucket and get lab to manage your code and team in one convenient place. don't get stuck with jira sprint. Through agile with click and save a day every week. Try click up for free at click dot com slash sl daily. Where do you feel like the most prominent battles are being fought across the product. Right now like is it about just winning over people on the category of time series. Is it about engineering out. The the last bugs or is it about building a platform on top of the basic concept of influx. So i think the biggest battles that we're fighting and it wouldn't call them. Battles are awesome problems to have is. We're looking at you know we. Essentially we put out this platform. We put up this time series from its generic platform and you can saw a tons of different use cases on it and what we're seeing is users are using the platform in ways that we didn't actually think that they would be using them and they're finding ways to exercise and i wouldn't use the word exploit but really kind of hone in and optimize on different parts of parts of the of the application when you're developing your building a bunch of different capabilities. You know a lot of times. Those capabilities don't resonate. And so you move you move on or you optimize somewhere else and so i think you know what we're seeing is we're looking at. We're really close to our customers. That are leveraging our our cloud platform. We're seeing use cases so one of them. Is you know this notion of storing queries in my application. But you know that's the downside. Because anytime i have a bug in one of my queries or have a problem in one of my analytics i need to go out and update you know thousands of clients out there. But i'd like to do is update some lambda on a server and then all of my clients automatically see that new information so the same kind of continuous. Cic process that that we use in cloud they want to develop for for their customers which makes makes a ton of sense. And so you know. We're looking at ways. Now which is investing in some of our api's around storing functions and reusing functions and bringing that capability in house. It kind of goes along the same model bringing more compute closer to where the data is so. That's just one example of of where i think we see users taking some of the stuff that we've built and really run with running with it and really kind of demanding more investing. I think the other thing that we're we're really trying to figure out. I man i don't think we've nailed it. Yet is the role of the role of a you. I in a platform like this. Think are you is serves serve many different purposes today. It's an analytics tool. It's a resource management tool for the platform. It's an operational tool. It's a lot of different things. I think were really looking at ways to. It's a developer tool mainly really looking at ways to kind of focus that you i onto developers building on top of our api's and figuring out the best way we can help them develop faster because that's really what we're what we're trying to do is we're all. We're always trying to help. People you know develop their applications and their capabilities faster and right less code on their side. What about in terms of the software development software engineering challenges. So i know all about how to build a web apps. I don't know as much about how to build a time series database. So what would surprise me about the actual software development process of building a time series database. Good question. I'm thinking about other projects or other other software that that i've worked on. I think what's really i think. Maybe maybe we may be underestimated a little bit. One of the goals that we wanted with our with. Our latest cloud offering was from an engineering perspective. We really wanted to hone. In on the continuous integration continuous delivery pipelines. Like we wanted changes. That engineers were making to be out into public as fast as humanly possible within the within the hour once. They're checked in in many cases obviously through various testing and all that sort of stuff. So i think one of the one of the things that we you know. I think it was harder harder than we originally planned was getting that pipeline up and running smoothly and making sure that we had the tools and the capabilities needed to understand where the problems were in that old process the whole deployment process because shipping new version of an application. Every minute is non trivial in a lot of cases. And this is. It's not one application. It's many different applications that come together to form a unified experience and a platform. So i think that was one thing that those little tougher. It's pretty unbelievable. The amount of engineering. That's that's gone into creating this platform as a service. This shared infrastructure and i can't even begin to describe all of the different all the different layers there but that to me having worked on other web apps. There's a lot of moving pieces when you're building out database platform that you know a lot of times when you're just building you i or just only one when peace you don't have to think about is time series database ever used as a transactional database or is it mostly a kind of append. Only right thing plus read only mostly for aggregations enroll ups and kind of things like that or is it ever used for transactional data so we see very few use cases in our world that rely on transactional data so people who are that are building out. Systems are building out vacations in our system. They're looking for append. Only and really the way to get the right performance that you'd need for large scale data. Application is to do that append. Only we do see customers coming in and overwriting previous data but in our world since a lot of our data is tied to real world events. You know really. It's tough to go back and rewrite history. We don't see that use case very often if we work with customers and we see them designing solution that requires them to rewrite history. Often we attempt to persuade them that there are other ways. Whether it's through like a continual change log it gets updated or something like that so and our world. It's very right heavy. We don't do a lot of updates and deletes and that's by design so again. How should i be thinking of a time. Series database like influx de influx data in my stack in my toolbox. What am i doing with this thing. Am i just using it to blindly right. Time series seem useful and then later on him using it like i'm sort of as it seems useful. Is it mostly used for metrics. These days like cooper netease kind of stuff per meatiest based logging. I'd really love to know where it shines today. And what application. I'm really really looking to align with this database product. I think where we really shine. Is there with customers. That many different devices out there are many different devices are either virtual machines or actual physical devices censor information all sending data to a central location. I think the ingesting all of that information at scale is one our strengths. I think within our platform once the data comes in. And there's this whole notion of of turning data into knowledge right and so the data comes in has very little value but the more you work within the more you clean it enhance it. In aggregate and run analysis on top vetted turns into into knowledge. And so one of the things that i think our platform is really good at is finding that knowledge in all of the data. That's coming in and then it actually allows you to develop and present that information to your users or levers that inspiration for yourself to make decisions on the fly so you know if you're bringing all the time series data and running it through a model to try to identify abnormalities or predictive alerting. All that can be done in our platform and then you know that information can be quickly and presented to users or yourself wherever and so you know back to the standard ingest everything you possibly can as fast as you possibly can use flex and the tooling that we provide to extract the knowledge from that information and then present that information to whoever needs it. So whether it's your end user or yourself or or whomever have you tried to lever up into the bi layer of really stayed away from the layer and try to just sit within not transactional query semantics. Like it seems like you've really focused on this language you know take. Language based approach seems pretty good relative to like going after the bi layer. We learn more about what you've gained. Traction within the product is is going to deepen its relationship with developers. The layers really interesting. There's a ton of different solutions out there for doing. Bi on on top of on top of that information. And so i think we're we're we're we're looking at is being data source for those tools in many cases were really interested in storing collecting that information potentially aggregating and analyzing it before it gets to the i. tool and then making sure that you're just like any platform out there. You wanna make sure that you can connect to the tools that people are actually using and so we want to provide. api's mechanisms to connect to those bi tools. And so you start seeing you know. There's different investments. You make depending on which tools you want to connect to and you know huge swath of them work with russell. Api's and integrate there. There's a huge. There's a huge category that works with sequel works with python works with other languages. We want to figure out ways to be the data provider for for all of those we ourselves. I think we've stayed away from developing our own. Bi tool because the space is so crowded. And you know. I think it's not necessarily where are where are strengths. Are today men. We purposefully decided to stay at the data provider level in many cases for now as somebody who's worked at several infrastructure companies. You've probably seen the different margin structures that can arise depending on what approach. You take a business. Do you have any general principles or lessons or takeaways. From seeing various infrastructure companies go to market and how their profit structures have developed my background. I've worked at a couple of different open source companies. And so i think. I can talk a little bit about some of the open source models that that i've seen and how those work with business models. I think you know historically right. You've got the open source software and then you sell some sort of support or help on top of it was kind of the original. The original model that works up to a certain extent. But if you think about it kind of the goals of year support and the goals of of your business aren't really that aligned right. It's like yes we want to help our users but only enough such that. They still need our help in the future. And so. I think you've seen different companies leverage different techniques to avoid that going forward you've got companies that are really focused on providing key capabilities that only enterprise driven and grouping that up into different structures. Different tiers right free. Tier ro bronze silver gold. That type of solution. I think one of the things that elastic search has been incredibly successful doing it doing it that way. I think you've also seen the model of creating these hosted instances of this open source tooling and open source software for a while. That was kind of our our model and a lot of cases we have a hosted version. It was a hosted version of our enterprise wherever we were selling nodes. And you know. I think we've shifted slightly slightly away from hosting our own open source tooling and looking at it. From a point of view of like the things that our customers need to run themselves or need to use on their own infrastructure on their own hardware should always be open source because it's transparency and a level of trust that we want to maintain with our with our customers. I think the idea that that software needs to be the same as you run in large scale cloud platforms isn't necessarily always true. So you know the software that we're running. The ap is are incredibly compatible in the same but the software behind them is very different depending on depending on where you're running so you know. I think that model in my opinion has lot strengths. Where you still developing and providing a ton of open source value to the community. But you're able to provide value and people are willing to pay for value of operating and running that infrastructure at scale such. Don't need to worry about it. What's the future of the company. I mean one thing i can imagine for. Example is a very heavily event driven platform find certain triggers across time series trigger certain infrastructure. Things off of that. That's a full platform right. Like the time series automation platform. Is that interesting. The future of our platform is one hundred percent moving more towards what they call application performance metrics apm basically like this event level metrics or the events that are coming through some of these large scale systems. They're massive and the scale is massive. And so we're investing heavily in next-generation storage storage capability. We're calling it is. I think. I think we've got some some public talks on on that as well but that foundation is going to allow us to really jump leaps and bounds and be able to ingest an insane amount of events that are flowing through these systems. I think you're seeing you know. In the past. It was all about things happening at a regular frequency all the time so data coming in once a second coming in ten times a second every single time these events that are coming through they can be three events per second to events per second and then suddenly you hundred thousand events in a single in a single second right and you start seeing these kind of shifts spike incredibly fast like during during an incident. You obviously want to record a ton more information than you need to on a regular basis you can go back and kind of see exactly what happened and so i think the scales are getting larger. We as a company were investing in a ton of rnd to build out a storage layer capable of of ingesting and meeting the needs of those of those and so we look for to the future towards the company. We're looking for higher and higher. What we call car analogy but basically the number of series in your database workloads that have an incredibly high number of series and being able to analyze those at scale. And that's kind of where we're spending a lot of energy is is really building that next generation storage layer. And what does that mean in practice like. What does that was the storage layer. What has to change in the storage kind engineering has take place in stores later. Yeah so i can't speak to the specific storage layer details. What i'll say is a technology out there achy arrow. That's really really amazing. It provides a way for you to work with data without copying it in many instances. And so you know. It's probably a really horrible description to anybody who actually knows what patchy arrow is. But that's not. That's not the point so i think the things that are different are around the set of technologies that you combined together to build a really really powerful storage layer. They've evolved over the past five seven years. And so we're leveraging that new technology were leveraging some of the new Were coming out from from apache arrow. And we're building a system that that is performed enough to handle you know series in the in the hundreds of millions right so we're really starting to see use cases where you know creating another database or creating another bucket in our system. That's not going to cut. You really need to be able to process those large data volumes and so that's what the storage layers to and it's going to be available open source open source as well so you know our goal is that you know the storage community as a whole can benefit from from this work to today's episode of software. Engineering daily is powered by sub space subspecies the global meta router a dedicated secure network for delivering tomorrow's internet today sub space uses real time. Api's on optimize network and near effortless integration to give you more control over your real time application than ever. You won't need to choose between speed and security with sub space thanks to inline diaz service that secures your traffic all the way through to its destination sub spaces redefining video and voice experience to give you truly real time network performance. You can rely on. Take control of your network today. Visit software engineering daily dot com slash sub space for anyone who's into technology or use technology. And that's all of us season. Two of the new reality series by arm. Podcasts is here and it'll be exploring what the not so distant future of consumer technology will look like this series will look five to ten years into the future to see how digital and mobile technology will further empower our lives. This isn't just what the future of mobile experiences will be. it's also the roadmap for how we get there. It's hosted by technologist and neuroscientist. Poppy chrome in created by arm. The leading provider processor technology for mobile devices. The latest episode explores how interacting with technology like virtual reality can change the way our brains work for the better. Check out the new reality series by armed podcasts on spotify apple google stitcher or wherever you like to listen. Let's pull back a little bit. I wanna know little bit about your history because you've done a lot of various applications work actually your your background fascinates me. Can you tell you what was the responsibility of a marketing engineered box in july twenty. Thirteen serious serious question. So by the way my company software engineering daily or a marketing company. So that's what we are so marketing technology fascinates me. Yeah well it fascinates me too. So yeah how did. How did they end up being a being a marketing engineer. So at background engineering. And i switched over to being a product manager. A previous company orbits dot com. Which i think now is part of expedia. But at the time we were working on a project of working with google google edwards for keyword bidding on travel on travel excites so basically figuring out a way to to keyword target millions of of different destinations and destination combinations for search and building a ton of technology that are and so through that process. You know i learned an amazing amount about how google ad words worked. How the paid search world worked how tracking the marketing world. Worked such that. You could get all the all the tracking information correct. How campaigns were set up and structured and tracked and analyzed. And all this stuff and so you know what that what that kind of led me to. Is this this notion not. I'm still very much in favor of is that you know. I think the marketing is incredibly metrics driven and the technology that's in the marketing space is some incredibly advanced in technology and i think it's underutilized by a lot of marketing companies out there and so you know my role at at boxes. Marketing engineer was to work with all the individual marketing teams ensure that all of their their marketing efforts were kind of optimized and tracked at a level that let us actually see exactly where money was going and optimize for our ally. And you know. It's kind of a fulltime fulltime role because there's just so much technology baked into a lot of the the efforts out there whether it's through some of the automated email platforms that are that are being set up the running campaign tracking from all the different all different sources and also doing that analysis on the information right bringing all that information together and and presenting a view that makes sense and is accurate is a is really powerful position. And so you know that was kind of my role. There was making sure that all that stuff was working behind the scenes. Technically and so i will one hundred percent tell you that i am not a marketer but if you pair me with a marketer in my opinion we could be much more successful in what we were trying to do because we were able to to kind of they were able to come at it from a marketing perspective. I was able to come at it from this. Is the underlying technology. You need to set up in order to meet that goal or run that campaign and so it was a really awesome combination. And i'm still incredibly fascinated with marketing technology and an seo optimization and all that sort of stuff it's a it's like a passion of mine hey movie can make your main passions. There's gotta be some marketing tools that you could build on top of time series database. Yeah for sure. The the amount of one of our customers wayfair leverages our platform to look at data customers that are browsing browsing website. So really so like would like click stream or mao. Stream sorta data. Yeah all time. Series data all flows in there. They're leveraging hit for for analyzing that information so yeah there's there's a ton of marketing applications especially as you look at people's behavior as they move through different products different websites and different actions that they take on the web right and then correlating all the information together. I think it's still fascinates me the online real time bidding space and how that how that system is set up and works at such scale is really really unbelievable. Do you notice how it seems like. Every year the infrastructure companies get a little bit closer to being able to build applications on top of that infrastructure. I mean at least out to the better infrastructure companies always go higher and higher level. And it's just sort of interesting because it's it's kind of an and if you are anything like an infrastructure company you're going to become a platform company and if you're a platform company become applications company. Yeah it's really kind of an evolution. It speaks of the number of resources and the scale that you need in order to and the expertise that you need to to be successful at each level. I think when you start getting towards the application side you need a ton of domain experts in those particular areas in order to develop applications that are worth anything and so it takes a ton of resources and a ton of time and so i think you see you see companies as they grow and as they as they scale they start looking at those different areas as a possibility. If you start trying to do that too early in my opinion you really start stretching your resources thin you start building solutions that are incomplete or not useful to any of the any of the end users that you're trying to target and so so. Yeah so i. I definitely think of it as a process or as a life cycle and just in my opinion has to do a ton with researching expertise at at every level. What do you think how does become such a platform like twi- leo's was one of these companies that looked like just infrastructure than they've just become a behemoth. Yeah totally was the company that all of their companies look up to and want to be like. They hit such an incredible pain point that they were able to capitalize. I myself have ma user. And i've written applications on top of tulio. Api's it's an amazing experience. I can't say bad things about them. I think what's really interesting is they. Were able to break into an area. That was kind of like you know. Individual developers would never even be able to to consider building things in that space cleo made. It made it accessible. They made a technology that's ubiquitous. They made that technology accessible to every person and as a result i mean they've been incredibly successful and they seen i think in general we've just seen a ton of innovation come out of You know it's immeasurable. How much innovation has come out of just unlocking that that world of interfacing with the with the device that everybody has the telephone. Yeah it's incredible company where we look at them very closely. Is you know how they were able to develop an api. That's incredibly easy to use. That's incredibly simple. But powerful. Dave solve use cases very very well. And when you're working with it you're just thinking to yourself the entire time. Oh that makes sense that makes sense that makes sense and you can kind of guess in the right direction as long as you understand. What's going on so i think their. Api design is top notch man. It's something we looked. We looked to emulate to be honest trying to make. Api's is that are simple and powerful and easy to use all right. Well as we begin to wrap up. Give me a little bit more about. Let's ama- developer. It sounds interesting to using time. Series database for various. Things helped me calcified in my brain if i'm a developer. What are the use cases. Where abusing time series database and how it should be evaluated in my time series database options. So what i would say is the way to know if you need time to database. Sounds cliche but if you have time data set that you want to analyze if you are building an application and you realized that one of the core fundamental components of your application is data that changes over time then. I think a good candidate for a time series database. What i would think about when. I'm evaluating different. Solutions are different options. I would look at the whole purpose of going towards a specialized platformer specialist database. Is you need to write less infrastructure code less code to get something to work and more code provides business value to your company and so you know in my opinion when you're evaluating different platforms. You should look for what you need to do out of that platform how easy it is to accomplish that. And how much code you have to write in order to meet that goal and look for the platforms that that reduce that number because you know writing code is a great thing so you obviously look at the performance and the capabilities and the different interfaces. I'd also look at the communities. I think a lot of these lot of the software companies. The platforms have varying sizes of community. Where you can get help and talk to other people that are leveraging those tools and so i think that's another important aspect to consider his You know if you run into problems you know. How can you find help quickly. Where can you get information quickly. Communities is really powerful for that cool. Well anything else at about influx or or anything else that you're working on well. I'm clearly a superfan. I super super fan of the database and the podcast. Yeah superfan of the database. I think the time series days is really awesome. I am really excited to see the new stuff. That gets developed all the time whenever i whenever i talked to customers and see what they're working on i'm in constant. Aw just the stuff that's being developed out there and it makes me get out of bed in the morning so i i really enjoy it. Yeah i think that's about it gelman. We'll take on the show. Yeah i really appreciate you having me.

cooper netease michael stone kuban eddie volt db Samp jira sprint google brochure Ross sas google edwards russell ap us expedia tulio apple Dave gelman
Stemma: Understanding Big Data with Mark Grover

Software Engineering Daily

47:46 min | 4 months ago

Stemma: Understanding Big Data with Mark Grover

"Amundsen was started at lift and is a leading open source data catalog with a fast growing community and a lot of integrations amundsen. Nabil's you to search your entire organization by text search see automated and created meta data share context with co workers and learn from others by seeing the most common queries on a table or frequently used data powered by amundsen. The company stem is a fully. Managed data catalogued bridges. The gap between data producers data consumers stem adds features to amundsen like showing meaningful data to individual adding meditated data automatically and documenting data on the fly stem integrates with all the major data. Sources like snowflake redshift big query and airflow in this episode. We talked to mark. Grover the founder at stem marco created ahmanson and authored. The book a dupe application architectures. He was an engineer at cloudera before joining lift as a product manager a few announcements before we get started. One if you like clubhouse subscribe to the club for software daily on clubhouse it's just software daily and we'll be doing some interesting clubhouse sessions within the next few weeks And two if you're looking for a job. We are hiring a variety of roles. We're looking for a social media manager. We're looking for a graphic designer and we're looking for writers if you are interested in contributing content to software engineering daily or even if you're a podcast or and you're curious about how to get involved. We are looking for people with interesting backgrounds who can contribute to software engineering daily again. We're looking for social media help and design help. But if you're a writer or a podcast we'd also love to hear from you. You can send me an email with your resume. Jeff at software. Engineering daily dot com. That's jeff at safa engineering. Daily dot com Demand for on prem software remains enormous. He continues to grow. And it's not going away. Take advantage of the automation. Reliability patterns and primitives provided by kuban eddie's for not only our applications but also. In how your on prem and multi prim apps are delivered in managed cooper netease and other cloud. Native technologies have led the way to modernizing on prem software delivery. It no longer has to be a tar ball. In one hundred and fifty page manual good replicated dot com slash se daily to learn how replicated can help you modernize your on prem software delivery strategy. If you're a software vendor looking to modernize your application delivery and management to gain more enterprise adoption checkout replicated dot com slash. save daily replicated software vendors a container based platform for easily deploying cloud native applications inside customers environments to provide greater security and control so checkout replicated dot com slash s daily and learn how to deliver and manage your software through all kinds of methods bare metal servers cloud. Vp gove cloud even air gapped. There's a secure way. Your customers can use your application without ever. Having to send data outside of their control and replicated has already trusted by noteworthy customers. Like core circle. See i am good. Replicated dot com slash s daily to get a free twenty one day trial of the full replicated platform mark. Welcome back to the show. Thank you for having me three back. Last time we talked about amundsen and some of the problems with data discovery and meta data at a large company like lift. And i'd like to go a little bit deeper and also talk about the company or starting around amundsen but first let's just give a brief review of amundsen and some of the data discovery problems. Can you review the problems that you built amundsen to solve it. Lift yup for sure. So what's happened over. The last few years is that companies have invested a lot of time and energy innovation in capturing processing and storing a lot of data and as an industry. We've had five tran unstitched to bring more data. We have brought in airflow and prefect in. Etl tools dbt to process more data. And then there's a slew of innovation happening in the data warehousing space with snowflake in big query making a really easy for companies. Small too big to be able to store all of this data at the same time. What's happening is there's a lot of innovation happening in consumption space. So you have these. Bi tools that have existed for a long time now sas products that enable users who want to use data operators to make data driven decision day-to-day. But what's missing. is that while. we're bringing more. And more data in the organization and the have provided tools for users operators data scientists analysts to make data driven decisions. No one really knows what data exist within the company in what could be trusted. And this is the problem. I saw lift where everyone had access to data but few knew what existed what was trustworthy in how to use it and that problem was so severe. Data scientists and data analysts. Bend over a third of their time finding invalidating trustworthy data. And that was the problem that amundsen was created to solve. And the more. I work amundsen. Today is a leading open source data catalog. the more i work in the open source community. It's clearer and clearer everyday that this problem wasn't unique to just lift exists that all companies have all different shapes and sizes and happy to dig into the problem. The solution as well as we talk further. How widespread is this set of problems. If data discovery and meditated does every company once it gets to a certain scale. Start to have this kind of problem absolutely so this problem doesn't exist in super small companies. Were it's all in one person's head or it could be easily documented in one single place like a wikipedia. Or doc you can document what all your data exists. And what each of the fields mean. It also doesn't exist where the setup is very stable so if you have a small company where everything is stable in your data. Model is not evolving. The organizational production systems aren't evolving that produce the data but that is a very small subset of companies. Most of the companies are big enough. Glad the place. Where i see is a couple hundred employees for the company. You are big enough where there's enough data enough people in the mix that you don't have a single person who knows everything and the company is evolving in a place where you can't keep everything up to date with just a simple doc so anywhere i would say from three hundred employees to up and the scale. The signals here are the number of employees in the growth. Both addition and subtraction are moving people within the company as well as a model of data the organization has and the change at which data is coming into the organization or changing So you work on the open source project ahmanson To work on these data discovery. Metadata issues and you are building a company around it. Tell me about the product ization of amundsen. What's the diff between the open source project and the close source product for sure. So like i was saying starting from the problem with this problem at lift and what was happening there is that there were these gossip protocols slack being built in shoulder tapping where people were asking each other what data exists i remember data scientists trying to optimize. Eta's and they didn't know what the source of truth for eta's was which was one of the core metrics for lift and worst of all data got delayed deprecated shut off and the analysts data scientists for the last ones to find out so amundsen co created the project to help solve that problem which is an open source. Data discovery in catalogs loosen so the product is getting meta data from various different sources. So hooks onto your data warehouse your hr system if you give permissions your bi tools. Ns able to bring all this information together to power of you. What's trustworthy based on. When was the last updated who else is using it. What's built on top of fit and what conversations are happening about it and so it's an automated way of augmenting documentation empowering that experience of what is trustworthy than relying on a single person a data steward or volunteer. Who's assigned to keep something up to date on an ongoing basis so amundsen. I'll get your question in a moment. Amundsen's pretty successful as it has seven hundred and fifty users every week at left over seventy five percent of data scientists data analysts data engineers at lift. Use it every week. Amundsen's used by more than thirty five companies in the open source. So these are iron square. Brex asana snap and many more and conway has eighty percent of the using amundsen every month. What stem does is. It provides a managed version of satirical. Fed has a superset. What dominance in project provides with two specific additions to the first one is enterprise management. It has super easy deployment with enterprise grade security and the second thing it provides is more intelligent so it uses existing meta data to infer richer and personalized the experience based on users role activity and what is happening in the organization. And those are the key ways. We enable more value for organizations statue stem. If i wanted to deploy amundsen within my company to have my data indexed. What do i need to do. Yeah so if you were to deploy the open source project. The first thing that usually end up doing is reading the documentation we have online. You also end up joining the slack channel but in terms of tactical steps that you need to do is that amundsen has four parts to it tweet. Fm services one of them is a library so the tweet services are a front end service. That is the ui that powers this product. The second service is a meta data service which is powered by graph database. And this meta has all the core meditate that is used in so it has information about what tables exist what columns what are the various relationships between them. What information are people. Quarrying who owns a particular or dashboard contains information about dashboards. In what tables are they built off. It has nodes for. And if i am in one team that i'm linked to another person might same team and it also has information about what i as a person us every day. So that's the graph back in the third service is a search service which uses elastic search in the back to power surge experience when someone comes in and searches for eta. They know what's behind it and it gets the meta data from the method service to surface a trustworthy notion of a page rank style. Implementation of data and last is a data builder library which allows you the mechanism to hold this meta data. We also have beta functionality to push this matter data into the product so this is a library that you integrate with gesture data from your data warehouse your be i will your hr system and so on and so forth so these are the four pieces and as you deploy it. The first thing you do is set up and we have very easy to set up install scripts and kuban eighties containers that you can quickly deploy using docker on your local machine and connect your existing data warehouses stewart. Once you're done with it you would apply more scalable distributed manner each of these components in standard up. How does a query against ahmanson work so the two main quarry serving aspects of amundsen are the search engine and the graph and to give you a little more flavor of the product. The product is more like a google search for your data. it doesn't have any capability. Corey the data itself. So for example. Amundsen doesn't provide you a jdbc. Connector used to query the data. That's in the back or sole goal is to take this haystack of data that has various degrees of trust and provide you a system off information as well as ranking on what could be trustworthy based on some automated signals. So the goal here is that you come to the product when you start say you're developing a new. Etl job or you're developing a new insight. We come to the product. You search for something like eta and this would hit the search service you get information in a ranked order off. What is trustworthy. Based on how it's being used in the organization by whom how often is updated etc you click on that information and then you get to a table detail page now. This table detail page has all the information you would want to know who established understand. Could you trust it. How often is getting used. And how do i use it. So it has information about a description which can come from an existing source like the data warehouses description as well as from curated meat. So you can edit this description it has information that comes from airflow integration brings about. How often does this stable updated. When was the last updated. We parse the kauri logs or has information about who are the people who frequently used this data and then we obviously have information about the columns in the types their descriptions and we generate column stats. So you could see what the standard deviation or a number of distinct backgrounds in particular calamar and lastly we have some information about lineage which is again part from the kauri logs. We can see a preview of the data if you have access to the data and if all of this looks good then you go to the explore phase which actually takes you out That's the end of the discovery journey and onto the exploration journey. From that point. You leave amundsen and you go to the next tool in the process which is usually bi tool so that could be a motor looker. Toddler patchy superset things of that nature. That's how that interactions of working. Can you extend that to like by wanted to get some data. That's indexed in amundsen and run a spark against it. What am i doing yeah. We've had requests to export the data. That's an amundsen into a relational form. So maybe you take all this metadata about corey usage history. Who's it export that to a data warehouse and then are able to run on it. So maybe you build a graph off. What are the most important data sets being quarried in the organization or what data sets data. I own have been delayed by more than an hour in the last month. Things of that nature in for that. What i'm does today is that actually has the backend graph which the default for that neo four j. We also support apache atlas as well as amazon. Neptune and there's work happening to support an rds for it. So that meditate out. Instead of being stored in a graph while the graph models will still apply it would be stored in a relational database so that will allow both amundsen to corey date of late and provide all the experience that it provides to eight engineers has will stay scientists and data analysts but also for further analysis to be done on the top of the meta data directly on the back in store. So that way. You don't have to export this out and you're able to do that. Same analysis you would do on the data warehouse using an analytical to on the backend database at the product security threats in cloud native environments. Move fast which means that security teams need to have the same visibility into their infrastructure network and applications as developers and operations with data dog security monitoring. Engineering teams can easily detect malicious activity in real time before it affects their customers us out of the box detection rules and detailed observability data in one unified platform to investigate security attacks. Cnn action by signing up for live security demo and receive a data t shirt by visiting software engineer daily dot com slash data dog security that software engineering daily dot com slash data dog security to sign up for a live security demo and receive a data dog t shirt. Are you building cloud applications with the distributed team checkout teleport an open source identity aware access proxy for cloud resources teleport provides secure access to anything running somewhere behind nat as as h servers. Cooper daddy's clusters internal web apps and databases teleport gives engineers superpowers get access to everything via single sign on with multi factor list and see all ssh servers cooper clusters databases available to you. Get instant access to them all using the tools. You already have teleport insurance. Best security practices like role based access preventing data exfiltration visibility and ensuring compliance. Best of all. Teleport doesn't get in the way. Download teleport at software engineering daily dot com slash teleport that suffering dot com slash teleport. Can you give a little bit. More context into why aronson fits into the current data engineering world so among the popular tools of utah. There's a little bit among the popular tools of the data warehousing the dvd and you know census and five tran all these different things that stick together. What is amundsen's place. Where does it fit into the workflow. That's an excellent question. So what's happening. What data engineers is that. They are constantly bogged down with keeping everyone informed about upcoming changes in current status off the so at at lift and many companies. In the open source of data engineer will build pipelines a using dvd and orchestrate that through airflow and then as the company of all star evolves. Sometimes estate is late and they have to notify all their stakeholders particularly data scientists in the analysts. Usually that this thing is going to be like right. But the reason these things happen is because they and people upstream of them which may be a product engineer or software engineer. Don't quite know how they're being used. Who is using them. So the most common way off notifying existing changes is to spray and pray right and data engineers end up sending out blanket emails that no one reads and when you make that change through the data pipeline it breaks or downstream pipeline was delayed in it surprises people. So where this fits is in two places. The first in the most impactful place is when you have an existing pipeline that you are evolving you get to see who is using the data that you are producing and in what ways. So this includes both etl pipelines that been built on the data that you are producing but also include dashboards that exists on top of that you're producing as well as ad hoc queries that are being run on your data so when you are changing something that you own. You have a way to know who's using it in what ways in a way to notify them so that there are no surprises. That's the first most impactful way this fits in their world. The second has been. You're developing a new pipeline. You may be working with a lot of event date outside that's coming from upstream eventing systems like segment or heap or something like that. You may have to use data. That's coming from third parties so your crm system. I think of that nature. And lastly you may be dealing with a lot of cdc data. That's coming from production. Data were databases and getting replicated into data warehouse. So the second place where this fits. In as a data engineer themselves has discovery problem understanding what data is available in the data warehouse. And can i trust this data. And how do i use it. So it's helping them. Discover the data understand and then start to build their pipeline whatever. Etl tool they're using all we're on the subject. How has data engineering changed since you left. Lift or since you were heavily involved with the data engineering. Phil's data engineering world is is accelerating. Yeah absolutely i would say. It's changed in two ways. One is that there are tools for data engineers. That are top of mind that weren't in the past and these are products to help build. Etl pines very quickly and easily and products like dvd. You go into space. there is easier. Integration to take these existing the pipelines that were built in prototype and orchestrate them so schedule them in a production. Ready manner through things like air flow. That wasn't as common a pattern when we last spoke almost two years ago but that integration has become a lot more sleepless so that's one area fools for data engineers to become more productive and another area where i'm seeing a lot of investment and which is where stem on the amundsen project. Both fall in is data operations. Once you have made a pipeline. What do you do when you have to evolve this pipeline so this involves change management. This involves making sure that data is getting delivered on time reliability. This involves data quality. That's the second category and the third one which is slightly tangential but still related is a little bit more the democratization of writing the l. So and in my opinion is a pioneer of this were analysts and data. scientists are able to write self-service. Iki elle's that you don't have to rely on data engineers so those are three big areas of investment that have seen over the last two years. What do you see as the most outstanding problems in the world of data engineering in my mind. The biggest problems in data engineering that remain unsolved are still around. Maintenance and upkeep off existing data pipelines and these are the symptoms so the symptoms involved keeping stakeholders posted when pipeline is running late telling them when it's supposed to land and that's a very highly manual process and you can never keep anyone fully up to date you all miss people and he spam people things of that nature it also involves knowing what changes are coming. That are gonna impact me as a data engineer on the pipeline. I operate as well as what changes i i making. Who do. i need to convey them. So it doesn't surprise them. And then making sure that my data that i do own is getting belived. Reliably and on a timely manner. So those are the areas of places where gaps still remain and there are a bunch of different products in the space that help with them minus one of them. And i'm happy chat more about what's evolved here for other products as well. I would like to go inside the perspective of starting a company around amundsen and like i know there are some other like data catalog data discovery meta data systems. And i guess. I'd love to hear about your competitive stance. How do you compare to the other products that are out there. And what your strategy for competing with them. Absolutely so the problem that i saw lift that created amundsen for that exists in a bunch of different companies the status quo of solving that problem when i evaluated existing tools at lift was curation so you have either a person who's fulltime job it has to make sure this meta data up to date and this meta here involves descriptions the cadence of delivery what data quality rules. Apply things of that nature or you assign a volunteer responsibility. Someone who already has a fulltime job in off doing something else to keep this matter up today and the biggest problem with that is that this information gets our date really quickly. People already have other jobs if there are volunteers and if they have this fulltime job they almost always don't have the full context so they then have to go to other domain experts to fill this information in and so that was the key thing that was missing. An existing tool in my opinion continues to mess. Today is that there's a heavy curation angle and rely full reliance inc which doesn't work out an organizations where change happening really quickly and you are democratizing access to data so more people can derive data different decisions themselves coming to today the key. Place where stem differentiates itself is that we uniquely augment your data with automated documentation. So you don't have to document every single Often formation and certify every single data set will support and this is very important when you're changing user behavior so if you're migrating from one particular that warehouse to another data warehouse duration as the way you move people off but at the same time for the large majority of your day our house. You don't have to go document every single field. So that's the one thing. The second thing i find is that the data ecosystem is always evolving and so the integrations and the mandate obtained an years ago or even five years ago. Aren't the state of the art immigrations today. And so today. You want meta data from your snowflake you're dvd your airflow and bring that into stitch a model off what is trustworthy in the organization. And so it's very important. For any data catalog stem i. Included is to have the most integrations and keep them up to date. And i think that's one place where having a vibrant community with simmons product is backed by amundson. Amundson has the largest open source data. Catalog community has over sixteen hundred people. And we have the most integration so keeping them up to date so they work for the integration. The organizations today and evolve as they invest in their future is very important and those are the two ways that we differentiate. Have you managed to get some early like beta customers at this point. I'm not in a place to share the name of the customers. We do have early customers and they come from three different categories about a third of them are amundsen aware so they they know the project they want it but may not have the investment they need in order to deploy this themselves. They use us to provide a managed offering off amundsen with the enterprise management as well as richard metadata through intelligence. The second category are people. Who aren't amundsen aware. But wanted data catalog. And amundsen is the leading open source data catalog and end up choosing to deploy that in the organizations and the third category is organizations that are ready us have successful with it and there are moving to stem in order to save the enterprise management or had also deliver some of the richer intelligence features that we are working on that will really bring value to them in your conversations with these early customers. Was there any feedback they gave you that. Change your mind about your perspective on the product and the product direction. Absolutely one thing. That's clear to me when working on amundsen in on stem is that users and consumers now demand consumer like experience for their enterprise products in slack. Release started this trend. But it's an area. I and and stan continues to invest both in the open source project as well as the company to make that experience or releasing less experience especially for a product like us. We have a lot of meta data we show we show the descriptions the owners frequent users. When was the last updated. How often does it get updated. The calms feels the stats preview. Lineage all this stuff and it's very important that we understand these use cases the user journeys invest our time and making that experience of very clean curated experience in the product users are not overwhelmed with the information. And the second thing that i've learned that it's very important to create a managed offering that really lowers the time for dopamine and taking that data warehouse and connecting to your stem install. Should be really really quick and easy and that's an area. We continue to invest our time. And so what's the infrastructure that you've built to have hosted solution like when i spin up an instance of amundsen on the hosted version on your products are you built completely on. Aws like give me a description of the underlying infrastructure. They're using yeah. So if you choose stem on our goal is that you do as little work as needed for us to provide that same experience that has worked for amundsen to you and so what we do is a managed offering in which we deploy all the parts of ahmanson that you would have deployed and managed them and so there we provide coober nettie space. Deploy for you where we run each of the three services and then we write the integrations most of the companies were working with have integrations that that work out of the box even an amundsen but in some cases we have to write specific integrations in an organization that has their own time series database. That's specific to them or custom edited that exists in a get hub repository or on file. So we we integrate them as well but all of our infrastructure is based on. Aws right now and we support deploying through kuban netease or cloud formation templates depending on the customer. You can learn as much fancy theory as you want. But at the end of the day machine learning is still ninety percent data cleaning and infrastructure work. And doing it all manually exhausting. It's not likely to make its way to production especially when your data your models and your code are constantly changing. Pachyderm is an easy to use. 'em ops platform that empowers anyone to build scalable into machine learning workflows regardless of whatever language or framework they're built on pachyderm provides get like data version and lineage to automatically track every day to change and final output result. Meaning you'll also know exactly what data used to build that latest model automatically right now. S daily listeners can get over four hundred dollars in credits on pachyderm hub sign up today and build production grade data science workflows in minutes without ever having to configure a single piece of infrastructure. Imagine being able to automate your entire data science workflow and still reproduce any result from any point in seconds with complete confidence head over to pachyderm dot com slash sl daily to get over four hundred dollars in free credits. But you want to hurry because this offer only lasts for limited time. That's pachyderm dot com slash. Save daily p. a. c. h. y. d. e. r. m. dot com slash. Save dale in scaling sequel. Cluster has historically been a difficult task. Cockroach de makes scaling your relational database much easier cockroach dis distributed sequel database. That makes it simple to build resilient scalable applications quickly cockroach db is post grass compatible giving the same familiar sequel interface that database developers have used for years but unlike older databases scaling with cockroach db is handled within the database itself. So you don't need to manage shards from your client application and because the data is distributed. You won't lose data if a machine or data center goes down. Cockroach db is resilient and adaptable to environmental. You can hosted on prem. You can run in a hybrid cloud and you can even deploy across multiple clouds. Some of the world's largest banks and massive online retailers and popular gaming platforms and developers from companies of all sizes trust cockroach db with their most critical data. Sign up for a free thirty day trial and get a free t shirt at cockroach. Labs dot com slash s daily thanks to cockroach labs for being a sponsor and nice work with cockroach db. Have there been any particularly difficult engineering problems that you've encountered while taking amundsen to becoming a cloud product. So i'm sitting was built for the cloud at left that's how the deployment was there. Were a lot of learnings there. And making that a cloud first cloud native product to begin with. And i'm happy to talk about them. The place where. I see us as a community the community and some commuters who work at stem a spending our time is currently. We have metadata around application context. So this is what data exists. What are the fields. And what does that mean. We have added information about behavior. Which is who's using the state. I reiterated what jobs are what people and the last one is changed which is housed. I have all been over time so maybe you can look back in time. Travel back a month to see. What was the linear at that time. And what's the lineage. Right now and whether the death between those two lineages and the change part actually remains uncaptured. And it's something that we don't do today in. It's something that we would like to do in the future. And that's the place where that is still very hard. Just because of the veracity and detailed that we have to capture and track over time and that remains to be a constant endeavor to bringing that change information in the open source project and stem in terms of what we learned at left when we build amundsen in. The cloud was one thing that we did right in. My opinion was to make the front end all the services configuration and so one problem that open source products have very commonly. Is that you end up maintaining a fork in the organization before lift. I worked as an engineer spark developer. Cloudera and i saw this very commonly in customers environment as well including clutter as where you would have dupe. And then you end up working your own thing. And the engineering team was amazing at lift than they've spent a lot of time and energy and making sure that king was kept to a minimum because upgrading these works becomes a huge hassle and so the repo structure of what at left and that remains consistent for the open source. Companies is that you would get a front end. Configuration placed in a separate prepo. That overlays on top off the amundsen front end repo and so a lot of thought was placed in the minds so the users don't have to maintain their own folks manage them and it was simply a configuration that got late and we continue to make investments in making that deployment really easy so two of the more recent changes that we've done the open source project is one that we have moved. All these separate repo simonsen used to be four different repos one for each of the services and one for data builder. And then we had one umbrella repo that sub module to all this stuff all that to one repo so it makes it really easy. But you're developing or deploying this as a user to wrap your head around and maintain and manage this thing so that's one recent change the second change we've done is we have published deployment guide and there is no munson custom repo which makes that act off overlaying also very easy so all you have to do is we provide some templates for configuration files. You fill that information in. And if you've used this custom repo along with the new mona repo f ahmanson mix really easy for you to overlay these convict changes in deploy in your own local environment. The way you want it to. Are you already starting to think about adjacency to expand amundsen into or. Do you think just the day discovering meta data challenges and kind of the product ization of source challenges. You have ahead of. You are sort of enough to keep you busy for a long time. Yeah so the space. I am in is that of helping organizations in users in the organization trust their data and there are three categories of problems that need to be solved here. The first one is data governance and my using the right data for the right purpose as an analyst or data scientists and. i'm on boarding. Am i getting ramped up quickly. Based on what my team uses and what my team owns. That's one category the second category state of quality which involves putting certain expectations on your data and making sure there are getting met on an ongoing basis and sometimes maybe these expectations are being automatically suggested to you. And the last one is data operations where you are investing time products making sure that data is getting delivered reliably on time every day and it's very easy to keep doing that. An ongoing basis by staying on top off changes. That are happening. We are in the data governance space. So the the first pillar and we wanna do a good job of that space before we venture on anywhere there is enough here for us to spend an invest our time in making organization successful. It is also the place where most organizations have the biggest gap so michael and amundsen project will stems. Goal is to make this problem not a problem for organizations and that means integrating with other products like data quality tools so that amundsen and stem become a single pane of glass for you to see. Okay what does this mean. What were the quality checks. Run on it. And seeing whether i can trust based on all that information and that's our focus in terms of servicing more legacy companies do they have a heterogeneity of data storage mechanisms. That makes it difficult to integrate with their infrastructure. To the extent that you would like to. That's a good question. Yes though older the organization the more disparate and diverse ecosystem and that could come from storage systems the i. Tools etl tools and so on and so forth. I have not found that to be a problem. In fact that's the place where the product really shines. Because where you have a more fragmented off the world the need for you to understand what exists in stress to wear the outside of my blinders even more important. So i think the larger the organization and the more fragmented is the more value. There is to a product like a data catalog that can help your users automatically uncover. What is out there and what could be trustworthy who to talk to and who are the main users that i should be asking certain questions to do. You have anything else. You can share about the ways that accompany changes after it adopts almond. Sarah dobbs stem the company building. Yeah absolutely so at lift. And in the open source companies over a third of data analyst data scientist time is spent on finding invalidating trustworthy data after they've deployed a product like amundsen the seen improvement in productivity of data analysts and data scientists by twenty to twenty five percent. And that's because you provided them the context in order to find trustworthy data. And see where it's coming from who it's used by. That number is very high in leads to a very strong adoption of a product. Like this so at left. Seventy five percent of data scientists analysts and data engineers user every week and we are seeing over seven hundred users at hyon. G eighty percent of the entirety of conway uses this product. So it's very sticky adoption because these users consider a catalog product like amundsen to be core to their workflow and they use it both creating new work. et l. pipelines as well as analysis but also their existing work needs to be evolved. Or make sure that it's being communicated to changes are happening. So becomes a very very core part of their workflow all right well as we begin to wind down any predictions about the world of data engineering data infrastructure. That you've learned from your work. I'd say a few thanks. I one is that every data user become a data engineer they take on responsibilities to understand and make decisions based on data but often they have to modify the data for their own needs. And i predict that more democratized cools for creating pipelines for managing maintaining pipelines for change managing those pipelines for more diverse set of personas diverse. Skills is something that's going to happen. The second is that there's going to be products. Experiences built that. Help these call them. Whatever you may have these. New category of data engineers who are broader than the current skills data engineers that provide them additional tools off maintaining onboarding users to the their so-called pipelines seeing how they're being used evolving. Damn that are going to make that process really smooth. So one is a philosophical change in the other one would require a bunch of tooling and product experience changes in order for these broader more diverse skillset people to be able to write data engineering style pipelines. So just to wrap up if we talk about a world of improved eight infrastructure or a world where companies are are using stem or. They're using other modern tools. How does the life of the data analyst improve absolutely so the problem with data analyst is that they are under extreme pressure to deliver reports in models and the inadvertently end up using the wrong source of the wrong logic to do this work because they don't have the entire context about what's out there in what's trustworthy what's worse is that data keeps changing underneath them. So something that was trustworthy. Yesterday may not be the day and data delayed deprecated or completely shut off and then analysts and data scientists are the last ones to find out so the place where a product like helps them is that they always know the up to date status of the data through the augmented automated documentation. So you know how it's being used. When was last updated. And you don't have to rely and paying a data engineer to be like hey assists getting delivered on time. What does this call them. Mean and that is the change that that this product bricks in their day-to-day workflow. This makes the analysts and data scientists over twenty percent more productive on ongoing basis. Okay so zooming out. Let's just summarize what we've been talking about. So can you just review the problems of data discovery. And what you've been working on in the progress you've made so far absolutely. Yeah so the key problem is companies are collecting more data than ever before processing more data than ever before but users data engineer estate at analysts data scientists. Very few of them know what exists was trustworthy in how to use it. This problem is severe. It existed at left where i co created amundsen project and it exists in all companies of any size. Once you reach a couple hundred employees. Amundsen is the leading data catalog. That was started at lift. Its used by more than thirty. Five companies instacart square. I n g brexit many more and users apartments and seed twenty twenty five percent improvement than analysts data science productivity after deploying. It and i started stem. Ah very recently. We came out of stealth earlier this month and stem provides managed amundsen which uniquely augments data with automated documentation in provides enterprise grade security that we manage so you can get the benefits of amundsen and more from a managed offering. So if you're interested checkout amundson. Amundson got as well as stem a stem dot. Ai mark next back on the show. Thank you for having me.

amundsen Amundsen ahmanson stem marco created ahmanson cloudera safa engineering kuban eddie cooper netease amundsen co corey Nabil Iki elle Grover reliance inc conway sas aronson
An Open Source Toolchain For Natural Language Processing From Explosion AI

The Python Podcast.__init__

51:19 min | 1 year ago

An Open Source Toolchain For Natural Language Processing From Explosion AI

"Hello and welcome to podcast dot in it the podcast about python and the people who make it great when you're ready to launch your next APP or want to try a project you hear about on the show. He lied somewhere to deploy it to take a look at our friends over at Leonard. The two hundred gigabit in private networking node. Balancers forty gigabit public network fast. Object Storage and a brand new managed Cooper Netease platform all controlled by a convenient. Api you've got everything you need to scale up and for your tasks that need fast computations such as training machine learning models or running your CI and CD pipelines dedicated CPU and GP instances could python podcast dot com slash node. That's L. I. N. O. D. E. Today to get a twenty dollar credit launch a new server and under a minute and don't forget to thank them for their continued support of this show. You listen to this show to learn. Stay up to date with the ways. The Python is being used including the latest and machine learning and data analysis for even more opportunities to meet. Listen and learn from your peers. You don't want to miss out on great conferences and now the events are coming to you with no travel necessary we have partnered with organizations such as od SC Data Council. Upcoming events include the observe twenty twenty virtual conference on April sixth and od SC East which is also gone. Virtual starting April sixteenth go python podcast dot com slash conferences to learn more about these and other events and take advantage our partners discounts to save money when he registered today. Your host is usual is Tobias macy and today interviewing Matthew Hannibal about the think prodigy tools in an update on spacey. So Matthew can you start by introducing yourself. Hi Tobias say thanks for having me again. I'm the creator of the Spicy Natural Language Processing Library. It's a popular tool for working with a text in Python. So it's often used for information extraction projects in I also dot asides projects. Twenty stand text End I'm the car founder of a company explosion I So we also make an annotation tool prodigy and we've recently released the updated and released machine learning component of spacey's. Its Library. Think as well which is not the things. I'm excited to talk to you about today. And you are actually on the podcast about three and a half years ago to talk about spacey's. I'm definitely excited to hear about where things have gone since then but before that can you share how you first got introduced to python. Sure so like a lot of people like came to Problems that I wanted to solve with programming before I came to decisions about languages unit does sorts of technical things so basically I started out in linguistics. I was doing research and I wanted to Text the To answer questions about grammar rule to basically linguistic theory that I was working with and Sudan. It just started from there and I started writing small scripts and everything and I actually. I started out with Po quite quickly switched across the hyphen. So this was around two thousand four two thousand five and since then of really worked with Python for pretty much. My hokry except for Eventually I realized that I wanted to write programs that will foster programs in particular which are more memory efficient So that I could ride. Basically can sites dotted structures. That would work well with the problems that I was working with incident. I started watching with Siphon and found that it really good compromise for that because some problems it just is a lot easier if you can sit down and plan out the memory ahead of time and sort of raising about how much he can hold in memory So that's how that was very much informed. How SPICY WITH IRANIAN. Because the library's really implemented incite than rather than in python directly and at the time when we spoke the natural language toolkit was still sort of the de facto standard for anybody who wanted to do any sort of natural language processing but these days most of the times when. I see references to people doing any sort of. Nlp Spacey is become the more prominent library for that. Some curious. What your sense of that has been as the Creator Container of Spacey. How things have progressed over the past few years in terms of the level of popularity adoption for your library so not k. Is still an extremely popular in useful library and they really do different things. So I would never WANNA cited although. There's only one way to do it in the united tools like deprecated or something like this still a lot of functionality Use cases where people find. Its approach if not having to initialized this much like low to lodge yard basically oldest utility functions. So it's still suddenly a very popular tool at but I have in place to save. A lot of people would be finding space useful and especially for a model appre trying to sort of pretrial processing pipelines and also data structures for working with different annotations. So one thing that you know I think spacey's quite good at is if you've got an attendance for like entities in text and you WanNa do things like get relations between them. You know you wanna find on the pots. Speyer you want to return is a text. Don't Dr into interfacing Spacey makes it quite easy to interact. Annotation lies together And so people are finding that quite useful also processing pipelines so being able to string together a group matching with entity recognition and then apply some other rules on top. Then get out. The document at the end is something that I think spacey's quite strong as well. And so that's why people are using it. These processing pipelines and spicy has a little bit more of industrial use case focus so it's more oriented towards production cases and so I think there's a lot of companies who've basically been looking for a tool that has a focus rather than one witches will are into towards teaching or research and for the work that you're doing it explosion. You've mentioned that you founded the company around the same time that we talked and as a follow on from your work on spacey's I'm GonNa give you can give a bit of an overview of the mission for that company and highlight some of the different projects that you've been working on their yes so when we first started out with explosion we did some consulting projects for six or seven months. We were feted Together with my car found in assists rising client round. It was really a good way to basically understand what sort of problems people had with pay and figure out what we wanted to do. Next and so then the product we ended up releasing was dissertation tool prodigy. And that's been going very well since and that's been really funding our activities in the company site. The why did we say things is that one of the needs to people have the machine. Learning technologies is to be able to develop them closely themselves so a random. We've with founding explosion to roll out of people who were thinking that ai technologies like pay would be something that you consumed as a cloud service and you would really have very few develop. His work closely with across these technologies and abet was different. Abet was that this is a bit more like web development in that to really make use technology effectively in a projects and products people would need to work with closely and there would be a lot of develop his who wanted to understand. How all of these technologies together and so urban souls and sort of self from technologies would be the way people wanted to build that projects and I think that that's largely true that's largely to white people have been working with. Ai is to be run using open source library unit at least as so her own. She's today can really understand in detail. I and you know so. We wanted to basically have a Fiddling with that sort of viewpoint on that people could run their projects. More get basically moved foster. Try things out. And so going back to Spacey. In the last three and a half years there have been a lot of new innovations and shift in direction for the study and usage of natural language with models such as bird and GP to coming out. And I'm curious how that has influenced the direction or the implementation of spacey itself and any other product developments or project updates that you think are worth noting in that happened in that timeframe yes so. That's definitely been a very exciting thing. That's been happening with natural language processing so essentially it's not all of these models. Give the ability to have very accurate models on through language model training so a problem for natural language processing on technologies has always been this problem. This protocol indulge acquisition bottleneck. And that's the this so much knowledge that's kind of in the background about language. On a model that has to work with language has to understand about the world. I in order to get any specific application done slit. Say You WanNa do something reasonably boring extract Financial figures from some document and so its profit and loss statements from leg company filings. So something all of these other woods and all these background things that the model has to sort of understand something about in order to figure out which sentences a of interest to an which sentences of providing that information. And if you have a person that you're teaching to do this along with all of that knowledge about the world. In general intelligence. They have this just knowledge of language to mainstay. They only have to learn a little bit about the toss before I can do it very accurately whereas if you have a model that has to say all of these words for the first time you need an enormous number of examples to teach this boring task it would be like having a new employee and instead of teaching them what to do. You have to teach English as well. And that's obviously a huge learning curve. Where you won't be able to import the general capability of English and just take your task on top and that's always been identified problem in natural language processing models and finally over the last couple of years really had a big breakthrough in houses done so these models on basically started learning to predict the next word or some tasks similar to it from large bodies of texts and they can use a knowledge to you. Know you can start off with that knowledge and apply it to some specific task. Now challenge at the moment. Is that these models have largely been developed by research. Labs who have compute costs completely not a consideration and especially been developed at to favor. Gp You and your devices so this means that if you just run these models on strike from research at the moment they really quite expensive to run and so if you WANNA take processing Of Text and you want to run the the purchasing several times because you WANNA keep experimenting. The costs of running those models starts to add up very quickly the You also have problems with serving them because if you've got a model that requires a GP devised a licensee using effectively starts to get quite significant. Because you need to Bat chop a lot of examples. So it's been very exciting to cities breakthroughs happen. I bet the challenges been basically adjusting DIAKA. Tetris that we have and finding compromise between models which still cheap to ron models. Which is still low enough latency while being able to take advantage of the high accuracy from these new techniques and these problems are constantly evolving. This more Amoah. That's coming up recently about making these models smaller and more efficient as well and another element of the large models that I've seen referenced is doing things like transfer learning for being able to take the existing models and then swap out a couple of the layers to be able to make it fit your specific use case. Is that something that spacey is used for in that context as well or is that sort of outside of the scope so we have a command like spicy pre-trading that way you can run language model pre-training even you know basically from scratch but you can also especially with think it's quite easy this little plug these together into a take advantage if they slips technologies and in. Spacey. The reason that we've sort of free design thing was really to take advantage of this type of model better so one of the challenges that basically is introduced by the new transformer And the new ways of doing machine learning at a win spacey was first developed thought carefully. About what the right level of abstraction was to present the developers who said that they could take advantage of natural language processing technologies without while you know basically which bits of complexity to shield of from them in which fits to present is like decisions that will be making so the level of abstraction that was sort of more sensible. When I was designing. The library was too. I think at the component level and say all right. Well this is an identity recognize an does this task of assigning labels to text and then he can combine it with like a tagger. We can combine it with a puzzle and then these are things which will analyze text. Daniel get back to stock object and you're basically work the dock object from that now with the neural network technologies and in particular with the transfer learning technologies. It's at the level of obstruction that sort of most handy for developers to work with is a little bit different. Because you WANNA be able to take these models and basically be thinking about the tenses and thinking about you know saying or I will feed this bit of this word representation out into this Lila. And I'll share that information with this other lyle and that's really a level to basically develop his want to be working with now. Because you know. The knowledge at these models is pretty detailed in the community people. There's a lot of people who understand these things pretty well in. Saudi obstruction is different. And so this is something that we've basically wanted to adjust in the library and make it easier to work at that level. Also making shorted library does the job Originally did as well of working at the pipeline lighters and other development. That has happened since we last talked is the fact that you've added support for a number of other languages whereas at the time. I believe it was only English and German. And I'm curious what you have found to be some of the most challenging aspects of building those additional models for different languages and any challenges that you see in terms of being able to support things like symbolic languages like Japanese or Korean or right to left languages such as Arabic. Yes sir the in terms of supporting more languages I would cite it. The the two big challenges at develops and Dada so too difficult challenges. Simply you know the As we've added more languages the operational complexity of training. All of the models in the automation required to have to trump's can play. Well and you don't have pipelines Reliably with low manual effort. We can get all of says Audifax. tested free trace and that was something that took longer than I thought my. I'm so You know the training jobs. Take you know a fabulous time and then for each individual training job. You need to be able to resume it in stuff. So I tried to have a number of technologies like eh flow Luigi and things. They ended up with basically a setup that works well for us but this was definitely a challenge. And you know that was the thing to talk fabulous time setting up the day's things partly because you know these talks on ones which I was you know had such deep expertise in so there was a bit of learning cuff mate. Eight air as well and then the other one is just the daughter resources so we wanna make shorter full all of these languages when we produce a model that it's basically a useful to papal And the models are sort of exist for the sake of it. So that's been something that's been difficult especially with different having having inconsistent licensing and stuff we wanna make sure that the mobile's that we produce a avalaible for commercial use for people and also the daughter is good enough that it's you know something that's actually useful so our time. I wanted to things. It's changed since we lost. Talk this universal dependency. Corpora have gotten a lot better. Gotten pretty consistent and so. That's something that we've been able to take advantage. Alvin poussant Ask More of these models and for being able to build out these models as you said. One of the challenges is having the appropriate Corpora. I imagined that another aspect is being able to label it. Effectively and find pre labeled data sets where. I'm sure that some of the inspiration for your project will came from some morning. If you can just talk through a bit of the motivation for creating that and described the use cases that enables the workflow for somebody using it definitely questions about libeling daughter. You know some of the problems around this and we sold a when we were doing the consulting Definitely something that taints struggling with so probably the most important thing that we felt that we could offer. That was a little bit different from you. Know lacking in people's process was I. Guess you could say. It's more of an agile methodology to at Donald Labeling. So the problem of labelling daughter is well you just Bay and then you tell somebody to to apply to labeling scheme and then the problems just as Tosca of getting the things done and for some tasks it looks a little bit like that. I'm some image tasks or a little bit more like that. But certainly for language as soon as you come up with any labelling scheme and you stop applying it to text and very quickly realized that there's all these edge cases and I it's kind of edge cases all the way down and even more importantly you need to realize that there are ways that you can just lie booms game that will hit a better compromise between what will be yours fulfil model. What will be useful for your end. Goal? And what will be easy to annotate? What the model will be able to annotate effectively. So deep the other day we were working on a little of property and we were annotating instances of like ingredients in cooking discussions. Because we wanted to say. Are there trends in? What sort of food people a a using especially for high cokes and things so we want to say I will? Can we find somebody's in the frequencies with each different ingredients have mentioned and so simple enough but then you quickly realized that this is not really clear distinction between what's Ingredient? And what's a finished product? Because in sometimes you might have something that are like on chicken. Fillets could be a ingredients in a recipe or it could be The recipe itself. So and there's all sorts of other examples like this kind of not sure if the boundaries and so always making these decisions when you're annotating any project and that means that you have to basically take a pretty flexible view of what you're doing. It means that you have to be able to start on. Stop the nations and look at the daughter and have it basically integrated process. That's really what we did with property. We made sure that it was a tool. It was fully scripted. I'm still you can really have control of the annotation process yourself and you'll be able to build out different capabilities of automation. You need as well so you can drive it from Python and if you can basically ride a function in python that generates the data than that will be something you could quite easily putting the little function and then have thrown a web browser few to click through and received a local database. And you can make different choices. If you WANNA stand out for instance you can have the database save to lag Amaya Scroll instance instead of local screw light file You can hardly application different ways. But at the core of it is the tool for any data. Scientists working individually. It's a really quick way to be able to build out these experiments yourself and the try different things that you don't have the stumbling block of as soon as you need some small amount of annotation you hit a sort of purchase block in. You have to do something different tool have to go to your team and get you know. Basically applied for funding with it to throw out next on rivaling Sevice instead it's just something you can do flexibly yourself and I know that there are some other labeling tools out there. I'm curious what you saw. As being some of the lacking features or capabilities in the available market that necessitated building out prodigy as an alternative to them so the number one thing was really the design of it is develop a tool. Ns cryptococcal developed a tool. Because when we talk to people about the at they're experienced doing annotation using annotation tooling almost all of them had built annotation tools in house and that was something that was worth thinking about is like okay so if this is the type of problem where people very frequently motivated to write their tools. You know why would that be and the simple answer is all will? Nobody's come up with Tool that everybody needs and I don't think it's quite that I think it's did the needs Flexible and people want to have control of the process because that's kind of efficient different needs different. I'm so we wanted to make sure that it was something that people could really work with. And they could work with developers so you know the script ability and The fact that people interact with pro grammatically and have itself. Is this something that we really wanted to build into design so most of the other things designed as web applications and programming against a web application do not hosting I is always going to be limited in the other problem is Dada Privacy. So the vast majority of uses really don't want to often simply con applied the texting into some cloud service and this makes a lot of sense to me if I've got a platform that's private to me. I don't think that that fender that those people should be sending that Dada to united some external potties and since then the regulation is also. I'm caught up with this interview that I have of how things should work and I think that that's right and I think that the US is catching up with this as well. We'll have rules. That are most standardise. Derided works in Europe to another thing that I saw. That was appealing about prodigies. The fact that it supports multiple different types of data for being able to label it so it has capabilities for texts so that you can do things like named entity recognition like you referring to earlier being able to say in the example you gave. This is an ingredient versus. This is a finished product but it also has support for being able to do labeling of images and segmentation of those images to say you know. This is a rectangular area. This is a polyvinyl area and this is the label associated with it and then you also have support for some other data types as well curious what you found to be some of the challenges of building a tool that supports different data types and some of the value. That you've seen come out of it. Yes so basically just trying to trying to make the right compromises between what people need in different use cases while still without diffusing too much from being less useful for any particular use case so to be clear. I think they'll always be able to talk to people want to use as well. You Know I. I think that some of the things that computational tools are actually tools in general can do is to try to be the one stop shop for all cases in all situations. I think it's important to do the job that you sit out for yourself. Well but a lot of people have found it useful to have this variety of capabilities in product. He said that you don't have to have very different websites. Just because you now have image Tosca spice to Tex Tusk. We've also introduced. Tonight's audio support. I recently as well in this in had some fun building. That album getting interfaced. It's helpful for that. So one of the challenges has been on designing workflows for things that we don't do ourselves so often you know we still don't do a lot of image work in terms of our actual projects in we don't have as deep next potatoes on them so making sure we basically doing something. That's helpful to paper without having a closer connection to it. I think is something that we've had to think carefully about end in different data sizes and stuff so obviously the size of the input on four something like a video. Image oil audio is quite different from text. And so we had to make some adjustments in the the White House database wax and stuff to make sure that is well accommodated and you mentioned to some of the challenge of being able to allocate funding for working with an external labeling service where you can have the capacity for doing your own labeling at least on a small to medium scale and. I'm curious what you see. As being a reasonable scale of data that can be handled by an individual or a small team. And at what point? You think it's necessary to start working with some of these labelling services to handle more large scale or more fine grain labeling for the data that you need to use for building your models are building your products. Well so I think that you definitely can get something to production with working with with basically having just sort of ad hoc resources of yourself. We'll tame a like a maybe Some interns alike some other junior people around the place and so it depends on the tasks and it depends on how much daughters needed to get the the models trying because there's no one answer the different tasks have different. Complexities and things. So I would say the number of examples. Did you need Model is on dropping all the time because the transfer learning technologies. Very good so I would say you know especially now on you need less an ever and I would always be focused on if you find that you needing hundreds of hours of annotation than Rather than saying just saying well this is our life. That's just how much we need. I would always be as suggesting to you. Look at ways that you can redesign the models because it may bid you know. Something's wrong with the white. You're actually defining the problem so I'll give you an example of this one of the examples that I use some of my talks. Imagine that what you wanted to do was extract information from crime reports. And you wanted to fill out this database of You know who the victims name a perpetrator name if it stay a location where the event happened. The event type of something so One way of doing this is very directly. And you might say all right. I'm going to do this. Labelling Task Ride Light Labeled Hispanic Takes Sean Smith as victim and then this other Spanish texter location crosses location of crime or something and so you know that's definitely a why did you will be able to train the model but you two pieces of information. You're coupling identification of that John Smith as a person with the fact that the event is about we'll actually a coupling By the the sentences about unity bedtime crime and John. Smith's role in that event is The role of victim. So if you affected are straight paces information out you can often have a need Dada because the the decision of all right. That's supposed to invest is not a person is you know basically easier and it doesn't require as much information about the whole rest of the sentence similarly the information that is the sentence about a crime or is it not about a crime. That's one bit of information that you can annotate over the whole sets and so if you annotate these separately and you. I'm trying to muddle separately. I you can often need Startup and so you'll have some situations where people a fun into model isn't converging. Well and the first instinct is to other try a different Architecture or to annotate Moda when by far the most best Is One which people don't really have practiced pulling because it's not one which You know you'll have gotten from. Shed tough so you'll have gotten from writing papers and things and that levers have redesigned to talk. How can I find a different way to either? Nate less for the application oughta just structure to muddles differently. They attacked different pots. Toscan undefined things differently. I say or well. What if I did? This is the sentence labeling task rather than his labeling. The Woods in the sentence would by application be able to do with that slightly less precise information. Well sorry maybe you'll find that. The Model Converges fatherm fall at all so to answer the question about win. I would actually switch to a libeling service I think. Actually I would you liable service. Basically never and the alternative. After you have PAS prototype I would actually have people Be Hiring people to do to work in house and they can be remote employees. You'd have people on freelance contracts. But I would always want them to base Pacific people that I can talk to under the supervision of project because after you get pasta the prototyping stage the task of the daughters discrete event. Way You do it once. Get back this batch. Toscan than the project is kind of shutdown. It will be something you constantly want this fate of data and fate of examples. So that you can keep monitoring to model keep on improving it over time and you don't want to have it as dislike discreet contracts way. The daughter is going to be different each time. You go back to the service because you're getting I'm done by different. People with different standards potentially different pricing. And it's basically something that you WanNa have consistent control over every time because you'll nays will try and just well the needs of what you're going to find that. Oh okay. I want to adjust slightly too wide at the. I'm Donna's annotated because this is problem in the The application problem in the model that needs to be solved and the third component that we mentioned at the opening and that ties into this whole ecosystem. That you're building out is think project which you mentioned was extracted from the Spacey Project Originally. I'm wondering if you can just talk a bit more about the motivation for releasing it as its own library and some of the primary problems that you're aiming to solve with it within the ecosystem of machine learning and data science. Yes Sir Spacey. Always Kinda came with. Its ON MACHINE. Learning implementations in initially it was basically a pretty simple linear model that was optimized to weapon with a very spouse features using the average perception algorithm inside. This was always These linear models. It was pretty common in Alpay. Basically everybody will implement themselves and you know most most all deposits would have there. Unlike LINEAR MODEL implementations looking within them so I did it the same way I found that you know basically a helpful white to keep the model efficient in working well and then over time is de Niro with models came in I had already been implementing. Neural Network Card wrote about when pie talks came out. I was basically done with The models and we were experimenting spacey too so watch. Why should come out earlier than that? You know before you know basically been doing all of that work probably every chance. We just pilots from the start but one of the advantages that we saw in spicy to sticking with our own implementation was that we could. I make the library a little bit smaller because we only had to implement the models that we needed and we didn't have to drag this whole huge salaries from next tunnel library and we were able to make sure that we didn't have a dependency on specific version of politics because we needed the library would evolve quickly and we wanted to make sure people never he. The situation where they had to projects in spicy needed a particular ranger versions. Pie Torch and their other card needed a different range of versions of patrons that they had this lock. And it's not always something that we will contra solve as well so you know over time. We have kept using our own implementations but as as I mentioned more and more people have wanted to interact with machine learning Laya underneath. They want the people need to be able to find their models and bring their models into spacey in productivity and so what we decided to do rather than stand standardizing on Taito directly was to Sort of adapt. Think into a library where you could see it as a wrapper around different machine learning US lucians underneath so in addition to things unim- implementation's of things you can use it as a just sort of interface Lia- above Patrick's if you can really easily defined any petro tomato you want then think is just the interface. Interact with you. I'm spicy so that was what we set out to do. And the way we approach that was to really think about the you know what's this little lightest white to alike most minimal interface. It is necessary for this type of Deep Learning Library. And we ended up with functional. Programming inspired design that features very minimal interface. If like a single model class and the actual work of the lights is done in function definitions and instead of bringing in a definition of some sort of order. Grad mechanism will take by differentiation. Like you have in Pie. Torch this justice convention of Kobe mechanisms. And then you have different relationships between lie is like say fig Ford relationship Concatenation SUBTRACTION. Or something is all handled by higher auto functions. So we ended up with the design. That's on really quite white and minimal and The library itself is quite small and easy to raid. And this means that you can really bring any model that you want from another library Whether INEX- net attentive floor. And you can plug them into spicy and plugged into Pie torch. We also built out a few other user interface features that we felt with would be helped very helpful in the main one is that. I think is kind of underrated as problem in machine learning problem of configuration? So we always found in spacey to this problem of had a poss- configuration through a tree of objects so one way to do it that you pass into some component. A whole tree of configuration defines the that model in that may be punished? Model et CETERA. So you have this glove of configuration. You Poss- top down into something. But this means that as soon as that component has applicable your sub paces you it can never know which I'm fictious or what configuration options. It's I individual. Pots will need so if If I want to say configure repository tag and I WANNA give flexibility of model of that oil Allow people to change individual pieces of that model than I have the puzzles this pay Club of configuration forward. And then those assumptions that being configured probably have defaults and things. So you end up with Problem of different defaults being said and you can very often have problems. What you think that you've ever done to default and you haven't so instead of positive configuration down like that we have a way of passing defining. The configuration bottom up through the conflict filing leading the tree evolves defined. I'm sort of brought in from that and we've found that really helpful in keeping the cleaning in keeping helping to manage its problems default and then finally we've got typing so python-3 has type declarations end. We've really made abusive days in think. Sorry it's the first time of releasing good really full support for umpire is and things in a pod Rico system. We have said that you can get steady tie Paris to something like indexing Ryan White. It's invalid because it's a three dimensional views too many index indices or something. So yeah. I think that that's an exciting That we will be very helpful to people as well and one of the things that I thought about as I was looking at think is because of the fact that it acts as a high level. Rapper for multiple different frameworks for doing deep learning. It puts it in some sense in the same space as the CARRIS project. And I'm curious what your sense of. The comparison is fourth being the comparison of think versus Carris in terms of acting as that wrapping layer. Yes oh I really found the the functional programming style in Karras. And you're very interesting. When I first sword it was definitely something that helped inspire the approach that we think but over time the Sort of focus of Charissa shift through a little bit and you know it is really pot extensively day size and it's really divided basically it's something nine interface intensive flow to people using its is really a high level. Api intensified coupled into flow ecosystem. So I would say that the focus is a little bit different with think. And we've also benefited from coming into it. A little bit lighter and being able to come up with a little bit. Tighter will be able to maintain a little bit. More consistency of time is that we really hoped that we will not have to make a breaking changes of time in that we are able to basically keep design quite concise incoherent. So I would say that the use cases different than Not so much a wrapper around different things. As much as just a you know a key part of tense Affleck's specifically and then because of the fact that think also acts as its own framework for building these neural nets and doing deep learning. I'm curious what you have found to be the strategy that you use in terms of determining when to do something entirely think versus when to incorporate either Pie torch or tensor floor. Some of those other frameworks into the network as a component of the project rather than just doing it entirely without those frameworks so we a goal is to avoid having in the long term models which have strict dependency for unpatriotic flow is the coal pipeline. Ipr's spicy because. I think that this doesn't make things sort of operationally simplisafe Myspace uses so. Why did I say it is did? Poetry in particular is a really excellent compiler of these architectures and it's able to really take very general you know basically you can implement things in a very sort of neutral without worrying about the foam instate tiles Pretty much always do it. You know a pretty good job that and so it's a lot easier to get to you. Know basically a good performance level without having to manage the specifics of the competition in particular specifics device. But that's said if you take any specific architecture and you implement something in Cooter yourself. I'm you can usually match. What the performance that you would get from something like Pie Torch so it way I would do. It is that when I'm experimenting with something and I want to try out a GI. You while I might not have jail you implementation Suddenly just plug thing on. Pie Torches and. There's no performance penalty for doing that. This overhead in translating the tenses from pupae to Uses to deal pack Formulation when you have Patrick We have a thing that let's set. Q Pie is memory allocated via patriots. Well City Run. Got One memory pool so there's really no disadvantage that you just have to have installed so w all sorts of architectures where it's other easier to implemented initially impeachable All Where I would using that directly and then eventually we want to provide that the space uses or if I feel like I can do a little bit better than patron optimizing that specific architecture I would switched our to think the other thing is. Sometimes you have these sort of high level building blocks of models and some of that position is actually easier to do in in think by just thinking about the different sort of function. You plug together so in particular. I'm uses Of writing things in thinking. I find that that's something that I kinda concise to define models and try different things out but for different components yet. Maybe there'll be something where it's easy to have a pile trap around it now. In terms of uses of spicy in uses of property almost always They'll be moved from India with Pie. Torch and Dell have don't want to work directly Pie torch to want to have that Shit especially initially as you want to have that is development framework and then they can just use a thin wrapper from around it and over time. Maybe they'll decide to do as some other. Specific thing in think routed doing it in python directly but the aim is to let people work with the framework said they WANNA work with. I imagined that promotes develops to want to work with you know patron. That's a you know a pretty standard technology machine learning so all of these tools that we've been talking about are open source and there's something that you're working on as a core element of your business and em- curious what you have found to be some of the biggest challenges in terms of building and maintaining these tools to meet the needs of data scientists and machine learning engineers and the approach that you're taking to making them sustainable so one thing that's definitely difficult about this. Technologies are changing so quickly but in terms of the research to Nathan and also to software. Hr systems around things. We have to strike the right balance between maintaining a good man of backwards compatibility instability for people while also moving quickly enough to take advantage of you opportunities technology new integrations with things basically providing a better experience improvements to didata scientists. So I would say that's different something that's challenging about the this type of work to basically be pushing ourselves to deliver the best quality software that we can you know. Certainly you know what this is constant background a thing of the The continuous integration systems. A changing underneath is still something so at some point we implement everything in you. Don't get set up with Travis. Set Fire in circle C I and then Okay Izzo pipelines comes Outta Okay that's a better option. So Migrate Different. We'll full maxon different bill tools and things so there's this sort of background level of old of the basically too boring problems. Did the technology stack around us? A chain improving and a different libraries all of these things so that's definitely something that occupies a surprising amount of time is just all of the rest of these like ecosystem. Things interactions with you know all of the other software that you wouldn't think of his Coal pots of solving the problem. But it's definitely needs to be done to basic take keep delivering high quality software and then as far as the most interesting or impressive projects that you've seen curious what you have found to be notable and worth calling out either things that your team is creating with the tools that you're building or things that you have seen built with those tools that you're releasing so we're always really blown away by saying over the things that people are building with spicy and the this is definitely something. That's constantly increasing improving so we have a collection of these projects in on the website. The space a universe so to which I'm collide in particular to models Too Spicy pipelines full as specific types of text so one is blackstone which is a spicy pipeline. Full legal text processing. Another one is Sized Spicy which is spicy biomedical text processing developed by the Allen Institute another project that I think is super cool. Is this information extraction. System called harms based in predicate logic. So that's something I've always wanted to dig into a bit. Bored was developed within a company in basically kindly open sourced. So it's really quite a substantial project that I think is definitely cool another one which way developed internally you know people might want to check out his. We have a protocol Where we train would victis based on your text. It's been pre-processed with spicy. So now prices have been merged into one. Tucker no entity submerged into one target and early this year we ran Spicy Guy. Old Text So this was several billions of words from two thousand ten to two thousand twenty and we have we used this to get the entities and basically make vectis and then we precompetitive similarities to those vectors on Jape year. That this you know you can get pretty much instant nearest neighbor queries across holidays terms. So you can find similarities across entities in things which is quite cool to play with end as a contributor and maintain her about these projects and as somebody who is running a business that relies on the what have been some of the most interesting or unexpected or challenging lessons that you've learned over the past few years so one of the things that's definitely important. History of the project should at documented and communicated Sir and missile systems back to our initial design decisions as well and basically making things consistent in the projects condition in libraries and. I think that this really makes things more useful than open to a wider audience. This was something in particular. Did Improved with my collaboration within us on. So she's been at really driving getting the the gods in like level of explanation the week on deliver united basically a high level and. I think that that's something that's really been setting a pot some projects as well when we we sold us in particular when we went back and did think that there was so many things way we felt like. We've done this before setting up these libraries in setting up tooling which people find useful and ways of doing the documentation things to people would nate. So we've learned a lot from touch questions. People have and the types of FBI design decisions that will lead us into maintenance. Problems believed to be confusing to people. And we've been able to head off some of those things at the stop. Think which would have been pleased about so. Those are all things which I definitely think with learned as well and also just the Setting up testing and making sure that the card is well tested well-tested intangible to what somebody's bugs in the first place and as you look to the future of the explosion company in the projects that you're building there. What do you have planned? And what are you most excited for so one of the things that we've been working on for a long time? We excited to finally get out would be an extension to project called prodigy teams Which has more of a sort of team management interface and has a heart's component where you can allocate work to individualize titus started stop annotation tossed and things. So that's something that we've been working hot on because it has this part of its mamanaged architecture in decimal intricate web behind. It's taking a little longer to develop. It's been going well and we've really excited to get that out to people and yet it's basically the main thing that we've got in the pipeline as well as well as having a getting spacey three out which will make it much easier to use at transforming models and really allow you to bring a bring your own model and you'd basically making it easier to interact with those technologies spacey. Are there any aspects of your work at explosion or on the spacey and Prodigy in think tools that we didn't discuss yet or anything else in the space of natural language processing in deep learning that we didn't discuss the cover before we close out the show? Yes so one of the things that's indifferent with explosion. Since we lost spoke Spanish to Bros nobody extremely effective team. Either we've been working with so as spacey's containers now also include Adrian so we've got a team of now full people working on awful time And that's really been helpful so in addition to myself in my current and then we also have Sebastian Ramirez has joined the company. So we started working with him because we started using his consoles library foster. Api which I think is a really great tool yummy python develop Check out and say he's been working with some product teams and as well as we've got another develop Justin who's in the US. Who's working teams as well sire? Yeah we've now got a few extra phillip is working with this on the scalp things. It's still quite a small team. But you know we really feel blessed to be working with people who are very effective Quite independently and I feel like. It's very collaboration that we have with for anybody who wants to get in touch with you and follow along with the work that you're doing. I'll have you had your preferred contact information to the show notes and with that. L. Moves into the picks and this week. I'm going to choose the movie onward that I watched recently with my family That's a movie aimed at younger kids. But it's great for the whole family hilarious really interesting. Storyline had a great time watching that. So if you're looking for something to watch with the whole family will definitely recommend it and with that all pass it to you. Matthew dabney picks this week so outside of a space a lot of my time over the last few weeks it's been spent following the current virus pandemic so. I'm sure by the time you listened to to use anything that I say will be different. I guess just stay safe with that and you know in terms of things within the ecosystem recommendations to make one project I think is really cool to people might. Checkout is This library Ri- on which is developed by some people originally from lab Berkeley. And I think it's really cool way to you. Know basically bright distributed applications for machine learning python on still quite young but I think they've got a nice design and I. It's something that you know. I think we'll continue to be popular in is wanted to check out. Yeah definitely second that one and the original creators of the library have also founded a company called any scale to try and accelerate the development that framework and turn it into a viable business. Oh definitely something to keep an eye on their well. Thank you very much for taking the time today to join me and Sherry. The work that you're doing at explosion on all of your different projects Definitely a lot of interesting tools that contribute a lot to the ecosystem so appreciate all of your time and effort on that front and I hope you enjoy the rest of your day. Thanks you too. Thank you for listening. Don't forget to check out our other. Show the data engineering. Podcast at data engineering podcasts. Dot Com for the latest on modern data management visit the site of Python podcasts dot com to subscribe to the show sign up for the mailing lists and read the show notes. And if you've learned something or tried out of Project Michaux then tell us about it. Email hosted podcast in dot com with your story to help other people find the show. Please leave review view on I tunes until your friends and coworkers.

Sir Spacey US Dada Nate Tobias macy founder SC Data Council Cooper Netease Patrick Matthew L. I. N. O. Sudan John Smith FBI Woods Deep Learning Library