Distributed In Memory Processing And Streaming With Hazelcast

Automatic TRANSCRIPT

You hear about all these different data management platforms that talk about these things but Hazel casts advantage is around in memory and in memory isn't a new concept it's been around for a while but there have been some limitations about what it can do in the past and some of these limitations are being mitigated so that in memory speeds are opening up to more and more companies and Hazel cast was founded a little more than ten years ago actually in. Turkey by a couple of very smart engineers and they came to Silicon Valley to start his cast as a former company, and it was all about in memory computing, and so the first product was the I M D product which stands for the in memory data grit. So very much like a database, but a bit more capabilities in terms of distributed computing ways to simplify building applications that could be spread across multiple nodes in a cluster and thus enable paralyzation much more simply and so from the early roots. It was all about trying to get applications that ran faster, but at the same time maintaining some of these enterprise qualities like security and reliability and availability. So ensuring that you're not getting speed at any cost but getting the right amount of speed that you need to address your use cases while also protecting your data and with added on stream processing since and we have a set of technologies that work extremely well together and are fitting in quite well with some types of use cases that people are building today. and. You mentioned that it's not being built at the expense of some of the reliability durability guarantees that you might care about is particularly if you're working on mission critical applications so I'm wondering if you can dig a bit more into some of the benefits and the potential trade offs of in memory compute particularly for data intensive workouts and things that are going to be operating on stateful sets of operations. You have the benefits of computing have largely to do with the fact that you have fast access to data stored in memory, and so I've heard some people say that this notion of in memory computation or in memory processing is redundant. When in fact, if you think about it, the processing isn't done in memory. The processing is done in the CPU or these days increasingly more in the GP and the in memory simply means that all of the data is stored within memory and not necessarily spilled out to disk, and so when you have a system that's designed to optimize that pattern where you have all your data in. Memory that means that you can get fast access to a lot of fast processing and be able to deliver on some of these use cases that have very narrow windows for service level agreements. So you get performance the same time when you have fastest, you need to incorporate some of the typical characteristics of a distributed system like replication in a variety of ways and you need to have consistent replication. So we've after doing some research, some competitive research we've seen at least one technology where at certain levels of throughput, it pauses some of the replication to be able to handle the throughput and so most people won't notice it but. It's one of those things that if you're not watching, then you could potentially have a big problem when your dad isn't replicated and notes go down and you get failures then you might see a lot of unexpected data loss when you thought that all of the data protection capabilities were in place but for us, we don't make those trade offs when we run our benchmark. So we say here's what you get in a true production environment in terms of performance, and you can be sure that we keep everything retained for business news that you would expect, and certainly some of the trade offs are pretty clear if people from there. With these, it's mostly about how much daddy can store. So you wouldn't use Hazel cast as your say your system of record for Peta Bytes of data we're talking more about operational data where you want to process it very quickly. So things like payment processing or fraud detection are good cases where you might have a good amount of data in memory as a cup, but also have the engine processing in parallel and being able to use that data in it almost transient way. So it's it's data that persisted somewhere else, but we put it into our engines so that we can have those very stringent, very data intensive workloads running. My understanding is that the actual implementation is as a library that can be embedded

Coming up next