A highlight from StrategyQA and Big Bench

Data Skeptic
|

Automatic TRANSCRIPT

It's been a while since we ran our season on artificial intelligence. And if I were to pat myself on the back I would say it was serendipitous that I planned it at the same time large language models were starting to change the NLP landscape. If I missed the ball on something it was probably running a season on computer vision right now. We've seen a mirroring set of advancements most notably the recent diffusion models, and its astounding to think how far we might go with this more or less identical underlying architecture used for both language and vision. If there's one thing I am confident about, it's that the true test of whether or not something is artificially intelligent. Can only be performed with Alan Turing's imitation game, or some of you know what the Turing test. And despite recent advancements, it seems pretty confident we're still a ways off from an AGI. But between now and then we're going to need bigger and badder challenges to press our machine learning algorithms up against. I mean, people still publish on mnist and ImageNet a bit, but if there's one lesson we've learned, it's more data and more distributed are the two paths to push forward on. So that's why I wanted to take a quick respite from our ad tech season and bring you a story about a collaborative benchmark known as big bench or the beyond the imitation game benchmark. This is a large collection of many different independent tasks. In natural language. He knew it was a major feature of Elmo and Bert and all the other models that have followed since that they're useful in a wide assortment of seemingly independent tasks. Or that the bird embeddings used as features in an MM model will allow you to train something with hundreds of examples where you previously needed maybe hundreds of thousands. Big benches one of the best benchmarks out there as we try and build algorithms that can be as general purpose as possible. So today I speak with returning guest more geva, we talk about strategy QA, her specific contribution to the overall project, as

Coming up next