Automatic Summarization

Data Skeptic


My name is martin luther before and i'm a vc student at the university from saddam. Main research is focused on natural language processing and information retrieval. And i'm especially interested in how we can learn from humans and human cognition to improve our ai models before that. I did my master's degree in artificial intelligence at my bachelor's degree in mystic so liked to take the knowledge i also have from the back roads and the light microbes research everything. I've done related to natural language processing carries with it a sort of computer science bias towards it. I don't have your background in linguistics. What advantages does that give you in your approaches to natural language processing. It's quite interesting. We've seen the developments. From the early days. I would say wehrley Quite prominence when we wanted to model language greedy looks into specific linguistic structures. And at least things then we went into an era. Our people throw that away basically no linguistics of war Data only we only want to learn patterns from data and always see a bit of shift back again so people try to incorporate knowledge from linguistics into models with the idea that they come maybe learn everything from data per se or if we have named with knowledge that might gives an advantage if we decide like these are models that could work well for this task for example is a pretty exciting thing to see that we go back in the knowledge of marie slogan definitely there have been a couple of people who have taken a pretty provocative ver- extreme point of view on this and is ibm. Has this famous quote. Yes i believe Frederick djelic said every time i fire a linguist the performance of the speech recognizer goes up so i imagine that was a deliberately provocative statement to make in your experience. How have these communities actually overlapped in the community or p. community specifically we want to model language and that's basically what you also want to do linguistics. You want to model language and you want to understand language. You may not want to light produce than which so much linguists rather observe whereas from nlp perspective you might as well to produce but also understand that right. So i think as linguist. You have certain intuitions about language with everyone might have. That seem very obvious to you as linguists that other people might not find so obvious such as negation can be a hard province so for me seems very obvious because this is a trend See gwyn stakes but like from computer science perspective. Never thought about this. You might wonder like why does my mom before well these types of includes or questions or whatever you might not realize that it was about negation or something. That is yet wasn't really that long ago when people still seriously considered that we could solve negation with just a couple of handcrafted rules Exactly like there's more to that right. I think in order to understand what would work well or licensing doesn't work well yet. List acknowledged really comes in handy. Will your paper the caught. My attention is titled what makes a good summary reconsidering the focus of automatic. Summer ization now automatic. Summer is kind of interesting that by hearing it. Even if you've never heard of this fuel before. You kind of intuitively know what it's all about yet. There's still some open questions practically speaking you know. What does it mean to do some reservation. Could you perhaps give us a survey or overview of the various techniques is a great question. Maybe not so clear which is one of the reasons why we started to write this paper. But that said i can give an overview first of what is often perceived as the way to do it in the community. so i'm talking about texts. Summarize -ation right because Of video summer ization for example with for decoration. You often do. thank you. Take any input documents. Texts article for example news article or a bunch of news articles media articles. And you want to kind of get the gist out of this input and right leg few sentence summary about it. That is the majority of work that is done now. How is this done with anything. We've seen a little progress. They're so it started off with a unsupervised Graph based model such as text. Frank relax wrangler. Basically people make a graph of the input documents and then kind of see. What are the most important sentences Extract those now with the rise of neuro models. We see that there is much To sequence approaches. That people used first night with our anez. We see transformers. And bird and bird dyke auto spoken up in a community. And then you also asked about the evaluation. So how often do it is served few forms of evaluation so you have flake the firm and often people use a roche with basically check for lexical. So have your label. Summary like the one you know. It should be any kind of check out. Many words are in common with summary. I produced there. How many acronyms to make it more precise and then there's also some new metrics such as like bird scores in one that doesn't measure lexical similarity with router semantics clarity right because in this lexical similarity approach. If you have a word is kind of the same word as in the summary. That was the label. But it's not the same word out and you don't want that so you run our to measure semantic similarity so that's another type of scoring functions people use and then another way is with human evaluation though you would ask. People questions like which of these summaries is more fluent or which one more informative or which one has the best coverage these questions

Coming up next