May 26, 2018text nlp R bot fun
I started off my text analytics journey with sentiment analysis because I just thought it was so very cool that a machine can detect positivity or negativity from humans! I ended up creating a report which condensed review-like reports after training and applying a sentiment algorithm to gauge the amount of positivity or negativity from the reviewers. I found that as a by product, by focusing on particularly positive or particularly negative comments, my created report also produced a fairly decent summary.
This however only works on reviews. I wanted to look into a universal automatic text summarizer that will work on factual documents as well as more sentiment charged articles.
And so… I created autoSmry!
autoSmry is a lightweight, modern-looking, automatic text summarizer built entirely in R and R Shiny.
Test 1: Jurassic World Movie
As a test for autoSmry, I went with the text of something that I can comfortably recall and assess… the plot of the 2015 movie reboot, Jurassic World 👍.
Here are the results of autoSmry after applying it to the plot of Jurassic World:
That’s pretty cool! The Indominus Rex is indeed the main villain of the movie and is the cause of all the action to our protagonists!
Test 2: Rainbow Six Novel
For the second test, I went with the plot summary to the novel Rainbow Six by Tom Clancy.
Here are the results of autoSmry after applying it to the plot of Rainbow Six:
That’s pretty good too!
Test 3: A blog post from Ask a Manager
I enjoy reading Ask A Manager for its great practical, everyday working advice, and also for some great entertainment too! Here’s the auto summary from a post titled when writing to a hiring manager, should I mention a shared hobby?.
A reader writes:
I thought this was okay. The reader summary could’ve been better if it included a sentence on what that hobby was, but the response summary was pretty spot on in my opinion.
Test 4: The about page from Prince of Travel
A friend of mine recently started a blog on how to travel the world in style, all on rewards and points. I ran autoSmry on his About | Prince of Travel page. Here are the results.
I can’t really find any faults on this summary. I’d say it’s the best one yet. (I’d also recommend giving his site a look if you are looking to travel somewhere!).
Test 5: A bunch of movies
I’m going to use the function directly to produce one sentence summaries for a bunch of movies.
First, a small bit of code to pull the top 10 highest grossing films from Wikipedia:
# Get top 10 movies movie.url <- "https://en.wikipedia.org/wiki/List_of_highest-grossing_films#Highest-grossing_films" movie.data <- read_html(movie.url) %>% html_node("table") %>% html_table() %>% head(10) # Drop unneeded data movie.data.clean <- movie.data %>% select(Rank, Title) # Load movie plots plot.data <- read_dir("E:/Projects/Website/data/movie.plots") %>% group_by(document) %>% summarize(Plot = paste(content, collapse = " ") %>% replace_non_ascii()) # Combine data movie.plot.data <- cbind(movie.data.clean, plot.data) %>% select(-document)
And now, to run the
auto_smry function for all 10 movie plots:
# Run auto_smry for all plot lines movie.plot.autosmry <- movie.plot.data %>% mutate(autoSmry = map(.$Plot, auto_smry, 1)) %>% select(-Plot)
Gives us the following table:
|1||Avatar||Quaritch prepares to slit the throat of Jake’s avatar, but Neytiri kills Quaritch and saves Jake from suffocation.|
|2||Titanic||Discovered with Jack, Rose tells a concerned Cal that she was peering over the edge and Jack saved her from falling.|
|3||Star Wars: The Force Awakens||Rey and Chewbacca escape with the unconscious Finn in the Falcon.|
|4||Avengers: Infinity War||Stark is seriously wounded by Thanos, but is spared after Strange surrenders the Time Stone to Thanos.|
|5||Jurassic World||Owen re-establishes his bond with the raptors before the Indominus reappears.|
|6||The Avengers||The Tesseract suddenly activates and opens a wormhole, allowing Loki to reach Earth.|
|7||Furious 7||Dom, Brian, Nobody and his team attempt to capture Shaw, but are ambushed by Jakande and his men and forced to flee while Jakande obtains God’s Eye.|
|8||Avengers: Age of Ultron||The Avengers fight amongst themselves when Stark secretly uploads J.A.R.V.I.S.a“who is still operational after hiding from Ultron inside the Interneta”into the synthetic body.|
|9||Black Panther||Fighting in Wakanda’s vibranium mine, T’Challa disrupts Killmonger’s suit and stabs him.|
|10||Harry Potter and the Deathly Hallows – Part 2||Harry discovers that he himself became a Horcrux when Voldemort originally failed to kill him and that Harry must die to destroy the piece of Voldemort’s soul within him.|
Reducing down to only 1 sentence makes it much harder to produce a sensible summary. The worst summary in my opinion is the one produced for Black Panther, however, for the other movies, it doesn’t appear to be too bad.
Yes… I am biased… but I really do think the results are fairly good! It can certainly serve as a fast initial first screen on things that I would otherwise just not read at all. Feel free to try it out yourself and let me know if you have any thoughts! (autoSmry is located in the Projects section)
I developed this tool as a quick and easy way for me to obtain summaries of text without worrying about if my data will be logged or captured in some way. I plan on keeping it free for use for everyone for as long as I am able to.