Auto text summarization (autoSmry)

May 26, 2018
text nlp R bot fun

I started off my text analytics journey with sentiment analysis because I just thought it was so very cool that a machine can detect positivity or negativity from humans! I ended up creating a report which condensed review-like reports after training and applying a sentiment algorithm to gauge the amount of positivity or negativity from the reviewers. I found that as a by product, by focusing on particularly positive or particularly negative comments, my created report also produced a fairly decent summary.

This however only works on reviews. I wanted to look into a universal automatic text summarizer that will work on factual documents as well as more sentiment charged articles.

And so… I created autoSmry!

autoSmry

autoSmry is a lightweight, modern-looking, automatic text summarizer built entirely in R and R Shiny.

Test 1: Jurassic World Movie

As a test for autoSmry, I went with the text of something that I can comfortably recall and assess… the plot of the 2015 movie reboot, Jurassic World 👍.

Test results

Here are the results of autoSmry after applying it to the plot of Jurassic World:


That’s pretty cool! The Indominus Rex is indeed the main villain of the movie and is the cause of all the action to our protagonists!

Test 2: Rainbow Six Novel

For the second test, I went with the plot summary to the novel Rainbow Six by Tom Clancy.

Test results

Here are the results of autoSmry after applying it to the plot of Rainbow Six:


That’s pretty good too!

Test 3: A blog post from Ask a Manager

I enjoy reading Ask A Manager for its great practical, everyday working advice, and also for some great entertainment too! Here’s the auto summary from a post titled when writing to a hiring manager, should I mention a shared hobby?.

Test results:

A reader writes:


Response:


I thought this was okay. The reader summary could’ve been better if it included a sentence on what that hobby was, but the response summary was pretty spot on in my opinion.

Test 4: The about page from Prince of Travel

A friend of mine recently started a blog on how to travel the world in style, all on rewards and points. I ran autoSmry on his About | Prince of Travel page. Here are the results.

Test results:


I can’t really find any faults on this summary. I’d say it’s the best one yet. (I’d also recommend giving his site a look if you are looking to travel somewhere!).

Test 5: A bunch of movies

I’m going to use the function directly to produce one sentence summaries for a bunch of movies.

First, a small bit of code to pull the top 10 highest grossing films from Wikipedia:

# Get top 10 movies
movie.url <- "https://en.wikipedia.org/wiki/List_of_highest-grossing_films#Highest-grossing_films"

movie.data <- read_html(movie.url) %>%
        html_node("table") %>%
        html_table() %>%
        head(10)

# Drop unneeded data
movie.data.clean <- movie.data %>%
        select(Rank, Title)

# Load movie plots
plot.data <- read_dir("E:/Projects/Website/data/movie.plots") %>%
        group_by(document) %>%
        summarize(Plot = paste(content, collapse = " ") %>%
                          replace_non_ascii())

# Combine data
movie.plot.data <- cbind(movie.data.clean, plot.data) %>%
        select(-document) 

And now, to run the auto_smry function for all 10 movie plots:

# Run auto_smry for all plot lines
movie.plot.autosmry <- movie.plot.data %>%
        mutate(autoSmry = map(.$Plot, auto_smry, 1)) %>%
        select(-Plot)

Test results

Gives us the following table:

Rank Title autoSmry
1 Avatar Quaritch prepares to slit the throat of Jake’s avatar, but Neytiri kills Quaritch and saves Jake from suffocation.
2 Titanic Discovered with Jack, Rose tells a concerned Cal that she was peering over the edge and Jack saved her from falling.
3 Star Wars: The Force Awakens Rey and Chewbacca escape with the unconscious Finn in the Falcon.
4 Avengers: Infinity War Stark is seriously wounded by Thanos, but is spared after Strange surrenders the Time Stone to Thanos.
5 Jurassic World Owen re-establishes his bond with the raptors before the Indominus reappears.
6 The Avengers The Tesseract suddenly activates and opens a wormhole, allowing Loki to reach Earth.
7 Furious 7 Dom, Brian, Nobody and his team attempt to capture Shaw, but are ambushed by Jakande and his men and forced to flee while Jakande obtains God’s Eye.
8 Avengers: Age of Ultron The Avengers fight amongst themselves when Stark secretly uploads J.A.R.V.I.S.a“who is still operational after hiding from Ultron inside the Interneta”into the synthetic body.
9 Black Panther Fighting in Wakanda’s vibranium mine, T’Challa disrupts Killmonger’s suit and stabs him.
10 Harry Potter and the Deathly Hallows – Part 2 Harry discovers that he himself became a Horcrux when Voldemort originally failed to kill him and that Harry must die to destroy the piece of Voldemort’s soul within him.

Reducing down to only 1 sentence makes it much harder to produce a sensible summary. The worst summary in my opinion is the one produced for Black Panther, however, for the other movies, it doesn’t appear to be too bad.

Conclusion

Yes… I am biased… but I really do think the results are fairly good! It can certainly serve as a fast initial first screen on things that I would otherwise just not read at all. Feel free to try it out yourself and let me know if you have any thoughts! (autoSmry is located in the Projects section)

My plans

I developed this tool as a quick and easy way for me to obtain summaries of text without worrying about if my data will be logged or captured in some way. I plan on keeping it free for use for everyone for as long as I am able to.

autoSmry Update 2 - Dynamic number of sentences & more languages

August 11, 2018
autoSmry bot nlp text

A look at teacher misconduct in Canada...

August 8, 2018
R data

autoSmry Update 1 - New features!

July 19, 2018
autoSmry bot nlp text