A look at teacher misconduct in Canada...

August 8, 2018
R data

(Updated on Dec. 31, 2018 for new data)

A couple of years back there was a news article written about someone that I went to high school with. We were in the same grade and we shared many classes together, including some extra-curricular classes like band class. We were never that close, and we never stayed in contact after high school, but nonetheless I recognized the name and picture when I saw it.

It turns out that he made the news because he was charged with “sexual exploitation” of a student of his while he was a teacher at a high school that was about 10 min away from where I now live. I recently just drove by that high school, and that prompted me to look into that case and whether he was found guilty or not (he was).

While exploring the internet for the results of his hearing, I found that the Toronto District School Board has a website that lists all offenses from presumably all teachers, going as far back as 2012 or so. It piqued my interest to explore this dataset a bit more fully than to simply look up a past classmate’s fate.

Interactive Exploration

Exploring the text summary of teacher misconduct hearings

Method

I used a word embedding layer to reconstruct the linguistic context of the words. After which I used a relatively novel (published Feb. 13, 2018) technique for dimensionality reduction by McInnes and Healy (they called it uniform manifold approximation and projection) to reduce the 300 dimensional word embedding layer to 2 dimensions for easy visualization.

The visualization is then clustered by two levels. First level (the physical cluster of words) is by how close the words are to each other as given by the word embedding layer. Second level (the colour of the words) is by the edge betweeness centrality.

Observations

There are some distinct clusters of topics within the hearing texts. Most notably, there’s a rather large cluster of sexual misconduct. There are a number of additional clusters representing various types of abuse and frequency of misconduct as well.

One additional observation I noticed was that males greatly outnumber females.

Appendix

1. Full dataset

I made the full dataset available to you to explore.

2. Obtaining and cleaning the data

How to scrape the data

The data from the “Professionally Speaking” website is split by month across many years.

Professionally Speaking Website

Professionally Speaking Website

I decided to make the urls from a couple of small functions:

# Function to loop through years
ps_year <- function(start.year, end.year) {
        yr <- list()
        n <- 1
        for (i in start.year:end.year) {
                yr[n:(n + 3)] <- rep(i, 4)
                n <- n + 4
                
        }
        return(yr)
}

# Function to loop through months
ps_month <- function() {
        mon <- c("March", "June", "September", "December")
        return(mon)

}

# Function to build professionally speaking URL
ps_url <- function(start.year, end.year) {
        url <- paste0("http://professionallyspeaking.oct.ca/hearings/hearings-",
                      ps_year(start.year, end.year), "-",
                      ps_month(), ".aspx")
        return(url)
        
}

With our functions made, we can now scrape all of the available “Hearings” data:

start.date <- 2012
end.date <- 2018

url <- ps_url(start.date, end.date)
# Use possibly to process urls not yet available
read_url_possibly <- possibly(read_html, otherwise = NULL)
nodes_possibly <- possibly(html_nodes, otherwise = NULL)
text_possibly <- possibly(html_text, otherwise = NULL)

url.data <- url %>%
        map(read_url_possibly) %>%
        map(nodes_possibly, "p") %>%
        map(text_possibly)
        
# View results
head(url.data[[1]], 2)
## [1] "Three-member panels of the Discipline  Committee conduct public hearings  into cases of alleged incompetence or  professional misconduct. The panels  are a mix of elected and appointed  Council members. Members found  guilty of incompetence or professional  misconduct may have their certificate  revoked, suspended or limited. In  professional misconduct matters only,  the committee may also reprimand,  admonish or counsel the member,  impose a fine, order the member to  pay costs, or publish the order in  Professionally Speaking."
## [2] "Member: C. Robert Clements Registration No: 266425Decision: Revoked"

How to clean the data

Ok good. It worked. Let’s clean it up some by removing that first paragraph, and segregating it by case rather than by line:

url.data.clean <- list()

# Remove preamble
for (i in 1:length(url.data)) {
        url.data.clean[i] <- list(i = url.data[[i]][-1])
        
}

# Separate by case, instead of by sentence/line
url.data.clean <- url.data.clean %>%
        map(paste, collapse = " <br><br> ") %>%
        textclean::replace_white() %>%
        map(strsplit, split = "(?<=.)(?=Member: )", perl = TRUE)

# Label list by year-month
names(url.data.clean) <- paste(ps_year(start.date, end.date), 
                               ps_month(),"",
                               sep = "_")

And now, let’s put everything in a data frame instead of working with lists, because I’m not as familiar with lists…

# Establish number of entries
n <- url.data.clean %>%
        unlist() %>%
        length()

# Skeleton data frame
url.data.df <- tibble(id = rep(NA, n),
                      name = NA,
                      gender = NA,
                      text = NA)

# Label date
url.data.df$id <- url.data.clean %>%
        unlist() %>%
        names()

# Extract year
url.data.df$year <- url.data.df$id %>%
        str_extract("20[0-9]+")

# Extract month
url.data.df$month <- url.data.df$id %>%
        str_extract("[A-Za-z]+")

# Place full text into `text` column
url.data.df$text <- url.data.clean %>%
        map(1) %>%
        unlist() %>%
        textclean::replace_white()

Let’s take a look to see what our data frame looks like right now:

id name gender text year month
2012_March_1 NA NA Member: C. Robert Clements Registration No: 266425Decision: Revoked

A Discipline Committee panel ordered the Registrar to revoke the certificate of C. Robert Clements for sexually assaulting a 15-year-old male.

Clements, who joined the teaching profession in June 1964, did not attend the August 25, 2011 hearing and was not represented.

The panel heard evidence that, in January 2009, Clements invited the boy to his home where he fondled the boy. Subsequently, Clements was found guilty of sexual assault in criminal court and sentenced to 12 months probation. As well, he was ordered not to contact the boy or his family, not to be in the presence of anyone under 16 except with the child’s parent(s) present, not to possess any weapons for 10 years and to attend treatment and counselling.

Having considered the evidence, the Discipline Committee panel found the member guilty of professional misconduct and directed the Registrar to revoke his Certificate of Qualification and Registration.

“The member engaged in inappropriate and unprofessional conduct with the student while that student was under his care and supervision,” the panel said. “The member’s conduct is disgraceful and unbecoming a member of the profession. Revocation is the appropriate penalty for misconduct of this severity and protects the public interest.”

2012 March

Perfect. That’s exactly what we are looking for (namely a structured view of the unstructured entries in Professionally Speaking).

Let’s continue our processing steps to extract the name of the teacher:

## Extract name
# First, we need to add spaces so all words are separated by a space
url.data.df$text <- url.data.df$text %>%
        map(str_replace_all, "([0-9a-zA-Z])([D|R])", "\\1 \\2")

# Next, we add a [space] after ":"
url.data.df$text <- url.data.df$text %>%
        map(str_replace_all, "([:])(.)", "\\1 \\2")


# Looking for all words after "Member: " and before "Registration|Decision"
url.data.df$name <- url.data.df$text %>%
        str_extract("(?s)(?<=Member: )(.*?)(?= (Reg|Dec))") %>%
        textclean::replace_white()
id name gender text year month
2012_March_1 C. Robert Clements NA Member: C. Robert Clements Registration No: 266425 Decision: Revoked

A Discipline Committee panel ordered the Registrar to revoke the certificate of C. Robert Clements for sexually assaulting a 15-year-old male.

Clements, who joined the teaching profession in June 1964, did not attend the August 25, 2011 hearing and was not represented.

The panel heard evidence that, in January 2009, Clements invited the boy to his home where he fondled the boy. Subsequently, Clements was found guilty of sexual assault in criminal court and sentenced to 12 months probation. As well, he was ordered not to contact the boy or his family, not to be in the presence of anyone under 16 except with the child’s parent(s) present, not to possess any weapons for 10 years and to attend treatment and counselling.

Having considered the evidence, the Discipline Committee panel found the member guilty of professional misconduct and directed the Registrar to revoke his Certificate of Qualification and Registration.

“The member engaged in inappropriate and unprofessional conduct with the student while that student was under his care and supervision,” the panel said. “The member’s conduct is disgraceful and unbecoming a member of the profession. Revocation is the appropriate penalty for misconduct of this severity and protects the public interest.”

2012 March

Let’s now extract the “Decision”:

## Extract Decision
# Looking for all words after "Decision: " and before "A panel/etc."
url.data.df$decision <- url.data.df$text %>%
        str_extract("(?s)(?<=Decision: )(.*?)(?= <br>)")
id name gender text year month decision
2012_March_1 C. Robert Clements NA Member: C. Robert Clements Registration No: 266425 Decision: Revoked

A Discipline Committee panel ordered the Registrar to revoke the certificate of C. Robert Clements for sexually assaulting a 15-year-old male.

Clements, who joined the teaching profession in June 1964, did not attend the August 25, 2011 hearing and was not represented.

The panel heard evidence that, in January 2009, Clements invited the boy to his home where he fondled the boy. Subsequently, Clements was found guilty of sexual assault in criminal court and sentenced to 12 months probation. As well, he was ordered not to contact the boy or his family, not to be in the presence of anyone under 16 except with the child’s parent(s) present, not to possess any weapons for 10 years and to attend treatment and counselling.

Having considered the evidence, the Discipline Committee panel found the member guilty of professional misconduct and directed the Registrar to revoke his Certificate of Qualification and Registration.

“The member engaged in inappropriate and unprofessional conduct with the student while that student was under his care and supervision,” the panel said. “The member’s conduct is disgraceful and unbecoming a member of the profession. Revocation is the appropriate penalty for misconduct of this severity and protects the public interest.”

2012 March Revoked

Let’s find the gender of the teacher:

# If "he" is mentioned than gender = male, otherwise gender = female
is_male <- function(txt) {
        tok <- tokenizers::tokenize_words(txt) %>%
                unlist()
        sum.he <- sum(tok == "he" | tok == "him" | tok == "himself")
        sum.she <- sum(tok == "she" | tok == "her" | tok == "herself")
        
        if (sum.he > sum.she) {
                return("male")
        } else {
                return("female")
        }

}

url.data.df$gender <- url.data.df$text %>%
        map(is_male)

RNN vs. Dense neural networks for time-series

June 27, 2018
R modelling

Auto neural networks vs. Manual Keras neural model

June 11, 2018
R modelling

Auto text summarization (autoSmry)

May 26, 2018
text nlp R bot fun