5.2 Exercise: Visualization of Discussion Interactions
Course instructors may find it helpful if student discussion interactions can be visualized and presented in network diagrams. In this exercise, we are going to learn how to draw a network diagram in R using Canvas discussion data. Please refer to Course Resources page for the detailed instruction on installing R.
Part 1: Download Canvas discussion interaction data
Please install the Get Discussion Data userscript
- Install a browser add-on: Greasemonkey for Firefox or Tampermonkey for Chrome/Safari. Please skip this step if you have already installed the add-on previously.
- Install the Get Discussion Data userscript.
- Login into Canvas, go to a course, navigate to the Discussions page, scroll all the way down to the bottom of the page, and click on the "Get Discussion Entries" button
Open the file in Excel, create an edge list out of the Discussion data that includes the four fields: from(reply_author), to(original_thread_author), weight(word_counts), and group(optional), and save it in a csv format as 'discussion.csv'.
If you don't have a Canvas course that contains discussion activities, you may download the sample data discussion.csv Download discussion.csv to experiment.
Please load the discussion.csv you created or the sample data we provided in the app below, and a visualization of discussion interactions will be generated for you:
Part 2: Draw a network diagram in R using the edge list file
If you would like to experiment building a network diagram in R, please install R and igraph package. Please refer the course resources page for detailed instruction on installing R
Links to an external site.install.packages("igraph") #Download igraph package from CRAN and install it automatically
library(igraph) #Load the igraph package in R
Download the sample data sampledataset.zip
Download sampledataset.zip. Load the links.csv file in R.
setwd("C:/.../...") # locate the links.csv file and set the working directory to the links.csv file
edge=read.csv("links.csv", header=TRUE) # load the edgelist to R
el=as.matrix(edge) # coerces the data into a two-column matrix format that igraph likes
el[,1]=as.character(el[,1]) # define values in the first column as characters
el[,2]=as.character(el[,2]) # define values in the second column as characters
g=graph.edgelist(el,directed=TRUE) # turns the edgelist into a 'graph object'
#draw the discussion interaction diagram
plot.igraph(g, #the graph to be plotted
layout=layout.fruchterman.reingold, #the layout method. see the igraph documentation for details
main='discussion interaction', #specifies the title
vertex.label.dist=0.5, #puts the name labels slightly off the dots
vertex.frame.color='blue', #the color of the border of the dots
vertex.label.color='black', #the color of the name labels
vertex.label.font=2, #the font of the name labels
vertex.label=V(g)$name, #specifies the labels of the vertices. in this case the 'name' attribute is used
vertex.label.cex=1, #specifies the size of the font of the labels. can also be made to vary
vertex.size=degree(g)*5 #specifies the size of each node. can also be made to vary
)
Part 3: Adding student attributes to the network diagram (Optional)
Course instructors may be curious to find out whether there is a relationship between student Canvas discussion interactions and their classwork performance. In order to include this information in a network diagram, we added student grade as an additional node attribute (Table 2), and added the word counts of a reply as a weight to each interaction (Table 3).
To experiment with the nodes' attributes in our analysis, we made up students' grades, converted them to a value either above median or below median. We also added a weight for each unique interaction that denotes the length of a reply. Graph 1 was generated by adding the length of a reply and student performance.
Graph 1: The color green means a performance above the median, and red denotes a performance below median. The size of each node represents the amount of the two-way interactions for the node. The thickness of the arrow line implies the number of academic words in each interaction.
Please use the sample data set to build a network diagram in R with igraph package.
Table1: an edgelist: An edge list is a two-column list of the two nodes that are connected in a network.
discussion feedback provider | discussion feedback receiver |
studentA | studentB |
studentA | studentC |
studentA | studentE |
studentB | studentF |
studentB | studentC |
studentB | studentE |
studentC | studentD |
studentC | studentE |
studentD | studentE |
studentE | instructorA |
studentA | studentF |
studentA | studentC |
studentA | studentE |
instructorA | studentE |
Table 2 - nodes' attributes:
ID | performance |
studentA | above |
studentB | above |
studentC | below |
studentD | above |
studentE | below |
studentF | below |
Table 3 - a weighted edgelist:
provider | receiver | weight |
studentA | studentB | 12 |
studentA | studentC | 30 |
studentA | studentE | 20 |
studentB | studentF | 9 |
studentB | studentC | 16 |
studentB | studentE | 18 |
studentC | studentD | 10 |
studentC | studentE | 11 |
studentD | studentE | 7 |
studentA | studentF | 10 |
studentA | studentC | 30 |
studentA | studentE | 20 |
You may download the sample data set here.sampledataset.zip Download sampledataset.zip
After you downloaded the data set, extract the three files. If you want to read the files from a specific location you will need to set working directory in R.
setwd("C:/.../...") # Set the working directory
edge=read.csv("links.csv", header=TRUE) # load the edgelist to R
el=as.matrix(edge) # coerces the data into a two-column matrix format that igraph likes
el[,1]=as.character(el[,1]) # define values in the first column as characters
el[,2]=as.character(el[,2]) # define values in the second column as characters
g=graph.edgelist(el,directed=TRUE) # turns the edgelist into a 'graph object'
#load the nodes file and add performance attribute to the diagram
a= read.csv("nodes.csv", header=TRUE) #add the performance attribute
V(g)$performance=as.character(a$performance[match(V(g)$name,a$ID)]) # create a vertex attribute called "performance" by extracting the value of the column "performance" in the attributes file when the ID number matches the vertex name
V(g)$performance #print the new vertex attribute
V(g)$color=V(g)$performance #assign the "performance" attribute as the vertex color
V(g)$color=gsub("above","green",V(g)$color) #assign ‘green’ color to nodes with above performance
V(g)$color=gsub("below","red",V(g)$color) #assign ‘red’ color to nodes with below performance
#draw the diagram with performance attribute for each student
plot.igraph(g, layout=layout.fruchterman.reingold, main='discussion interactions', vertex.label.dist=0.5, vertex.frame.color='blue', vertex.label.color='black', vertex.label.font=2, vertex.label=V(g)$name, vertex.label.cex=1, vertex.size=degree(g)*5)
#when use the weightedEdgeList file, we need to turn the three column list to a 'graph' data frame
el= read.csv("weightedEdgeList.csv", header=TRUE)
g=graph.data.frame(el)
#draw the diagram with weight attribute
plot.igraph(g, layout=layout.fruchterman.reingold, main='discussion interaction – instructor is excluded', vertex.label.dist=0.5, vertex.frame.color='blue', vertex.label.color='black', vertex.label.font=2, vertex.label=V(g)$name, vertex.label.cex=1, vertex.size=degree(g)*5, edge.width=E(g)$weight/10 #specifies the width of the edge for each interaction)