5.2 Exercise: Visualization of Discussion Interactions

Course instructors may find it helpful if student discussion interactions can be visualized and presented in network diagrams. In this exercise, we are going to learn how to draw a network diagram in R using Canvas discussion data. Please refer to Course Resources page for the detailed instruction on installing R.

Part 1: Download Canvas discussion interaction data

Please install the Get Discussion Data userscript

Install a browser add-on: Greasemonkey for Firefox or Tampermonkey for Chrome/Safari. Please skip this step if you have already installed the add-on previously.
Install the Get Discussion Data userscript.
Login into Canvas, go to a course, navigate to the Discussions page, scroll all the way down to the bottom of the page, and click on the "Get Discussion Entries" button

Open the file in Excel, create an edge list out of the Discussion data that includes the four fields: from(reply_author), to(original_thread_author), weight(word_counts), and group(optional), and save it in a csv format as 'discussion.csv'.

If you don't have a Canvas course that contains discussion activities, you may download the sample data discussion.csv Download discussion.csv

to experiment.

Please load the discussion.csv you created or the sample data we provided in the app below, and a visualization of discussion interactions will be generated for you:

Part 2: Draw a network diagram in R using the edge list file

If you would like to experiment building a network diagram in R, please install R and igraph package. Please refer the course resources page for detailed instruction on installing R

Links to an external site.install.packages("igraph") #Download igraph package from CRAN and install it automatically

library(igraph) #Load the igraph package in R

Download the sample data sampledataset.zip Download sampledataset.zip. Load the links.csv file in R.

setwd("C:/.../...") # locate the links.csv file and set the working directory to the links.csv file

edge=read.csv("links.csv", header=TRUE)         # load the edgelist to R
el=as.matrix(edge)                                              # coerces the data into a two-column matrix format that igraph likes
el[,1]=as.character(el[,1])                                    # define values in the first column as characters
el[,2]=as.character(el[,2])                                   # define values in the second column as characters
g=graph.edgelist(el,directed=TRUE)                 # turns the edgelist into a 'graph object'

#draw the discussion interaction diagram

plot.igraph(g, #the graph to be plotted

layout=layout.fruchterman.reingold, #the layout method. see the igraph documentation for details

main='discussion interaction', #specifies the title

vertex.label.dist=0.5, #puts the name labels slightly off the dots

vertex.frame.color='blue', #the color of the border of the dots

vertex.label.color='black', #the color of the name labels

vertex.label.font=2, #the font of the name labels

vertex.label=V(g)$name, #specifies the labels of the vertices. in this case the 'name' attribute is used

vertex.label.cex=1, #specifies the size of the font of the labels. can also be made to vary

vertex.size=degree(g)*5 #specifies the size of each node. can also be made to vary

)

Part 3: Adding student attributes to the network diagram (Optional)

Course instructors may be curious to find out whether there is a relationship between student Canvas discussion interactions and their classwork performance. In order to include this information in a network diagram, we added student grade as an additional node attribute (Table 2), and added the word counts of a reply as a weight to each interaction (Table 3).

To experiment with the nodes' attributes in our analysis, we made up students' grades, converted them to a value either above median or below median. We also added a weight for each unique interaction that denotes the length of a reply. Graph 1 was generated by adding the length of a reply and student performance.

Graph 1: The color green means a performance above the median, and red denotes a performance below median. The size of each node represents the amount of the two-way interactions for the node. The thickness of the arrow line implies the number of academic words in each interaction.

Please use the sample data set to build a network diagram in R with igraph package.

Table1: an edgelist: An edge list is a two-column list of the two nodes that are connected in a network.

discussion feedback provider	discussion feedback receiver
studentA	studentB
studentA	studentC
studentA	studentE
studentB	studentF
studentB	studentC
studentB	studentE
studentC	studentD
studentC	studentE
studentD	studentE
studentE	instructorA
studentA	studentF
studentA	studentC
studentA	studentE
instructorA	studentE

Table 2 - nodes' attributes:

ID	performance
studentA	above
studentB	above
studentC	below
studentD	above
studentE	below
studentF	below

Table 3 - a weighted edgelist:

provider	receiver	weight
studentA	studentB	12
studentA	studentC	30
studentA	studentE	20
studentB	studentF	9
studentB	studentC	16
studentB	studentE	18
studentC	studentD	10
studentC	studentE	11
studentD	studentE	7
studentA	studentF	10
studentA	studentC	30
studentA	studentE	20

You may download the sample data set here.sampledataset.zip Download sampledataset.zip

After you downloaded the data set, extract the three files. If you want to read the files from a specific location you will need to set working directory in R.

setwd("C:/.../...") # Set the working directory

#load the nodes file and add performance attribute to the diagram

a= read.csv("nodes.csv", header=TRUE) #add the performance attribute

V(g)$performance=as.character(a$performance[match(V(g)$name,a$ID)]) # create a vertex attribute called "performance" by extracting the value of the column "performance" in the attributes file when the ID number matches the vertex name
V(g)$performance #print the new vertex attribute

V(g)$color=V(g)$performance                                #assign the "performance" attribute as the vertex color
V(g)$color=gsub("above","green",V(g)$color)        #assign ‘green’ color to nodes with above performance
V(g)$color=gsub("below","red",V(g)$color)            #assign ‘red’ color to nodes with below performance

#draw the diagram with performance attribute for each student

plot.igraph(g, layout=layout.fruchterman.reingold, main='discussion interactions', vertex.label.dist=0.5, vertex.frame.color='blue', vertex.label.color='black', vertex.label.font=2, vertex.label=V(g)$name, vertex.label.cex=1, vertex.size=degree(g)*5)

#when use the weightedEdgeList file, we need to turn the three column list to a 'graph' data frame

el= read.csv("weightedEdgeList.csv", header=TRUE)

g=graph.data.frame(el)

#draw the diagram with weight attribute

plot.igraph(g, layout=layout.fruchterman.reingold, main='discussion interaction – instructor is excluded', vertex.label.dist=0.5, vertex.frame.color='blue', vertex.label.color='black', vertex.label.font=2, vertex.label=V(g)$name, vertex.label.cex=1, vertex.size=degree(g)*5, edge.width=E(g)$weight/10 #specifies the width of the edge for each interaction)