Collaboratory data can be exported into a .csv (comma separated value) file for use beyond Collaboratory. These fields can be saved locally on your computer, then edited by a spreadsheet program like Microsoft Excel, and are also commonly accepted as an import file by other common programs/services like SPSS, Stata, or Microsoft Access.
How to Export Data
Administrators can export data for each content type collected via Collaboratory: Activities, Courses, Units, Organizations, and Institutional Profiles. To export data, select Administration | Reports, then select Download all Your Data at the bottom of the page.
Here, Administrators may select Export to export data for each content type as needed.
Interpreting Export Files
Data exports are organized by categories presented in Collaboratory such as activities, courses, units, and community organizations. Because Collaboratory is built using a relational database, export file data can be connected between categories by referring to the provided 32-character ID fields and mapping those to the corresponding ID keys in the related data.
- For example, in an activity export file, corresponding Courses will be listed as alphanumeric IDs under the Courses column. By exporting the Courses data and matching IDs, you can connect course information to their specific activities.
Understanding and Unpacking Relational Data
Our database stores data in graph form. While this has many advantages, it makes preparing and working with a data export tricky due to its complex structure. The remainder of this article is meant to give some insight into how to work with these data exports in more sophisticated ways.
The data exports have one instance per row - for example, the Activities export identifies one activity per row, the Community Organization export identifies one community organization per row, etc. The challenge is that Collaboratory data also includes a number of one-to-many relationships. The simplest case is when data has been entered through a multi-select checkbox. For example, a single activity can serve multiple populations. In this case, these populations are simply listed as comma separated values within a single cell.
In a more complex case, there may be relationships between entities where one will need to identify attributes from entities across the separate downloads. For example, notice that activities information and community organizations information are held in separate downloads. A single activity can be affiliated with multiple community organizations, and we may need to join data across the two downloads to grab columns from a particular activity and its affiliated community organization(s).
Using SQL to join tables
These types of relationships are stored as an array of UUIDs, which can be unnested to build a more standard relational or “lookup” table:
This can now be merged with the community organizations download by joining on community_org_id.
Some cases are even trickier. For example, community orgs may have multiple roles that are specific and unique to each activity. This means that if an activity is associated with more than one community org, the roles will be held as a nested array - a list of lists. The first step is to identify that these individual lists are delimited from the others by the character string:
Therefore we can parse this into individual rows like we did in the example above using SQL:
Now we have all the data we need, although in a slightly messy format. Since the roles are just a text string, we can do some cleanup using a bit of regex:
Finally, putting everything together, we have: