4.2. Data Delivered in Data Packages#

The edX data package that data czars download from Amazon S3 consists of a set of compressed and encrypted files that contain event logs and database snapshots for all of their organization’s edx.org and edge.edx.org courses.

Course-specific data is also available to the members of individual course teams. Users who are assigned the Admin or Staff role for the course can view and download data from the instructor dashboard in their live courses and from edX Insights. The data available to course teams from these applications is a subset of the data available in the data packages. For more information, see Building and Running an edX Course and Overview of EdX Insights.

4.2.1. Data Package Files#

A data package consists of different files that contain event data and database data.

Note

In all file names, the date is in {YYYY}-{MM}-{DD} format.

You download these files from different Amazon S3 “buckets” and folders. See Amazon S3 Buckets and Folders.

4.2.1.1. Event Data#

The {org}-{site}-events-{date}.log.gz.gpg file contains a daily log of course events. A separate file is available for courses running on edge.edx.org (with “edge” for {site} in the file name) and on edx.org (with “prod” for {site}).

For a partner organization named UniversityX, these daily files are identified by the organization name, the edX site name, and the date. For example, universityx-edge-events-2014-07-25.log.gz.gpg.

Each of these compressed files can range in size from hundreds of kilobytes to tens of megabytes. When you extract a compressed file, it is approximately 20 times larger. As a result, multiple gigabytes of space might be needed to store the tracking logs for a year.

For information about the contents of these files, see Data Package Contents.

4.2.1.2. Database Data#

The {org}-{date}.zip file contains views on database tables. This file includes data as of the time of the export, for all of an organization’s courses on both the edx.org and edge.edx.org sites. A new file is available every week, representing the database at that point in time.

For a partner organization named UniversityX, each weekly file is identified by the organization name and its extraction date: for example, universityx-2013-10-27.zip.

Compressed, these files can range in size from hundreds of megabytes to tens of gigabytes in size. When you extract a compressed file, it is approximately 20 times larger. As a result, institutions that receive data for several courses for several years might require from tens to hundreds of gigabytes of space for data storage.

For information about the contents of this file, see Data Package Contents.

4.2.2. Amazon S3 Buckets and Folders#

Data package files are located at the following Amazon S3 destinations:

  • The s3://edx-course-data/{org} folder contains the daily {org}-{site}-events-{date}.log.gz.gpg files of course event data.

  • The s3://course-data bucket contains the weekly {org}-{date}.zip database snapshot.

  • The s3://course-data/email-opt-in folder contains the report listing learners who have consented to be contacted by email.

For information about accessing Amazon S3, see Access Amazon S3.

4.2.3. Download Data Packages from Amazon S3#

You download the files in your data package from the Amazon S3 storage service.

4.2.3.1. Download Daily Event Files#

  1. To download daily event files, use the AWS Command Line Interface or a third-party tool to connect to the s3://edx-course-data/{org} folder on Amazon S3.

    For information about providing your credentials to connect to Amazon S3, see Access Amazon S3.

  2. Navigate within s3://edx-course-data/{org} to locate the files that you want:

    {org}/{site}/events/{year}

    The event logs in the {year} folder are in compressed, encrypted files named {org}-{site}-events-{date}.log.gz.gpg.

  3. Download the {org}-{site}-events-{date}.log.gz.gpg file.

    If your organization has courses running on both edx.org and edge.edx.org, separate log files are available for the “prod” site and the “edge” site. Repeat this step to download the file for the other site.

4.2.3.2. Download Weekly Database Files#

Note

If you are using a third-party tool to connect to Amazon S3, you might not be able to navigate directly between the s3://course-data bucket and the s3://edx-course-data/{org} folder. You might need to disconnect from Amazon S3 and then reconnect to the other destination.

  1. To download a weekly database data file, connect to the edX s3://course-data bucket on Amazon S3 using the AWS Command Line Interface or a third-party tool.

    For information about providing your credentials to connect to Amazon S3, see Access Amazon S3.

  2. Download the {org}-{date}.zip database data file from the s3://course-data bucket.

4.2.3.3. Download the Learner Email Opt-in Report#

  1. To download the report listing learners who have consented to be contacted by email, connect to the edX s3://course-data bucket on Amazon S3 using the AWS Command Line Interface or a third-party tool.

    For information about providing your credentials to connect to Amazon S3, see Access Amazon S3.

  2. Navigate within the s3://course-data bucket to the s3://course-data/email-opt-in folder.

  3. Download the {org}-{date}.zip file from the s3://course-data/email-opt-in folder.

4.2.4. Data Package Contents#

Each of the files you download contains one or more files of research data.

4.2.4.1. Extracted Contents of {org}-{site}-events-{date}.log.gz.gpg#

The {org}-{site}-events-{date}.log.gz.gpg file contains all event data for courses on a single edX site for one 24-hour period. After you download a {org}-{site}-events-{date}.log.gz.gpg file for your institution, you complete these steps.

  1. Use your private key to decrypt the file. See Decrypt an Encrypted File.

  2. Extract the log file from the compressed .gz file. The result is a single file named {org}-{site}-events-{date}.log. (Alternatively, the data can be decompressed in stream using a tool such as gzip.)

For more information about the events in this file, see Events in the Tracking Logs.

4.2.4.2. Extracted Contents of {org}-{date}.zip#

After you download the {org}-{date}.zip file for your institution, you complete these steps.

  1. Extract the contents of the file. When you extract (or unzip) this file, all of the files that it contains are placed in the same directory. All of the extracted files end in .gpg, which indicates that they are encrypted.

  2. Use your private key to decrypt the extracted files. See Decrypt an Encrypted File.

The result of extracting and decrypting the {org}-{date}.zip file is the following set of .sql, .csv, and .mongo files. Note that the .sql files are tab separated.

4.2.4.2.1. {org}-{course}-{run}-auth_user-{site}-analytics.sql#

Information about the users who are authorized to access the course. See Columns in the auth_user Table.

4.2.4.2.2. {org}-{course}-{run}-auth_userprofile-{site}-analytics.sql#

Demographic data provided by users during site registration. See Columns in the auth_userprofile Table.

4.2.4.2.3. {org}-{course}-{run}-certificates_generatedcertificate-{site}-analytics.sql#

The final grade and certificate status for learners (populated after course completion). See Columns in the certificates_generatedcertificate Table.

4.2.4.2.4. {org}-{course}-{run}-course_structure-{site}-analytics.json#

This file documents the structure of a course at a point in time. The file includes data for the course, including important dates, pages, and course- wide discussion topics. It also identifies each item of course content defined in the course outline. A separate file is included for each course on the site. For more information, see Course Content Data.

4.2.4.2.5. {org}-{course}-{run}-courseware_studentmodule-{site}-analytics.sql#

The courseware state for each learner, with a separate row for each item in the course content that the learner accesses. No file is produced for courses that do not have any records in this table (for example, recently created courses). See Columns in the courseware_studentmodule Table.

4.2.4.2.6. {org}-{course}-{run}-django_comment_client_role_users-{site}-analytics.sql#

This file lists the role that every enrolled user has for course discussions. See Columns in the django_comment_client_role_users Table.

4.2.4.2.7. {org}-{course}-{run}-student_courseaccessrole-{site}-analytics.sql#

This file reports the users who have a privileged role for the course. See Columns in the student_courseaccessrole Table.

4.2.4.2.8. {org}-{course}-{run}-student_courseenrollment-{site}-analytics.sql#

The enrollment status and type of enrollment selected by each learner in the course. See Columns in the student_courseenrollment Table.

4.2.4.2.9. {org}-{course}-{run}-student_languageproficiency-{site}-analytics.sql#

Indicates each learner’s self-reported language preference. See Columns in the student_languageproficiency Table.

4.2.4.2.10. {org}-{course}-{run}-teams_courseteam-{site}-analytics.sql#

Identifies the teams of learners established in a course that uses the teams feature. See Columns in the teams_courseteam Table.

4.2.4.2.11. {org}-{course}-{run}-teams_courseteammembership-{site}-analytics.sql#

In a course that uses the teams feature, this table indicates the learners who are members of each team. See Columns in the teams_courseteammembership Table.

4.2.4.2.12. {org}-{course}-{run}-user_api_usercoursetag-{site}-analytics.sql#

Metadata that describes different types of learner participation in the course. See Columns in the user_api_usercoursetag Table.

4.2.4.2.13. {org}-{course}-{run}-user_id_map-{site}-analytics.sql#

A mapping of user IDs to site-wide obfuscated IDs. See Columns in the user_id_map Table.

4.2.4.2.14. {org}-{course}-{run}-{site}.mongo#

The content and characteristics of course discussion interactions. See Discussion Forums Data.

4.2.4.2.15. ora Subdirectory#

The ora subdirectory contains SQL tables for data relating to any open response assessment (ORA) problems in your organization’s courses. For more information, see Open Response Assessment Data.

  • {org}-{course}-{run}-assessment_assessment-prod-analytics.sql.gpg

  • {org}-{course}-{run}-assessment_assessmentfeedback-prod-analytics.sql.gpg

  • {org}-{course}-{run}-assessment_assessmentfeedback_assessments-prod-analytics.sql.gpg

  • {org}-{course}-{run}-assessment_assessmentfeedback_options-prod-analytics.sql.gpg

  • {org}-{course}-{run}-assessment_assessmentfeedbackoption-prod-analytics.sql.gpg

  • {org}-{course}-{run}-assessment_assessmentpart-prod-analytics.sql.gpg

  • {org}-{course}-{run}-assessment_criterion-prod-analytics.sql.gpg

  • {org}-{course}-{run}-assessment_criterionoption-prod-analytics.sql.gpg

  • {org}-{course}-{run}-assessment_peerworkflow-prod-analytics.sql.gpg

  • {org}-{course}-{run}-assessment_peerworkflowitem-prod-analytics.sql.gpg

  • {org}-{course}-{run}-assessment_rubric-prod-analytics.sql.gpg

  • {org}-{course}-{run}-assessment_studenttrainingworkflow-prod-analytics.sql.gpg

  • {org}-{course}-{run}-assessment_studenttrainingworkflowitem-prod-analytics.sql.gpg

  • {org}-{course}-{run}-assessment_trainingexample-prod-analytics.sql.gpg

  • {org}-{course}-{run}-assessment_trainingexample_options_selected-prod-analytics.sql.gpg

  • {org}-{course}-{run}-submissions_score-prod-analytics.sql.gpg

  • {org}-{course}-{run}-submissions_scoresummary-prod-analytics.sql.gpg

  • {org}-{course}-{run}-submissions_studentitem-prod-analytics.sql.gpg

  • {org}-{course}-{run}-submissions_submission-prod-analytics.sql.gpg

  • {org}-{course}-{run}-workflow_assessmentworkflow-prod-analytics.sql.gpg

  • {org}-{course}-{run}-workflow_assessmentworkflowstep-prod-analytics.sql.gpg

4.2.4.2.16. {org}-{course}-{run}-student_anonymoususerid-prod-analytics.sql.gpg#

A mapping of user IDs to the course specific anonymous IDs used by open response assessment tables. See Columns in the student_anonymoususerid Table.

4.2.4.2.17. {org}-{course}-{run}-wiki_article-{site}-analytics.sql#

Information about the articles added to the course wiki. See Fields in the wiki_article File.

4.2.4.2.18. {org}-{course}-{run}-wiki_articlerevision-{site}-analytics.sql#

Changes and deletions affecting course wiki articles. See Fields in the wiki_articlerevision File.

4.2.4.3. Extracted Contents of email-opt-in/{org}-{date}.zip#

After you download the {email-opt-in/org}-{date}.zip file for your institution, you complete these steps.

  1. Extract the contents of the file. When you extract (or unzip) this file, the contents are a single file that ends in .gpg, which indicates that it is encrypted.

  2. Use your private key to decrypt the extracted file. See Decrypt an Encrypted File.

The result of extracting and decrypting the {org}-{date}.zip file is the following .csv file.

4.2.4.3.1. {org}-email_opt_in-{site}-analytics.csv#

This file reports the email preference selected by learners who are enrolled in any of your institution’s courses. See Email Opt-in Report.