You are required to carry out a series of analyses on publicly accessible datasets using the R programming language used in this module and programming environments suitable for the task. It is recommended that your use at least two separate datasets. For each of the chosen datasets, you are required to compile a report of your analysis. Each dataset should have at least 1,000 records (rows). If you are unsure if your dataset(s) is/are appropriate, please check with your lecturer. You must provide evidence in your report that you are authorized to use the dataset(s) that you have chosen.
The main deliverable is a report that provides significant insights into the datasets that you have chosen to analyze. Your report should provide at least four unique insights based on your data analysis. Examples of insights might include relationships, trends/patterns, correlations, models based on the data, visuals, and statistical analyses.
All deliverables should be compiled into a project report document for submission along with all programming code elements in an appendix. Please submit your report via the Turnitin upload link in Moodle. R scripts and additional files are to be uploaded to a separate link in Moodle. Your project report should discuss the challenges that you encountered while handling your chosen datasets and the means and mechanisms you implemented to overcome these challenges. The word count for your report should be not less than 2,000 words, and not more than 2,500 words (not counting R code).
As with every piece of data analysis, you should ideally have a question or set of questions you expect your work to answer; these are your objectives. They will be graded for realism, imagination, ambition and clarity of expression.
Your chosen datasets should be included in their original form as ancillary files. If they are prohibitively large, you should include a well-chosen, representative subset. Where and how the datasets where located and downloaded should be clearly shown. They will be graded for richness, depth and interest factor.
Your data analysis should be designed in advance and the design documented via description and visual aids such as tables, flowcharts, and other appropriate schematics. Please note that screenshots of your code do not count as such in the general case and should be avoided unless there is a specific reason why they are appropriate.
There are 3 established approaches to data analysis
The results of your analysis should go as far as possible towards reaching your prior stated objectives (i.e. answering the questions you were hoping to answer). Note that a robust conclusion that the dataset or the analysis aren’t enough to reach a specific conclusion isn’t a failure but is, in fact, a positive result!
You are required to use R to an extent that showcases your aptitude with the most important operations learned during the course (file I/O, control structures, functions, etc). A substantial amount of code is expected.
Your ultimate project report, encompassing all the above, will be graded for structure, presentation and quality of its discussion of challenges. There is no requirement to structure your report as a scientific paper, though you are free to do so if you prefer.