Project Assignment 2
Proposal
Overview
For this part of the project, you are going to download the data that you discussed in your outline and wrangle it. “Wrangling” means cleaning the data and getting it into the proper format for analysis. For Tableau and other commonly used data analysis applications, the data need to be in long format so that each row contains values for one country or region per year and each column contains data for one variable. You also need to filter the data for the years that you want to use and select the variables that you want to work with. If you are working with more than one data set, then you have to merge the two data sets. I show you how to do all of this using data from the World Bank and V-Dem in this video.
Steps
- Download the data that you want to work with.
- Open the file in Excel’s Power Query Editor and see what the data look like.
- Are the data in long format or will you need to reshape them?
- How many years are there in the data and what will you need to filter out?
- Which variables do you want to work with?
If you want to work with more than one dataset, you will need to merge the two datasets if you want to display variables from the two datasets in the same visualization (like a scatter plot).
Follow the steps discussed in class to clean the data and prepare it for analysis.
I recommend working with just a handful of variables for this assignment. You need at least two for the scatter plot. You can then choose whether you want to also display one or both of those in the other visualizations. If you have four visualizations in your report, then you should have a maximum of five variables. But just using two or three is also fine.
What to Submit
Submit a report in the form of a white paper that tells your boss your plan for creating a Tableau report. The white paper should include the following elements:
- An abstract (\(\approx\) 150 words)
- Problem statement (\(\approx\) 100 words)
- Background (\(\approx\) 250 words)
- Solution (\(\approx\) 250 words)
- Conclusion (\(\approx\) 100 words)
- Reference list (include at least five sources)
In the solution section, include a brief description of each variable included in your dataset. Then talk in detail about the visualizations you will build using these variables. The default number of visualizations is four: a line chart; a bar chart; a scatter plot and a map. What will be on the x and y axis of each visualization? How many countries or regions will you include in each? You will probably want to include a selection of countries for the bar and line charts and all of the countries for the map. Please take this opportunity to also note any problems that you had in cleaning the data and any unresolved questions about your data. It is better to admit your limitations and try to find a solution than to simply turn something in that is not going to work.
Along with your report, please be sure to include a .csv or Excel file with your data. I want to make sure that you are ready to produce the visualizations in Power BI. You may even want to take some time and see that the data re ready for that as well.
Don’t overdo it. The final length of your white paper should be approximately 4-5 pages including references. No visualizations of the data are required at this time, however you are welcome to include any preliminary visualizations you have done in an appendix if you would like some feedback on them.