Datasets to Learn and Grow With- Start with these.
Exploring datasets from diverse fields can provide invaluable insights and practical experience for those embarking on a journey into data analysis. Being comfortable working with these datasets is essential for building analytical skills as a competent data analyst.
The Paris 2024 Olympics dataset examines athlete performances, medal counts, and event outcomes from the recent Summer Games. It is ideal for historical analysis, which requires merging multiple CSV files for a complete picture.
Similarly, datasets like the "100 Highest Paid Athletes" or "Cost of Living Index" provide rich economic analysis and trend identification opportunities, helping you hone your data exploration and visualisation skills.
Together, these datasets are useful for anyone looking to grow as a data analyst. They offer both the complexity of sports analytics and the simplicity of economic comparisons while providing a broad perspective on how data can inform, predict, and influence various sectors.
1) Paris 2024 Olympics Medals
Dataset Overview:
The dataset focuses on the recently concluded Paris 2024 Olympic Summer Games.
It contains detailed information on participating athletes, countries, and events.
Key Features Included:
Lists of sports and disciplines.
Data on participating athletes and nations.
Event schedules and results (as available).
Athletes and Events: Information on participants, including their performance metrics.
Medals and Results: Detailed records of medal counts.
Sports and Venues: Data on the sports venues.
Contains: The dataset includes the following CSV files:
athletes.csv
coaches.csv
events.csv
medallists.csv
medals.csv
medals_total.csv
nocs.csv
schedules.csv
schedules_preliminary.csv
teams.csv
technical_officials.csv
torch_route.csv
venues.csv
Purpose:
Analyse and visualise all the events, winners and their countries from the 2024 Olympic Games.
Useful for predictive modelling, historical analysis, and sports data exploration.
Complexity:
Medium: To better understand the various aspects of the Olympics, you will need to combine all 13 CSV files.
It might require knowledge of SQL to import and merge the data.
You can also import all CSV files into Power BI and create relationships between each query to complete your data analysis.
URL: https://www.kaggle.com/datasets/piterfm/paris-2024-olympic-summer-games
2) Student Mental Health Survey
Dataset Overview:
It focuses on student mental health.
It is a relatively small dataset containing 101 entries with eight features (columns).
It aims to understand mental health issues, their prevalence, and potential factors influencing mental health among students.
It examines students between the ages of 17 and 26 studying for undergraduate or postgraduate degrees at five universities.
Key Features Included:
Age
Gender
Course
Year of Study
GPA
Reported campus discrimination, harassment or bullying
Sports engagement
Purpose:
To analyse mental health trends among students.
Identify correlations between features and reported mental issues.
Complexity:
Easy: Analysis can be done in Microsoft Excel, Power BI, SQL Server or any other analytical tool.
Contains: MentalHealthSurvey.csv
URL: https://www.kaggle.com/datasets/abdullahashfaqvirk/student-mental-health-survey
3) Cost of Living Index by Country
Dataset Overview:
Review the cost of living across various countries, based on indexing compiled by Numbeo index for 2024.
The cost of living indices in the dataset e are relative to New York City (NYC), with a baseline index of 100% for NYC.
An index value of 120 means that entry is 20% more expensive than when compared to New York City prices.
Credits:
Data scraped from Numbeo website: www.numbeo.com/cost-of-living/rankings_by_country.jsp
Key Features Included:
Cost of Living Index (Excl. Rent):
Measures the cost of consumer goods, excluding rent.
Rent Index:
Compare rental apartment prices to NYC prices.
Cost of Living Plus Rent Index:
Combines consumer goods and rent costs relative to NYC.
Groceries Index:
Compare grocery prices to NYC.
Restaurants Index:
Evaluates meal and drink prices in restaurants against NYC standards.
Local Purchasing Power:
Reflects how much goods and services the average net salary can buy compared to NYC.
Purpose:
Comparative Analysis: To compare the cost of living between different countries for relocation, business expansion, or research.
Economic Research: Understand economic conditions, inflation rates, and purchasing power compared to other nations.
Personal Finance: This is useful if you are planning to move abroad or want to understand the financial implications of living in different countries.
Business Strategy: Help companies decide pricing strategies, wage levels, and market expansion plans.
Complexity:
Easy: Analysis can be done in Microsoft Excel, Power BI, SQL Server or any other analytical tool.
Contains: Cost_of_Living_Index_by_Country_2024.csv
URL: https://www.kaggle.com/datasets/myrios/cost-of-living-index-by-country-by-number-2024
4) 100 Highest Paid Athletes of the World
Dataset Overview:
Analyse all top-earning athletes worldwide by nationality, sport, team and total income.
Key Features Included:
Athlete Name
Sport
Total Earnings: Combined income from salary, winnings, and endorsements.
Salary/Winnings: Earnings directly from their sport, including bonuses and prize money.
Endorsements: Income from sponsorships, advertisements, and other off-field activities.
Nationality: The country the athlete represents.
Team/Club: Current team or club affiliation, if applicable.
Rank: Position in the list based on total earnings.
Purpose:
Economic Analysis: To determine which sports generate the most income for athletes, considering the impact of endorsements, and analysing how athlete earnings correlate with sports popularity and marketability.
Market Trends: Identify trends in athlete compensation, the role of endorsements, and the influence of global sports events or leagues on earnings.
Fan and Business Insights: This resource is for fans, sports analysts, marketers, and businesses interested in athlete branding and market value.
Complexity:
Easy: Analysis can be done in Microsoft Excel, Power BI, SQL Server or any other analytical tool.
Note:
Only some athletes will have a team, as some compete as individuals.
Some of the column data will require cleaning.
URL: https://www.kaggle.com/datasets/batrosjamali/100-highest-paid-athletes-of-the-world