1.Understand the role of data science in their discipline 2.Use Python and Jupyter to explore different datasets 3.Apply basic criteria and tools to transform and visualise their data 4.Interpret their data based on the results of an exploratory data analysis (EDA)
Prerequisiti
None. This course is designed for students with no prior background in data analysis and minimal knowledge of statistics.
Metodi didattici
Class activity will be focused to demonstrations, discussions and problem solving through interaction: demo, group work, quiz and real-time feedback. Jupyter notebooks and Google Colab will be used for easy access to coding and command line tools, with no prior experience.
Verifica Apprendimento
Multiple choice quiz
Testi
- Python for Data Analysis, 3rd Edition Wes McKinney, O’Reilly https://wesmckinney.com/book/ - Python4DS Arthur Turrell et al. https://aeturrell.github.io/python4DS/welcome.html - Python Data Science Handbook Jake VanderPlas, O’Reilly https://github.com/jakevdp/PythonDataScienceHandbook/tree/maste
Contenuti
This course is aimed at a medical and life science audience, with no prior background of data analysis and a minimal background of statistics. The goal of the course is to provide students with the most important tools and decision criteria, to import and visualise data originating from different sources (structured medical data, laboratory measures, biological experiments), to explore and understand key elements in those datasets they might encounter in their studies or everyday practice.
To achieve this goal, the course focuses on practical data science techniques using Python, leveraging Jupyter Notebooks and/or Google Colab for hands-on practice. Key topics include: 1.Introduction to Python for Data Science: o Overview of Python's relevance to life sciences. o Introduction to Jupyter Notebooks and Google Colab. o Setting up your environment and working with online/cloud-based tools. 2. Python Basics for Data Analysis: o Data types and basic operations in Python. o Using pandas to manipulate tabular data. o NumPy for numerical computations. 3.Data Wrangling and Preparation: o Understanding and cleaning messy datasets. o The concept of “tidy data”/tabular in Python. o Merging, grouping, and transforming datasets. 4. Data Visualization: o Exploratory plots using matplotlib and seaborn. o Interactive visualizations with Plotly. o Visualization techniques for biological and medical datasets. 5. Introduction to Statistics in Python: o Descriptive statistics and their applications. o Hypothesis testing using Python’s scipy/statsmodel. 6. Data Lab: o Hands-on exercises using Google Colab or local Jupyter Notebooks. o (BONUS TRACK) use of ChatGpt in the data science workflow 7. Applied Examples of Data Analysis: o Real-world case studies with medical and biological datasets. o (Exploratory data analysis workflow from data import to visualization.)