500028 - SOCIAL STATISTICS

courses

ID:

500028

Duration (hours):

44

CFU:

SSD:

STATISTICA SOCIALE

Year:

2025

Date/time interval

Secondo Semestre (16/02/2026 - 23/05/2026)

Course Objectives

Course Objectives

The Social Statistics course aims to provide students with the skills necessary to analyze social data using advanced statistical techniques. These techniques include social network analysis, machine learning, and large language models (LLM). Students will learn to apply these methodologies to extract meaningful information from social data and make informed decisions.

Course Prerequisites

There are no formal prerequisites for this course. However, in order to follow the lectures effectively, students are strongly advised to have a solid understanding of the main concepts typically covered in introductory courses in mathematics and statistics within economics programs. In particular, familiarity with basic mathematical tools is recommended, including elements of differential calculus (e.g., derivatives of elementary functions and the interpretation of rates of change) and linear algebra, such as vectors, matrices, matrix operations, and the solution of simple systems of linear equations. From a statistical perspective, students are expected to know the fundamental concepts of descriptive statistics (e.g., mean, variance, and statistical distributions) and the basic principles of statistical inference, including hypothesis testing and confidence intervals, as well as the foundations of simple linear regression and the interpretation of its parameters. These competencies provide the methodological background necessary to understand and correctly apply the analytical tools presented in the social statistics course.

Teaching Methods

Lectures
Lectures are devoted to the presentation of the theoretical foundations of the statistical methods covered in the course. In particular, they introduce the conceptual and methodological framework underlying the analysis of social data, including topics such as social network analysis, machine learning techniques, and the use of large language models (LLMs). During the lectures, the main models, assumptions, and interpretation of results are discussed, with examples drawn from real-world social data applications.

Classroom Exercises (MATLAB-based)
Practical sessions are dedicated to the implementation of the methods presented in the lectures through programming activities in MATLAB. Students will work with real or simulated datasets, learning how to preprocess data, implement statistical and machine learning algorithms, analyze social networks, and apply language models for the extraction of information from textual data.

The combination of theoretical lectures and hands-on computational exercises is designed to support the achievement of the course objectives. Lectures provide the conceptual and methodological knowledge required to understand advanced statistical techniques, while the MATLAB-based exercises allow students to develop the practical skills necessary to apply these methods to real social data. Together, these teaching methods enable students to acquire the competencies needed to analyze complex social datasets, extract meaningful insights, and support data-driven decision-making.

Assessment Methods

The exam consists of a group project and an in-class presentation:

Students will divide into groups with the aim of collecting and analyzing data on current topics. They must prepare a research paper and a presentation to be given in class in front of their peers.

A detailed report on the work carried out is required, with a clear indication of the individual contribution of each member. Several classroom sessions will be scheduled to check the progress of the group work.

The final score will be the sum of the following elements:
Evaluation of the group work presentation (10 points)
Evaluation of the report and the paper (20 points)
Honors will be awarded to students who not only achieve the highest score but also demonstrate significant and active involvement in the proposed activities.

Texts

Dietz, T., & Kalof, L. (2009). Introduction to Social Statistics: The Logic of Statistical Reasoning. Wiley-Blackwell.

Linneman, T. J. (2025). Social Statistics: Managing Data, Conducting Analyses, Presenting Results (5th ed.). Routledge.

Agresti, A., & Finlay, B. (2018). Statistical Methods for the Social Sciences (5th ed.). Pearson.

Moore, D. S., McCabe, G. P., & Craig, B. A. (2017). Introduction to the Practice of Statistics (9th ed.). W. H. Freeman.

Field, A. (2013). Discovering Statistics Using IBM SPSS Statistics (4th ed.). Sage.

MathWorks. (2024). Statistics and Machine Learning Toolbox Documentation. Retrieved from https://www.mathworks.com/help/stats/

Barabási, A.-L. (2016). Network Science. Cambridge University Press.

Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55–67.

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288.

Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320.

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

Module 1: Introduction to Social Statistics

The first module introduces the fundamentals of social statistics, explaining its definition, importance, and the different types of social data. The main sources of social data will be examined, and descriptive statistics such as measures of central tendency (mean, median, mode) and measures of dispersion (variance, standard deviation) will be covered. Additionally, the concepts of statistical inference and hypothesis testing will be addressed.

Module 2: Social Network Analysis

The second module focuses on social network analysis, introducing basic concepts and the representation of social networks through nodes and edges. Different types of social networks, such as egocentric and sociocentric networks, will be discussed. Social network metrics, including degree, centrality, and analysis of connected components, will be examined. Group cohesion and structure will also be covered, and tools and software for visualizing social networks will be utilized. Finally, practical applications of social network analysis will be explored through case studies.

Module 3: Machine Learning for Social Statistics

The third module introduces machine learning, explaining the difference between supervised and unsupervised learning, as well as the processes of training, validation, and testing. Supervised learning techniques such as linear and logistic regression, Lasso, Ridge, and Elastic Net, decision trees, and Random Forest will be explored. Ensemble methods such as Random Forest, LSboost, and Bagging will also be discussed, with practical applications for predicting social phenomena. In unsupervised learning, techniques such as clustering (K-means and hierarchical) and principal component analysis (PCA) will be examined, with applications for segmenting social groups. Performance metrics for machine learning models and cross-validation techniques will be explained, with particular attention to avoiding overfitting and underfitting.

Module 4: Large Language Models (LLM)

The final module introduces large language models (LLM), explaining their definition and history, as well as the main architectures such as RNN, LSTM, and Transformer. Applications of LLM in social analysis, such as sentiment analysis and opinion extraction from social media, and automatic text and summary generation, will be discussed. The module concludes with practical examples of LLM applications through case studies and hands-on exercises with pre-trained models such as GPT-3 and BERT.

Course Language

Italian

More information

NOTE: students enrolled in the Inclusive Learning Modalities programme (“Modalità didattiche inclusive) are requested to contact the Professor and the Degree Course Coordinator in order to assess specific needs and define targeted support actions.