511333 - INFORMATION RETRIEVAL

courses

ID:

511333

Duration (hours):

60

CFU:

SSD:

SISTEMI DI ELABORAZIONE DELLE INFORMAZIONI

Year:

2025

Date/time interval

Primo Semestre (29/09/2025 - 16/01/2026)

Course Objectives

The course aims to provide an introduction to modern approaches to information retrieval from a collection of documents. It describes the architecture of modern systems and highlights the issues that the designer must face during the design and implementation of modern search engines and information retrieval systems.

Course Prerequisites

The student should have a basic knowledge of Internet and Web architecture, be able to develop applications using object-oriented languages (preferably Java), and know how to implement simple data structures, such as stacks, queues, lists and trees.

Teaching Methods

The course includes lectures and a series of laboratory sessions aimed at creating a project for information recovery.

Assessment Methods

Written examination and project

Texts

Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein, Introduction to algorithms Online Resources

Advanced data structures for information retrieval (Linked List, Hash Table, Binary Tree, B-Tree, Binary Heap); The architecture of modern information retrieval systems Dictionary and Posting List Management (Tokenization, Stemming, Porter's Algorithm, Linguistic Preprocessing) Optimization Methods for Information Retrieval Index Types (BiWord Index, Positional Index, Permuterm Index, k-Gram Index, Soundex Index) Data Structures for Dictionaries (Prefix Tree, Prefix Binary Tree, Prefix B-Tree) Identification of Syntactic and Semantic Errors (Edit Distance; K-Gram Overlap, Jaccard Similarity Coefficient) Index Construction Algorithms (Blocked Sort Based Indexing Algorithm, Single Pass In Memory Indexing Algorithm, Distributed Indexing Algorithms, Dynamic Indexes) Index Compression Techniques (Heaps Law, Zipf's Law, Dictionary Compression; Postings File Compression; Gamma Code) Identification of Duplicates (Fingerprint, Shingling, Signature, Min Hashing) Document Ranking (Weighted Search, Inverse Document Frequency) Document Representation in Vector Form (Bag of Word, Word Embedding, Document Embedding) Document Similarity and Distance (Cosine Distance, Jaccard Distance, Edit Distance) Word Embedding for Syntactic and Semantic Document Analysis, Sentiment Analysis, Text and Document Classification, Prediction of Next Words Neural Networks for Word Embedding (Word2Vec, Continuous Bag of Words, SkipGram) Neural Networks for Document Embedding (Doc2Vec, Distributed Memory Model Of Paragraph Vectors, Paragraph Vector With A Distributed Bag Of Words, FastText) Solr Image Retrieval Systems, Image Feature Extraction Techniques (Local Binary Pattern, Haar Wavelet Transform, Histogram of Oriented Gradient) Document Databases and MongoDB

Course Language

English

More information

The teaching material will be available on the Kiro teaching page

Degrees

COMPUTER ENGINEERING

Master’s Degree

2 years

No Results Found

People

SANTANGELO LUIGI

Teaching staff

No Results Found

511333 - INFORMATION RETRIEVAL

60

SISTEMI DI ELABORAZIONE DELLE INFORMAZIONI

Overview

Date/time interval

Syllabus

Course Objectives

Course Prerequisites

Teaching Methods

Assessment Methods

Texts

Contents

Course Language

More information

Degrees

Degrees

COMPUTER ENGINEERING

People

People

SANTANGELO LUIGI