The course aims at providing an introduction of the modern approaches for retrieving information from a document collection. It describes foundation of indexing both in OLTP and web, the properties of transaction management, the architecture of the modern systems and address all issues the designer should face during designing and implementing modern search engines. At the end of the course, students will be able to design an efficient indexing system and components of a search engine.
Prerequisiti
Students should have a basic knowledge of DBMS, SQL, Internet and Web.
Metodi didattici
The course will be based both on lectures introducing the main topics and on sessions in a laboratory. It is integral part of the learning activity the start phase of the project development, which will be assisted by the teacher with proper hints and advices.
Verifica Apprendimento
The assessment consists of a project developed by each student as the follow up of the laboratory sessions, and of a test session on premises, delivered in each scheduled date for exams with a suitable platform. The final grade will be based on the project (70%) and test (30%).
Testi
a) Lessons charts b) “Database Systems - Concepts, Languages and Architectures” Paolo Atzeni, Stefano Ceri, Stefano Paraboschi and Riccardo Torlone, McGraw Hill (out of print, available on line for institutional training only at http://dbbook.dia.uniroma3.it/) c) Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008. available online at https://nlp.stanford.edu/IR-book/information-retrieval-book.html
Contenuti
Transactions and transaction management - concurrency control and policies in DBMS and distributed systems. Indexing in DBMS, Btrees and hashing – index selection in queries. The architecture of the modern Information Retrieval Systems; Term Vocabulary; Postings Lists; Data Structure for Dictionary; Tolerant Retrieval; WildCard Queries; Spelling Correction; Index Construction; Index Compression; Dictionary Compression; Postings Files Compression; Scoring; Term Frequency; Term Weighting; Vector Space Model; Parametric Index; Zone Index; Web Search Basics; Web Crawling
Lingua Insegnamento
INGLESE
Altre informazioni
Students can access the elearning platform at https://elearning.unipv.it/course/view.php?id=6807, where they can retrieve all learning material and get any other information/notice about the classes.