Strumenti Utente

Strumenti Sito


digitalhealth:0002a

Algorithms and data structures for data-intensive applications

Teacher: Paolo Ferragina e Giovanni Manzini

CFU: 9.

Language: English.

Degree: Informatics for Digital Health


Goals

The goal of this course is to enrich and strengthen the knowledge about basic algorithms and data structures, learned in undergraduate courses, with further models, methodologies, tools, and techniques to design, analyze, experiment, and improve solutions for managing massive datasets of multimodal types: integer, string, vector, text, time series and (labeled) graph.

These knowledge and skills constitute the building blocks upon which Big Data platforms and applications are built (e.g., key-value stores, graph DBs, vector DBs, Transformers, …), so this theoretical part will provide students with the “methodological toolbox” for designing and evaluating their big data solutions, and for choosing the right library to adopt for their implementation.

The theoretical lectures will be complemented by some hands-on experience in coding with relevant programming libraries that offer the above-mentioned building blocks. The course provides most of the algorithmic prerequisites for the other courses of this master’s degree.

Syllabus

  • Key issues about I/Os, streaming, and hierarchical memory in managing massive datasets.
  • Algorithmic and data structure issues concerning atomic versus variable-length items.
  • Two fundamental tools: sorting and permuting.
  • Key-value stores: From basic to advanced hash tables and tries.
  • Textual search engines: compressing and accessing posting lists (integer sequences) and text collections. Deduplicating similar/equal texts, clustering text collections.
  • Bio-Informatics engines: compressing and indexing arbitrary text for substring search, exact or approximate.
  • Vector DBs: nearest neighbor search, hamming, or Euclidean distance.
  • Advanced storage: Streaming and random access to compressed raw files, time series, and (labeled) graphs.

Lectures will include in-depth discussions on the practical efficiency of the proposed algorithms and data structures, plus hands-on experience with coding solutions for some topics.

Background

If you wish to refresh your mind on Algorithms and Data Structures, I suggest you to follow the Video Lectures by Erik Demaine and Charles Leiserson, specifically Lectures 1-5, 7 and 10. There it is missing the part on basic graph problems (representation, DFS, BFS, topological sort) which you may browse in any book, such as Introduction to Algorithms by Cormen-Leiserson-Rivest-Stein, third edition.

Current year

Previous years

digitalhealth/0002a.txt · Ultima modifica: 09/07/2024 alle 12:47 (5 mesi fa) da Paolo Ferragina

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki