• University of Beira Interior

    DATA SCIENCE


2021/22, Spring

PROGRAM

- What is Data Science? Data infrastructure: challenges due to volume, heterogeneity and inconsistency/incompleteness;

- Data Science Fundamentals: Framing Problems, Data Wrangling, Exploratory Analysis, Feature Extraction and Modelling;

- Data Encoding and File Formats;

- Databases: Relational, Non-Structured Data;

- Data Visualization and Summarization;

- Pie, Bar Charts, Histograms, Boxplots, Scatterplots and Heat maps;

- Dimensionality Reduction

- Axis Rotation (PCA);

- Type Transformation (Wavelets, Spectral Analysis)

- Probability Distributions;

- Anscombe’s Quartet;

- Big Data;

- Hadoop, HDFS, PySpark;

- MapReduce Paradigm;

- Frequent Pattern Mining Model;

- Outlier Analysis;;

- Meta-Algorithms;

- Mining Web Data and Social Network Analysis;

- Software Engineering and Computational Performance

- CRAP Design;

- Key Data Structures;

- Amortized and Average Performance;


BIBLIOGRAPHY

- C. Aggarwal. Data Mining: the textbook. Springer, ISBN: 9783319141411, 2015.

- John Kelleger. Data Science. MIT Press Essential Knowledge Series, ISBN: 0262535432, 2018.

- Field Cady. The Data Science Handbook. Wiley, ISBN: 1119092949, 2017.


EVALUATION CRITERIA

- Assiduity (A) To get approved at this course, students should attend to - at least - 80% of the theoretical and practical classes

- Practical Project (P) The practical projects of this course weights 50% (10/20) of the final mark

- To get approved at the course, a minimal mark of 5/20 should be obtained in the practical project part;

- The pratical project mark is conditioned to an individual presentation and discussion by each student;

- Written Test (F) Monday, June 6th, 2022, 14:00. Room 6.18

- Mark (M) M = (A >= 0.8) * (P * 10/20 + F * 10/20)

- Admission to Exams Students with M >= 6 are admitted to final exams

- The practical projects mark is considered in all exam epochs;




CLASSES

Theoretical slides: [pdf]

Practical Project: [pdf]

Practical Sheet 1 (Introduction Python/DS): [pdf]

Theoretical slides: [pdf]

Practical Sheet 2 (Distributed ETL): [pdf]

Theoretical slides (Clustering): [pdf]

Theoretical slides (Outliers Detection): [pdf]

Old Written Tests (Examples): [zip]

Theoretical slides (Models Interpretability): [pdf]

Theoretical slides (Meta Learning): [pdf]

Theoretical slides (Semi-Supervised Learning): [pdf]





EVALUATION



FACULTY

HUGO PEDRO PROENÇA


Informatics Department

Theoretical + Practical classes