• University of Beira Interior

    DATA SCIENCE


    2021/22, Spring

    PROGRAM

    - What is Data Science? Data infrastructure: challenges due to volume, heterogeneity and inconsistency/incompleteness;

    - Data Science Fundamentals: Framing Problems, Data Wrangling, Exploratory Analysis, Feature Extraction and Modelling;

    - Data Encoding and File Formats;

    - Databases: Relational, Non-Structured Data;

    - Data Visualization and Summarization;

    - Pie, Bar Charts, Histograms, Boxplots, Scatterplots and Heat maps;

    - Dimensionality Reduction

    - Axis Rotation (PCA);

    - Type Transformation (Wavelets, Spectral Analysis)

    - Probability Distributions;

    - Anscombe’s Quartet;

    - Big Data;

    - Hadoop, HDFS, PySpark;

    - MapReduce Paradigm;

    - Frequent Pattern Mining Model;

    - Outlier Analysis;;

    - Meta-Algorithms;

    - Mining Web Data and Social Network Analysis;

    - Software Engineering and Computational Performance

    - CRAP Design;

    - Key Data Structures;

    - Amortized and Average Performance;


    BIBLIOGRAPHY

    - C. Aggarwal. Data Mining: the textbook. Springer, ISBN: 9783319141411, 2015.

    - John Kelleger. Data Science. MIT Press Essential Knowledge Series, ISBN: 0262535432, 2018.

    - Field Cady. The Data Science Handbook. Wiley, ISBN: 1119092949, 2017.


    EVALUATION CRITERIA

    - Assiduity (A) To get approved at this course, students should attend to - at least - 80% of the theoretical and practical classes

    - Practical Project (P) The practical projects of this course weights 50% (10/20) of the final mark

    - To get approved at the course, a minimal mark of 5/20 should be obtained in the practical project part;

    - The pratical project mark is conditioned to an individual presentation and discussion by each student;

    - Written Test (F) Monday, June 6th, 2022, 14:00. Room 6.18

    - Mark (M) M = (A >= 0.8) * (P * 10/20 + F * 10/20)

    - Admission to Exams Students with M >= 6 are admitted to final exams

    - The practical projects mark is considered in all exam epochs;




    CLASSES

    Theoretical slides: [pdf]

    Practical Project: [pdf]

    Practical Sheet 1 (Introduction Python/DS): [pdf]

    Theoretical slides: [pdf]

    Practical Sheet 2 (Distributed ETL): [pdf]

    Theoretical slides (Clustering): [pdf]

    Theoretical slides (Outliers Detection): [pdf]

    Old Written Tests (Examples): [zip]

    Theoretical slides (Models Interpretability): [pdf]

    Theoretical slides (Meta Learning): [pdf]

    Theoretical slides (Semi-Supervised Learning): [pdf]





    EVALUATION



    FACULTY

    HUGO PEDRO PROENÇA


    Informatics Department

    Theoretical + Practical classes