Skip to main content

Data Sources and Data PreprocessingLaajuus (5 cr)

Code: TT00CD99

Credits

5 op

Teaching language

  • Finnish
  • English

Responsible person

  • Antti Häkkinen

Objective

Data preprocessing is an essential part of data analytics and machine learning projects. As part of the pre-processing, you will learn about collecting data from different sources and combining them into an entity according to the goals. In addition, you will get to know the data pre-processing methods applied to the collected data.

EUR-ACE Knowledge and Understanding
You understand different data sources and their special features and limitations.

EUR-ACE Engineering practice
You know how to plan and implement data pre-processing, taking into account the special characteristics of data sources. You are able to identify, analyze and correct data quality problems and apply knowledge and technology to new problems in the area of data preprocessing.

Content

In this course, you will learn the importance of data preprocessing in data analytics and machine learning projects. You will explore how to collect data from various sources and combine them into a cohesive dataset that meets your objectives. Additionally, you will learn to apply different data preprocessing methods to the collected data. By the end of the course, you will be able to design and implement data preprocessing processes, identify and correct data quality issues, and apply your knowledge and techniques to new challenges in data preprocessing.

Data sources
Data exploration and enrichment
Handling of missing values
Data cleaning and transformations
Data scaling

Qualifications

Basics of Programming

Assessment criteria, satisfactory (1)

Sufficient (1)
You understand the basics of data sources and data preprocessing. You recognize the most important structural characteristics of different data sources. You identify the most common data quality problems, but the solutions may be incomplete.

Satisfactory (2)
You will recognize the most important structural features of different data sources and the basic methods of data preprocessing. You know how to solve simple data quality problems and use basic tools for data preprocessing.

Assessment criteria, good (3)

Good (3)
You have a understanding of the differences between data sources and the challenges of data preprocessing. You are able to independently apply learned knowledge and use versatile different tools for data preprocessing. You can also identify and process data sources containing incomplete data.

Very good (4)
You can critically evaluate different data sources and their suitability for different purposes. You can independently and creatively solve complex data preprocessing problems and integrate different data sources. You are able to combine relevant data from selected data sources and formulate incomplete data partially in accordance with the objectives.

Assessment criteria, excellent (5)

Excellent (5)
You manage various data sources and data pre-processing aspects in a wide-ranging and in-depth manner. You are able to apply the learned knowledge as part of more demanding solutions in data preprocessing. You know how to combine different methods and tools to achieve more demanding solutions. You are able to combine relevant data from selected data sources and formulate incomplete data, according to the goals.