DATA SCIENCE

General description of the Data Science studies. MSc. studies in area of Data Science in the Faculty of Automatic Control, Electronics and Informatics, Silesian University of Technology are given in English and concern issues related to contemporary algorithms of data processing and their applications. Teaching and discussing techniques of data analysis have interdisciplinary character, include elements of information technologies, several computational and algorithmic aspects of data analysis, statistics and many links to specific knowledge in areas of applications. Studies include lectures, laboratories and projects taught by specialists in data analysis, with the help of modern computational techniques, supported by high quality hardware tools, focused on real data sets and contemporary, important problems of their processing. Some of the optional courses are given by experts from information technology companies.
Studying data science in English allows for wider access to the literature and better possibilities for international cooperation. Industrial stakeholders of the Data Science specialization are scientific institutions and enterprises developing systems and software for data analyses, prognoses, computer aided business and industrial decision making, which search for high quality staff for supporting their projects. 

Profile of the graduate. Graduate of the Data Science specialization acquires competence in the area of application of information techniques, modeling methods, computational and algorithmic methods, statistical analyses, to processing and knowledge extraction from large datasets generated on the basis of contemporary computer techniques. The graduate is prepared to both creative individual work and to work as a member of the team creating informatics solutions of application character, based on modern informatics hardware systems, new computer technologies and programming environments and tools. 

Modular organization of the studies. Studies in Data Science have modular organization. They include the following five modules covering most important areas of the contemporary problems of analyses of data sets: Machine learning, Soft computing, Data mining, Big data and Statistics for data science

Module: Machine learning 

Joanna Polańska Machine learning module is headed by prof. Joanna Polanska, a specialist in the area of biostatistics, bioinformatics, exploratory data analysis, data clustering and classification. Prof. Joanna Polanska is the head of the Division of Exploratory Data Analysis in Institute of Automatic Control, Silesian University of Technology and the head of doctoral studies of the Faculty of Automatic Control, Electronics and Informatics, Silesian University of Technology. She is the author of over 100 scientific publications, cited in the Google Scholar Database over 3000 times (h-index=19)..

Courses in the module:
Classifiers. The aim of the course is making students familiar with problem and methods related to supervised and unsupervised classification methods. The contents of the course are presented in the aspect of wide spectrum of applications, in particular in engineering, automatic control, electronics and information technologies.
Statistical learning. The aim of the course is making students familiar with statistical problems related to machine learning, feature engineering for classification, model rank estimation and model selection in the aspect of machine learning, regression models for machine learning, model integration and analyses of significantly correlated datasets.
Evolutionary algorithms. The aim of the course is making students familiar with issues related to evolutionary algorithms and their applications to engineering constructions in automation, electronics, informatics and biocybernetics. Relations between evolutionary algorithms and optimization theory and classification methods are underlined. 

Module: Soft computing 

Sebastian Deorowicz Soft Computing module is headed by prof. Sebastian Deorowicz, a specialist in the area or computational algorithms, computer programming and their applications. Prof. Deorowicz is the head of the Division of Computer Programming in the Institute of Informatics Silesian University of Technology, vice director for scientific research of the Institute of Informatics. He is the author of approximately 100 publications cited in the Google Scholar Database over 1300 times (h-index=19)..

Courses in the module:
Scientific computing. The aim of the course is making students familiar with computing techniques for science and engineering. The course is focused on the use of supercomputing centers and computing clusters. 
Programming in R and Python. The aim of the course is making students familiar with programming in the R and Python languages. An important part of the course is to show applications of R and Python in the field of soft computing. On the basis of numerous examples, it is demonstrated that R and Python are effective and efficient tools for data analysis. 
Optimization theory. The aim of the course is introducing students to advanced mathematical optimization methods and algorithms, optimal control problems. Development of skills necessary to implement and solve complex optimization problems. 
Formal languages. The aim of the course is presentation of theory of formal languages, their connection to theory of computing. The course also presents practical application of formal languages in data analysis. 
Fuzzy data analysis. The aim of the course is making students familiar with modeling, classification and generally data analyses based of formalisms of fuzzy sets theory. Data analysis scenarios, based on fuzzy sets are illustrated by many applications, in biomedicine, engineering, automatic control, electronics, informatics. 

Module: Data Mining

Marek Sikora  Data Mining module is headed by prof. Marek Sikora, a specialist in data mining, clustering and classification. Prof. Sikora is a head of the Machine Learning scientific group in the Institute of Informatics Silesian University of Technology. He as extensive experiences in industrial data analyses, closely cooperates with the EMAG scientific institute in Katowice, Poland. He is the author of approximately 100 publications cited in the Google Scholar Database over 600 times (h-index=15)..

Courses in the module:
Data mining in practice. The aim of the course is to make the students familiar with the methodology of the data exploration process, particularly with respect to complex-structure data. Use cases analysis are presented, along with weak and strong points of particular analytical methods. The selected analytical platforms are discussed. 
Knowledge discovery. The aim of the course is to make the students familiar with the methods of knowledge discovery in data (particularly in databases). The methods for building tree and rule based classification, regression, survival (survival analysis data models are presented. The foundations of the rough set theory are discussed along with its application in knowledge discovery. 
Data visualization. The aim of the course is making the student familiar with methods, algorithms and tools for visualization of data, numeric data, continuous and discrete, categories, relations, multidimensional data, time series and data streams. Importance of visualization techniques for data analyses and for data based inference is stressed. 

Module: Big Data

Dariusz Mrozek Big Data module is headed by prof. Dariusz Mrozek, a specialist in the area of data bases, large scale and parallel computations, bioinformatics and data mining techniques. Prof. Mrozek is the head of the Division of Theory of Informatics in Institute of Informatics Silesian University of Technology. He is the author of over 100 publications cited in the Google Scholar Database approximately 500 times (h-index=13)..

Courses in the module:
Cloud platforms. The aim of the course is to provide students the knowledge necessary to understand Cloud computing, its architecture, models, platforms, interaction, programming solutions working on the Cloud for various applications. 
Hadoop ecosystem. The aim of the course is to provide students the knowledge necessary to understand Big Data concepts, platforms for processing Big Data (including Hadoop) and their architecture, data storage and transformation solutions, computational models applied on platforms for processing Big Data and developing solutions for Big Data analytics. 
Visual Data. The aim of the course is making the student familiar with methods, algorithms and tools for visual data analysis, These data may come from various imaging sources, such as visible light cameras, X-rays, USG or magnetic resonance. 

Module: Statistics for data science 

Adam Czornik Statistics for data science module is headed by prof. Adam Czornik, a specialist in the area of mathematical modeling, statistics and system dynamics. Prof. Czornik is the dean of the Faculty of Automatic Control, Electronics and Informatics, Silesian University of Technology. He is the author of over 100 publications cited in the Google Scholar Database over 1000 times (h-index=21)..

Courses in the module:
Markov models. The aim of the course is making students familiar with issues related to modeling processes, systems, dynamical phenomena with the use of Markov models. During the lecture overview of multiple applications of Markov models is presented. 
Bayesian Data Analysis. The aim of the course is making students familiar with issues related to Bayesian approach to data analysis. 
Models with hidden data. The aim of the course is making students familiar with issues related to statistical models with hidden variables.