Master in Big Data and Analytics
Online Master in Big Data and Analytics
The Master in Big Data and Analytics was born as a result of the union between the extensive experience in training and research, in the field of technology, which characterizes the UPC, backed by the recognition and accreditations it has, both nationally and international; and, the experience in online training, with a technological and business focus, from OBS .
The structure of the Master in Big Data and Analytics will allow students to identify when the type of Big Data solutions can help the organization and govern its implementation within it.
Throughout the program, three large blocks will be seen:
Block I. Management and storage.
Block II Processes and analysis.
Block III Visualization and business.
During the program, knowledge about technological solutions will be applied in a practical way, since the objective is to bring the student closer to the current technology existing in the market to be able to touch it and see its applicability.
In OBS we are committed to training as an engine of change, growth and improvement both personal and professional. The application in the professional field of the developed competencies, throughout the program, will contribute to the growth and improvement of the students, as well as of their companies.
The digital transformation of companies is a reality and this implies changes in the different functional areas of them. It is for this reason that professionals must be prepared to use the new tools provided by the new technological environment.
Professional outings are grouped into:
Responsible for Big Data projects and advanced business analytics.
Head of Infrastructure Big Data Analytics in the ICT area.
Chief Data Officer
With an ecosystem of 1,000 Big Data solutions, within a changing market that does not stop growing year after year, it is necessary to continuously adapt and correctly define the bases of the architectures, but always under the umbrella of business expectations.
The large volumes of data, variety and speed are not going to get more than wasting resources of an organization that is not clear about what it wants to solve. Therefore, knowing why and what an organization expects (cost reduction or revenue increase) are elements that will mark the viability of a Big Data project.
The Master in Big Data and Analytics offers a basically practical and rigorous approach oriented to the implementation of Big Data technology with guarantees of success.
The general objective of the Master in Big Data and Analytics is to provide the necessary knowledge to manage a Big Data project from all its aspects, starting from how to identify the opportunity in an organization to what is delivered to the business areas.
The Master in Big Data and Analytics focuses on more than one solution. The objective is that the responsible professional can identify if an organization is facing a Big Data challenge and have a defined process of the steps to follow: identify the type of solutions to be adopted; the professional profiles you will need; Prepare an economic viability plan and have the fundamentals to manage its scalability.
The curriculum of the Master in Big Data and Analytics is designed to achieve the following specific objectives:
Understand how to transform a traditional organization, by applying the concept of data analysis or Big Data Analytics, into a data driven organization.
Know the main technological frameworks of the market and its main applications: Hadoop, Spark, Neo4j.
Identify what are the different types of information, their storage and quality processes.
Understand how to extract knowledge from the data to generate predictive models via predictive statistics and machine learning.
Master the techniques of data governance: acquisition, storage, process, analysis, visualization.
Discover the new dashboard visualization techniques to improve decision making.
Block I. Management and storage
Big Data Leveling Course
In parallel to module 1, students start the Big Data and Analytics program with this leveling course that provides the technical knowledge bases necessary to pursue the program and to carry out Big Data projects. In this course, the students will find material resources that will allow them to deepen in different subjects necessary for the follow-up of the course and will realize tests type test that will serve as a guide for the evaluation of their knowledge and the final evaluation of the course.
Module 1. Big Data Analytics Management
In this module we will introduce the fundamental concepts of Big Data to know how to identify the keys of each project and scalability. Discovering, before starting, the variability of the data, the volumetry and the speed, will help us to identify what phases should be carried out before starting a Big Data project and, most importantly, what return do we expect from the project? What expectations does business have?
Identify when a project is Big Data.
Find the ROI in a Big Data project.
Understand and apply the concepts of Dark Data and Open Data.
Guide the organization to be a Data Driven organization and support the project.
Know the legality of the data.
Define the figure of Data Science within an organization.
Module 2. Big Data Architecture
The Big Data solutions ecosystem will be as large as data typologies and process capacity we need in the project. Most of them focus on scalability, diversity and are based primarily on Cloud environments. Some companies do not want to upload their most critical data in the cloud and prefer to have it in house, others prefer 100% Cloud or hybrid environments.
In this module we will discover the PROs and the NOCs of each architecture, the main solution providers and how we can build the most elastic environments possible, always looking for the most efficient architecture in solutions and cost.
Discover the typologies of On Premise, Hybrid, On Cloud architectures.
Understand the role of Hadoop and HDFS as fundamentals for the parallelization of processes.
Know what Spark is and the ecosystem Hive, Pig, Sqoop and Flume.
Identify the advantages of Kubernetes and Databricks.
Module 3. ETLs and ELTs
The information can be obtained from sources outside the organization (social networks, open data, among others) and internal databases (CRMs, ERPs, transactional, among others.). All these data must be transformed before reading them or afterwards to later treat them through aggregation processes that allow us to obtain KPIs.
At this time, we must define quality rules to verify that the data is correct and does not impoverish our Data Lake.
In this module we will learn to define the bases that all data loading process must have to guarantee integrity, sanitization, historification and recursion in loads.
Understand the difference between ETLs and ELTs.
Understand the benefits of ETL processes.
Identify KPIs for MDM.
Integrate the different systems.
Being able to manage the use of exceptions.
Understand the difference of Data Lake and Data Warehouse.
Module 4. Data Lakes
Registering large amounts of information requires different types of databases, beyond the relational ones with a more traditional approach; For example, video, routes or critical paths, documents or social networks are increasingly common sources of data among the sources of information that interest a business.
The technology market has adapted to all of them and created solutions to be able to store and exploit them optimally. In this module we will discover its advantages and disadvantages, we will carry out small practices on each of them to explore its potential.
Know the relational databases vs NOSQL.
Know the columnar databases.
Know the graph databases. Neo4j.
Know the documentary databases.
Discover data sources external to the organization to enrich our Data Lake.
Understand the role of data streams in real-time decision making.
Block II Processes and analysis
Module 5. Data Mining
In this module we will learn to extract information from the Data Lake data set and, above all, to make it interpretable. Throughout the process, we must be clear about the business objectives, the tools that will help us to sanitize the data, to determine which mathematical models are better and to qualify the results.
Select the data set that can best solve the business question, from the available data.
Transform the input data set.
Select the most appropriate data mining technique, for example, neural networks, decision trees, clustering or association rules.
Understand the process of knowledge extraction.
Interpret and evaluate the data.
Module 6. Advanced analytics: R and Python
Once we have the correct data, it will be time to extract the knowledge, interpret them and take the knowledge to a new level. In this module, we will establish a small statistical base to work with two of the main Advanced Analytics tools on the market: R and Python. With them, small practices will be carried out to discover when to use each of them and extract the maximum potential from the data.
Know the basis of statistics and the calculation of probabilities.
Apply multivariate data analysis.
Understand and apply the time series.
Understand the process of statistical control of data quality.
Calculate correlations and patterns.
Know the process of clustering the data.
Module 7. Machine Learning
In the previous modules we see how to interpret the existing data, how to extract knowledge of everything that has happened to us. In this module we will approach Machine Learning to see how, with good information, we can approach the predictors. We will discover the main techniques and market tools, what type and volume of information is necessary and we will make small practical modules to see their applicability.
Understand the difference between supervised learning and unsupervised learning.
Know the different classification techniques, from decision trees to Bayesian techniques.
Understand the concept of machine learning.
Identify the main open source and commercial software.
Block III Visualization and business
Module 8. Data Governance.
Once we have all the information, we must define the rules of use: who can see the data, the definition of each data, lineage and provide users with the tools to interpret them.
Data governance is where many companies fail, having two different values for the same or predictive KPI, causes distrust between different business areas.
In this module, we will learn the information governance techniques to maintain integrity, security and traceability, to ensure that the data helps in making safe decisions without generating distrust.
Understand what is the accessibility of data.
Manage data as an asset.
Define the main KPIs and data traceability.
Understand the concept of security.
Module 9. Visualization techniques
In this module we will discover what are the different data visualization techniques and when to use each of them. The large volumes of data need new graphical representations to interpret them: heat maps, clusters, dimensions, critical paths, among others.
In addition to the graphics, it is important to associate the assessment with each of them and generate an interpretation. Indicators can confuse and hinder decision making, orienting them, putting them in context and subjecting them will help a better interpretation:
Discover the available types of graphics.
Know the use cases and their main graphic representations. Understand the process linked to the passage of the graphics to the business story telling.
Know how to simplify and add data for address dashboards.
Module 10. Data visualization and self-service tools
Finally, since the ecosystem of visualization tools is complex, it will be important to choose the most suitable for each type of organization. We must bear in mind that in an organization we seek to simplify technology and we must find a solution for the entire organization. It is only then that we can easily guarantee the security, accessibility and availability of the same KPIs.
Self-service is a key piece in large organizations with delegations, branches, etc. It allows you to decentralize the information and make each node of the organization autonomous. In this module we will discover the main tools of data visualization and self-service.
Identify the main market tools: Qlikview / sense, Tableau, PowerBI, Cognos.
Know the strengths and weaknesses of each one.
Make a case study with Qliksense and self-service data.
During the Final Master Project (PFM), the student will work hand in hand with a real company in the development of a project. This will have the option to do it for your own company or choose between the options proposed by the school.
The proposed projects may take two approaches, Business and / or Technological, and within these approaches may take multiple forms, some examples are:
Approach 1. Business
Example 1. Development of the Business Plan including equipment, infrastructure and business deliverables in a theoretical business case with different types of data.
Example 2. Business consulting project of a real business case.
Approach 2. Technological
Example 1. Development of business dashboards with a market tool with different types of data and definition of indicators.
In the Master in Big Data and Analytics the student will have the opportunity to conduct 2 practical workshops. These workshops are:
Workshop 1. Neo4j
In this workshop we will know, technically, how the Graph Databases work. How to harness the power they can contribute when looking for the most used critical paths, hierarchical recurrences in front of relational databases.
Workshop 2. Watson
The solutions for setting up an Artificial Intelligence / Machine Learning environment can go through different interconnected Open Soucer products, or a solution that adds the main market products and guarantees interoperability between versions, automatic upgrades and contracting as a service. These are Watson's main advantages and, in this purely practical workshop, we will discover how to provision an environment for our Data Science in which we can govern solutions and cost.
Throughout the course we will approach the main market solutions in each of the layers that make up a Big Data solution. The main solution partners (IBM, Amazon, Google Cloud, Azure), Databases (Neo4J, HDFS, Cassandra, MongoDB), ETLs (Kafka, Pentaho, Powercenter), presentation (Tableau, Qlik) among others.
The Master does not seek to technologically train their students in each of these tools, but to approach from a theoretical point of view, see the PROs and CONs of each of them and perform small practices to understand first hand its operation. The practices will be simple and easy to solve, not being the main focus, but looking to not be a purely theoretical master.
Student profile and admission requirements
The modules of the program are designed for professionals from any sector who want to implement a Big Data project in their company, identify what types of projects are of this type and define the best roadmap for the project in order to solve it successfully.
The origin profiles are:
Graduates in technical engineering, ADE and science (Medicine, Mathematics, Physics, Chemistry).
Professionals who are working in the ICT sector.
Intermediate positions of companies that want to have an advantageous position in the face of future opportunities in their company.
Professional BI (Business Intelligence) professionals who want to expand their knowledge.
Technical profiles / consultants who currently work with data and who want to have an end-to-end management vision.
Upon completion of the program, students will obtain:
A title of Three Points.
An own degree accredited by the UPC, if the requirements of the University are fulfilled at the end of the program