Bienvenidos
IPN-Dharma IA Lab
Es una iniciativa de Laboratorio de Inteligencia Artificial del CIC del IPN con la colaboración de DHARMA para motivar a investigadores, profesores y estudiantes a aprovechar los cursos, recursos y herramientas de las principales plataformas tecnológicas de la industria en las áreas de Aprendizaje Automático, Ciencia de Datos, Computación en la Nube, Inteligencia Artificial e Internet de las Cosas con el propósito de generar una experiencia práctica a través de un modelo de aprendizaje entre pares y por objetivos.
Nivel 3: Construyendo Soluciones
AI Enterprise Workflow
This six course specialization is designed to prepare you to take the certification examination for IBM AI Enterprise Workflow V1 Data Science Specialist. IBM AI Enterprise Workflow is a comprehensive, end-to-end process that enables data scientists to build AI solutions, starting with business priorities and working through to taking AI into production. The learning aims to elevate the skills of practicing data scientists by explicitly connecting business priorities to technical implementations, connecting machine learning to specialized AI use cases such as visual recognition and NLP, and connecting Python to IBM Cloud technologies. The videos, readings, and case studies in these courses are designed to guide you through your work as a data scientist at a hypothetical streaming media company.
Throughout this specialization, the focus will be on the practice of data science in large, modern enterprises. You will be guided through the use of enterprise-class tools on the IBM Cloud, tools that you will use to create, deploy and test machine learning models. Your favorite open source tools, such a Jupyter notebooks and Python libraries will be used extensively for data preparation and building models. Models will be deployed on the IBM Cloud using IBM Watson tooling that works seamlessly with open source tools. After successfully completing this specialization, you will be ready to take the official IBM certification examination for the IBM AI Enterprise Workflow.
Throughout this specialization, the focus will be on the practice of data science in large, modern enterprises. You will be guided through the use of enterprise-class tools on the IBM Cloud, tools that you will use to create, deploy and test machine learning models. Your favorite open source tools, such a Jupyter notebooks and Python libraries will be used extensively for data preparation and building models. Models will be deployed on the IBM Cloud using IBM Watson tooling that works seamlessly with open source tools. After successfully completing this specialization, you will be ready to take the official IBM certification examination for the IBM AI Enterprise Workflow.
Cursos en este programa
1) AI Workflow: Business Priorities and Data Ingestion
This is the first course of a six part specialization. You are STRONGLY encouraged to complete these courses in order as they are not individual independent courses, but part of a workflow where each course builds on the previous ones.
This first course in the IBM AI Enterprise Workflow Certification specialization introduces you to the scope of the specialization and prerequisites. Specifically, the courses in this specialization are meant for practicing data scientists who are knowledgeable about probability, statistics, linear algebra, and Python tooling for data science and machine learning. A hypothetical streaming media company will be introduced as your new client. You will be introduced to the concept of design thinking, IBMs framework for organizing large enterprise AI projects. You will also be introduced to the basics of scientific thinking, because the quality that distinguishes a seasoned data scientist from a beginner is creative, scientific thinking. Finally you will start your work for the hypothetical media company by understanding the data they have, and by building a data ingestion pipeline using Python and Jupyter notebooks.
By the end of this course you should be able to:
It is assumed you have a solid understanding of the following topics prior to starting this course: Fundamental understanding of Linear Algebra; Understand sampling, probability theory, and probability distributions; Knowledge of descriptive and inferential statistical concepts; General understanding of machine learning techniques and best practices; Practiced understanding of Python and the packages commonly used in data science: NumPy, Pandas, matplotlib, scikit-learn; Familiarity with IBM Watson Studio; Familiarity with the design thinking process.
This first course in the IBM AI Enterprise Workflow Certification specialization introduces you to the scope of the specialization and prerequisites. Specifically, the courses in this specialization are meant for practicing data scientists who are knowledgeable about probability, statistics, linear algebra, and Python tooling for data science and machine learning. A hypothetical streaming media company will be introduced as your new client. You will be introduced to the concept of design thinking, IBMs framework for organizing large enterprise AI projects. You will also be introduced to the basics of scientific thinking, because the quality that distinguishes a seasoned data scientist from a beginner is creative, scientific thinking. Finally you will start your work for the hypothetical media company by understanding the data they have, and by building a data ingestion pipeline using Python and Jupyter notebooks.
By the end of this course you should be able to:
- Know the advantages of carrying out data science using a structured process.
- Describe how the stages of design thinking correspond to the AI enterprise workflow.
- Discuss several strategies used to prioritize business opportunities.
- Explain where data science and data engineering have the most overlap in the AI workflow.
- Explain the purpose of testing in data ingestion.
- Describe the use case for sparse matrices as a target destination for data ingestion.
- Know the initial steps that can be taken towards automation of data ingestion pipelines.
It is assumed you have a solid understanding of the following topics prior to starting this course: Fundamental understanding of Linear Algebra; Understand sampling, probability theory, and probability distributions; Knowledge of descriptive and inferential statistical concepts; General understanding of machine learning techniques and best practices; Practiced understanding of Python and the packages commonly used in data science: NumPy, Pandas, matplotlib, scikit-learn; Familiarity with IBM Watson Studio; Familiarity with the design thinking process.
2) AI Workflow: Data Analysis and Hypothesis Testing
This is the second course in the IBM AI Enterprise Workflow Certification specialization. You are STRONGLY encouraged to complete these courses in order as they are not individual independent courses, but part of a workflow where each course builds on the previous ones.
In this course you will begin your work for a hypothetical streaming media company by doing exploratory data analysis (EDA). Best practices for data visualization, handling missing data, and hypothesis testing will be introduced to you as part of your work. You will learn techniques of estimation with probability distributions and extending these estimates to apply null hypothesis significance tests. You will apply what you learn through two hands on case studies: data visualization and multiple testing using a simple pipeline.
By the end of this course you should be able to:
It is assumed that you have completed Course 1 of the IBM AI Enterprise Workflow specialization and have a solid understanding of the following topics prior to starting this course: Fundamental understanding of Linear Algebra; Understand sampling, probability theory, and probability distributions; Knowledge of descriptive and inferential statistical concepts; General understanding of machine learning techniques and best practices; Practiced understanding of Python and the packages commonly used in data science: NumPy, Pandas, matplotlib, scikit-learn; Familiarity with IBM Watson Studio; Familiarity with the design thinking process.
In this course you will begin your work for a hypothetical streaming media company by doing exploratory data analysis (EDA). Best practices for data visualization, handling missing data, and hypothesis testing will be introduced to you as part of your work. You will learn techniques of estimation with probability distributions and extending these estimates to apply null hypothesis significance tests. You will apply what you learn through two hands on case studies: data visualization and multiple testing using a simple pipeline.
By the end of this course you should be able to:
- List several best practices concerning EDA and data visualization.
- Create a simple dashboard in Watson Studio.
- Describe strategies for dealing with missing data.
- Explain the difference between imputation and multiple imputation.
- Employ common distributions to answer questions about event probabilities.
- Explain the investigative role of hypothesis testing in EDA.
- Apply several methods for dealing with multiple testing.
It is assumed that you have completed Course 1 of the IBM AI Enterprise Workflow specialization and have a solid understanding of the following topics prior to starting this course: Fundamental understanding of Linear Algebra; Understand sampling, probability theory, and probability distributions; Knowledge of descriptive and inferential statistical concepts; General understanding of machine learning techniques and best practices; Practiced understanding of Python and the packages commonly used in data science: NumPy, Pandas, matplotlib, scikit-learn; Familiarity with IBM Watson Studio; Familiarity with the design thinking process.
3) AI Workflow: Feature Engineering and Bias Detection
This is the third course in the IBM AI Enterprise Workflow Certification specialization. You are STRONGLY encouraged to complete these courses in order as they are not individual independent courses, but part of a workflow where each course builds on the previous ones.
Course 3 introduces you to the next stage of the workflow for our hypothetical media company. In this stage of work you will learn best practices for feature engineering, handling class imbalances and detecting bias in the data. Class imbalances can seriously affect the validity of your machine learning models, and the mitigation of bias in data is essential to reducing the risk associated with biased models. These topics will be followed by sections on best practices for dimension reduction, outlier detection, and unsupervised learning techniques for finding patterns in your data. The case studies will focus on topic modeling and data visualization.
By the end of this course you will be able to:
It is assumed that you have completed Courses 1 and 2 of the IBM AI Enterprise Workflow specialization and you have a solid understanding of the following topics prior to starting this course: Fundamental understanding of Linear Algebra; Understand sampling, probability theory, and probability distributions; Knowledge of descriptive and inferential statistical concepts; General understanding of machine learning techniques and best practices; Practiced understanding of Python and the packages commonly used in data science: NumPy, Pandas, matplotlib, scikit-learn; Familiarity with IBM Watson Studio; Familiarity with the design thinking process.
Course 3 introduces you to the next stage of the workflow for our hypothetical media company. In this stage of work you will learn best practices for feature engineering, handling class imbalances and detecting bias in the data. Class imbalances can seriously affect the validity of your machine learning models, and the mitigation of bias in data is essential to reducing the risk associated with biased models. These topics will be followed by sections on best practices for dimension reduction, outlier detection, and unsupervised learning techniques for finding patterns in your data. The case studies will focus on topic modeling and data visualization.
By the end of this course you will be able to:
- Employ the tools that help address class and class imbalance issues.
- Explain the ethical considerations regarding bias in data.
- Employ ai Fairness 360 open source libraries to detect bias in models.
- Employ dimension reduction techniques for both EDA and transformations stages.
- Describe topic modeling techniques in natural language processing.
- Use topic modeling and visualization to explore text data.
- Employ outlier handling best practices in high dimension data.
- Employ outlier detection algorithms as a quality assurance tool and a modeling tool.
- Employ unsupervised learning techniques using pipelines as part of the AI workflow.
- Employ basic clustering algorithms.
It is assumed that you have completed Courses 1 and 2 of the IBM AI Enterprise Workflow specialization and you have a solid understanding of the following topics prior to starting this course: Fundamental understanding of Linear Algebra; Understand sampling, probability theory, and probability distributions; Knowledge of descriptive and inferential statistical concepts; General understanding of machine learning techniques and best practices; Practiced understanding of Python and the packages commonly used in data science: NumPy, Pandas, matplotlib, scikit-learn; Familiarity with IBM Watson Studio; Familiarity with the design thinking process.
4) AI Workflow: Machine Learning, Visual Recognition and NLP
This is the fourth course in the IBM AI Enterprise Workflow Certification specialization. You are STRONGLY encouraged to complete these courses in order as they are not individual independent courses, but part of a workflow where each course builds on the previous ones.
Course 4 covers the next stage of the workflow, setting up models and their associated data pipelines for a hypothetical streaming media company. The first topic covers the complex topic of evaluation metrics, where you will learn best practices for a number of different metrics including regression metrics, classification metrics, and multi-class metrics, which you will use to select the best model for your business challenge. The next topics cover best practices for different types of models including linear models, tree-based models, and neural networks. Out-of-the-box Watson models for natural language understanding and visual recognition will be used. There will be case studies focusing on natural language processing and on image analysis to provide realistic context for the model pipelines.
By the end of this course you will be able to:
It is assumed that you have completed Courses 1 through 3 of the IBM AI Enterprise Workflow specialization and you have a solid understanding of the following topics prior to starting this course: Fundamental understanding of Linear Algebra; Understand sampling, probability theory, and probability distributions; Knowledge of descriptive and inferential statistical concepts; General understanding of machine learning techniques and best practices; Practiced understanding of Python and the packages commonly used in data science: NumPy, Pandas, matplotlib, scikit-learn; Familiarity with IBM Watson Studio; Familiarity with the design thinking process.
Course 4 covers the next stage of the workflow, setting up models and their associated data pipelines for a hypothetical streaming media company. The first topic covers the complex topic of evaluation metrics, where you will learn best practices for a number of different metrics including regression metrics, classification metrics, and multi-class metrics, which you will use to select the best model for your business challenge. The next topics cover best practices for different types of models including linear models, tree-based models, and neural networks. Out-of-the-box Watson models for natural language understanding and visual recognition will be used. There will be case studies focusing on natural language processing and on image analysis to provide realistic context for the model pipelines.
By the end of this course you will be able to:
- Discuss common regression, classification, and multilabel classification metrics.
- Explain the use of linear and logistic regression in supervised learning applications.
- Describe common strategies for grid searching and cross-validation.
- Employ evaluation metrics to select models for production use.
- Explain the use of tree-based algorithms in supervised learning applications.
- Explain the use of Neural Networks in supervised learning applications.
- Discuss the major variants of neural networks and recent advances.
- Create a neural net model in Tensorflow.
- Create and test an instance of Watson Visual Recognition.
- Create and test an instance of Watson NLU.
It is assumed that you have completed Courses 1 through 3 of the IBM AI Enterprise Workflow specialization and you have a solid understanding of the following topics prior to starting this course: Fundamental understanding of Linear Algebra; Understand sampling, probability theory, and probability distributions; Knowledge of descriptive and inferential statistical concepts; General understanding of machine learning techniques and best practices; Practiced understanding of Python and the packages commonly used in data science: NumPy, Pandas, matplotlib, scikit-learn; Familiarity with IBM Watson Studio; Familiarity with the design thinking process.
5) AI Workflow: Enterprise Model Deployment
This is the fifth course in the IBM AI Enterprise Workflow Certification specialization. You are STRONGLY encouraged to complete these courses in order as they are not individual independent courses, but part of a workflow where each course builds on the previous ones.
This course introduces you to an area that few data scientists are able to experience: Deploying models for use in large enterprises. Apache Spark is a very commonly used framework for running machine learning models. Best practices for using Spark will be covered in this course. Best practices for data manipulation, model training, and model tuning will also be covered. The use case will call for the creation and deployment of a recommender system. The course wraps up with an introduction to model deployment technologies.
By the end of this course you will be able to:
It is assumed that you have completed Courses 1 through 4 of the IBM AI Enterprise Workflow specialization and you have a solid understanding of the following topics prior to starting this course: Fundamental understanding of Linear Algebra; Understand sampling, probability theory, and probability distributions; Knowledge of descriptive and inferential statistical concepts; General understanding of machine learning techniques and best practices; Practiced understanding of Python and the packages commonly used in data science: NumPy, Pandas, matplotlib, scikit-learn; Familiarity with IBM Watson Studio; Familiarity with the design thinking process.
This course introduces you to an area that few data scientists are able to experience: Deploying models for use in large enterprises. Apache Spark is a very commonly used framework for running machine learning models. Best practices for using Spark will be covered in this course. Best practices for data manipulation, model training, and model tuning will also be covered. The use case will call for the creation and deployment of a recommender system. The course wraps up with an introduction to model deployment technologies.
By the end of this course you will be able to:
- Use Apache Spark's RDDs, dataframes, and a pipeline.
- Employ spark-submit scripts to interface with Spark environments.
- Explain how collaborative filtering and content-based filtering work.
- Build a data ingestion pipeline using Apache Spark and Apache Spark streaming.
- Analyze hyperparameters in machine learning models on Apache Spark.
- Deploy machine learning algorithms using the Apache Spark machine learning interface.
- Deploy a machine learning model from Watson Studio to Watson Machine Learning.
It is assumed that you have completed Courses 1 through 4 of the IBM AI Enterprise Workflow specialization and you have a solid understanding of the following topics prior to starting this course: Fundamental understanding of Linear Algebra; Understand sampling, probability theory, and probability distributions; Knowledge of descriptive and inferential statistical concepts; General understanding of machine learning techniques and best practices; Practiced understanding of Python and the packages commonly used in data science: NumPy, Pandas, matplotlib, scikit-learn; Familiarity with IBM Watson Studio; Familiarity with the design thinking process.
6) AI Workflow: AI in Production
This is the sixth course in the IBM AI Enterprise Workflow Certification specialization. You are STRONGLY encouraged to complete these courses in order as they are not individual independent courses, but part of a workflow where each course builds on the previous ones.
This course focuses on models in production at a hypothetical streaming media company. There is an introduction to IBM Watson Machine Learning. You will build your own API in a Docker container and learn how to manage containers with Kubernetes. The course also introduces several other tools in the IBM ecosystem designed to help deploy or maintain models in production. The AI workflow is not a linear process so there is some time dedicated to the most important feedback loops in order to promote efficient iteration on the overall workflow.
By the end of this course you will be able to:
It is assumed that you have completed Courses 1 through 5 of the IBM AI Enterprise Workflow specialization and you have a solid understanding of the following topics prior to starting this course: Fundamental understanding of Linear Algebra; Understand sampling, probability theory, and probability distributions; Knowledge of descriptive and inferential statistical concepts; General understanding of machine learning techniques and best practices; Practiced understanding of Python and the packages commonly used in data science: NumPy, Pandas, matplotlib, scikit-learn; Familiarity with IBM Watson Studio; Familiarity with the design thinking process.
This course focuses on models in production at a hypothetical streaming media company. There is an introduction to IBM Watson Machine Learning. You will build your own API in a Docker container and learn how to manage containers with Kubernetes. The course also introduces several other tools in the IBM ecosystem designed to help deploy or maintain models in production. The AI workflow is not a linear process so there is some time dedicated to the most important feedback loops in order to promote efficient iteration on the overall workflow.
By the end of this course you will be able to:
- Use Docker to deploy a flask application.
- Deploy a simple UI to integrate the ML model, Watson NLU, and Watson Visual Recognition.
- Discuss basic Kubernetes terminology.
- Deploy a scalable web application on Kubernetes.
- Discuss the different feedback loops in AI workflow.
- Discuss the use of unit testing in the context of model production.
- Use IBM Watson OpenScale to assess bias and performance of production machine learning models.
It is assumed that you have completed Courses 1 through 5 of the IBM AI Enterprise Workflow specialization and you have a solid understanding of the following topics prior to starting this course: Fundamental understanding of Linear Algebra; Understand sampling, probability theory, and probability distributions; Knowledge of descriptive and inferential statistical concepts; General understanding of machine learning techniques and best practices; Practiced understanding of Python and the packages commonly used in data science: NumPy, Pandas, matplotlib, scikit-learn; Familiarity with IBM Watson Studio; Familiarity with the design thinking process.