IPN-Dharma IA Lab

    Bienvenidos
    IPN-Dharma IA Lab

    Es una iniciativa de Laboratorio de Inteligencia Artificial del CIC del IPN con la colaboración de DHARMA para motivar a investigadores, profesores y estudiantes a aprovechar los cursos, recursos y herramientas de las principales plataformas tecnológicas de la industria en las áreas de Aprendizaje Automático, Ciencia de Datos, Computación en la Nube, Inteligencia Artificial e Internet de las Cosas con el propósito de generar una experiencia práctica a través de un modelo de aprendizaje entre pares y por objetivos.

    Nivel 2: Conocimiento Contextual

    Data Engineering, Big Data, and Machine Learning on GCP

    This program provides the skills you need to advance your career and provides training to support your preparation for the industry-recognized
    What you will learn:
    • Identify the purpose and value of the key Big Data and Machine Learning products in Google Cloud.
    • Use Cloud SQL and Dataproc to migrate existing MySQL and Hadoop/Pig/Spark/Hive workloads to Google Cloud.
    • Employ BigQuery to carry out interactive data analysis.
    • Choose between different data processing products on Google Cloud.

    Cursos en este programa

    1) Google Cloud Big Data and Machine Learning Fundamentals

    This course introduces participants to the big data capabilities of Google Cloud. Through a combination of presentations, demos, and hands-on labs, participants get an overview of Google Cloud and a detailed view of the data processing and machine learning capabilities. This course showcases the ease, flexibility, and power of big data solutions on Google Cloud.

    What you will learn:
    • Identify the purpose and value of the key Big Data and Machine Learning products in Google Cloud.
    • Use Cloud SQL and Dataproc to migrate existing MySQL and Hadoop/Pig/Spark/Hive workloads to Google Cloud.
    • Employ BigQuery to carry out interactive data analysis.
    • Choose between different data processing products on Google Cloud.

    Esfuerzo  Esfuerzo estimado 12 horas

    Idioma  Idioma español e inglés

    Link  Coursera español

    Link  Coursera inglés

    2) Modernizing Data Lakes and Data Warehouses with GCP

    The two key components of any data pipeline are data lakes and warehouses. This course highlights use-cases for each type of storage and dives into the available data lake and warehouse solutions on Google Cloud in technical detail. Also, this course describes the role of a data engineer, the benefits of a successful data pipeline to business operations, and examines why data engineering should be done in a cloud environment.

    What you will learn:
    • Understand the differences between data lakes and data warehouses, the two key components of any data pipeline.
    • Explore use-cases for each type of storage and dive into the available data lake and warehouse solutions on Google Cloud in technical detail.
    • Understand the role of a data engineer and the benefits of a successful data pipeline to business operations.
    • Examine why data engineering should be done in a cloud environment.

    Esfuerzo  Esfuerzo estimado 7 horas

    Idioma  Idioma español e inglés

    Link  Coursera español

    Link  Coursera inglés

    3) Building Batch Data Pipelines on GCP

    Data pipelines typically fall under one of the Extra-Load, Extract-Load-Transform or Extract-Transform-Load paradigms. This course describes which paradigm should be used and when for batch data. Furthermore, this course covers several technologies on Google Cloud for data transformation including BigQuery, executing Spark on Dataproc, pipeline graphs in Cloud Data Fusion and serverless data processing with Dataflow. Learners will get hands-on experience building data pipeline components on Google Cloud.

    What you will learn:
    • Review different methods of data loading: EL, ELT and ETL and when to use what.
    • Run Hadoop on Dataproc, leverage Cloud Storage, and optimize Dataproc jobs.
    • Use Dataflow to build your data processing pipelines.
    • Manage data pipelines with Data Fusion and Cloud Composer.

    Esfuerzo  Esfuerzo estimado 17 horas

    Idioma  Idioma español e inglés

    Link  Coursera español

    Link  Coursera inglés

    4) Building Resilient Streaming Analytics Systems on GCP

    Processing streaming data is becoming increasingly popular as streaming enables businesses to get real-time metrics on business operations. This course covers how to build streaming data pipelines on Google Cloud. Pub/Sub is described for handling incoming streaming data. The course also covers how to apply aggregations and transformations to streaming data using Dataflow, and how to store processed records to BigQuery or Cloud Bigtable for analysis. Learners will get hands-on experience building streaming data pipeline components on Google Cloud.

    What you will learn:
    • Understand use-cases for real-time streaming analytics.
    • Use Pub/Sub asynchronous messaging service to manage data events.
    • Write streaming pipelines and run transformations where necessary.
    • Interoperate Dataflow, BigQuery and Pub/Sub for real-time streaming and analysis.

    Esfuerzo  Esfuerzo estimado 8 horas

    Idioma  Idioma español e inglés

    Link  Coursera español

    Link  Coursera inglés

    5) Smart Analytics, Machine Learning, and AI on GCP

    Incorporating machine learning into data pipelines increases the ability of businesses to extract insights from their data. This course covers several ways machine learning can be included in data pipelines on Google Cloud depending on the level of customization required. For little to no customization, this course covers AutoML. For more tailored machine learning capabilities, this course introduces Notebooks and BigQuery machine learning (BigQuery ML). Also, this course covers how to productionalize machine learning solutions using Kubeflow. Learners will get hands-on experience building machine learning models on Google Cloud.

    What you will learn:
    • Understand use-cases for real-time streaming analytics.
    • Use Pub/Sub asynchronous messaging service to manage data events. Write streaming pipelines and run transformations where necessary.
    • Understand both sides of a streaming pipeline: production and consumption.
    • Interoperate Dataflow, BigQuery and Pub/Sub for real-time streaming and analysis.

    Esfuerzo  Esfuerzo estimado 9 horas

    Idioma  Idioma español e inglés

    Link  Coursera español

    Link  Coursera inglés

    © 2015 |Laboratorio de Microtecnología y Sistemas Embebidos | Centro de Investigación en Computación | Instituto Politécnico Nacional