DP-100: Designing and Implementing a Data Science Solution

DP-100: Designing and Implementing a Data Science Solution, DP-100.
Course Description
Skills at a glance
- Design and prepare a machine learning solution (20–25%)
- Explore data, and run experiments (20–25%)
- Train and deploy models (25–30%)
- Optimize language models for AI applications (25–30%)
Design and prepare a machine learning solution (20–25%)
Design a machine learning solution
- Identify the structure and format for datasets
- Determine the compute specifications for machine learning workload
- Select the development approach to train a model
Create and manage resources in an Azure Machine Learning workspace
- Create and manage a workspace
- Create and manage datastores
- Create and manage compute targets
- Set up Git integration for source control
Create and manage assets in an Azure Machine Learning workspace
- Create and manage data assets
- Create and manage environments
- Share assets across workspaces by using registries
Explore data, and run experiments (20–25%)
Use automated machine learning to explore optimal models
- Use automated machine learning for tabular data
- Use automated machine learning for computer vision
- Use automated machine learning for natural language processing
- Select and understand training options, including preprocessing and algorithms
- Evaluate an automated machine learning run, including responsible AI guidelines
Use notebooks for custom model training
- Use the terminal to configure a compute instance
- Access and wrangle data in notebooks
- Wrangle data interactively with attached Synapse Spark pools and serverless Spark compute
- Retrieve features from a feature store to train a model
- Track model training by using MLflow
- Evaluate a model, including responsible AI guidelines
Automate hyperparameter tuning
- Select a sampling method
- Define the search space
- Define the primary metric
- Define early termination options
Train and deploy models (25–30%)
Run model training scripts
- Consume data in a job
- Configure compute for a job run
- Configure an environment for a job run
- Track model training with MLflow in a job run
- Define parameters for a job
- Run a script as a job
- Use logs to troubleshoot job run errors
Implement training pipelines
- Create custom components
- Create a pipeline
- Pass data between steps in a pipeline
- Run and schedule a pipeline
- Monitor and troubleshoot pipeline runs
Manage models
- Define the signature in the MLmodel file
- Package a feature retrieval specification with the model artifact
- Register an MLflow model
- Assess a model by using responsible AI principles
Deploy a model
- Configure settings for online deployment
- Deploy a model to an online endpoint
- Test an online deployed service
- Configure compute for a batch deployment
- Deploy a model to a batch endpoint
- Invoke the batch endpoint to start a batch scoring job
Optimize language models for AI applications (25–30%)
Prepare for model optimization
- Select and deploy a language model from the model catalog
- Compare language models using benchmarks
- Test a deployed language model in the playground
- Select an optimization approach
Optimize through prompt engineering and prompt flow
- Test prompts with manual evaluation
- Define and track prompt variants
- Create prompt templates
- Define chaining logic with the prompt flow SDK
- Use tracing to evaluate your flow
Optimize through Retrieval Augmented Generation (RAG)
- Prepare data for RAG, including cleaning, chunking, and embedding
- Configure a vector store
- Configure an Azure AI Search-based index store
- Evaluate your RAG solution
Optimize through fine-tuning
- Prepare data for fine-tuning
- Select an appropriate base model
- Run a fine-tuning job
- Evaluate your fine-tuned model