bitnamicharts/deepspeed

Verified Publisher

By VMware

•Updated 9 months ago

Bitnami Helm chart for DeepSpeed

Helm

Image

Languages & frameworks

Machine learning & AI

Data science

500K+

Overview Tags

bitnamicharts/deepspeed repository overview

⁠Bitnami Secure Images Helm chart for DeepSpeed

DeepSpeed is deep learning software suite for empowering ChatGPT-like model training. Features dense or sparse model inference, high throughput and high compression.

Overview of DeepSpeed⁠

Trademarks: This software listing is packaged by Bitnami. The respective trademarks mentioned in the offering are owned by the respective companies, and use of them does not imply any affiliation or endorsement.

⁠ TL;DR

helm install my-release oci://REGISTRY_NAME/REPOSITORY_NAME/deepspeed

Note: You need to substitute the placeholders REGISTRY_NAME and REPOSITORY_NAME with a reference to your Helm chart registry and repository.

⁠ Introduction

This chart bootstraps a DeepSpeed⁠ deployment on a Kubernetes⁠ cluster using the Helm⁠ package manager.

Python is built for full integration into Python that enables you to use it with its libraries and main packages.

⁠ Before you begin

Kubernetes 1.23+
Helm 3.8.0+
PV provisioner support in the underlying infrastructure

⁠ Installing the Chart

To install the chart with the release name my-release:

helm install my-release oci://REGISTRY_NAME/REPOSITORY_NAME/deepspeed

Note: You need to substitute the placeholders REGISTRY_NAME and REPOSITORY_NAME with a reference to your Helm chart registry and repository. For example, in the case of Bitnami, you need to use REGISTRY_NAME=registry-1.docker.io and REPOSITORY_NAME=bitnamicharts.

These commands deploy DeepSpeed on the Kubernetes cluster in the default configuration. The Parameters⁠ section lists the parameters that can be configured.

Tip: List all releases using helm list

⁠ Uninstalling the Chart

To uninstall/delete the my-release deployment:

helm delete my-release

The command removes all the Kubernetes components associated with the chart and deletes the release.

⁠ Configuration and installation details

This section describes credentials, configuration, and other installation options.

⁠ Resource requests and limits

Bitnami charts allow setting resource requests and limits for all containers inside the chart deployment. These are inside the resources value (check parameter table). Setting requests is essential for production workloads and these should be adapted to your specific use case.

To make this process easier, the chart contains the resourcesPreset values, which automatically sets the resources section according to different presets. Check these presets in the bitnami/common chart⁠. However, in production workloads using resourcesPreset is discouraged as it may not fully adapt to your specific needs. Find more information on container resource management in the official Kubernetes documentation⁠.

⁠ Rolling VS Immutable tags⁠

It is strongly recommended to use immutable tags in a production environment. This ensures your deployment does not change automatically if the same tag is updated with a different image.

Bitnami will release a new chart updating its containers if a new version of the main container, significant changes, or critical vulnerabilities exist.

⁠ Deploy as Job

By default, the chart will deploy the client container (the one that connects to the Deepspeed workers) as a Deployment. This allows you to enter the container via kubectl exec and perform operations. In case you want to deploy it as a Kubernetes job, set the client.useJob=true value.

⁠ Loading your files

The DeepSpeed chart supports three different ways to load your files. In order of priority, they are:

Existing config map
Add files in the values.yaml
Cloning a git repository

This means that if you specify a config map with your files, it won't check the files defined in values.yaml directory nor the git repository.

In order to use an existing config map, set the source.existingConfigMap=my-config-map parameter.

To add your files in the values.yaml file, set the source.configmap object with the files.

Finally, if you want to clone a git repository you can use those parameters:

source.type=git
source.git.repository=https://github.com/my-user/oci://REGISTRY_NAME/REPOSITORY_NAME
source.git.revision=master

⁠ Setting Pod's affinity

This chart allows you to set your custom affinity using the affinity parameter. Find more information about Pod's affinity in the kubernetes documentation⁠.

As an alternative, you can use of the preset configurations for pod affinity, pod anti-affinity, and node affinity available at the bitnami/common⁠ chart. To do so, set the podAffinityPreset, podAntiAffinityPreset, or nodeAffinityPreset parameters.

⁠ Additional environment variables

In case you want to add extra environment variables (useful for advanced operations like custom init scripts), you can use the extraEnvVars property inside each of the subsections: client, worker.

client:
  extraEnvVars:
    - name: LOG_LEVEL
      value: error

worker:
  extraEnvVars:
    - name: LOG_LEVEL
      value: error

Alternatively, you can use a ConfigMap or a Secret with the environment variables. To do so, use the extraEnvVarsCM or the extraEnvVarsSecret values.

⁠ Sidecars

If additional containers are needed in the same pod as Milvus (such as additional metrics or logging exporters), they can be defined using the sidecars parameter.

sidecars:
- name: your-image-name
  image: your-image
  imagePullPolicy: Always
  ports:
  - name: portname
    containerPort: 1234

If these sidecars export extra ports, extra port definitions can be added using the service.extraPorts parameter (where available), as shown in the example below:

service:
  extraPorts:
  - name: extraPort
    port: 11311
    targetPort: 11311

NOTE: This Helm chart already includes sidecar containers for the Prometheus exporters (where applicable). These can be activated by adding the --enable-metrics=true parameter at deployment time. The sidecars parameter should therefore only be used for any extra sidecar containers.

If additional init containers are needed in the same pod, they can be defined using the initContainers parameter. Here is an example:

initContainers:
  - name: your-image-name
    image: your-image
    imagePullPolicy: Always
    ports:
      - name: portname
        containerPort: 1234

Learn more about sidecar containers⁠ and init containers⁠.

⁠ Backup and restore

To back up and restore Helm chart deployments on Kubernetes, you need to back up the persistent volumes from the source deployment and attach them to a new deployment using Velero⁠, a Kubernetes backup/restore tool. Find the instructions for using Velero in this guide⁠.

⁠ Persistence

The Bitnami DeepSpeed⁠ image can persist data. If enabled, the persisted path is /bitnami/deepspeed/data by default.

The chart mounts a Persistent Volume⁠ at this location. The volume is created using dynamic volume provisioning.

⁠ Adjust permissions of persistent volume mountpoint

As the image run as non-root by default, it is necessary to adjust the ownership of the persistent volume so that the container can write data into it.

By default, the chart is configured to use Kubernetes Security Context to automatically change the ownership of the volume. However, this feature does not work in all Kubernetes distributions. As an alternative, this chart supports using an initContainer to change the ownership of the volume before mounting it in the final destination.

You can enable this initContainer by setting volumePermissions.enabled to true.

⁠ Parameters

The following subsections list global, common, and component-specific parameters.

⁠ Global parameters

Name	Description	Value
`global.imageRegistry`	Global Docker image registry	`""`
`global.imagePullSecrets`	Global Docker registry secret names as an array	`[]`
`global.defaultStorageClass`	Global default StorageClass for Persistent Volume(s)	`""`
`global.storageClass`	DEPRECATED: use global.defaultStorageClass instead	`""`
`global.security.allowInsecureImages`	Allows skipping image verification	`false`
`global.compatibility.openshift.adaptSecurityContext`	Adapt the securityContext sections of the deployment to make them compatible with Openshift restricted-v2 SCC: remove runAsUser, runAsGroup and fsGroup and let the platform use their allowed default IDs. Possible values: auto (apply if the detected running cluster is Openshift), force (perform the adaptation always), disabled (do not perform adaptation)	`auto`

⁠ Common parameters

Name	Description	Value
`kubeVersion`	Override Kubernetes version	`""`
`nameOverride`	String to partially override common.names.fullname	`""`
`fullnameOverride`	String to fully override common.names.fullname	`""`
`commonLabels`	Labels to add to all deployed objects	`{}`
`commonAnnotations`	Annotations to add to all deployed objects	`{}`
`clusterDomain`	Kubernetes cluster domain name	`cluster.local`
`extraDeploy`	Array of extra objects to deploy with the release	`[]`
`diagnosticMode.enabled`	Enable diagnostic mode (all probes will be disabled and the command will be overridden)	`false`
`diagnosticMode.command`	Command to override all containers in the deployments/statefulsets	`["sleep"]`
`diagnosticMode.args`	Args to override all containers in the deployments/statefulsets	`["infinity"]`

⁠ Source code parameters

Name	Description	Value
`image.registry`	Deepspeed image registry	`REGISTRY_NAME`
`image.repository`	Deepspeed image repository	`REPOSITORY_NAME/deepspeed`
`image.digest`	Deepspeed image digest in the way sha256:aa.... Please note this parameter, if set, will override the tag	`""`
`image.pullPolicy`	Deepspeed image pull policy	`IfNotPresent`
`image.pullSecrets`	Specify docker-registry secret names as an array	`[]`
`source.type`	Where the source comes from: Possible values: configmap, git, custom	`configmap`
`source.launchCommand`	deepspeed command to run over the project	`""`
`source.configMap`	List of files of the project	`{}`
`source.existingConfigMap`	Name of a configmap containing the files of the project	`""`
`source.git.repository`	Repository that holds the files	`""`
`source.git.revision`	Revision from the repository to checkout	`""`
`source.git.extraVolumeMounts`	Add extra volume mounts for the Git container	`[]`
`config.defaultHostFile`	Host file generated by default (only edit if you know what you are doing)	`""`
`config.overrideHostFile`	Override default host file with the content in this value	`""`
`config.existingHostFileConfigMap`	Name of a ConfigMap containing the hostfile	`""`
`config.defaultSSHClient`	Default SSH client configuration for the client node (only edit if you know what you are doing)	`""`
`config.overrideSSHClient`	Override default SSH cliient configuration with the content in this value	`""`
`config.existingSSHClientConfigMap`	Name of a ConfigMap containing the SSH client configuration	`""`
`config.defaultSSHServer`	Default SSH Server configuration for the worker nodes (only edit if you know what you are doing)	`""`
`config.overrideSSHServer`	Override SSH Server configuration with the content in this value	`""`
`config.existingSSHServerConfigMap`	Name of a ConfigMap with with the SSH Server configuration	`""`
`config.sshPrivateKey`	Private key for the client node to connect to the worker nodes	`""`
`config.existingSSHKeySecret`	Name of a secret containing the ssh private key	`""`

⁠ Client Deployment Parameters

Name	Description	Value
`client.enabled`	Enable Client deployment	`true`
`client.useJob`	Deploy as job	`false`
`client.backoffLimit`	set backoff limit of the job	`10`
`client.extraEnvVars`	Array with extra environment variables to add to client nodes	`[]`
`client.extraEnvVarsCM`	Name of existing ConfigMap containing extra env vars for client nodes	`""`
`client.extraEnvVarsSecret`	Name of existing Secret containing extra env vars for client nodes	`""`
`client.annotations`	Annotations for the client deployment	`{}`
`client.command`	Override default container command (useful when using custom images)	`[]`
`client.args`	Override default container args (useful when using custom images)	`[]`
`client.terminationGracePeriodSeconds`	Client termination grace period (in seconds)	`""`
`client.livenessProbe.enabled`	Enable livenessProbe on Client nodes	`true`
`client.livenessProbe.initialDelaySeconds`	Initial delay seconds for livenessProbe	`5`
`client.livenessProbe.periodSeconds`	Period seconds for livenessProbe	`30`
`client.livenessProbe.timeoutSeconds`	Timeout seconds for livenessProbe	`60`
`client.livenessProbe.failureThreshold`	Failure threshold for livenessProbe	`5`
`client.livenessProbe.successThreshold`	Success threshold for livenessProbe