Synthetic Data Generation using Isaac Sim

A simulation pipeline built in NVIDIA Isaac Sim to generate a synthetic warehouse dataset using domain randomization and train an object detection model using the TAO Toolkit.

In this project, I implemented an end-to-end synthetic data generation and training pipeline using NVIDIA Isaac Sim and the TAO Toolkit. The goal was to train a high-performance pallet jack detection model using entirely synthetic data.

I began by cloning the official NVIDIA workflow from synthetic_data_generation_training_workflow and thoroughly analyzing the provided scripts and domain randomization logic.

Warehouse Environment Setup

The core Python script, standalone_palletjack_sdg.py, initializes a simple warehouse environment using the Isaac Sim asset at:

ENV_URL = "/Isaac/Environments/Simple_Warehouse/warehouse.usd"

Using the Isaac Asset Server, I loaded the environment and then added assets from the SimReady Pallet Jack Library.

Adding Pallet Jacks and Camera

I loaded multiple pallet jack USDs (e.g., Scale_A, Heavy_Duty_A) into the warehouse. A camera was positioned in the scene using the Replicator API to support randomized viewpoint generation for data diversity.

Applying Domain Randomization

To make the dataset robust, I applied a variety of domain randomization techniques:

  • Randomized camera positions and look-at points.
  • Color jittering of pallet jack components like SteerAxles.
  • Pose variation of pallet jacks (position, rotation, scale).
  • Dynamic lighting conditions using randomized color, intensity, and visibility of light sources.
  • Injected distractors like cones, barrels, and wet floor signs using custom pose variation.

Annotation and Data Export

I used rep.WriterRegistry.get("KittiWriter") to annotate the dataset in KITTI format, suitable for object detection tasks. Images and annotations were organized under three categories:

  • distractors_warehouse: Warehouse props (cones, bins, etc.)
  • distractors_additional: Non-warehouse props (bags, furniture, wheelchairs)
  • no_distractors: Clean pallet jack only scenes

Each output contained RGB images and matching KITTI-style annotations.

Dataset Generation

I modified the generate_data.sh script to generate the following:

  • 2000 images with warehouse distractors
  • 2000 images with additional distractors
  • 1000 clean images without any distractors

The image resolution was set to 960x544, and output data was saved to:

synthetic_data_generation_training_workflow/palletjack_sdg/palletjack_data/

Model Training with TAO Toolkit

Once the dataset was generated, I set up the NVIDIA TAO Toolkit for training. I used the DetectNet_v2 model, based on ResNet, for object detection. The training was run on an RTX 3080 GPU for 200 epochs with a batch size of 16.

The pretrained model originally achieved a mean Average Precision (mAP) of 78%. After fine-tuning on my custom synthetic dataset, I improved the performance to 84% mAP.

Conclusion and Learnings

This project validated that synthetic data alone can be used to train a performant object detection model. I was able to generate diverse training data using Isaac Sim, annotate it using Replicator tools, and fine-tune a deep learning model with strong results.

Through this experience, I gained expertise in:

  • Isaac Sim scripting and Replicator API
  • Domain randomization and synthetic data strategies
  • KITTI annotation pipeline and dataset structuring
  • Model fine-tuning using NVIDIA TAO Toolkit

Resources