Synthetic Data Generation using Isaac Sim

A simulation pipeline built in NVIDIA Isaac Sim to generate a synthetic warehouse dataset using domain randomization and train an object detection model using the TAO Toolkit.

In this project, I implemented an end-to-end synthetic data generation and training pipeline using NVIDIA Isaac Sim and the TAO Toolkit. The goal was to train a high-performance pallet jack detection model using entirely synthetic data.

I began by cloning the official NVIDIA workflow from synthetic_data_generation_training_workflow and thoroughly analyzing the provided scripts and domain randomization logic.

Warehouse Environment Setup

The core Python script, standalone_palletjack_sdg.py, initializes a simple warehouse environment using the Isaac Sim asset at:

ENV_URL = "/Isaac/Environments/Simple_Warehouse/warehouse.usd"

Using the Isaac Asset Server, I loaded the environment and then added assets from the SimReady Pallet Jack Library.

Adding Pallet Jacks and Camera

I loaded multiple pallet jack USDs (e.g., Scale_A, Heavy_Duty_A) into the warehouse. A camera was positioned in the scene using the Replicator API to support randomized viewpoint generation for data diversity.

Applying Domain Randomization

To make the dataset robust, I applied a variety of domain randomization techniques:

Randomized camera positions and look-at points.
Color jittering of pallet jack components like SteerAxles.
Pose variation of pallet jacks (position, rotation, scale).
Dynamic lighting conditions using randomized color, intensity, and visibility of light sources.
Injected distractors like cones, barrels, and wet floor signs using custom pose variation.

Annotation and Data Export

I used rep.WriterRegistry.get("KittiWriter") to annotate the dataset in KITTI format, suitable for object detection tasks. Images and annotations were organized under three categories:

distractors_warehouse: Warehouse props (cones, bins, etc.)
distractors_additional: Non-warehouse props (bags, furniture, wheelchairs)
no_distractors: Clean pallet jack only scenes

Each output contained RGB images and matching KITTI-style annotations.

Dataset Generation

I modified the generate_data.sh script to generate the following:

2000 images with warehouse distractors
2000 images with additional distractors
1000 clean images without any distractors

The image resolution was set to 960x544, and output data was saved to:

synthetic_data_generation_training_workflow/palletjack_sdg/palletjack_data/

Model Training with TAO Toolkit

Once the dataset was generated, I set up the NVIDIA TAO Toolkit for training. I used the DetectNet_v2 model, based on ResNet, for object detection. The training was run on an RTX 3080 GPU for 200 epochs with a batch size of 16.

The pretrained model originally achieved a mean Average Precision (mAP) of 78%. After fine-tuning on my custom synthetic dataset, I improved the performance to 84% mAP.

Conclusion and Learnings

This project validated that synthetic data alone can be used to train a performant object detection model. I was able to generate diverse training data using Isaac Sim, annotate it using Replicator tools, and fine-tune a deep learning model with strong results.

Through this experience, I gained expertise in:

Isaac Sim scripting and Replicator API
Domain randomization and synthetic data strategies
KITTI annotation pipeline and dataset structuring
Model fine-tuning using NVIDIA TAO Toolkit