Problem with Use Your Own Images Video - Step 18 - Training time.
Related to: AI Unlimited
Date: 20/02/2023 10:39
User: Gbola
Awards:
I have tried a few times to go through set up but I keep getting errors at step 18 of Your Own Images Video section.
Below is the latest error I got at the last attempt. Can you please assist.
Installing collected packages: transformers
Successfully installed transformers-4.26.1
2023-02-12 10:22:08.407657: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-12 10:22:11.925382: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-12 10:22:11.925627: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-12 10:22:11.925650: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Generating class images 94% 117/125 [1:01:12<04:11, 31.39s/it]Traceback (most recent call last):
File "/usr/lib/python3.8/subprocess.py", line 1083, in wait
Traceback (most recent call last):
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 789, in
main()
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 481, in main
images = pipeline(example["prompt"]).images
File "/usr/local/lib/python3.8/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return self._wait(timeout=timeout)
File "/usr/lib/python3.8/subprocess.py", line 1806, in _wait
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 495, in __call__
noise_pred = self.unet(latent_model_input, t, encoder_hidden_states=text_embeddings).sample
(pid, sts) = self._try_wait(0)
File "/usr/lib/python3.8/subprocess.py", line 1764, in _try_wait
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
(pid, sts) = os.waitpid(self.pid, wait_flags)
KeyboardInterrupt
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 352, in simple_launcher
process.wait()
File "/usr/lib/python3.8/subprocess.py", line 1096, in wait
self._wait(timeout=sigint_timeout)
File "/usr/lib/python3.8/subprocess.py", line 1800, in _wait
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/diffusers/models/unet_2d_condition.py", line 365, in forward
sample = upsample_block(
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
time.sleep(delay)
KeyboardInterrupt
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/diffusers/models/unet_2d_blocks.py", line 1251, in forward
hidden_states = attn(hidden_states, encoder_hidden_states=encoder_hidden_states).sample
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/diffusers/models/attention.py", line 219, in forward
hidden_states = block(hidden_states, context=encoder_hidden_states, timestep=timestep)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/diffusers/models/attention.py", line 479, in forward
hidden_states = self.attn2(norm_hidden_states, context=context) + hidden_states
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/diffusers/models/attention.py", line 550, in forward
key = self.to_k(context)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
KeyboardInterrupt
^C
Something went wrong
Thank you.
Date: 13/02/2023 09:03
User: Jorge Vila
Awards:
At this point, the system will have generated the trained file, the one with .CKPT extension.
What you can do now is going to the 1.5 process and instead oaf choosing the standard CKPT file from Stable Diffusion, use the path for your new generated CKPT file and then follow the steps as explained!
Let me know!
Date: 14/02/2023 15:45
User: Gbola
Awards:
Hi Jorge,
Thanks for your quick response but I still could not get this to work. Below is the error message I got this time.
I was not able to generate my own CKPT file as the failure occurred at the training stage. I tried reinstalling libraries and I resize all the photos to 512 by 512 to no avail.
Your assistance would be highly appreciated.
Regards,
Gbola.
Successfully installed transformers-4.26.1
2023-02-14 15:15:44.809244: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-14 15:15:47.254364: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-14 15:15:47.254602: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia
2023-02-14 15:15:47.254633: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
'########:'########:::::'###::::'####:'##::: ##:'####:'##::: ##::'######:::
... ##..:: ##.... ##:::'## ##:::. ##:: ###:: ##:. ##:: ###:: ##:'##... ##::
::: ##:::: ##:::: ##::'##:. ##::: ##:: ####: ##:: ##:: ####: ##: ##:::..:::
::: ##:::: ########::'##:::. ##:: ##:: ## ## ##:: ##:: ## ## ##: ##::'####:
::: ##:::: ##.. ##::: #########:: ##:: ##. ####:: ##:: ##. ####: ##::: ##::
::: ##:::: ##::. ##:: ##.... ##:: ##:: ##:. ###:: ##:: ##:. ###: ##::: ##::
::: ##:::: ##:::. ##: ##:::: ##:'####: ##::. ##:'####: ##::. ##:. ######:::
:::..:::::..:::::..::..:::::..::....::..::::..::....::..::::..:::......::::
Progress:| | 0% 5/2000 [00:14<1:05:04, 1.96s/it, loss=0.174, lr=1e-6]Traceback (most recent call last):
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 789, in
main()
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 655, in main
for step, batch in enumerate(train_dataloader):
File "/usr/local/lib/python3.8/dist-packages/accelerate/data_loader.py", line 357, in __iter__
next_batch = next(dataloader_iter)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 628, in __next__
data = self._next_data()
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 671, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 58, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 349, in __getitem__
instance_image = Image.open(path)
File "/usr/local/lib/python3.8/dist-packages/PIL/Image.py", line 2895, in open
raise UnidentifiedImageError(
PIL.UnidentifiedImageError: cannot identify image file '/content/data/alo86og/IMG-20191201-WA0014.png'
Progress:| | 0% 5/2000 [00:14<1:37:57, 2.95s/it, loss=0.174, lr=1e-6]
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--save_starting_step=500', '--save_n_steps=0', '--train_text_encoder', '--pretrained_model_name_or_path=/content/stable-diffusion-v1-5', '--instance_data_dir=/content/data/alo86og', '--class_data_dir=/content/regularization_images/person_ddim', '--output_dir=/content/models/alo86og', '--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=photo of alo86og person', '--class_prompt=a photo of a person, ultra detailed', '--seed=75576', '--resolution=512', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--gradient_checkpointing', '--use_8bit_adam', '--learning_rate=1e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--center_crop', '--max_train_steps=2000', '--num_class_images=500']' returned non-zero exit status 1.
Something went wrong
Date: 14/02/2023 15:56
User: Jorge Vila
Awards:
OK, let's do this, send me the images for the training to my email jorge@jorgevila.com and I'll try to do it locally and will get back to you.
Date: 20/02/2023 10:39
User:
Awards:
Respuesta Aprobada
I have just sent the images to you,
Thanks.