You will need:
- some Python + good NVIDIA card (at least 16 GB VRAM)
- kohya-ss/sd-scripts with or without some extensions (e.g. cocktailpeanut/fluxgym)
- 1h - 2h - it depends on the number of images
- demo should take around 10-15 minutes
source env/bin/activate
./outputs/train.sh
----
nvidia-smiName and Surname:
Piotr Stapp
Experience in IT:
18 years
Position:
System Principal Architect
Specialization:
Cleaner, Cloud, Code, Infra
Distinguishing marks:
Don't Stapp me now!
Diffusion models learn to reverse a gradual noising process applied to training images.
The 2 key processes are:
Forward Process (Diffusion):
Noise is incrementally added to an image over several steps until it's indistinguishable from pure noise.
Reverse Process (Denoising):
The model learns how to undo this noising process by predicting and removing noise at each step to recover the original image.
Formally, they model:
A Markov chain that adds Gaussian noise:
q(xt∣xt−1)=N(xt;1−βtxt−1,βtI)q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{1 - \beta_t} x_{t-1}, \beta_t I)pθ(xt−1∣xt)=N(xt−1;μθ(xt,t),Σθ(xt,t))p_\theta(x_{t-1} | x_t) = \mathcal{N}(x_{t-1}; \mu_\theta(x_t, t), \Sigma_\theta(x_t, t))
A neural network learns the reverse denoising:
q(xt∣xt−1)=N(xt;1−βtxt−1,βtI)q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{1 - \beta_t} x_{t-1}, \beta_t I)pθ(xt−1∣xt)=N(xt−1;μθ(xt,t),Σθ(xt,t))p_\theta(x_{t-1} | x_t) = \mathcal{N}(x_{t-1}; \mu_\theta(x_t, t), \Sigma_\theta(x_t, t))
Source:
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising Diffusion Probabilistic Models. arXiv preprint arXiv:2006.11239, 2020
During image generation:
There are different types of conditioning:
A U-Net is a type of convolutional neural network (CNN) originally developed for biomedical image segmentation, but it’s now widely used in many image-to-image tasks, including diffusion models like Stable Diffusion.
Dae-Young Kang, Hieu Pham Duong, and Jung-Chul Park.
Application of Deep Learning in Dentistry and Implantology. Implantology, vol. 24, no. 3, 2020, pp. 148–181. arXiv
M. J. Cardoso, A. Li, A. D. McClure, et al. Deep learning tools for the cancer clinic: an open-source framework with head and neck contour validation. Radiation Oncology, 2022. Available at: https://www.researchgate.net/publication/358442721
Avi Chawla. Full Model Fine-Tuning vs LoRA vs QLoRA. Daily Dose of Data Science, 2025. Available at: https://blog.dailydoseofds.com/p/full-model-fine-tuning-vs-lora-vs
Theory is when we know everything, but nothing works!
Practice is when everything works, and no one knows why.
In this room, we combine theory with practice.
Nothing works, and no one knows why.
- prof. Jan Miodek
ELO score from 01-10-2024
Current:
A .safetensors file is a secure and efficient format for storing tensors - multi-dimensional arrays used in machine learning models.
It was developed as a safer alternative to the traditional PyTorch .pt or .pth files.
from diffusers import FluxPipeline
import torch, os, datetime
from dotenv import load_dotenv
def load_model(model_name, safetensor_path=None):
load_dotenv()
pipeline = FluxPipeline.from_pretrained(
model_name,token=os.getenv("HF_TOKEN"),torch_dtype=torch.float16)
pipeline.load_lora_weights(".", weight_name=safetensor_path)
pipeline = pipeline.to("cuda" if torch.cuda.is_available() else "cpu")
return pipeline
def main():
input_text = "A stapp as superman."
pipeline = load_model("black-forest-labs/FLUX.1-dev", "./lora/stapp-v1.safetensors")
for i in range(3):
time=datetime.datetime.now().strftime('%Y%m%d_%H%M%S')
output_image_path = f"output/generated_image_{time}_{i}.png"
pipeline(input_text).images[0].save(output_image_path)
print(f"Image generated and saved to {output_image_path}")
if __name__ == "__main__":
main()Stapp as Captain America
Stapp as Captain America
Stapp wearing glasses as Captain America, in a dynamic full-body pose, holding a large shield. Dark-blue, high-tech costume with star-patterned chest, hints of bronze. Helmeted, serious expression. Dramatic sunset sky with dark clouds. Shield is wooden-toned with cream bands and central star. Battlefield setting. Realistic lighting, photorealistic style. Heroic pose, strong posture, confident demeanor. Highly detailed, dramatic lighting.
Stapp wearing glasses as Captain America, in a dynamic full-body pose, holding a large shield. Dark-blue, high-tech costume with star-patterned chest, hints of bronze. Helmeted, serious expression. Dramatic sunset sky with dark clouds. Shield is wooden-toned with cream bands and central star. Battlefield setting. Realistic lighting, photorealistic style. Heroic pose, strong posture, confident demeanor. Highly detailed, dramatic lighting.
input_text_short = "Stapp with Barack Obama"
input_text_long = "Stapp with Barack Obama in the White House, both smiling and shaking hands. The entire scene should be visible, capturing the moment of camaraderie and friendship."
----
python src/main-simple.py# stapp lora
pipeline.load_lora_weights(
"./lora/stapp-v1.safetensors", weight_name="stapp.safetensors", adapter_name="stapp")
# IKEA lora
pipeline.load_lora_weights(
"./lora/ikea.safetensors", weight_name="ikea.safetensors", adapter_name="ikea")
pipeline.set_adapters(["stapp", "ikea"], adapter_weights=[0.9, 0.7])
# LoRA Ghibli + stapp lora
python src/mmain-ghibli-mix.py
# plain LoRA Ghibli
python src/main-ghibli-plain.py
# lora
python src/main-uncensored-lora.py
# plain flux
python src/main-uncensored-plain.py
python main_fill.pyimport torch
from diffusers import FluxFillPipeline
from diffusers.utils import load_image
from dotenv import load_dotenv
load_dotenv()
image = load_image("https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/cup.png")
mask = load_image("https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/cup_mask.png")
pipe = FluxFillPipeline.from_pretrained("black-forest-labs/FLUX.1-Fill-dev", torch_dtype=torch.bfloat16).to("cuda")
image = pipe(
prompt="a cup with a handle and NIKE logo",
image=image,
mask_image=mask,
height=1632,
width=1232,
guidance_scale=30,
num_inference_steps=50,
max_sequence_length=512,
generator=torch.Generator("cpu")#.manual_seed(0)
).images[0]
image.save(f"flux-fill-dev.png")
print("Successfully inpaint image")GPT-image-1 supports multiple modalities and features:
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising Diffusion Probabilistic Models. arXiv preprint arXiv:2006.11239, 2020.
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-Resolution Image Synthesis with Latent Diffusion Models. arXiv preprint arXiv:2112.10752, 2021.
Dae-Young Kang, Hieu Pham Duong, and Jung-Chul Park. Application of Deep Learning in Dentistry and Implantology. Implantology, vol. 24, no. 3, 2020, pp. 148–181. arXiv
M. J. Cardoso, A. Li, A. D. McClure, et al. Deep learning tools for the cancer clinic: an open-source framework with head and neck contour validation. Radiation Oncology, 2022. Available at: https://www.researchgate.net/publication/358442721
https://learn.microsoft.com/en-us/azure/virtual-machines/linux/n-series-driver-setup
Flux Style Test Gallery - https://enragedantelope.github.io/Styles-FluxDev/
FLUX NSFW / Nude Prompts and Learnings - https://betterwaifu.com/blog/flux-nsfw
Nextusos/Flux-Uncensored-V2 - https://github.com/Nextusos/Flux-Uncensored-V2
Flux docs - https://github.com/black-forest-labs/flux
photo AI (my repo) - https://github.com/ptrstpp950/photoAi
Live Portrait - https://huggingface.co/spaces/KwaiVGI/LivePortrait
Hugging Face - https://huggingface.co/