Spaces:

AngelBottomless
/

Lumina-Illustrious-v0.03

Running on Zero

App Files Files Community

AngelBottomless commited on Apr 18

Commit

0a4fc35

verified ·

1 Parent(s): 2da64d3

Upload 18 files

Browse files

Files changed (18) hide show

LICENSE +201 -0
cache.py +199 -0
data/__init__.py +2 -0
data/data_reader.py +758 -0
data/dataset.py +271 -0
grad_norm.py +60 -0
imgproc.py +80 -0
models/__init__.py +1 -0
models/components.py +54 -0
models/model.py +930 -0
parallel.py +97 -0
transport/__init__.py +70 -0
transport/dpm_solver.py +1386 -0
transport/integrators.py +122 -0
transport/path.py +201 -0
transport/transport.py +490 -0
transport/utils.py +56 -0
util/misc.py +150 -0

LICENSE ADDED Viewed

	@@ -0,0 +1,201 @@

+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+   END OF TERMS AND CONDITIONS
+   APPENDIX: How to apply the Apache License to your work.
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+   Copyright [yyyy] [name of copyright owner]
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.

cache.py ADDED Viewed

	@@ -0,0 +1,199 @@

+import argparse
+import os
+import hashlib
+import functools
+import json
+import yaml
+import numpy as np
+import torch
+import torch.nn.functional as F
+from PIL import Image
+from diffusers import AutoencoderKL
+from torchvision import transforms
+from tqdm import tqdm
+from imgproc import (
+    generate_crop_size_list,
+    to_rgb_if_rgba,
+    var_center_crop,
+)
+from data import read_general
+# ---- Flux VAE scaling parameters ----
+VAE_SCALE = 0.3611
+VAE_SHIFT = 0.1159
+def handle_image(image: Image.Image) -> Image.Image:
+    """
+    Ensure the image is in RGB format, converting from RGBA, L, P, etc.
+    Raise ValueError if unrecognized mode.
+    """
+    mode = image.mode.upper()
+    if mode == "RGB":
+        return image
+    elif mode == "RGBA":
+        return to_rgb_if_rgba(image)
+    elif mode in ("L", "P"):
+        return image.convert("RGB")
+    else:
+        raise ValueError(f"Unsupported image mode: {mode}")
+def encode(vae: AutoencoderKL, img_tensor: torch.Tensor, device: torch.device) -> torch.Tensor:
+    """
+    Encode a normalized image tensor to latents using the Flux VAE, applying SHIFT+SCALE.
+    img_tensor shape: (C, H, W) or (1,C,H,W). We'll reshape to (1,C,H,W) if needed.
+    """
+    if img_tensor.dim() == 3:
+        img_tensor = img_tensor.unsqueeze(0)  # (1,C,H,W)
+    img_tensor = img_tensor.to(device, non_blocking=True)
+    with torch.no_grad():
+        # bfloat16 casting for VAE encode
+        latent_dist = vae.encode(img_tensor).latent_dist
+        # use .mode()[0] or .sample() depending on whether you prefer the mode or random sample
+        latents = latent_dist.mode()[0]
+        latents = (latents - VAE_SHIFT) * VAE_SCALE
+    return latents.float()
+def load_image_paths_from_yaml(yaml_path: str) -> list:
+    """
+    Parse a YAML containing a 'META' key with paths to .jsonl files.
+    For each .jsonl (with 'type' == 'image_text'), read lines of JSON
+    where we expect an 'image_path' field. Collect these paths in a list.
+    """
+    with open(yaml_path, "r", encoding="utf-8") as f:
+        data = yaml.safe_load(f)
+    image_files = []
+    meta_list = data.get("META", [])
+    for meta_item in meta_list:
+        # Example:  path=/data0/DanbooruWebp/booru1116Webp.jsonl
+        #           type=image_text
+        ftype = meta_item.get("type", "")
+        fpath = meta_item.get("path", "")
+        if ftype != "image_text":
+            # skip unknown types
+            continue
+        if not os.path.isfile(fpath):
+            print(f"[Warning] JSONL file not found: {fpath}")
+            continue
+        # Open .jsonl and parse lines
+        with open(fpath, "r", encoding="utf-8") as fin:
+            for line in fin:
+                line = line.strip()
+                if not line:
+                    continue
+                try:
+                    obj = json.loads(line)
+                    if "image_path" in obj:
+                        # This is the actual disk path for the image
+                        image_files.append(obj["image_path"])
+                except Exception as e:
+                    print(f"[Warning] JSON parse error in {fpath}: {e}")
+                    continue
+    return image_files
+def main():
+    parser = argparse.ArgumentParser(description="Cache image latents using Flux VAE")
+    parser.add_argument("--data_yaml", type=str, required=True,
+                        help="Path to dataset YAML config (with META -> .jsonl paths)")
+    parser.add_argument("--resolution", type=int, required=True,
+                        help="Target resolution (e.g., 256, 512, 1024) for center-crop/resize")
+    parser.add_argument("--total_split", type=int, default=1,
+                        help="Total number of parallel splits/workers")
+    parser.add_argument("--current_worker_index", type=int, default=0,
+                        help="Index of this worker (0-based)")
+    parser.add_argument("--patch_size", type=int, default=8,
+                        help="Patch size used for generating potential crop sizes")
+    parser.add_argument("--random_top_k", type=int, default=1,
+                        help="Number of top crop options from var_center_crop to randomly pick")
+    args = parser.parse_args()
+    # ------------------------------------------------------------------
+    # 1) Setup VAE model for encoding:
+    # ------------------------------------------------------------------
+    vae = AutoencoderKL.from_pretrained(
+        "black-forest-labs/FLUX.1-dev",
+        subfolder="vae",
+        torch_dtype=torch.float16
+    ).eval()
+    device = torch.device(
+        f"cuda:0" if torch.cuda.is_available() else "cpu"
+    )
+    vae.to(device)
+    # ------------------------------------------------------------------
+    # 2) Prepare your transform (crop -> tensor -> normalize).
+    #    This must match how images are processed before training.
+    # ------------------------------------------------------------------
+    max_num_patches = round((args.resolution / (args.patch_size * 1.0)) ** 2)
+    crop_size_list = generate_crop_size_list(max_num_patches, args.patch_size)
+    image_transform = transforms.Compose([
+        transforms.Lambda(functools.partial(var_center_crop,
+                                            crop_size_list=crop_size_list,
+                                            random_top_k=args.random_top_k)),
+        transforms.ToTensor(),
+        transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
+    ])
+    # ------------------------------------------------------------------
+    # 3) Load image paths from YAML / JSONL references:
+    # ------------------------------------------------------------------
+    image_files = load_image_paths_from_yaml(args.data_yaml)
+    if not image_files:
+        print("[INFO] No image files found. Check your YAML & JSONL contents.")
+        return
+    # ------------------------------------------------------------------
+    # 4) Process each image => transform => encode => save .npz
+    # ------------------------------------------------------------------
+    worker_idx = args.current_worker_index
+    total_split = args.total_split
+    res = args.resolution
+    for image_path in tqdm(image_files, desc=f"Worker {worker_idx}"):
+        # 4.a) Determine if this file belongs to the current worker
+        hash_val = int(hashlib.sha1(image_path.encode("utf-8")).hexdigest(), 16)
+        if hash_val % total_split != worker_idx:
+            continue
+        # 4.b) Construct cache path
+        base, _ = os.path.splitext(image_path)
+        out_path = f"{base}_{res}.npz"
+        if os.path.exists(out_path):
+            continue
+        # 4.c) Read the image from disk & handle mode
+        try:
+            pil_image = Image.open(read_general(image_path))
+            pil_image = handle_image(pil_image)  # ensure RGB
+        except Exception as e:
+            print(f"[Warning] Could not open image {image_path}: {e}")
+            continue
+        # Optionally, you can do a simple resize (if your training expects it).
+        # Otherwise, rely solely on var_center_crop to pick a final crop size.
+        pil_image = pil_image.resize((res, res), Image.Resampling.LANCZOS)
+        # 4.d) Apply var_center_crop -> toTensor -> normalize
+        try:
+            transformed_tensor = image_transform(pil_image)  # shape=(3,H,W)
+        except Exception as e:
+            print(f"[Warning] Skipping {image_path} due to transform error: {e}")
+            continue
+        transformed_tensor = transformed_tensor.to(torch.float16)
+        # 4.e) Encode with Flux VAE (shift+scale) => latent
+        latents = encode(vae, transformed_tensor, device=device)
+        latents_np = latents.cpu().numpy()  # shape=(C, H//8, W//8) typically
+        # 4.f) Save latents to .npz
+        try:
+            np.savez_compressed(out_path, latent=latents_np)
+        except Exception as e:
+            print(f"[Error] Saving .npz for {image_path} failed: {e}")
+if __name__ == "__main__":
+    main()

data/__init__.py ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ from .data_reader import *
2	+ from .dataset import *

data/data_reader.py ADDED Viewed

	@@ -0,0 +1,758 @@

+import os
+import json
+import time
+import logging
+from io import BytesIO
+from typing import Union, Optional, Tuple, Dict, Any, Protocol, List
+import requests
+from PIL import Image
+# Disable Pillow’s large image pixel limit.
+Image.MAX_IMAGE_PIXELS = None
+#####################################################
+# Configure Logging with Level Argument
+#####################################################
+logger = logging.getLogger(__name__)
+def configure_logging(level: Union[str, int] = logging.INFO):
+    """
+    Configures the root logger (and thus 'logger') to a specific logging level.
+    :param level: Either a string like 'DEBUG'/'INFO'/'WARNING'
+                  or an integer like logging.DEBUG/logging.INFO/etc.
+    """
+    if isinstance(level, str):
+        level = getattr(logging, level.upper(), logging.INFO)
+    logging.basicConfig(
+        level=level,
+        format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
+    )
+# Global Ceph/petrel client
+client = None  # type: ignore
+# Cache for JSON data loaded from a repo
+loaded_jsons: Dict[str, Any] = {}
+#####################################################
+# Helpers for Hugging Face Token & HTTP Session
+#####################################################
+def _get_hf_access_token() -> str:
+    """
+    Retrieves the Hugging Face access token from the environment or from 'env.json'.
+    Raises ValueError if not found.
+    """
+    hf_access_token = os.environ.get("HF_ACCESS_TOKEN")
+    if not hf_access_token and os.path.isfile("env.json"):
+        with open("env.json", "r", encoding="utf-8") as f:
+            env_data = json.load(f)
+            hf_access_token = env_data.get("HF_ACCESS_TOKEN")
+    if not hf_access_token:
+        return None
+    return hf_access_token
+def get_hf_session() -> requests.Session:
+    """
+    Creates and returns a requests.Session object with the Hugging Face token in the headers.
+    """
+    token = _get_hf_access_token()
+    session = requests.Session()
+    if token:
+        session.headers.update({"Authorization": f"Bearer {token}"})
+    return session
+#####################################################
+# Ceph/Petrel Client Initialization
+#####################################################
+def init_ceph_client_if_needed():
+    """
+    Initializes the global Ceph/petrel `client` if it has not yet been set.
+    """
+    global client
+    if client is None:
+        logger.info("Initializing Ceph/petrel client...")
+        start_time = time.time()
+        from petrel_client.client import Client  # noqa
+        client = Client("./petreloss.conf")
+        end_time = time.time()
+        logger.info(
+            f"Initialized Ceph/petrel client in {end_time - start_time:.2f} seconds"
+        )
+#####################################################
+# Reading & Caching JSON
+#####################################################
+def read_json_from_repo(
+    session: requests.Session, repo_addr: str, file_name: str, cache_dir: str
+) -> Optional[Dict[str, Any]]:
+    """
+    Reads JSON from a given repository address and file name, with caching:
+      1. If cached in memory (loaded_jsons), returns it.
+      2. Otherwise, checks local disk cache (cache_dir).
+      3. If not found on disk, downloads and saves it locally.
+    :param session: requests.Session
+    :param repo_addr: URL base (e.g. "https://github.com/user/repo/tree/main")
+    :param file_name: Name of the JSON file
+    :param cache_dir: Local directory to store cache
+    :return: Parsed JSON object or None
+    """
+    unique_key = f"{repo_addr}/{file_name}"
+    if unique_key in loaded_jsons:
+        logger.debug(f"Found in-memory cache for {unique_key}")
+        return loaded_jsons[unique_key]
+    # Check local disk cache
+    cache_file = os.path.join(cache_dir, file_name)
+    if os.path.exists(cache_file):
+        logger.debug(f"Reading from local cache: {cache_file}")
+        with open(cache_file, "r", encoding="utf-8") as f:
+            result = json.load(f)
+            loaded_jsons[unique_key] = result
+            return result
+    else:
+        # Download and cache
+        url = f"{repo_addr}/{file_name}"
+        logger.debug(f"Downloading JSON from {url}")
+        response = session.get(url)
+        try:
+            response.raise_for_status()
+        except requests.HTTPError:
+            if response.status_code == 404:
+                loaded_jsons[unique_key] = None
+                return None
+            raise
+        data = response.json()
+        os.makedirs(cache_dir, exist_ok=True)
+        with open(cache_file, "w", encoding="utf-8") as f:
+            json.dump(data, f, indent=4)
+        loaded_jsons[unique_key] = data
+        return data
+def load_json_index(
+    session: requests.Session,
+    json_url: str,
+    cache_path: Optional[str] = None,
+) -> Optional[Dict[str, Any]]:
+    """
+    Download (if needed) and cache a JSON file from `json_url`.
+    If `cache_path` is provided, data is saved/loaded from that path.
+    :param session: requests.Session
+    :param json_url: Direct URL to the JSON file
+    :param cache_path: Local path for caching the JSON
+    :return: Parsed JSON (dict) or None if 404
+    """
+    if cache_path is not None and os.path.isfile(cache_path):
+        logger.debug(f"Found cached JSON at {cache_path}")
+        with open(cache_path, "r", encoding="utf-8") as f:
+            return json.load(f)
+    logger.debug(f"Requesting JSON index from {json_url}")
+    resp = session.get(json_url)
+    if resp.status_code == 404:
+        logger.warning(f"JSON index not found (404): {json_url}")
+        return None
+    resp.raise_for_status()
+    data = resp.json()
+    if cache_path is not None:
+        os.makedirs(os.path.dirname(cache_path), exist_ok=True)
+        with open(cache_path, "w", encoding="utf-8") as f:
+            json.dump(data, f)
+        logger.debug(f"Saved JSON index to {cache_path}")
+    return data
+#####################################################
+# Downloading Byte Ranges
+#####################################################
+def download_range(session: requests.Session, url: str, start: int, end: int) -> bytes:
+    """
+    Downloads the inclusive byte range [start, end] from the specified URL via
+    an HTTP Range request and returns the raw bytes.
+    :param session: A requests.Session with appropriate headers
+    :param url: The file URL to download
+    :param start: Start byte (inclusive)
+    :param end: End byte (inclusive)
+    :return: Raw bytes of the specified range
+    """
+    headers = {"Range": f"bytes={start}-{end}"}
+    logger.debug(f"Downloading range {start}-{end} from {url}")
+    response = session.get(url, headers=headers, stream=True)
+    response.raise_for_status()
+    return response.content
+#####################################################
+# Repository Protocol and Implementations
+#####################################################
+class BaseRepository(Protocol):
+    """
+    A Protocol that each repository must implement. Must have a method:
+        find_image(session, image_id) -> (tar_url, start_offset, end_offset, filename) or None
+    """
+    def find_image(
+        self, session: requests.Session, image_id: Union[int, str]
+    ) -> Optional[Tuple[str, int, int, str]]: ...
+def primary_subfolder_from_id(x: int) -> str:
+    """
+    Given an integer image ID, return a subfolder name based on the ID mod 1000.
+    E.g., 7502245 -> '0245'.
+    """
+    if not isinstance(x, int):
+        raise ValueError(f"Primary subfolder requires an integer ID, given: {x}")
+    val = x % 1000
+    return f"{val:04d}"
+def secondary_chunk_from_id(x: int, chunk_size: int = 1000) -> int:
+    """
+    Compute the chunk index for a 'secondary' dataset given an image ID.
+    """
+    return x % chunk_size
+class PrimaryRepository(BaseRepository):
+    """
+    Example of a 'primary' dataset repository:
+      - .tar files named "NNNN.tar" where NNNN = image_id % 1000
+      - Each .tar file has a companion JSON index "NNNN.json"
+      - The JSON maps "7501000.jpg" -> [start_offset, end_offset]
+    """
+    def __init__(self, base_url: str, cache_dir: str, entry: Optional[str]=None):
+        self.base_url = base_url
+        self.cache_dir = cache_dir
+        self.entry = entry
+        os.makedirs(self.cache_dir, exist_ok=True)
+    def _build_primary_id_map(self, json_index: Dict[str, Any]) -> Dict[int, str]:
+        """
+        From a JSON index like { "7501000.jpg": [start, end], ... },
+        create a map of integer ID -> filename (e.g. 7501000 -> "7501000.jpg").
+        """
+        out = {}
+        for filename in json_index.keys():
+            root, _ = os.path.splitext(filename)
+            try:
+                num = int(root)
+                out[num] = filename
+            except ValueError:
+                continue
+        return out
+    def find_image(
+        self, session: requests.Session, image_id: Union[int, str]
+    ) -> Optional[Tuple[str, int, int, str]]:
+        if isinstance(image_id, str):
+            try:
+                image_id = int(image_id)
+            except ValueError:
+                logger.error(f"Invalid image ID: {image_id}")
+                return None
+        folder = primary_subfolder_from_id(image_id)
+        json_name = f"{folder}.json"
+        json_url = f"{self.base_url}/{json_name}"
+        cache_path = os.path.join(self.cache_dir, json_name)
+        logger.debug(f"Looking for image {image_id} in {json_name} (folder: {folder})")
+        json_index = load_json_index(session, json_url, cache_path)
+        if not json_index:
+            logger.debug(f"No JSON index found for folder {folder}")
+            return None
+        # Build a map integer_id -> filename
+        id_map = self._build_primary_id_map(json_index)
+        filename = id_map.get(image_id)
+        if not filename:
+            logger.debug(f"Image ID {image_id} not found in index for folder {folder}")
+            return None
+        start_offset, end_offset = json_index[filename]
+        tar_url = f"{self.base_url}/{folder}.tar"
+        logger.debug(
+            f"Found image {image_id} in {folder}.tar ({start_offset}-{end_offset})"
+        )
+        return tar_url, start_offset, end_offset, filename
+class SecondaryRepository(BaseRepository):
+    """
+    Example for a 'secondary' dataset that:
+      - Uses chunk-based storage (each chunk is named data-XXXX.tar)
+      - For each chunk, there's a corresponding data-XXXX.json with a "files" mapping
+    """
+    def __init__(
+        self,
+        tar_base_url: str,
+        json_base_url: str,
+        cache_dir: str,
+        chunk_size: int = 1000,
+        entry: Optional[str]=None
+    ):
+        self.tar_base_url = tar_base_url
+        self.json_base_url = json_base_url
+        self.cache_dir = cache_dir
+        self.chunk_size = chunk_size
+        self.entry = entry
+        os.makedirs(self.cache_dir, exist_ok=True)
+    def find_image(
+        self, session: requests.Session, image_id: Union[int, str]
+    ) -> Optional[Tuple[str, int, int, str]]:
+        if isinstance(image_id, str):
+            try:
+                image_id = int(image_id)
+            except ValueError:
+                logger.error(f"Invalid image ID: {image_id}")
+                return None
+        chunk_index = secondary_chunk_from_id(image_id, self.chunk_size)
+        data_name = f"data-{chunk_index:04d}"
+        json_url = f"{self.json_base_url}/{data_name}.json"
+        cache_path = os.path.join(self.cache_dir, f"{data_name}.json")
+        logger.debug(f"Looking for image {image_id} in chunk {data_name}")
+        data = load_json_index(session, json_url, cache_path)
+        if not data or "files" not in data:
+            logger.debug(f"No file mapping found in {data_name}.json")
+            return None
+        filename_key = f"{image_id}.webp"
+        file_dict = data["files"].get(filename_key)
+        if not file_dict:
+            logger.debug(f"Image ID {image_id} not found in chunk {data_name}")
+            return None
+        offset = file_dict["offset"]
+        size = file_dict["size"]
+        start_offset = offset
+        end_offset = offset + size - 1  # inclusive
+        tar_url = f"{self.tar_base_url}/{data_name}.tar"
+        logger.info(
+            f"Found image {image_id} in {data_name}.tar ({start_offset}-{end_offset})"
+        )
+        return tar_url, start_offset, end_offset, filename_key
+class CustomRepository(BaseRepository):
+    """
+    Repository that relies on a single 'all_indices.json' plus a structure:
+        key -> "tar_path#file_name"
+        and then a nested mapping for tar_path -> file_name -> [start_offset, end_offset]
+    """
+    def __init__(self, base_url: str, cache_dir: str, entry: Optional[str]=None):
+        self.base_url = base_url
+        self.cache_dir = cache_dir
+        self.entry = entry
+        os.makedirs(self.cache_dir, exist_ok=True)
+    def get_range_for_key(
+        self, session: requests.Session, key: Union[int, str]
+    ) -> Optional[Tuple[str, int, int, str]]:
+        # all_indices.json: { key: "tar_path#file_name", tar_path: {...} }
+        key = str(key)
+        key_index = read_json_from_repo(
+            session, self.base_url, "internal_map.json", self.cache_dir
+        )
+        if key_index is None:
+            logger.debug(f"No internal_map.json found in custom repo: {self.base_url}")
+            return None
+        real_key = key_index.get(key)
+        if not real_key:
+            logger.debug(f"Key {key} not found in custom repo index")
+            return None
+        repo_index = read_json_from_repo(
+            session, self.base_url, "all_indices.json", self.cache_dir
+        )
+        if repo_index is None:
+            logger.debug(f"No all_indices.json found in custom repo: {self.base_url}")
+            return None
+        tar_path, file_name = real_key.split("#", 1)
+        if tar_path not in repo_index:
+            logger.debug(f"Key {real_key} not found in custom repo index")
+            return None
+        tar_info = repo_index.get(tar_path, {}).get(file_name, None)
+        if not tar_info or len(tar_info) < 2:
+            return None
+        start, end = tar_info
+        tar_url = f"{self.base_url}/{tar_path}"
+        logger.info(
+            f"Found key '{key}' in custom repository {tar_path} ({start}-{end})"
+        )
+        return tar_url, start, end, file_name
+    def find_image(
+        self, session: requests.Session, image_id: str
+    ) -> Optional[Tuple[str, int, int, str]]:
+        return self.get_range_for_key(session, image_id)
+#####################################################
+# Repository Configuration
+#####################################################
+class RepositoryConfig:
+    """
+    Manages loading/storing repository configurations from a JSON file,
+    and instantiates the corresponding repository objects, including custom 'entry' prefixes.
+    """
+    def __init__(self, config_path: str):
+        """
+        :param config_path: Path to the JSON configuration file.
+        """
+        self.config_path = config_path
+        # Lists to hold instantiated repository objects
+        self.repositories: List[BaseRepository] = []
+        self.custom_repositories: List[CustomRepository] = []
+        # Map from entry string -> list of repositories that handle that entry
+        self.entry_map: Dict[str, List[BaseRepository]] = {}
+    def load(self):
+        """
+        Reads the config file from disk and populates repositories and entry_map.
+        """
+        if not os.path.isfile(self.config_path):
+            raise FileNotFoundError(f"Config file not found: {self.config_path}")
+        logger.debug(f"Loading repository configuration from {self.config_path}")
+        print(f"Loading repository configuration from {self.config_path}")
+        with open(self.config_path, "r", encoding="utf-8") as f:
+            data = json.load(f)
+        self.from_dict(data)
+    def from_dict(self, data: Dict[str, Any]):
+        """
+        Populates repositories/customs from a dictionary, building self.entry_map as well.
+        :param data: A dict corresponding to the structure of `repository.json`.
+        """
+        # Clear existing repos
+        self.repositories.clear()
+        self.custom_repositories.clear()
+        self.entry_map.clear()
+        # Load standard repositories
+        repos_config = data.get("repositories", [])
+        for repo_dict in repos_config:
+            repo_obj = self._create_repository(repo_dict)
+            if repo_obj is not None:
+                self.repositories.append(repo_obj)
+                # If there's an "entry", register in entry_map
+                entry_name = repo_dict.get("entry")
+                if entry_name:
+                    self.entry_map.setdefault(entry_name, []).append(repo_obj)
+        # Load custom repositories
+        custom_config = data.get("customs", [])
+        for custom_dict in custom_config:
+            custom_obj = self._create_custom_repository(custom_dict)
+            if custom_obj is not None:
+                self.custom_repositories.append(custom_obj)
+                entry_name = custom_dict.get("entry")
+                if entry_name:
+                    self.entry_map.setdefault(entry_name, []).append(custom_obj)
+        logger.info(
+            f"Loaded {len(self.repositories)} standard repositories, "
+            f"{len(self.custom_repositories)} custom repositories, "
+            f"with {len(self.entry_map)} distinct entries."
+        )
+    def _create_repository(self, config: Dict[str, Any]) -> Optional[BaseRepository]:
+        """
+        Internal helper to instantiate a standard repository based on 'type'.
+        """
+        repo_type = config.get("type")
+        entry = config.get("entry", None)  # new field
+        if repo_type == "primary":
+            base_url = config.get("base_url")
+            cache_dir = config.get("cache_dir")
+            if base_url and cache_dir:
+                return PrimaryRepository(
+                    base_url=base_url,
+                    cache_dir=cache_dir,
+                    entry=entry,  # pass to constructor
+                )
+            else:
+                logger.warning(
+                    "Invalid 'primary' repo config; missing base_url or cache_dir."
+                )
+                return None
+        elif repo_type == "secondary":
+            tar_base_url = config.get("tar_base_url")
+            json_base_url = config.get("json_base_url")
+            cache_dir = config.get("cache_dir")
+            chunk_size = config.get("chunk_size", 1000)
+            if tar_base_url and json_base_url and cache_dir:
+                return SecondaryRepository(
+                    tar_base_url=tar_base_url,
+                    json_base_url=json_base_url,
+                    cache_dir=cache_dir,
+                    chunk_size=chunk_size,
+                    entry=entry,
+                )
+            else:
+                logger.warning(
+                    "Invalid 'secondary' repo config; missing tar_base_url/json_base_url/cache_dir."
+                )
+                return None
+        else:
+            logger.warning(
+                f"Repository type '{repo_type}' is not recognized or not supported."
+            )
+            return None
+    def _create_custom_repository(
+        self, config: Dict[str, Any]
+    ) -> Optional[CustomRepository]:
+        """
+        Internal helper to instantiate a custom repository.
+        """
+        repo_type = config.get("type")
+        entry = config.get("entry", None)
+        if repo_type == "custom":
+            base_url = config.get("base_url")
+            cache_dir = config.get("cache_dir")
+            if base_url and cache_dir:
+                return CustomRepository(
+                    base_url=base_url, cache_dir=cache_dir, entry=entry
+                )
+            else:
+                logger.warning(
+                    "Invalid 'custom' repo config; missing base_url or cache_dir."
+                )
+                return None
+        else:
+            logger.warning(
+                f"Custom repository type '{repo_type}' is not recognized or not supported."
+            )
+            return None
+    def to_dict(self) -> Dict[str, Any]:
+        """
+        Reconstructs the config dictionary from the current repository objects.
+        """
+        return {
+            "repositories": [self._repo_to_dict(repo) for repo in self.repositories],
+            "customs": [
+                self._custom_repo_to_dict(crepo) for crepo in self.custom_repositories
+            ],
+        }
+    def _repo_to_dict(self, repo: BaseRepository) -> Dict[str, Any]:
+        """
+        Rebuilds the config dict for a standard repository from its attributes.
+        """
+        # We assume each repository has .entry
+        if hasattr(repo, "entry"):
+            entry_val = getattr(repo, "entry", None)
+        else:
+            entry_val = None
+        if isinstance(repo, PrimaryRepository):
+            return {
+                "type": "primary",
+                "base_url": repo.base_url,
+                "cache_dir": repo.cache_dir,
+                "entry": entry_val,
+            }
+        elif isinstance(repo, SecondaryRepository):
+            return {
+                "type": "secondary",
+                "tar_base_url": repo.tar_base_url,
+                "json_base_url": repo.json_base_url,
+                "cache_dir": repo.cache_dir,
+                "chunk_size": repo.chunk_size,
+                "entry": entry_val,
+            }
+        else:
+            return {"type": "unknown", "entry": entry_val}
+    def _custom_repo_to_dict(self, repo: CustomRepository) -> Dict[str, Any]:
+        """
+        Rebuilds the config dict for a CustomRepository from its attributes.
+        """
+        return {
+            "type": "custom",
+            "base_url": repo.base_url,
+            "cache_dir": repo.cache_dir,
+            "entry": getattr(repo, "entry", None),
+        }
+    def save(self, path: Optional[str] = None):
+        """
+        Saves the current config (based on the instantiated repo objects) back to a JSON file.
+        :param path: Optional; if None, uses self.config_path.
+        """
+        if path is None:
+            path = self.config_path
+        data = self.to_dict()
+        with open(path, "w", encoding="utf-8") as f:
+            json.dump(data, f, indent=4)
+        logger.info(f"Repository configuration saved to {path}")
+    def get_repositories_for_entry(self, entry: str) -> List[Union[BaseRepository, CustomRepository]]:
+        """
+        Retrieves the list of repositories (both standard and custom) that are mapped to a given entry prefix.
+        """
+        return self.entry_map.get(entry, [])
+    def search_entry_and_key(self, entry: str, key: str) -> Optional[BytesIO]:
+        """
+        Returns a RepositoryPool object that can be used to download images for a given entry.
+        """
+        repositories = self.get_repositories_for_entry(entry)
+        if not repositories:
+            logger.warning(f"No repositories found for entry: {entry}")
+            return None
+        base_repos = BaseRepositoryPool(repositories)
+        result = base_repos.download_by_id(key)
+        if result:
+            return result
+        return None
+#####################################################
+class RepositoryPool(Protocol):
+    """
+    A Protocol for a set of repositories that can be searched for a given image ID.
+    """
+    ### class to hold download_by_id method
+    def download_by_id(self, image_id: int) -> Optional[BytesIO]: ...
+class BaseRepositoryPool(RepositoryPool):
+    """
+    A pool of BaseRepository objects, allowing for a unified download_by_id method.
+    """
+    def __init__(self, repositories: List[BaseRepository]):
+        self.repositories = repositories
+    ### class to hold download_by_id method
+    def download_by_id(self, image_id: int) -> Optional[BytesIO]:
+        session = get_hf_session()
+        for repo in self.repositories:
+            info = repo.find_image(session, image_id)
+            logger.debug(f"Searching for image {image_id} in {repo}, result: {info}")
+            if info:
+                break
+        if not info:
+            msg = f"Image ID {image_id} was not found in any repository. (Base)"
+            logger.info(msg)
+            return None
+        tar_url, start_offset, end_offset, _ = info
+        file_bytes = download_range(session, tar_url, start_offset, end_offset)
+        logger.debug(f"Successfully downloaded image {image_id} from {tar_url}")
+        return BytesIO(file_bytes)
+#####################################################
+# Universal Read Function
+#####################################################
+REPOSITORY_CONFIG: RepositoryConfig = RepositoryConfig(r"repository.json")
+REPOSITORY_CONFIG.load()
+def read_general(path: str) -> Union[str, BytesIO]:
+    """
+    A universal read function:
+      - If path starts with "danbooru://", parse out the integer ID and download
+        from configured repositories. Returns a BytesIO of the file content.
+      - If path starts with "s3://", uses Ceph/petrel client to retrieve data.
+      - Otherwise, if the path doesn't exist locally, tries custom repositories.
+      - If none of the above, returns the path string as-is (assuming it's local or standard).
+    :param path: The path or URI to read
+    :return: Either a local path string or an in-memory BytesIO
+    """
+    config = REPOSITORY_CONFIG
+    if path.startswith("s3://"):
+        init_ceph_client_if_needed()
+        logger.debug(f"Downloading from Ceph/petrel: {path}")
+        file_data = client.get(path)  # type: ignore
+        return BytesIO(file_data)
+    if "://" in path:
+        parts = path.split("://", 1)
+        entry = parts[0]
+        result = config.search_entry_and_key(entry, parts[1])
+        if result:
+            return result
+        raise FileNotFoundError(f"Image ID not found in any repository: {path}")
+    # If the path isn't local, try custom repositories
+    if not os.path.exists(path):
+        raise FileNotFoundError(f"File not found: {path}")
+    # Otherwise, assume it's a normal local path
+    logger.debug(f"Returning local path: {path}")
+    return path
+if __name__ == "__main__":
+    # 2) Configure logging at the desired level
+    configure_logging("DEBUG")  # or "INFO", "WARNING", etc.
+    # 3) Example usage:
+    # try:
+    #     data = read_general("danbooru://6706939")
+    #     if isinstance(data, BytesIO):
+    #         img = Image.open(data)
+    #         img.show()
+    # except FileNotFoundError as e:
+    #     logger.error(str(e))
+    # try:
+    #     data = read_general("danbooru://8884993")
+    #     if isinstance(data, BytesIO):
+    #         img = Image.open(data)
+    #         img.show()
+    # except FileNotFoundError as e:
+    #     logger.error(str(e))
+    #
+    try:
+        data = read_general("anime://fancaps/8183457")
+        if isinstance(data, BytesIO):
+            img = Image.open(data)
+            img.show()
+    except FileNotFoundError as e:
+        logger.error(str(e))
+    # Other usage examples:
+    # data2 = read_general("s3://bucket_name/path/to/object.jpg")
+    # data3 = read_general("some/local/path.jpg")

data/dataset.py ADDED Viewed

	@@ -0,0 +1,271 @@

+from abc import ABC, abstractmethod
+import copy
+import json
+import logging
+import os
+from pathlib import Path
+import random
+from time import sleep
+import traceback
+import warnings
+import pandas as pd
+from tqdm import tqdm
+import h5py
+import torch.distributed as dist
+from torch.utils.data import Dataset
+import yaml
+logger = logging.getLogger(__name__)
+class DataBriefReportException(Exception):
+    def __init__(self, message=None):
+        self.message = message
+    def __str__(self):
+        return f"{self.__class__}: {self.message}"
+class DataNoReportException(Exception):
+    def __init__(self, message=None):
+        self.message = message
+    def __str__(self):
+        return f"{self.__class__}: {self.message}"
+class ItemProcessor(ABC):
+    @abstractmethod
+    def process_item(self, data_item, training_mode=False):
+        raise NotImplementedError
+def is_huggingface_path(path: str) -> bool:
+    # Heuristic: Hugging Face dataset paths are in format "user/dataset"
+    # and not an existing local file or directory.
+    return ("/" in path and not os.path.exists(path) and not "booru" in path) or (os.path.exists(path) and os.path.isdir(path))
+global_log_count = 0
+def log_every_n(n, msg):
+    global global_log_count
+    if global_log_count % n == 0:
+        logger.warning(msg)
+    global_log_count += 1
+class MyDataset(Dataset):
+    def __init__(self, config_path, item_processor: ItemProcessor, cache_on_disk=False):
+        logger.info(f"read dataset config from {config_path}")
+        with open(config_path, "r") as f:
+            self.config = yaml.load(f, Loader=yaml.FullLoader)
+        logger.info("DATASET CONFIG:")
+        logger.info(self.config)
+        self.cache_on_disk = cache_on_disk
+        if self.cache_on_disk:
+            cache_dir = self._get_cache_dir(config_path)
+            if int(os.environ["LOCAL_RANK"]) == 0:
+                local_rank = dist.get_rank()
+                print(f"Building cache on rank {local_rank}")
+                self._collect_annotations_and_save_to_cache(cache_dir)
+            dist.barrier()
+            ann, group_indice_range = self._load_annotations_from_cache(cache_dir)
+        else:
+            cache_dir = None
+            ann, group_indice_range = self._collect_annotations()
+        self.ann = ann
+        self.group_indices = {key: list(range(val[0], val[1])) for key, val in group_indice_range.items()}
+        logger.info(f"total length: {len(self)}")
+        self.item_processor = item_processor
+    def __len__(self):
+        return len(self.ann)
+    def _collect_annotations(self):
+        meta_type_to_caption_type = {
+            "image_text" : "prompt",
+            "image_nl_caption" : "sentence",
+            "image_alttext" : "alttext",
+            "default" : "prompt",
+            "super_high_quality_caption" : "super_high_quality_caption",
+            "image_tags" : "tags",
+        }
+        switchable_keys = ["prompt", "sentence", "alttext", "super_high_quality_caption", "tags"]
+        group_ann = {}
+        for meta in self.config["META"]:
+            meta_path, meta_type = meta["path"], meta.get("type", "default")
+            meta_key = meta_type_to_caption_type.get(meta_type, "prompt")
+            logger.info(f"Reading {meta_path} with type {meta_type} and key {meta_key}")
+            if is_huggingface_path(meta_path):
+                raise NotImplementedError("Hugging Face datasets are not supported in this minimal example.")
+            else:
+                meta_ext = os.path.splitext(meta_path)[-1]
+                if meta_ext == ".json":
+                    # with open(meta_path) as f:
+                    #     meta_l = json.load(f)
+                    with open(meta_path, 'r') as json_file:
+                        f = json_file.read()
+                        meta_l = json.loads(f)
+                elif meta_ext == ".jsonl":
+                    meta_l = []
+                    with open(meta_path) as f:
+                        for i, line in tqdm(enumerate(f), desc=f"Reading {meta_path}"):
+                            try:
+                                read_result = json.loads(line)
+                                if isinstance(read_result, dict):
+                                    for key in switchable_keys:
+                                        if key in read_result and meta_key != key:
+                                            read_result[meta_key] = read_result[key]
+                                            read_result.pop(key)
+                                            break
+                                    if read_result[meta_key].strip():
+                                        meta_l.append(read_result)
+                                    else:
+                                        logger.error(f"Empty prompt in {meta_path} line {i}, file: {meta_path}")
+                                    log_every_n(10000, f"line {i}: {read_result}")
+                                else:
+                                    raise ValueError(f"Expected a dictionary, got {type(read_result)} for {meta_path} line {i}")
+                            except json.decoder.JSONDecodeError as e:
+                                logger.error(f"Error decoding the following jsonl line ({i}):\n{line.rstrip()}")
+                                raise e
+                elif meta_ext == ".parquet":
+                    meta_l = []
+                    df = pd.read_parquet(meta_path)  # Read the Parquet file into a DataFrame
+                    pq_cols = meta.get("pq_cols", None)
+                    if pq_cols is not None:
+                        cols = pq_cols.split(",")
+                    else:
+                        cols = None
+                    if cols:
+                        if "index" not in cols:
+                            raise ValueError(f"The 'index' column must be included in the 'pq_cols' list., in {meta_path}")
+                        if not all([col in df.columns for col in cols]):
+                            raise ValueError(f"Columns in 'pq_cols' must be present in the Parquet file., in {meta_path}")
+                    for _, row in tqdm(df.iterrows(), total=len(df), desc=f"Reading {meta_path}"):
+                        # Pull the 'index' column (whatever column indicates image index/id)
+                        index_val = row["index"]
+                        # For each *other* column in the row, if not None/NaN, use it as "prompt"
+                        for col in df.columns:
+                            if col == "index":
+                                continue
+                            if cols:
+                                if col not in cols:
+                                    continue
+                            # Skip if the value is None or NaN
+                            if pd.notna(row[col]) and str(row[col]):
+                                log_every_n(10000, f"{meta_key}: {row[col]}")
+                                meta_l.append({
+                                    "image_path": f"danbooru://{index_val}" if not os.path.exists(index_val) and "://" not in str(index_val) else str(index_val),
+                                    meta_key: str(row[col])  # Cast to str in case it's not a string
+                                })
+                else:
+                    raise NotImplementedError(
+                        f'Unknown meta file extension: "{meta_ext}". '
+                        f"Currently, .json, .jsonl, .parquet (with index column + caption columns) are supported. "
+                        "If you are using a supported format, please set the file extension so that the proper parsing "
+                        "routine can be called."
+                    )
+            logger.info(f"{meta_path}, type{meta_type}: len {len(meta_l)}")
+            if "ratio" in meta:
+                random.seed(0)
+                meta_l = random.sample(meta_l, int(len(meta_l) * meta["ratio"]))
+                logger.info(f"sample (ratio = {meta['ratio']}) {len(meta_l)} items")
+            if "root" in meta:
+                for item in meta_l:
+                    for path_key in ["path", "image_url", "image", "image_path"]:
+                        if path_key in item:
+                            item[path_key] = os.path.join(meta["root"], item[path_key])
+            if meta_type not in group_ann:
+                group_ann[meta_type] = []
+            group_ann[meta_type] += meta_l
+        ann = sum(list(group_ann.values()), start=[])
+        group_indice_range = {}
+        start_pos = 0
+        for meta_type, meta_l in group_ann.items():
+            group_indice_range[meta_type] = [start_pos, start_pos + len(meta_l)]
+            start_pos = start_pos + len(meta_l)
+        return ann, group_indice_range
+    def _collect_annotations_and_save_to_cache(self, cache_dir):
+        if (Path(cache_dir) / "data.h5").exists() and (Path(cache_dir) / "ready").exists():
+            # off-the-shelf annotation cache exists
+            warnings.warn(
+                f"Use existing h5 data cache: {Path(cache_dir)}\n"
+                f"Note: if the actual data defined by the data config has changed since your last run, "
+                f"please delete the cache manually and re-run this experiment, or the data actually used "
+                f"will not be updated"
+            )
+            return
+        Path(cache_dir).mkdir(parents=True, exist_ok=True)
+        ann, group_indice_range = self._collect_annotations()
+        # when cache on disk, rank0 saves items to an h5 file
+        serialized_ann = [json.dumps(_) for _ in ann]
+        logger.info(f"start to build data cache to: {Path(cache_dir)}")
+        with h5py.File(Path(cache_dir) / "data.h5", "w") as file:
+            dt = h5py.vlen_dtype(str)
+            h5_ann = file.create_dataset("ann", (len(serialized_ann),), dtype=dt)
+            h5_ann[:] = serialized_ann
+            file.create_dataset("group_indice_range", data=json.dumps(group_indice_range))
+        with open(Path(cache_dir) / "ready", "w") as f:
+            f.write("ready")
+        logger.info(f"data cache built")
+    @staticmethod
+    def _get_cache_dir(config_path):
+        config_identifier = config_path
+        disallowed_chars = ["/", "\\", ".", "?", "!"]
+        for _ in disallowed_chars:
+            config_identifier = config_identifier.replace(_, "-")
+        cache_dir = f"./accessory_data_cache/{config_identifier}"
+        return cache_dir
+    @staticmethod
+    def _load_annotations_from_cache(cache_dir):
+        while not (Path(cache_dir) / "ready").exists():
+            # cache has not yet been completed by rank 0
+            assert int(os.environ["LOCAL_RANK"]) != 0
+            sleep(1)
+        cache_file = h5py.File(Path(cache_dir) / "data.h5", "r")
+        annotations = cache_file["ann"]
+        group_indice_range = json.loads(cache_file["group_indice_range"].asstr()[()])
+        return annotations, group_indice_range
+    def get_item_func(self, index):
+        data_item = self.ann[index]
+        if self.cache_on_disk:
+            data_item = json.loads(data_item)
+        else:
+            data_item = copy.deepcopy(data_item)
+        return self.item_processor.process_item(data_item, training_mode=True)
+    def __getitem__(self, index):
+        try:
+            return self.get_item_func(index)
+        except Exception as e:
+            if isinstance(e, DataNoReportException):
+                pass
+            elif isinstance(e, DataBriefReportException):
+                logger.info(e)
+            else:
+                logger.info(
+                    f"Item {index} errored, annotation:\n"
+                    f"{self.ann[index]}\n"
+                    f"Error:\n"
+                    f"{traceback.format_exc()}"
+                )
+            for group_name, indices_this_group in self.group_indices.items():
+                if indices_this_group[0] <= index <= indices_this_group[-1]:
+                    if index == indices_this_group[0]:
+                        new_index = indices_this_group[-1]
+                    else:
+                        new_index = index - 1
+                    return self[new_index]
+            raise RuntimeError
+    def groups(self):
+        return list(self.group_indices.values())

grad_norm.py ADDED Viewed

	@@ -0,0 +1,60 @@

+from typing import Dict
+import fairscale.nn.model_parallel.initialize as fs_init
+from fairscale.nn.model_parallel.layers import ColumnParallelLinear, ParallelEmbedding, RowParallelLinear
+import torch
+import torch.distributed as dist
+import torch.nn as nn
+def get_model_parallel_dim_dict(model: nn.Module) -> Dict[str, int]:
+    ret_dict = {}
+    for module_name, module in model.named_modules():
+        def param_fqn(param_name):
+            return param_name if module_name == "" else module_name + "." + param_name
+        if isinstance(module, ColumnParallelLinear):
+            ret_dict[param_fqn("weight")] = 0
+            if module.bias is not None:
+                ret_dict[param_fqn("bias")] = 0
+        elif isinstance(module, RowParallelLinear):
+            ret_dict[param_fqn("weight")] = 1
+            if module.bias is not None:
+                ret_dict[param_fqn("bias")] = -1
+        elif isinstance(module, ParallelEmbedding):
+            ret_dict[param_fqn("weight")] = 1
+        else:
+            for param_name, param in module.named_parameters(recurse=False):
+                ret_dict[param_fqn(param_name)] = -1
+    return ret_dict
+def calculate_l2_grad_norm(
+    model: nn.Module,
+    model_parallel_dim_dict: Dict[str, int],
+) -> float:
+    mp_norm_sq = torch.tensor(0.0, dtype=torch.float32, device="cuda")
+    non_mp_norm_sq = torch.tensor(0.0, dtype=torch.float32, device="cuda")
+    for name, param in model.named_parameters():
+        if param.grad is None:
+            continue
+        name = ".".join(x for x in name.split(".") if not x.startswith("_"))
+        assert name in model_parallel_dim_dict
+        if model_parallel_dim_dict[name] < 0:
+            non_mp_norm_sq += param.grad.norm(dtype=torch.float32) ** 2
+        else:
+            mp_norm_sq += param.grad.norm(dtype=torch.float32) ** 2
+    dist.all_reduce(mp_norm_sq)
+    dist.all_reduce(non_mp_norm_sq)
+    non_mp_norm_sq /= fs_init.get_model_parallel_world_size()
+    return (mp_norm_sq.item() + non_mp_norm_sq.item()) ** 0.5
+def scale_grad(model: nn.Module, factor: float) -> None:
+    for param in model.parameters():
+        if param.grad is not None:
+            param.grad.mul_(factor)

imgproc.py ADDED Viewed

	@@ -0,0 +1,80 @@

+import random
+from PIL import Image
+import numpy as np
+def center_crop_arr(pil_image, image_size):
+    """
+    Center cropping implementation from ADM.
+    https://github.com/openai/guided-diffusion/blob/8fb3ad9197f16bbc40620447b2742e13458d2831/guided_diffusion/image_datasets.py#L126
+    """
+    while min(*pil_image.size) >= 2 * image_size:
+        pil_image = pil_image.resize(tuple(x // 2 for x in pil_image.size), resample=Image.BOX)
+    scale = image_size / min(*pil_image.size)
+    pil_image = pil_image.resize(tuple(round(x * scale) for x in pil_image.size), resample=Image.BICUBIC)
+    arr = np.array(pil_image)
+    crop_y = (arr.shape[0] - image_size) // 2
+    crop_x = (arr.shape[1] - image_size) // 2
+    return Image.fromarray(arr[crop_y : crop_y + image_size, crop_x : crop_x + image_size])
+def center_crop(pil_image, crop_size):
+    while pil_image.size[0] >= 2 * crop_size[0] and pil_image.size[1] >= 2 * crop_size[1]:
+        pil_image = pil_image.resize(tuple(x // 2 for x in pil_image.size), resample=Image.BOX)
+    scale = max(crop_size[0] / pil_image.size[0], crop_size[1] / pil_image.size[1])
+    pil_image = pil_image.resize(tuple(round(x * scale) for x in pil_image.size), resample=Image.BICUBIC)
+    # crop_left = random.randint(0, pil_image.size[0] - crop_size[0])
+    # crop_upper = random.randint(0, pil_image.size[1] - crop_size[1])
+    crop_left = (pil_image.size[0] - crop_size[0]) // 2
+    crop_upper = (pil_image.size[1] - crop_size[1]) // 2
+    crop_right = crop_left + crop_size[0]
+    crop_lower = crop_upper + crop_size[1]
+    return pil_image.crop(box=(crop_left, crop_upper, crop_right, crop_lower))
+def var_center_crop(pil_image, crop_size_list, random_top_k=4):
+    w, h = pil_image.size
+    rem_percent = [min(cw / w, ch / h) / max(cw / w, ch / h) for cw, ch in crop_size_list]
+    crop_size = random.choice(
+        sorted(((x, y) for x, y in zip(rem_percent, crop_size_list)), reverse=True)[:random_top_k]
+    )[1]
+    return center_crop(pil_image, crop_size)
+def var_center_crop_128(pil_image, crop_size_list, random_top_k=4):
+    w, h = pil_image.size
+    rem_percent = [min(cw / w, ch / h) / max(cw / w, ch / h) for cw, ch in crop_size_list]
+    crop_size = random.choice(
+        sorted(((x, y) for x, y in zip(rem_percent, crop_size_list)), reverse=True)[:random_top_k]
+    )[1]
+    breakpoint()
+    return center_crop(pil_image, (((w//128)*128), ((h//128)*128)))
+def generate_crop_size_list(num_patches, patch_size, max_ratio=4.0):
+    assert max_ratio >= 1.0
+    crop_size_list = []
+    wp, hp = num_patches, 1
+    while wp > 0:
+        if max(wp, hp) / min(wp, hp) <= max_ratio:
+            if ((wp * patch_size)//32) % 2 == 0 and  ((hp * patch_size)//32) % 2 == 0:
+                crop_size_list.append((wp * patch_size, hp * patch_size))
+        if (hp + 1) * wp <= num_patches:
+            hp += 1
+        else:
+            wp -= 1
+    return crop_size_list
+def to_rgb_if_rgba(img: Image.Image):
+    if img.mode.upper() == "RGBA":
+        rgb_img = Image.new("RGB", img.size, (255, 255, 255))
+        rgb_img.paste(img, mask=img.split()[3])  # 3 is the alpha channel
+        return rgb_img
+    elif img.mode.upper() == "P":
+        return img.convert('RGB')
+    else:
+        return img

models/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ from .model import NextDiT_2B_GQA_patch2_Adaln_Refiner, NextDiT_3B_GQA_patch2_Adaln_Refiner, NextDiT_4B_GQA_patch2_Adaln_Refiner, NextDiT_7B_GQA_patch2_Adaln_Refiner

models/components.py ADDED Viewed

	@@ -0,0 +1,54 @@

+import warnings
+import torch
+import torch.nn as nn
+try:
+    from apex.normalization import FusedRMSNorm as RMSNorm
+except ImportError:
+    warnings.warn("Cannot import apex RMSNorm, switch to vanilla implementation")
+    class RMSNorm(torch.nn.Module):
+        def __init__(self, dim: int, eps: float = 1e-6):
+            """
+            Initialize the RMSNorm normalization layer.
+            Args:
+                dim (int): The dimension of the input tensor.
+                eps (float, optional): A small value added to the denominator for numerical stability. Default is 1e-6.
+            Attributes:
+                eps (float): A small value added to the denominator for numerical stability.
+                weight (nn.Parameter): Learnable scaling parameter.
+            """
+            super().__init__()
+            self.eps = eps
+            self.weight = nn.Parameter(torch.ones(dim))
+        def _norm(self, x):
+            """
+            Apply the RMSNorm normalization to the input tensor.
+            Args:
+                x (torch.Tensor): The input tensor.
+            Returns:
+                torch.Tensor: The normalized tensor.
+            """
+            return x * torch.rsqrt(x.pow(2).mean(-1, keepdim=True) + self.eps)
+        def forward(self, x):
+            """
+            Forward pass through the RMSNorm layer.
+            Args:
+                x (torch.Tensor): The input tensor.
+            Returns:
+                torch.Tensor: The output tensor after applying RMSNorm.
+            """
+            output = self._norm(x.float()).type_as(x)
+            return output * self.weight

models/model.py ADDED Viewed

	@@ -0,0 +1,930 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+# --------------------------------------------------------
+# References:
+# GLIDE: https://github.com/openai/glide-text2im
+# MAE: https://github.com/facebookresearch/mae/blob/main/models_mae.py
+# --------------------------------------------------------
+import math
+from typing import List, Optional, Tuple
+from flash_attn import flash_attn_varlen_func
+from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input  # noqa
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from .components import RMSNorm
+def modulate(x, scale):
+    return x * (1 + scale.unsqueeze(1))
+#############################################################################
+#             Embedding Layers for Timesteps and Class Labels               #
+#############################################################################
+class TimestepEmbedder(nn.Module):
+    """
+    Embeds scalar timesteps into vector representations.
+    """
+    def __init__(self, hidden_size, frequency_embedding_size=256):
+        super().__init__()
+        self.mlp = nn.Sequential(
+            nn.Linear(
+                frequency_embedding_size,
+                hidden_size,
+                bias=True,
+            ),
+            nn.SiLU(),
+            nn.Linear(
+                hidden_size,
+                hidden_size,
+                bias=True,
+            ),
+        )
+        nn.init.normal_(self.mlp[0].weight, std=0.02)
+        nn.init.zeros_(self.mlp[0].bias)
+        nn.init.normal_(self.mlp[2].weight, std=0.02)
+        nn.init.zeros_(self.mlp[2].bias)
+        self.frequency_embedding_size = frequency_embedding_size
+    @staticmethod
+    def timestep_embedding(t, dim, max_period=10000):
+        """
+        Create sinusoidal timestep embeddings.
+        :param t: a 1-D Tensor of N indices, one per batch element.
+                          These may be fractional.
+        :param dim: the dimension of the output.
+        :param max_period: controls the minimum frequency of the embeddings.
+        :return: an (N, D) Tensor of positional embeddings.
+        """
+        # https://github.com/openai/glide-text2im/blob/main/glide_text2im/nn.py
+        half = dim // 2
+        freqs = torch.exp(-math.log(max_period) * torch.arange(start=0, end=half, dtype=torch.float32) / half).to(
+            device=t.device
+        )
+        args = t[:, None].float() * freqs[None]
+        embedding = torch.cat([torch.cos(args), torch.sin(args)], dim=-1)
+        if dim % 2:
+            embedding = torch.cat([embedding, torch.zeros_like(embedding[:, :1])], dim=-1)
+        return embedding
+    def forward(self, t):
+        t_freq = self.timestep_embedding(t, self.frequency_embedding_size)
+        t_emb = self.mlp(t_freq.to(self.mlp[0].weight.dtype))
+        return t_emb
+#############################################################################
+#                               Core NextDiT Model                              #
+#############################################################################
+class JointAttention(nn.Module):
+    """Multi-head attention module."""
+    def __init__(
+        self,
+        dim: int,
+        n_heads: int,
+        n_kv_heads: Optional[int],
+        qk_norm: bool,
+    ):
+        """
+        Initialize the Attention module.
+        Args:
+            dim (int): Number of input dimensions.
+            n_heads (int): Number of heads.
+            n_kv_heads (Optional[int]): Number of kv heads, if using GQA.
+        """
+        super().__init__()
+        self.n_kv_heads = n_heads if n_kv_heads is None else n_kv_heads
+        self.n_local_heads = n_heads
+        self.n_local_kv_heads = self.n_kv_heads
+        self.n_rep = self.n_local_heads // self.n_local_kv_heads
+        self.head_dim = dim // n_heads
+        self.qkv = nn.Linear(
+            dim,
+            (n_heads + self.n_kv_heads + self.n_kv_heads) * self.head_dim,
+            bias=False,
+        )
+        nn.init.xavier_uniform_(self.qkv.weight)
+        self.out = nn.Linear(
+            n_heads * self.head_dim,
+            dim,
+            bias=False,
+        )
+        nn.init.xavier_uniform_(self.out.weight)
+        if qk_norm:
+            self.q_norm = RMSNorm(self.head_dim)
+            self.k_norm = RMSNorm(self.head_dim)
+        else:
+            self.q_norm = self.k_norm = nn.Identity()
+    @staticmethod
+    def apply_rotary_emb(
+        x_in: torch.Tensor,
+        freqs_cis: torch.Tensor,
+    ) -> torch.Tensor:
+        """
+        Apply rotary embeddings to input tensors using the given frequency
+        tensor.
+        This function applies rotary embeddings to the given query 'xq' and
+        key 'xk' tensors using the provided frequency tensor 'freqs_cis'. The
+        input tensors are reshaped as complex numbers, and the frequency tensor
+        is reshaped for broadcasting compatibility. The resulting tensors
+        contain rotary embeddings and are returned as real tensors.
+        Args:
+            x_in (torch.Tensor): Query or Key tensor to apply rotary embeddings.
+            freqs_cis (torch.Tensor): Precomputed frequency tensor for complex
+                exponentials.
+        Returns:
+            Tuple[torch.Tensor, torch.Tensor]: Tuple of modified query tensor
+                and key tensor with rotary embeddings.
+        """
+        with torch.cuda.amp.autocast(enabled=False):
+            x = torch.view_as_complex(x_in.float().reshape(*x_in.shape[:-1], -1, 2))
+            freqs_cis = freqs_cis.unsqueeze(2)
+            x_out = torch.view_as_real(x * freqs_cis).flatten(3)
+            return x_out.type_as(x_in)
+    # copied from huggingface modeling_llama.py
+    def _upad_input(self, query_layer, key_layer, value_layer, attention_mask, query_length):
+        def _get_unpad_data(attention_mask):
+            seqlens_in_batch = attention_mask.sum(dim=-1, dtype=torch.int32)
+            indices = torch.nonzero(attention_mask.flatten(), as_tuple=False).flatten()
+            max_seqlen_in_batch = seqlens_in_batch.max().item()
+            cu_seqlens = F.pad(torch.cumsum(seqlens_in_batch, dim=0, dtype=torch.int32), (1, 0))
+            return (
+                indices,
+                cu_seqlens,
+                max_seqlen_in_batch,
+            )
+        indices_k, cu_seqlens_k, max_seqlen_in_batch_k = _get_unpad_data(attention_mask)
+        batch_size, kv_seq_len, num_key_value_heads, head_dim = key_layer.shape
+        key_layer = index_first_axis(
+            key_layer.reshape(batch_size * kv_seq_len, num_key_value_heads, head_dim),
+            indices_k,
+        )
+        value_layer = index_first_axis(
+            value_layer.reshape(batch_size * kv_seq_len, num_key_value_heads, head_dim),
+            indices_k,
+        )
+        if query_length == kv_seq_len:
+            query_layer = index_first_axis(
+                query_layer.reshape(batch_size * kv_seq_len, self.n_local_heads, head_dim),
+                indices_k,
+            )
+            cu_seqlens_q = cu_seqlens_k
+            max_seqlen_in_batch_q = max_seqlen_in_batch_k
+            indices_q = indices_k
+        elif query_length == 1:
+            max_seqlen_in_batch_q = 1
+            cu_seqlens_q = torch.arange(
+                batch_size + 1, dtype=torch.int32, device=query_layer.device
+            )  # There is a memcpy here, that is very bad.
+            indices_q = cu_seqlens_q[:-1]
+            query_layer = query_layer.squeeze(1)
+        else:
+            # The -q_len: slice assumes left padding.
+            attention_mask = attention_mask[:, -query_length:]
+            query_layer, indices_q, cu_seqlens_q, max_seqlen_in_batch_q = unpad_input(query_layer, attention_mask)
+        return (
+            query_layer,
+            key_layer,
+            value_layer,
+            indices_q,
+            (cu_seqlens_q, cu_seqlens_k),
+            (max_seqlen_in_batch_q, max_seqlen_in_batch_k),
+        )
+    def forward(
+        self,
+        x: torch.Tensor,
+        x_mask: torch.Tensor,
+        freqs_cis: torch.Tensor,
+    ) -> torch.Tensor:
+        """
+        Args:
+            x:
+            x_mask:
+            freqs_cis:
+        Returns:
+        """
+        bsz, seqlen, _ = x.shape
+        dtype = x.dtype
+        xq, xk, xv = torch.split(
+            self.qkv(x),
+            [
+                self.n_local_heads * self.head_dim,
+                self.n_local_kv_heads * self.head_dim,
+                self.n_local_kv_heads * self.head_dim,
+            ],
+            dim=-1,
+        )
+        xq = xq.view(bsz, seqlen, self.n_local_heads, self.head_dim)
+        xk = xk.view(bsz, seqlen, self.n_local_kv_heads, self.head_dim)
+        xv = xv.view(bsz, seqlen, self.n_local_kv_heads, self.head_dim)
+        xq = self.q_norm(xq)
+        xk = self.k_norm(xk)
+        xq = JointAttention.apply_rotary_emb(xq, freqs_cis=freqs_cis)
+        xk = JointAttention.apply_rotary_emb(xk, freqs_cis=freqs_cis)
+        xq, xk = xq.to(dtype), xk.to(dtype)
+        softmax_scale = math.sqrt(1 / self.head_dim)
+        if dtype in [torch.float16, torch.bfloat16]:
+            # begin var_len flash attn
+            (
+                query_states,
+                key_states,
+                value_states,
+                indices_q,
+                cu_seq_lens,
+                max_seq_lens,
+            ) = self._upad_input(xq, xk, xv, x_mask, seqlen)
+            cu_seqlens_q, cu_seqlens_k = cu_seq_lens
+            max_seqlen_in_batch_q, max_seqlen_in_batch_k = max_seq_lens
+            attn_output_unpad = flash_attn_varlen_func(
+                query_states,
+                key_states,
+                value_states,
+                cu_seqlens_q=cu_seqlens_q,
+                cu_seqlens_k=cu_seqlens_k,
+                max_seqlen_q=max_seqlen_in_batch_q,
+                max_seqlen_k=max_seqlen_in_batch_k,
+                dropout_p=0.0,
+                causal=False,
+                softmax_scale=softmax_scale,
+            )
+            output = pad_input(attn_output_unpad, indices_q, bsz, seqlen)
+            # end var_len_flash_attn
+        else:
+            n_rep = self.n_local_heads // self.n_local_kv_heads
+            if n_rep >= 1:
+                xk = xk.unsqueeze(3).repeat(1, 1, 1, n_rep, 1).flatten(2, 3)
+                xv = xv.unsqueeze(3).repeat(1, 1, 1, n_rep, 1).flatten(2, 3)
+            output = (
+                F.scaled_dot_product_attention(
+                    xq.permute(0, 2, 1, 3),
+                    xk.permute(0, 2, 1, 3),
+                    xv.permute(0, 2, 1, 3),
+                    attn_mask=x_mask.bool().view(bsz, 1, 1, seqlen).expand(-1, self.n_local_heads, seqlen, -1),
+                    scale=softmax_scale,
+                )
+                .permute(0, 2, 1, 3)
+                .to(dtype)
+            )
+        output = output.flatten(-2)
+        return self.out(output)
+class FeedForward(nn.Module):
+    def __init__(
+        self,
+        dim: int,
+        hidden_dim: int,
+        multiple_of: int,
+        ffn_dim_multiplier: Optional[float],
+    ):
+        """
+        Initialize the FeedForward module.
+        Args:
+            dim (int): Input dimension.
+            hidden_dim (int): Hidden dimension of the feedforward layer.
+            multiple_of (int): Value to ensure hidden dimension is a multiple
+                of this value.
+            ffn_dim_multiplier (float, optional): Custom multiplier for hidden
+                dimension. Defaults to None.
+        """
+        super().__init__()
+        # custom dim factor multiplier
+        if ffn_dim_multiplier is not None:
+            hidden_dim = int(ffn_dim_multiplier * hidden_dim)
+        hidden_dim = multiple_of * ((hidden_dim + multiple_of - 1) // multiple_of)
+        self.w1 = nn.Linear(
+            dim,
+            hidden_dim,
+            bias=False,
+        )
+        nn.init.xavier_uniform_(self.w1.weight)
+        self.w2 = nn.Linear(
+            hidden_dim,
+            dim,
+            bias=False,
+        )
+        nn.init.xavier_uniform_(self.w2.weight)
+        self.w3 = nn.Linear(
+            dim,
+            hidden_dim,
+            bias=False,
+        )
+        nn.init.xavier_uniform_(self.w3.weight)
+    # @torch.compile
+    def _forward_silu_gating(self, x1, x3):
+        return F.silu(x1) * x3
+    def forward(self, x):
+        return self.w2(self._forward_silu_gating(self.w1(x), self.w3(x)))
+class JointTransformerBlock(nn.Module):
+    def __init__(
+        self,
+        layer_id: int,
+        dim: int,
+        n_heads: int,
+        n_kv_heads: int,
+        multiple_of: int,
+        ffn_dim_multiplier: float,
+        norm_eps: float,
+        qk_norm: bool,
+        modulation=True
+    ) -> None:
+        """
+        Initialize a TransformerBlock.
+        Args:
+            layer_id (int): Identifier for the layer.
+            dim (int): Embedding dimension of the input features.
+            n_heads (int): Number of attention heads.
+            n_kv_heads (Optional[int]): Number of attention heads in key and
+                value features (if using GQA), or set to None for the same as
+                query.
+            multiple_of (int):
+            ffn_dim_multiplier (float):
+            norm_eps (float):
+        """
+        super().__init__()
+        self.dim = dim
+        self.head_dim = dim // n_heads
+        self.attention = JointAttention(dim, n_heads, n_kv_heads, qk_norm)
+        self.feed_forward = FeedForward(
+            dim=dim,
+            hidden_dim=4 * dim,
+            multiple_of=multiple_of,
+            ffn_dim_multiplier=ffn_dim_multiplier,
+        )
+        self.layer_id = layer_id
+        self.attention_norm1 = RMSNorm(dim, eps=norm_eps)
+        self.ffn_norm1 = RMSNorm(dim, eps=norm_eps)
+        self.attention_norm2 = RMSNorm(dim, eps=norm_eps)
+        self.ffn_norm2 = RMSNorm(dim, eps=norm_eps)
+        self.modulation = modulation
+        if modulation:
+            self.adaLN_modulation = nn.Sequential(
+                nn.SiLU(),
+                nn.Linear(
+                    min(dim, 1024),
+                    4 * dim,
+                    bias=True,
+                ),
+            )
+            nn.init.zeros_(self.adaLN_modulation[1].weight)
+            nn.init.zeros_(self.adaLN_modulation[1].bias)
+    def forward(
+        self,
+        x: torch.Tensor,
+        x_mask: torch.Tensor,
+        freqs_cis: torch.Tensor,
+        adaln_input: Optional[torch.Tensor]=None,
+    ):
+        """
+        Perform a forward pass through the TransformerBlock.
+        Args:
+            x (torch.Tensor): Input tensor.
+            freqs_cis (torch.Tensor): Precomputed cosine and sine frequencies.
+        Returns:
+            torch.Tensor: Output tensor after applying attention and
+                feedforward layers.
+        """
+        if self.modulation:
+            assert adaln_input is not None
+            scale_msa, gate_msa, scale_mlp, gate_mlp = self.adaLN_modulation(adaln_input).chunk(4, dim=1)
+            x = x + gate_msa.unsqueeze(1).tanh() * self.attention_norm2(
+                self.attention(
+                    modulate(self.attention_norm1(x), scale_msa),
+                    x_mask,
+                    freqs_cis,
+                )
+            )
+            x = x + gate_mlp.unsqueeze(1).tanh() * self.ffn_norm2(
+                self.feed_forward(
+                    modulate(self.ffn_norm1(x), scale_mlp),
+                )
+            )
+        else:
+            assert adaln_input is None
+            x = x + self.attention_norm2(
+                self.attention(
+                    self.attention_norm1(x),
+                    x_mask,
+                    freqs_cis,
+                )
+            )
+            x = x + self.ffn_norm2(
+                self.feed_forward(
+                    self.ffn_norm1(x),
+                )
+            )
+        return x
+class FinalLayer(nn.Module):
+    """
+    The final layer of NextDiT.
+    """
+    def __init__(self, hidden_size, patch_size, out_channels):
+        super().__init__()
+        self.norm_final = nn.LayerNorm(
+            hidden_size,
+            elementwise_affine=False,
+            eps=1e-6,
+        )
+        self.linear = nn.Linear(
+            hidden_size,
+            patch_size * patch_size * out_channels,
+            bias=True,
+        )
+        nn.init.zeros_(self.linear.weight)
+        nn.init.zeros_(self.linear.bias)
+        self.adaLN_modulation = nn.Sequential(
+            nn.SiLU(),
+            nn.Linear(
+                min(hidden_size, 1024),
+                hidden_size,
+                bias=True,
+            ),
+        )
+        nn.init.zeros_(self.adaLN_modulation[1].weight)
+        nn.init.zeros_(self.adaLN_modulation[1].bias)
+    def forward(self, x, c):
+        scale = self.adaLN_modulation(c)
+        x = modulate(self.norm_final(x), scale)
+        x = self.linear(x)
+        return x
+class RopeEmbedder:
+    def __init__(
+        self, theta: float = 10000.0, axes_dims: List[int] = (16, 56, 56), axes_lens: List[int] = (1, 512, 512)
+    ):
+        super().__init__()
+        self.theta = theta
+        self.axes_dims = axes_dims
+        self.axes_lens = axes_lens
+        self.freqs_cis = NextDiT.precompute_freqs_cis(self.axes_dims, self.axes_lens, theta=self.theta)
+    def __call__(self, ids: torch.Tensor):
+        self.freqs_cis = [freqs_cis.to(ids.device) for freqs_cis in self.freqs_cis]
+        result = []
+        for i in range(len(self.axes_dims)):
+            # import torch.distributed as dist
+            # if not dist.is_initialized() or dist.get_rank() == 0:
+            #     import pdb
+            #     pdb.set_trace()
+            index = ids[:, :, i:i+1].repeat(1, 1, self.freqs_cis[i].shape[-1]).to(torch.int64)
+            result.append(torch.gather(self.freqs_cis[i].unsqueeze(0).repeat(index.shape[0], 1, 1), dim=1, index=index))
+        return torch.cat(result, dim=-1)
+class NextDiT(nn.Module):
+    """
+    Diffusion model with a Transformer backbone.
+    """
+    def __init__(
+        self,
+        patch_size: int = 2,
+        in_channels: int = 4,
+        dim: int = 4096,
+        n_layers: int = 32,
+        n_refiner_layers: int = 2,
+        n_heads: int = 32,
+        n_kv_heads: Optional[int] = None,
+        multiple_of: int = 256,
+        ffn_dim_multiplier: Optional[float] = None,
+        norm_eps: float = 1e-5,
+        qk_norm: bool = False,
+        cap_feat_dim: int = 5120,
+        axes_dims: List[int] = (16, 56, 56),
+        axes_lens: List[int] = (1, 512, 512),
+    ) -> None:
+        super().__init__()
+        self.in_channels = in_channels
+        self.out_channels = in_channels
+        self.patch_size = patch_size
+        self.x_embedder = nn.Linear(
+            in_features=patch_size * patch_size * in_channels,
+            out_features=dim,
+            bias=True,
+        )
+        nn.init.xavier_uniform_(self.x_embedder.weight)
+        nn.init.constant_(self.x_embedder.bias, 0.0)
+        self.noise_refiner = nn.ModuleList(
+            [
+                JointTransformerBlock(
+                    layer_id,
+                    dim,
+                    n_heads,
+                    n_kv_heads,
+                    multiple_of,
+                    ffn_dim_multiplier,
+                    norm_eps,
+                    qk_norm,
+                    modulation=True,
+                )
+                for layer_id in range(n_refiner_layers)
+            ]
+        )
+        self.context_refiner = nn.ModuleList(
+            [
+                JointTransformerBlock(
+                    layer_id,
+                    dim,
+                    n_heads,
+                    n_kv_heads,
+                    multiple_of,
+                    ffn_dim_multiplier,
+                    norm_eps,
+                    qk_norm,
+                    modulation=False,
+                )
+                for layer_id in range(n_refiner_layers)
+            ]
+        )
+        self.t_embedder = TimestepEmbedder(min(dim, 1024))
+        self.cap_embedder = nn.Sequential(
+            RMSNorm(cap_feat_dim, eps=norm_eps),
+            nn.Linear(
+                cap_feat_dim,
+                dim,
+                bias=True,
+            ),
+        )
+        nn.init.trunc_normal_(self.cap_embedder[1].weight, std=0.02)
+        # nn.init.zeros_(self.cap_embedder[1].weight)
+        nn.init.zeros_(self.cap_embedder[1].bias)
+        self.layers = nn.ModuleList(
+            [
+                JointTransformerBlock(
+                    layer_id,
+                    dim,
+                    n_heads,
+                    n_kv_heads,
+                    multiple_of,
+                    ffn_dim_multiplier,
+                    norm_eps,
+                    qk_norm,
+                )
+                for layer_id in range(n_layers)
+            ]
+        )
+        self.norm_final = RMSNorm(dim, eps=norm_eps)
+        self.final_layer = FinalLayer(dim, patch_size, self.out_channels)
+        assert (dim // n_heads) == sum(axes_dims)
+        self.axes_dims = axes_dims
+        self.axes_lens = axes_lens
+        self.rope_embedder = RopeEmbedder(axes_dims=axes_dims, axes_lens=axes_lens)
+        self.dim = dim
+        self.n_heads = n_heads
+    def unpatchify(
+        self, x: torch.Tensor, img_size: List[Tuple[int, int]], cap_size: List[int], return_tensor=False
+    ) -> List[torch.Tensor]:
+        """
+        x: (N, T, patch_size**2 * C)
+        imgs: (N, H, W, C)
+        """
+        pH = pW = self.patch_size
+        imgs = []
+        for i in range(x.size(0)):
+            H, W = img_size[i]
+            begin = cap_size[i]
+            end = begin + (H // pH) * (W // pW)
+            imgs.append(
+                x[i][begin:end]
+                .view(H // pH, W // pW, pH, pW, self.out_channels)
+                .permute(4, 0, 2, 1, 3)
+                .flatten(3, 4)
+                .flatten(1, 2)
+            )
+        if return_tensor:
+            imgs = torch.stack(imgs, dim=0)
+        return imgs
+    def patchify_and_embed(
+        self, x: List[torch.Tensor] | torch.Tensor, cap_feats: torch.Tensor, cap_mask: torch.Tensor, t: torch.Tensor
+    ) -> Tuple[torch.Tensor, torch.Tensor, List[Tuple[int, int]], List[int], torch.Tensor]:
+        bsz = len(x)
+        pH = pW = self.patch_size
+        device = x[0].device
+        l_effective_cap_len = cap_mask.sum(dim=1).tolist()
+        img_sizes = [(img.size(1), img.size(2)) for img in x]
+        l_effective_img_len = [(H // pH) * (W // pW) for (H, W) in img_sizes]
+        max_seq_len = max(
+            (cap_len+img_len for cap_len, img_len in zip(l_effective_cap_len, l_effective_img_len))
+        )
+        max_cap_len = max(l_effective_cap_len)
+        max_img_len = max(l_effective_img_len)
+        position_ids = torch.zeros(bsz, max_seq_len, 3, dtype=torch.int32, device=device)
+        for i in range(bsz):
+            cap_len = l_effective_cap_len[i]
+            img_len = l_effective_img_len[i]
+            H, W = img_sizes[i]
+            H_tokens, W_tokens = H // pH, W // pW
+            assert H_tokens * W_tokens == img_len
+            position_ids[i, :cap_len, 0] = torch.arange(cap_len, dtype=torch.int32, device=device)
+            position_ids[i, cap_len:cap_len+img_len, 0] = cap_len
+            row_ids = torch.arange(H_tokens, dtype=torch.int32, device=device).view(-1, 1).repeat(1, W_tokens).flatten()
+            col_ids = torch.arange(W_tokens, dtype=torch.int32, device=device).view(1, -1).repeat(H_tokens, 1).flatten()
+            position_ids[i, cap_len:cap_len+img_len, 1] = row_ids
+            position_ids[i, cap_len:cap_len+img_len, 2] = col_ids
+        freqs_cis = self.rope_embedder(position_ids)
+        # build freqs_cis for cap and image individually
+        cap_freqs_cis_shape = list(freqs_cis.shape)
+        # cap_freqs_cis_shape[1] = max_cap_len
+        cap_freqs_cis_shape[1] = cap_feats.shape[1]
+        cap_freqs_cis = torch.zeros(*cap_freqs_cis_shape, device=device, dtype=freqs_cis.dtype)
+        img_freqs_cis_shape = list(freqs_cis.shape)
+        img_freqs_cis_shape[1] = max_img_len
+        img_freqs_cis = torch.zeros(*img_freqs_cis_shape, device=device, dtype=freqs_cis.dtype)
+        for i in range(bsz):
+            cap_len = l_effective_cap_len[i]
+            img_len = l_effective_img_len[i]
+            cap_freqs_cis[i, :cap_len] = freqs_cis[i, :cap_len]
+            img_freqs_cis[i, :img_len] = freqs_cis[i, cap_len:cap_len+img_len]
+        # refine context
+        for layer in self.context_refiner:
+            cap_feats = layer(cap_feats, cap_mask, cap_freqs_cis)
+        # refine image
+        flat_x = []
+        for i in range(bsz):
+            img = x[i]
+            C, H, W = img.size()
+            img = img.view(C, H // pH, pH, W // pW, pW).permute(1, 3, 2, 4, 0).flatten(2).flatten(0, 1)
+            flat_x.append(img)
+        x = flat_x
+        padded_img_embed = torch.zeros(bsz, max_img_len, x[0].shape[-1], device=device, dtype=x[0].dtype)
+        padded_img_mask = torch.zeros(bsz, max_img_len, dtype=torch.bool, device=device)
+        for i in range(bsz):
+            padded_img_embed[i, :l_effective_img_len[i]] = x[i]
+            padded_img_mask[i, :l_effective_img_len[i]] = True
+        padded_img_embed = self.x_embedder(padded_img_embed)
+        for layer in self.noise_refiner:
+            padded_img_embed = layer(padded_img_embed, padded_img_mask, img_freqs_cis, t)
+        mask = torch.zeros(bsz, max_seq_len, dtype=torch.bool, device=device)
+        padded_full_embed = torch.zeros(bsz, max_seq_len, self.dim, device=device, dtype=x[0].dtype)
+        for i in range(bsz):
+            cap_len = l_effective_cap_len[i]
+            img_len = l_effective_img_len[i]
+            mask[i, :cap_len+img_len] = True
+            padded_full_embed[i, :cap_len] = cap_feats[i, :cap_len]
+            padded_full_embed[i, cap_len:cap_len+img_len] = padded_img_embed[i, :img_len]
+        return padded_full_embed, mask, img_sizes, l_effective_cap_len, freqs_cis
+    def forward(self, x, t, cap_feats, cap_mask):
+        """
+        Forward pass of NextDiT.
+        t: (N,) tensor of diffusion timesteps
+        y: (N,) tensor of text tokens/features
+        """
+        # import torch.distributed as dist
+        # if not dist.is_initialized() or dist.get_rank() == 0:
+        #     import pdb
+        #     pdb.set_trace()
+            # torch.save([x, t, cap_feats, cap_mask], "./fake_input.pt")
+        t = self.t_embedder(t)  # (N, D)
+        adaln_input = t
+        cap_feats = self.cap_embedder(cap_feats)  # (N, L, D)  # todo check if able to batchify w.o. redundant compute
+        x_is_tensor = isinstance(x, torch.Tensor)
+        x, mask, img_size, cap_size, freqs_cis = self.patchify_and_embed(x, cap_feats, cap_mask, t)
+        freqs_cis = freqs_cis.to(x.device)
+        for layer in self.layers:
+            x = layer(x, mask, freqs_cis, adaln_input)
+        x = self.final_layer(x, adaln_input)
+        x = self.unpatchify(x, img_size, cap_size, return_tensor=x_is_tensor)
+        return x
+    def forward_with_cfg(
+        self,
+        x,
+        t,
+        cap_feats,
+        cap_mask,
+        cfg_scale,
+        cfg_trunc=100,
+        renorm_cfg=1
+    ):
+        """
+        Forward pass of NextDiT, but also batches the unconditional forward pass
+        for classifier-free guidance.
+        """
+        # # https://github.com/openai/glide-text2im/blob/main/notebooks/text2im.ipynb
+        half = x[: len(x) // 2]
+        if t[0] < cfg_trunc:
+            combined = torch.cat([half, half], dim=0) # [2, 16, 128, 128]
+            model_out = self.forward(combined, t, cap_feats, cap_mask) # [2, 16, 128, 128]
+            # For exact reproducibility reasons, we apply classifier-free guidance on only
+            # three channels by default. The standard approach to cfg applies it to all channels.
+            # This can be done by uncommenting the following line and commenting-out the line following that.
+            eps, rest = model_out[:, : self.in_channels], model_out[:, self.in_channels :]
+            cond_eps, uncond_eps = torch.split(eps, len(eps) // 2, dim=0)
+            half_eps = uncond_eps + cfg_scale * (cond_eps - uncond_eps)
+            if float(renorm_cfg) > 0.0:
+                ori_pos_norm = torch.linalg.vector_norm(cond_eps
+                        , dim=tuple(range(1, len(cond_eps.shape))), keepdim=True
+                )
+                max_new_norm = ori_pos_norm * float(renorm_cfg)
+                new_pos_norm = torch.linalg.vector_norm(
+                        half_eps, dim=tuple(range(1, len(half_eps.shape))), keepdim=True
+                    )
+                if new_pos_norm >= max_new_norm:
+                    half_eps = half_eps * (max_new_norm / new_pos_norm)
+        else:
+            combined = half
+            model_out = self.forward(combined, t[:len(x) // 2], cap_feats[:len(x) // 2], cap_mask[:len(x) // 2])
+            eps, rest = model_out[:, : self.in_channels], model_out[:, self.in_channels :]
+            half_eps = eps
+        output = torch.cat([half_eps, half_eps], dim=0)
+        return output
+    @staticmethod
+    def precompute_freqs_cis(
+        dim: List[int],
+        end: List[int],
+        theta: float = 10000.0,
+    ):
+        """
+        Precompute the frequency tensor for complex exponentials (cis) with
+        given dimensions.
+        This function calculates a frequency tensor with complex exponentials
+        using the given dimension 'dim' and the end index 'end'. The 'theta'
+        parameter scales the frequencies. The returned tensor contains complex
+        values in complex64 data type.
+        Args:
+            dim (list): Dimension of the frequency tensor.
+            end (list): End index for precomputing frequencies.
+            theta (float, optional): Scaling factor for frequency computation.
+                Defaults to 10000.0.
+        Returns:
+            torch.Tensor: Precomputed frequency tensor with complex
+                exponentials.
+        """
+        freqs_cis = []
+        for i, (d, e) in enumerate(zip(dim, end)):
+            freqs = 1.0 / (theta ** (torch.arange(0, d, 2, dtype=torch.float64, device="cpu") / d))
+            timestep = torch.arange(e, device=freqs.device, dtype=torch.float64)
+            freqs = torch.outer(timestep, freqs).float()
+            freqs_cis_i = torch.polar(torch.ones_like(freqs), freqs).to(torch.complex64)  # complex64
+            freqs_cis.append(freqs_cis_i)
+        return freqs_cis
+    def parameter_count(self) -> int:
+        total_params = 0
+        def _recursive_count_params(module):
+            nonlocal total_params
+            for param in module.parameters(recurse=False):
+                total_params += param.numel()
+            for submodule in module.children():
+                _recursive_count_params(submodule)
+        _recursive_count_params(self)
+        return total_params
+    def get_fsdp_wrap_module_list(self) -> List[nn.Module]:
+        return list(self.layers)
+    def get_checkpointing_wrap_module_list(self) -> List[nn.Module]:
+        return list(self.layers)
+#############################################################################
+#                                 NextDiT Configs                               #
+#############################################################################
+def NextDiT_2B_GQA_patch2_Adaln_Refiner(**kwargs):
+    return NextDiT(
+        patch_size=2,
+        dim=2304,
+        n_layers=26,
+        n_heads=24,
+        n_kv_heads=8,
+        axes_dims=[32, 32, 32],
+        axes_lens=[300, 512, 512],
+        **kwargs
+    )
+def NextDiT_3B_GQA_patch2_Adaln_Refiner(**kwargs):
+    return NextDiT(
+        patch_size=2,
+        dim=2592,
+        n_layers=30,
+        n_heads=24,
+        n_kv_heads=8,
+        axes_dims=[36, 36, 36],
+        axes_lens=[300, 512, 512],
+        **kwargs,
+    )
+def NextDiT_4B_GQA_patch2_Adaln_Refiner(**kwargs):
+    return NextDiT(
+        patch_size=2,
+        dim=2880,
+        n_layers=32,
+        n_heads=24,
+        n_kv_heads=8,
+        axes_dims=[40, 40, 40],
+        axes_lens=[300, 512, 512],
+        **kwargs,
+    )
+def NextDiT_7B_GQA_patch2_Adaln_Refiner(**kwargs):
+    return NextDiT(
+        patch_size=2,
+        dim=3840,
+        n_layers=32,
+        n_heads=32,
+        n_kv_heads=8,
+        axes_dims=[40, 40, 40],
+        axes_lens=[300, 512, 512],
+        **kwargs,
+    )

parallel.py ADDED Viewed

	@@ -0,0 +1,97 @@

+#!/usr/bin/env python
+import os
+import subprocess
+from time import sleep
+import fairscale.nn.model_parallel.initialize as fs_init
+import torch
+import torch.distributed as dist
+from datetime import timedelta
+def _setup_dist_env_from_slurm(args):
+    while not os.environ.get("MASTER_ADDR", ""):
+        os.environ["MASTER_ADDR"] = (
+            subprocess.check_output(
+                "sinfo -Nh -n %s | head -n 1 | awk '{print $1}'" % os.environ["SLURM_NODELIST"],
+                shell=True,
+            )
+            .decode()
+            .strip()
+        )
+        sleep(1)
+    if not os.environ.get("MASTER_PORT"):
+        os.environ["MASTER_PORT"] = str(args.master_port)
+    if not os.environ.get("WORLD_SIZE"):
+        os.environ["WORLD_SIZE"] = os.environ["SLURM_NPROCS"]
+    if not os.environ.get("RANK"):
+        os.environ["RANK"] = os.environ["SLURM_PROCID"]
+    if not os.environ.get("LOCAL_RANK"):
+        os.environ["LOCAL_RANK"] = os.environ["SLURM_LOCALID"]
+    if not os.environ.get("LOCAL_WORLD_SIZE"):
+        os.environ["LOCAL_WORLD_SIZE"] = os.environ["SLURM_NTASKS_PER_NODE"]
+_INTRA_NODE_PROCESS_GROUP, _INTER_NODE_PROCESS_GROUP = None, None
+_LOCAL_RANK, _LOCAL_WORLD_SIZE = -1, -1
+def get_local_rank() -> int:
+    return _LOCAL_RANK
+def get_local_world_size() -> int:
+    return _LOCAL_WORLD_SIZE
+def distributed_init(args):
+    if any([x not in os.environ for x in ["RANK", "WORLD_SIZE", "MASTER_PORT", "MASTER_ADDR"]]):
+        _setup_dist_env_from_slurm(args)
+    dist.init_process_group("nccl", timeout=timedelta(hours=5))
+    fs_init.initialize_model_parallel(args.model_parallel_size)
+    torch.cuda.set_device(dist.get_rank() % torch.cuda.device_count())
+    global _LOCAL_RANK, _LOCAL_WORLD_SIZE
+    _LOCAL_RANK = int(os.environ["LOCAL_RANK"])
+    _LOCAL_WORLD_SIZE = int(os.environ["LOCAL_WORLD_SIZE"])
+    global _INTRA_NODE_PROCESS_GROUP, _INTER_NODE_PROCESS_GROUP
+    local_ranks, local_world_sizes = [
+        torch.empty([dist.get_world_size()], dtype=torch.long, device="cuda") for _ in (0, 1)
+    ]
+    dist.all_gather_into_tensor(local_ranks, torch.tensor(get_local_rank(), device="cuda"))
+    dist.all_gather_into_tensor(local_world_sizes, torch.tensor(get_local_world_size(), device="cuda"))
+    local_ranks, local_world_sizes = local_ranks.tolist(), local_world_sizes.tolist()
+    node_ranks = [[0]]
+    for i in range(1, dist.get_world_size()):
+        if len(node_ranks[-1]) == local_world_sizes[i - 1]:
+            node_ranks.append([])
+        else:
+            assert local_world_sizes[i] == local_world_sizes[i - 1]
+        node_ranks[-1].append(i)
+    for ranks in node_ranks:
+        group = dist.new_group(ranks)
+        if dist.get_rank() in ranks:
+            assert _INTRA_NODE_PROCESS_GROUP is None
+            _INTRA_NODE_PROCESS_GROUP = group
+    assert _INTRA_NODE_PROCESS_GROUP is not None
+    if min(local_world_sizes) == max(local_world_sizes):
+        for i in range(get_local_world_size()):
+            group = dist.new_group(list(range(i, dist.get_world_size(), get_local_world_size())))
+            if i == get_local_rank():
+                assert _INTER_NODE_PROCESS_GROUP is None
+                _INTER_NODE_PROCESS_GROUP = group
+        assert _INTER_NODE_PROCESS_GROUP is not None
+def get_intra_node_process_group():
+    assert _INTRA_NODE_PROCESS_GROUP is not None, "Intra-node process group is not initialized."
+    return _INTRA_NODE_PROCESS_GROUP
+def get_inter_node_process_group():
+    assert _INTRA_NODE_PROCESS_GROUP is not None, "Intra- and inter-node process groups are not initialized."
+    return _INTER_NODE_PROCESS_GROUP

transport/__init__.py ADDED Viewed

	@@ -0,0 +1,70 @@

+from .transport import ModelType, PathType, Sampler, Transport, WeightType
+def create_transport(
+    path_type="Linear",
+    prediction="velocity",
+    loss_weight=None,
+    train_eps=None,
+    sample_eps=None,
+    snr_type="uniform",
+    do_shift=True,
+    seq_len=1024,  # corresponding to 512x512
+):
+    """function for creating Transport object
+    **Note**: model prediction defaults to velocity
+    Args:
+    - path_type: type of path to use; default to linear
+    - learn_score: set model prediction to score
+    - learn_noise: set model prediction to noise
+    - velocity_weighted: weight loss by velocity weight
+    - likelihood_weighted: weight loss by likelihood weight
+    - train_eps: small epsilon for avoiding instability during training
+    - sample_eps: small epsilon for avoiding instability during sampling
+    """
+    if prediction == "noise":
+        model_type = ModelType.NOISE
+    elif prediction == "score":
+        model_type = ModelType.SCORE
+    else:
+        model_type = ModelType.VELOCITY
+    if loss_weight == "velocity":
+        loss_type = WeightType.VELOCITY
+    elif loss_weight == "likelihood":
+        loss_type = WeightType.LIKELIHOOD
+    else:
+        loss_type = WeightType.NONE
+    path_choice = {
+        "Linear": PathType.LINEAR,
+        "GVP": PathType.GVP,
+        "VP": PathType.VP,
+    }
+    path_type = path_choice[path_type]
+    if path_type in [PathType.VP]:
+        train_eps = 1e-5 if train_eps is None else train_eps
+        sample_eps = 1e-3 if train_eps is None else sample_eps
+    elif path_type in [PathType.GVP, PathType.LINEAR] and model_type != ModelType.VELOCITY:
+        train_eps = 1e-3 if train_eps is None else train_eps
+        sample_eps = 1e-3 if train_eps is None else sample_eps
+    else:  # velocity & [GVP, LINEAR] is stable everywhere
+        train_eps = 0
+        sample_eps = 0
+    # create flow state
+    state = Transport(
+        model_type=model_type,
+        path_type=path_type,
+        loss_type=loss_type,
+        train_eps=train_eps,
+        sample_eps=sample_eps,
+        snr_type=snr_type,
+        do_shift=do_shift,
+        seq_len=seq_len,
+    )
+    return state

transport/dpm_solver.py ADDED Viewed

	@@ -0,0 +1,1386 @@

+# Copyright 2024 NVIDIA CORPORATION & AFFILIATES
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# SPDX-License-Identifier: Apache-2.0
+# This file is modified from https://github.com/PixArt-alpha/PixArt-sigma
+import os
+import torch
+from tqdm import tqdm
+class NoiseScheduleFlow:
+    def __init__(
+        self,
+        schedule="discrete_flow",
+    ):
+        """Create a wrapper class for the forward SDE (EDM type)."""
+        self.T = 1
+        self.t0 = 0.001
+        self.schedule = schedule  # ['continuous', 'discrete_flow']
+        self.total_N = 1000
+    def marginal_log_mean_coeff(self, t):
+        """
+        Compute log(alpha_t) of a given continuous-time label t in [0, T].
+        """
+        return torch.log(self.marginal_alpha(t))
+    def marginal_alpha(self, t):
+        """
+        Compute alpha_t of a given continuous-time label t in [0, T].
+        """
+        return 1 - t
+    @staticmethod
+    def marginal_std(t):
+        """
+        Compute sigma_t of a given continuous-time label t in [0, T].
+        """
+        return t
+    def marginal_lambda(self, t):
+        """
+        Compute lambda_t = log(alpha_t) - log(sigma_t) of a given continuous-time label t in [0, T].
+        """
+        log_mean_coeff = self.marginal_log_mean_coeff(t)
+        log_std = torch.log(self.marginal_std(t))
+        return log_mean_coeff - log_std
+    @staticmethod
+    def inverse_lambda(lamb):
+        """
+        Compute the continuous-time label t in [0, T] of a given half-logSNR lambda_t.
+        """
+        return torch.exp(-lamb)
+def model_wrapper(
+    model,
+    noise_schedule,
+    model_type="noise",
+    model_kwargs={},
+    guidance_type="uncond",
+    condition=None,
+    unconditional_condition=None,
+    guidance_scale=1.0,
+    interval_guidance=[0, 1.0],
+    classifier_fn=None,
+    classifier_kwargs={},
+):
+    """Create a wrapper function for the noise prediction model.
+    DPM-Solver needs to solve the continuous-time diffusion ODEs. For DPMs trained on discrete-time labels, we need to
+    firstly wrap the model function to a noise prediction model that accepts the continuous time as the input.
+    We support four types of the diffusion model by setting `model_type`:
+        1. "noise": noise prediction model. (Trained by predicting noise).
+        2. "x_start": data prediction model. (Trained by predicting the data x_0 at time 0).
+        3. "v": velocity prediction model. (Trained by predicting the velocity).
+            The "v" prediction is derivation detailed in Appendix D of [1], and is used in Imagen-Video [2].
+            [1] Salimans, Tim, and Jonathan Ho. "Progressive distillation for fast sampling of diffusion models."
+                arXiv preprint arXiv:2202.00512 (2022).
+            [2] Ho, Jonathan, et al. "Imagen Video: High Definition Video Generation with Diffusion Models."
+                arXiv preprint arXiv:2210.02303 (2022).
+        4. "score": marginal score function. (Trained by denoising score matching).
+            Note that the score function and the noise prediction model follows a simple relationship:
+            ```
+                noise(x_t, t) = -sigma_t * score(x_t, t)
+            ```
+    We support three types of guided sampling by DPMs by setting `guidance_type`:
+        1. "uncond": unconditional sampling by DPMs.
+            The input `model` has the following format:
+            ``
+                model(x, t_input, **model_kwargs) -> noise | x_start | v | score
+            ``
+        2. "classifier": classifier guidance sampling [3] by DPMs and another classifier.
+            The input `model` has the following format:
+            ``
+                model(x, t_input, **model_kwargs) -> noise | x_start | v | score
+            ``
+            The input `classifier_fn` has the following format:
+            ``
+                classifier_fn(x, t_input, cond, **classifier_kwargs) -> logits(x, t_input, cond)
+            ``
+            [3] P. Dhariwal and A. Q. Nichol, "Diffusion models beat GANs on image synthesis,"
+                in Advances in Neural Information Processing Systems, vol. 34, 2021, pp. 8780-8794.
+        3. "classifier-free": classifier-free guidance sampling by conditional DPMs.
+            The input `model` has the following format:
+            ``
+                model(x, t_input, cond, **model_kwargs) -> noise | x_start | v | score
+            ``
+            And if cond == `unconditional_condition`, the model output is the unconditional DPM output.
+            [4] Ho, Jonathan, and Tim Salimans. "Classifier-free diffusion guidance."
+                arXiv preprint arXiv:2207.12598 (2022).
+    The `t_input` is the time label of the model, which may be discrete-time labels (i.e. 0 to 999)
+    or continuous-time labels (i.e. epsilon to T).
+    We wrap the model function to accept only `x` and `t_continuous` as inputs, and outputs the predicted noise:
+    ``
+        def model_fn(x, t_continuous) -> noise:
+            t_input = get_model_input_time(t_continuous)
+            return noise_pred(model, x, t_input, **model_kwargs)
+    ``
+    where `t_continuous` is the continuous time labels (i.e. epsilon to T). And we use `model_fn` for DPM-Solver.
+    ===============================================================
+    Args:
+        model: A diffusion model with the corresponding format described above.
+        noise_schedule: A noise schedule object, such as NoiseScheduleVP.
+        model_type: A `str`. The parameterization type of the diffusion model.
+                    "noise" or "x_start" or "v" or "score".
+        model_kwargs: A `dict`. A dict for the other inputs of the model function.
+        guidance_type: A `str`. The type of the guidance for sampling.
+                    "uncond" or "classifier" or "classifier-free".
+        condition: A pytorch tensor. The condition for the guided sampling.
+                    Only used for "classifier" or "classifier-free" guidance type.
+        unconditional_condition: A pytorch tensor. The condition for the unconditional sampling.
+                    Only used for "classifier-free" guidance type.
+        guidance_scale: A `float`. The scale for the guided sampling.
+        classifier_fn: A classifier function. Only used for the classifier guidance.
+        classifier_kwargs: A `dict`. A dict for the other inputs of the classifier function.
+    Returns:
+        A noise prediction model that accepts the noised data and the continuous time as the inputs.
+    """
+    def get_model_input_time(t_continuous):
+        """
+        Convert the continuous-time `t_continuous` (in [epsilon, T]) to the model input time.
+        For discrete-time DPMs, we convert `t_continuous` in [1 / N, 1] to `t_input` in [0, 1000 * (N - 1) / N].
+        For continuous-time DPMs, we just use `t_continuous`.
+        """
+        if noise_schedule.schedule == "discrete":
+            return (t_continuous - 1.0 / noise_schedule.total_N) * noise_schedule.total_N
+        elif noise_schedule.schedule == "discrete_flow":
+            return t_continuous * noise_schedule.total_N
+        else:
+            return t_continuous
+    def noise_pred_fn(x, t_continuous, cond=None):
+        t_input = get_model_input_time(t_continuous)
+        if cond is None:
+            output = model(x, t_input, **model_kwargs)
+        else:
+            output = model(x, t_input, cond, **model_kwargs)
+        if model_type == "noise":
+            return output
+        elif model_type == "x_start":
+            alpha_t, sigma_t = noise_schedule.marginal_alpha(t_continuous), noise_schedule.marginal_std(t_continuous)
+            return (x - expand_dims(alpha_t, x.dim()) * output) / expand_dims(sigma_t, x.dim())
+        elif model_type == "v":
+            alpha_t, sigma_t = noise_schedule.marginal_alpha(t_continuous), noise_schedule.marginal_std(t_continuous)
+            return expand_dims(alpha_t, x.dim()) * output + expand_dims(sigma_t, x.dim()) * x
+        elif model_type == "score":
+            sigma_t = noise_schedule.marginal_std(t_continuous)
+            return -expand_dims(sigma_t, x.dim()) * output
+        elif model_type == "flow":
+            _, sigma_t = noise_schedule.marginal_alpha(t_continuous), noise_schedule.marginal_std(t_continuous)
+            try:
+                noise = (1 - expand_dims(sigma_t, x.dim()).to(x)) * output + x
+            except:
+                noise = (1 - expand_dims(sigma_t, x.dim()).to(x)) * output[0] + x
+            return noise
+    def cond_grad_fn(x, t_input):
+        """
+        Compute the gradient of the classifier, i.e. nabla_{x} log p_t(cond | x_t).
+        """
+        with torch.enable_grad():
+            x_in = x.detach().requires_grad_(True)
+            log_prob = classifier_fn(x_in, t_input, condition, **classifier_kwargs)
+            return torch.autograd.grad(log_prob.sum(), x_in)[0]
+    def model_fn(x, t_continuous):
+        """
+        The noise predicition model function that is used for DPM-Solver.
+        """
+        guidance_tp = guidance_type
+        if guidance_tp == "uncond":
+            return noise_pred_fn(x, t_continuous)
+        elif guidance_tp == "classifier":
+            assert classifier_fn is not None
+            t_input = get_model_input_time(t_continuous)
+            cond_grad = cond_grad_fn(x, t_input)
+            sigma_t = noise_schedule.marginal_std(t_continuous)
+            noise = noise_pred_fn(x, t_continuous)
+            return noise - guidance_scale * expand_dims(sigma_t, x.dim()) * cond_grad
+        elif guidance_tp == "classifier-free":
+            if (
+                guidance_scale == 1.0
+                or unconditional_condition is None
+                or not (interval_guidance[0] < t_continuous[0] < interval_guidance[1])
+            ):
+                return noise_pred_fn(x, t_continuous, cond=condition)
+            else:
+                x_in = torch.cat([x] * 2)
+                t_in = torch.cat([t_continuous] * 2)
+                c_in = torch.cat([unconditional_condition, condition])
+                try:
+                    noise_uncond, noise = noise_pred_fn(x_in, t_in, cond=c_in).chunk(2)
+                except:
+                    noise_uncond, noise = noise_pred_fn(x_in, t_in, cond=c_in)[0].chunk(2)
+                return noise_uncond + guidance_scale * (noise - noise_uncond)
+    assert model_type in ["noise", "x_start", "v", "score", "flow"]
+    assert guidance_type in [
+        "uncond",
+        "classifier",
+        "classifier-free",
+    ]
+    return model_fn
+class DPM_Solver:
+    def __init__(
+        self,
+        model_fn,
+        noise_schedule,
+        algorithm_type="dpmsolver++",
+        correcting_x0_fn=None,
+        correcting_xt_fn=None,
+        thresholding_max_val=1.0,
+        dynamic_thresholding_ratio=0.995,
+    ):
+        """Construct a DPM-Solver.
+        We support both DPM-Solver (`algorithm_type="dpmsolver"`) and DPM-Solver++ (`algorithm_type="dpmsolver++"`).
+        We also support the "dynamic thresholding" method in Imagen[1]. For pixel-space diffusion models, you
+        can set both `algorithm_type="dpmsolver++"` and `correcting_x0_fn="dynamic_thresholding"` to use the
+        dynamic thresholding. The "dynamic thresholding" can greatly improve the sample quality for pixel-space
+        DPMs with large guidance scales. Note that the thresholding method is **unsuitable** for latent-space
+        DPMs (such as stable-diffusion).
+        To support advanced algorithms in image-to-image applications, we also support corrector functions for
+        both x0 and xt.
+        Args:
+            model_fn: A noise prediction model function which accepts the continuous-time input (t in [epsilon, T]):
+                ``
+                def model_fn(x, t_continuous):
+                    return noise
+                ``
+                The shape of `x` is `(batch_size, **shape)`, and the shape of `t_continuous` is `(batch_size,)`.
+            noise_schedule: A noise schedule object, such as NoiseScheduleVP.
+            algorithm_type: A `str`. Either "dpmsolver" or "dpmsolver++".
+            correcting_x0_fn: A `str` or a function with the following format:
+                ```
+                def correcting_x0_fn(x0, t):
+                    x0_new = ...
+                    return x0_new
+                ```
+                This function is to correct the outputs of the data prediction model at each sampling step. e.g.,
+                ```
+                x0_pred = data_pred_model(xt, t)
+                if correcting_x0_fn is not None:
+                    x0_pred = correcting_x0_fn(x0_pred, t)
+                xt_1 = update(x0_pred, xt, t)
+                ```
+                If `correcting_x0_fn="dynamic_thresholding"`, we use the dynamic thresholding proposed in Imagen[1].
+            correcting_xt_fn: A function with the following format:
+                ```
+                def correcting_xt_fn(xt, t, step):
+                    x_new = ...
+                    return x_new
+                ```
+                This function is to correct the intermediate samples xt at each sampling step. e.g.,
+                ```
+                xt = ...
+                xt = correcting_xt_fn(xt, t, step)
+                ```
+            thresholding_max_val: A `float`. The max value for thresholding.
+                Valid only when use `dpmsolver++` and `correcting_x0_fn="dynamic_thresholding"`.
+            dynamic_thresholding_ratio: A `float`. The ratio for dynamic thresholding (see Imagen[1] for details).
+                Valid only when use `dpmsolver++` and `correcting_x0_fn="dynamic_thresholding"`.
+        [1] Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour,
+            Burcu Karagol Ayan, S Sara Mahdavi, Rapha Gontijo Lopes, et al. Photorealistic text-to-image diffusion models
+            with deep language understanding. arXiv preprint arXiv:2205.11487, 2022b.
+        """
+        self.model = lambda x, t: model_fn(x, t.expand(x.shape[0]))
+        self.noise_schedule = noise_schedule
+        assert algorithm_type in ["dpmsolver", "dpmsolver++"]
+        self.algorithm_type = algorithm_type
+        if correcting_x0_fn == "dynamic_thresholding":
+            self.correcting_x0_fn = self.dynamic_thresholding_fn
+        else:
+            self.correcting_x0_fn = correcting_x0_fn
+        self.correcting_xt_fn = correcting_xt_fn
+        self.dynamic_thresholding_ratio = dynamic_thresholding_ratio
+        self.thresholding_max_val = thresholding_max_val
+        self.register_progress_bar()
+    def register_progress_bar(self, progress_fn=None):
+        """
+        Register a progress bar callback function
+        Args:
+            progress_fn: Callback function that takes current step and total steps as parameters
+        """
+        self.progress_fn = progress_fn if progress_fn is not None else lambda step, total: None
+    def update_progress(self, step, total_steps):
+        """
+        Update sampling progress
+        Args:
+            step: Current step number
+            total_steps: Total number of steps
+        """
+        if hasattr(self, "progress_fn"):
+            try:
+                self.progress_fn(step / total_steps, desc=f"Generating {step}/{total_steps}")
+            except:
+                self.progress_fn(step, total_steps)
+        else:
+            # If no progress_fn registered, use default empty function
+            pass
+    def dynamic_thresholding_fn(self, x0, t):
+        """
+        The dynamic thresholding method.
+        """
+        dims = x0.dim()
+        p = self.dynamic_thresholding_ratio
+        s = torch.quantile(torch.abs(x0).reshape((x0.shape[0], -1)), p, dim=1)
+        s = expand_dims(torch.maximum(s, self.thresholding_max_val * torch.ones_like(s).to(s.device)), dims)
+        x0 = torch.clamp(x0, -s, s) / s
+        return x0
+    def noise_prediction_fn(self, x, t):
+        """
+        Return the noise prediction model.
+        """
+        return self.model(x, t)
+    def data_prediction_fn(self, x, t):
+        """
+        Return the data prediction model (with corrector).
+        """
+        noise = self.noise_prediction_fn(x, t)
+        alpha_t, sigma_t = self.noise_schedule.marginal_alpha(t), self.noise_schedule.marginal_std(t)
+        x0 = (x - sigma_t * noise) / alpha_t
+        if self.correcting_x0_fn is not None:
+            x0 = self.correcting_x0_fn(x0, t)
+        return x0
+    def model_fn(self, x, t):
+        """
+        Convert the model to the noise prediction model or the data prediction model.
+        """
+        if self.algorithm_type == "dpmsolver++":
+            return self.data_prediction_fn(x, t)
+        else:
+            return self.noise_prediction_fn(x, t)
+    def get_time_steps(self, skip_type, t_T, t_0, N, device, shift=1.0):
+        """Compute the intermediate time steps for sampling.
+        Args:
+            skip_type: A `str`. The type for the spacing of the time steps. We support three types:
+                - 'logSNR': uniform logSNR for the time steps.
+                - 'time_uniform': uniform time for the time steps. (**Recommended for high-resolutional data**.)
+                - 'time_quadratic': quadratic time for the time steps. (Used in DDIM for low-resolutional data.)
+            t_T: A `float`. The starting time of the sampling (default is T).
+            t_0: A `float`. The ending time of the sampling (default is epsilon).
+            N: A `int`. The total number of the spacing of the time steps.
+            device: A torch device.
+        Returns:
+            A pytorch tensor of the time steps, with the shape (N + 1,).
+        """
+        if skip_type == "logSNR":
+            lambda_T = self.noise_schedule.marginal_lambda(torch.tensor(t_T).to(device))
+            lambda_0 = self.noise_schedule.marginal_lambda(torch.tensor(t_0).to(device))
+            logSNR_steps = torch.linspace(lambda_T.cpu().item(), lambda_0.cpu().item(), N + 1).to(device)
+            return self.noise_schedule.inverse_lambda(logSNR_steps)
+        elif skip_type == "time_uniform":
+            return torch.linspace(t_T, t_0, N + 1).to(device)
+        elif skip_type == "time_quadratic":
+            t_order = 2
+            t = torch.linspace(t_T ** (1.0 / t_order), t_0 ** (1.0 / t_order), N + 1).pow(t_order).to(device)
+            return t
+        elif skip_type == "time_uniform_flow":
+            betas = torch.linspace(t_T, t_0, N + 1).to(device)
+            sigmas = 1.0 - betas
+            sigmas = (shift * sigmas / (1 + (shift - 1) * sigmas)).flip(dims=[0])
+            return sigmas
+        else:
+            raise ValueError(
+                f"Unsupported skip_type {skip_type}, need to be 'logSNR' or 'time_uniform' or 'time_quadratic'"
+            )
+    def get_orders_and_timesteps_for_singlestep_solver(self, steps, order, skip_type, t_T, t_0, device):
+        """
+        Get the order of each step for sampling by the singlestep DPM-Solver.
+        We combine both DPM-Solver-1,2,3 to use all the function evaluations, which is named as "DPM-Solver-fast".
+        Given a fixed number of function evaluations by `steps`, the sampling procedure by DPM-Solver-fast is:
+            - If order == 1:
+                We take `steps` of DPM-Solver-1 (i.e. DDIM).
+            - If order == 2:
+                - Denote K = (steps // 2). We take K or (K + 1) intermediate time steps for sampling.
+                - If steps % 2 == 0, we use K steps of DPM-Solver-2.
+                - If steps % 2 == 1, we use K steps of DPM-Solver-2 and 1 step of DPM-Solver-1.
+            - If order == 3:
+                - Denote K = (steps // 3 + 1). We take K intermediate time steps for sampling.
+                - If steps % 3 == 0, we use (K - 2) steps of DPM-Solver-3, and 1 step of DPM-Solver-2 and 1 step of DPM-Solver-1.
+                - If steps % 3 == 1, we use (K - 1) steps of DPM-Solver-3 and 1 step of DPM-Solver-1.
+                - If steps % 3 == 2, we use (K - 1) steps of DPM-Solver-3 and 1 step of DPM-Solver-2.
+        ============================================
+        Args:
+            order: A `int`. The max order for the solver (2 or 3).
+            steps: A `int`. The total number of function evaluations (NFE).
+            skip_type: A `str`. The type for the spacing of the time steps. We support three types:
+                - 'logSNR': uniform logSNR for the time steps.
+                - 'time_uniform': uniform time for the time steps. (**Recommended for high-resolutional data**.)
+                - 'time_quadratic': quadratic time for the time steps. (Used in DDIM for low-resolutional data.)
+            t_T: A `float`. The starting time of the sampling (default is T).
+            t_0: A `float`. The ending time of the sampling (default is epsilon).
+            device: A torch device.
+        Returns:
+            orders: A list of the solver order of each step.
+        """
+        if order == 3:
+            K = steps // 3 + 1
+            if steps % 3 == 0:
+                orders = [3,] * (
+                    K - 2
+                ) + [2, 1]
+            elif steps % 3 == 1:
+                orders = [3,] * (
+                    K - 1
+                ) + [1]
+            else:
+                orders = [3,] * (
+                    K - 1
+                ) + [2]
+        elif order == 2:
+            if steps % 2 == 0:
+                K = steps // 2
+                orders = [
+                    2,
+                ] * K
+            else:
+                K = steps // 2 + 1
+                orders = [2,] * (
+                    K - 1
+                ) + [1]
+        elif order == 1:
+            K = 1
+            orders = [
+                1,
+            ] * steps
+        else:
+            raise ValueError("'order' must be '1' or '2' or '3'.")
+        if skip_type == "logSNR":
+            # To reproduce the results in DPM-Solver paper
+            timesteps_outer = self.get_time_steps(skip_type, t_T, t_0, K, device)
+        else:
+            timesteps_outer = self.get_time_steps(skip_type, t_T, t_0, steps, device)[
+                torch.cumsum(
+                    torch.tensor(
+                        [
+                            0,
+                        ]
+                        + orders
+                    ),
+                    0,
+                ).to(device)
+            ]
+        return timesteps_outer, orders
+    def denoise_to_zero_fn(self, x, s):
+        """
+        Denoise at the final step, which is equivalent to solve the ODE from lambda_s to infty by first-order discretization.
+        """
+        return self.data_prediction_fn(x, s)
+    def dpm_solver_first_update(self, x, s, t, model_s=None, return_intermediate=False):
+        """
+        DPM-Solver-1 (equivalent to DDIM) from time `s` to time `t`.
+        Args:
+            x: A pytorch tensor. The initial value at time `s`.
+            s: A pytorch tensor. The starting time, with the shape (1,).
+            t: A pytorch tensor. The ending time, with the shape (1,).
+            model_s: A pytorch tensor. The model function evaluated at time `s`.
+                If `model_s` is None, we evaluate the model by `x` and `s`; otherwise we directly use it.
+            return_intermediate: A `bool`. If true, also return the model value at time `s`.
+        Returns:
+            x_t: A pytorch tensor. The approximated solution at time `t`.
+        """
+        ns = self.noise_schedule
+        dims = x.dim()
+        lambda_s, lambda_t = ns.marginal_lambda(s), ns.marginal_lambda(t)
+        h = lambda_t - lambda_s
+        log_alpha_s, log_alpha_t = ns.marginal_log_mean_coeff(s), ns.marginal_log_mean_coeff(t)
+        sigma_s, sigma_t = ns.marginal_std(s), ns.marginal_std(t)
+        alpha_t = torch.exp(log_alpha_t)
+        if self.algorithm_type == "dpmsolver++":
+            phi_1 = torch.expm1(-h)
+            if model_s is None:
+                model_s = self.model_fn(x, s)
+            x_t = sigma_t / sigma_s * x - alpha_t * phi_1 * model_s
+            if return_intermediate:
+                return x_t, {"model_s": model_s}
+            else:
+                return x_t
+        else:
+            phi_1 = torch.expm1(h)
+            if model_s is None:
+                model_s = self.model_fn(x, s)
+            x_t = torch.exp(log_alpha_t - log_alpha_s) * x - (sigma_t * phi_1) * model_s
+            if return_intermediate:
+                return x_t, {"model_s": model_s}
+            else:
+                return x_t
+    def singlestep_dpm_solver_second_update(
+        self, x, s, t, r1=0.5, model_s=None, return_intermediate=False, solver_type="dpmsolver"
+    ):
+        """
+        Singlestep solver DPM-Solver-2 from time `s` to time `t`.
+        Args:
+            x: A pytorch tensor. The initial value at time `s`.
+            s: A pytorch tensor. The starting time, with the shape (1,).
+            t: A pytorch tensor. The ending time, with the shape (1,).
+            r1: A `float`. The hyperparameter of the second-order solver.
+            model_s: A pytorch tensor. The model function evaluated at time `s`.
+                If `model_s` is None, we evaluate the model by `x` and `s`; otherwise we directly use it.
+            return_intermediate: A `bool`. If true, also return the model value at time `s` and `s1` (the intermediate time).
+            solver_type: either 'dpmsolver' or 'taylor'. The type for the high-order solvers.
+                The type slightly impacts the performance. We recommend to use 'dpmsolver' type.
+        Returns:
+            x_t: A pytorch tensor. The approximated solution at time `t`.
+        """
+        if solver_type not in ["dpmsolver", "taylor"]:
+            raise ValueError(f"'solver_type' must be either 'dpmsolver' or 'taylor', got {solver_type}")
+        if r1 is None:
+            r1 = 0.5
+        ns = self.noise_schedule
+        lambda_s, lambda_t = ns.marginal_lambda(s), ns.marginal_lambda(t)
+        h = lambda_t - lambda_s
+        lambda_s1 = lambda_s + r1 * h
+        s1 = ns.inverse_lambda(lambda_s1)
+        log_alpha_s, log_alpha_s1, log_alpha_t = (
+            ns.marginal_log_mean_coeff(s),
+            ns.marginal_log_mean_coeff(s1),
+            ns.marginal_log_mean_coeff(t),
+        )
+        sigma_s, sigma_s1, sigma_t = ns.marginal_std(s), ns.marginal_std(s1), ns.marginal_std(t)
+        alpha_s1, alpha_t = torch.exp(log_alpha_s1), torch.exp(log_alpha_t)
+        if self.algorithm_type == "dpmsolver++":
+            phi_11 = torch.expm1(-r1 * h)
+            phi_1 = torch.expm1(-h)
+            if model_s is None:
+                model_s = self.model_fn(x, s)
+            x_s1 = (sigma_s1 / sigma_s) * x - (alpha_s1 * phi_11) * model_s
+            model_s1 = self.model_fn(x_s1, s1)
+            if solver_type == "dpmsolver":
+                x_t = (
+                    (sigma_t / sigma_s) * x
+                    - (alpha_t * phi_1) * model_s
+                    - (0.5 / r1) * (alpha_t * phi_1) * (model_s1 - model_s)
+                )
+            elif solver_type == "taylor":
+                x_t = (
+                    (sigma_t / sigma_s) * x
+                    - (alpha_t * phi_1) * model_s
+                    + (1.0 / r1) * (alpha_t * (phi_1 / h + 1.0)) * (model_s1 - model_s)
+                )
+        else:
+            phi_11 = torch.expm1(r1 * h)
+            phi_1 = torch.expm1(h)
+            if model_s is None:
+                model_s = self.model_fn(x, s)
+            x_s1 = torch.exp(log_alpha_s1 - log_alpha_s) * x - (sigma_s1 * phi_11) * model_s
+            model_s1 = self.model_fn(x_s1, s1)
+            if solver_type == "dpmsolver":
+                x_t = (
+                    torch.exp(log_alpha_t - log_alpha_s) * x
+                    - (sigma_t * phi_1) * model_s
+                    - (0.5 / r1) * (sigma_t * phi_1) * (model_s1 - model_s)
+                )
+            elif solver_type == "taylor":
+                x_t = (
+                    torch.exp(log_alpha_t - log_alpha_s) * x
+                    - (sigma_t * phi_1) * model_s
+                    - (1.0 / r1) * (sigma_t * (phi_1 / h - 1.0)) * (model_s1 - model_s)
+                )
+        if return_intermediate:
+            return x_t, {"model_s": model_s, "model_s1": model_s1}
+        else:
+            return x_t
+    def singlestep_dpm_solver_third_update(
+        self,
+        x,
+        s,
+        t,
+        r1=1.0 / 3.0,
+        r2=2.0 / 3.0,
+        model_s=None,
+        model_s1=None,
+        return_intermediate=False,
+        solver_type="dpmsolver",
+    ):
+        """
+        Singlestep solver DPM-Solver-3 from time `s` to time `t`.
+        Args:
+            x: A pytorch tensor. The initial value at time `s`.
+            s: A pytorch tensor. The starting time, with the shape (1,).
+            t: A pytorch tensor. The ending time, with the shape (1,).
+            r1: A `float`. The hyperparameter of the third-order solver.
+            r2: A `float`. The hyperparameter of the third-order solver.
+            model_s: A pytorch tensor. The model function evaluated at time `s`.
+                If `model_s` is None, we evaluate the model by `x` and `s`; otherwise we directly use it.
+            model_s1: A pytorch tensor. The model function evaluated at time `s1` (the intermediate time given by `r1`).
+                If `model_s1` is None, we evaluate the model at `s1`; otherwise we directly use it.
+            return_intermediate: A `bool`. If true, also return the model value at time `s`, `s1` and `s2` (the intermediate times).
+            solver_type: either 'dpmsolver' or 'taylor'. The type for the high-order solvers.
+                The type slightly impacts the performance. We recommend to use 'dpmsolver' type.
+        Returns:
+            x_t: A pytorch tensor. The approximated solution at time `t`.
+        """
+        if solver_type not in ["dpmsolver", "taylor"]:
+            raise ValueError(f"'solver_type' must be either 'dpmsolver' or 'taylor', got {solver_type}")
+        if r1 is None:
+            r1 = 1.0 / 3.0
+        if r2 is None:
+            r2 = 2.0 / 3.0
+        ns = self.noise_schedule
+        lambda_s, lambda_t = ns.marginal_lambda(s), ns.marginal_lambda(t)
+        h = lambda_t - lambda_s
+        lambda_s1 = lambda_s + r1 * h
+        lambda_s2 = lambda_s + r2 * h
+        s1 = ns.inverse_lambda(lambda_s1)
+        s2 = ns.inverse_lambda(lambda_s2)
+        log_alpha_s, log_alpha_s1, log_alpha_s2, log_alpha_t = (
+            ns.marginal_log_mean_coeff(s),
+            ns.marginal_log_mean_coeff(s1),
+            ns.marginal_log_mean_coeff(s2),
+            ns.marginal_log_mean_coeff(t),
+        )
+        sigma_s, sigma_s1, sigma_s2, sigma_t = (
+            ns.marginal_std(s),
+            ns.marginal_std(s1),
+            ns.marginal_std(s2),
+            ns.marginal_std(t),
+        )
+        alpha_s1, alpha_s2, alpha_t = torch.exp(log_alpha_s1), torch.exp(log_alpha_s2), torch.exp(log_alpha_t)
+        if self.algorithm_type == "dpmsolver++":
+            phi_11 = torch.expm1(-r1 * h)
+            phi_12 = torch.expm1(-r2 * h)
+            phi_1 = torch.expm1(-h)
+            phi_22 = torch.expm1(-r2 * h) / (r2 * h) + 1.0
+            phi_2 = phi_1 / h + 1.0
+            phi_3 = phi_2 / h - 0.5
+            if model_s is None:
+                model_s = self.model_fn(x, s)
+            if model_s1 is None:
+                x_s1 = (sigma_s1 / sigma_s) * x - (alpha_s1 * phi_11) * model_s
+                model_s1 = self.model_fn(x_s1, s1)
+            x_s2 = (
+                (sigma_s2 / sigma_s) * x
+                - (alpha_s2 * phi_12) * model_s
+                + r2 / r1 * (alpha_s2 * phi_22) * (model_s1 - model_s)
+            )
+            model_s2 = self.model_fn(x_s2, s2)
+            if solver_type == "dpmsolver":
+                x_t = (
+                    (sigma_t / sigma_s) * x
+                    - (alpha_t * phi_1) * model_s
+                    + (1.0 / r2) * (alpha_t * phi_2) * (model_s2 - model_s)
+                )
+            elif solver_type == "taylor":
+                D1_0 = (1.0 / r1) * (model_s1 - model_s)
+                D1_1 = (1.0 / r2) * (model_s2 - model_s)
+                D1 = (r2 * D1_0 - r1 * D1_1) / (r2 - r1)
+                D2 = 2.0 * (D1_1 - D1_0) / (r2 - r1)
+                x_t = (
+                    (sigma_t / sigma_s) * x
+                    - (alpha_t * phi_1) * model_s
+                    + (alpha_t * phi_2) * D1
+                    - (alpha_t * phi_3) * D2
+                )
+        else:
+            phi_11 = torch.expm1(r1 * h)
+            phi_12 = torch.expm1(r2 * h)
+            phi_1 = torch.expm1(h)
+            phi_22 = torch.expm1(r2 * h) / (r2 * h) - 1.0
+            phi_2 = phi_1 / h - 1.0
+            phi_3 = phi_2 / h - 0.5
+            if model_s is None:
+                model_s = self.model_fn(x, s)
+            if model_s1 is None:
+                x_s1 = (torch.exp(log_alpha_s1 - log_alpha_s)) * x - (sigma_s1 * phi_11) * model_s
+                model_s1 = self.model_fn(x_s1, s1)
+            x_s2 = (
+                (torch.exp(log_alpha_s2 - log_alpha_s)) * x
+                - (sigma_s2 * phi_12) * model_s
+                - r2 / r1 * (sigma_s2 * phi_22) * (model_s1 - model_s)
+            )
+            model_s2 = self.model_fn(x_s2, s2)
+            if solver_type == "dpmsolver":
+                x_t = (
+                    (torch.exp(log_alpha_t - log_alpha_s)) * x
+                    - (sigma_t * phi_1) * model_s
+                    - (1.0 / r2) * (sigma_t * phi_2) * (model_s2 - model_s)
+                )
+            elif solver_type == "taylor":
+                D1_0 = (1.0 / r1) * (model_s1 - model_s)
+                D1_1 = (1.0 / r2) * (model_s2 - model_s)
+                D1 = (r2 * D1_0 - r1 * D1_1) / (r2 - r1)
+                D2 = 2.0 * (D1_1 - D1_0) / (r2 - r1)
+                x_t = (
+                    (torch.exp(log_alpha_t - log_alpha_s)) * x
+                    - (sigma_t * phi_1) * model_s
+                    - (sigma_t * phi_2) * D1
+                    - (sigma_t * phi_3) * D2
+                )
+        if return_intermediate:
+            return x_t, {"model_s": model_s, "model_s1": model_s1, "model_s2": model_s2}
+        else:
+            return x_t
+    def multistep_dpm_solver_second_update(self, x, model_prev_list, t_prev_list, t, solver_type="dpmsolver"):
+        """
+        Multistep solver DPM-Solver-2 from time `t_prev_list[-1]` to time `t`.
+        Args:
+            x: A pytorch tensor. The initial value at time `s`.
+            model_prev_list: A list of pytorch tensor. The previous computed model values.
+            t_prev_list: A list of pytorch tensor. The previous times, each time has the shape (1,)
+            t: A pytorch tensor. The ending time, with the shape (1,).
+            solver_type: either 'dpmsolver' or 'taylor'. The type for the high-order solvers.
+                The type slightly impacts the performance. We recommend to use 'dpmsolver' type.
+        Returns:
+            x_t: A pytorch tensor. The approximated solution at time `t`.
+        """
+        if solver_type not in ["dpmsolver", "taylor"]:
+            raise ValueError(f"'solver_type' must be either 'dpmsolver' or 'taylor', got {solver_type}")
+        ns = self.noise_schedule
+        model_prev_1, model_prev_0 = model_prev_list[-2], model_prev_list[-1]
+        t_prev_1, t_prev_0 = t_prev_list[-2], t_prev_list[-1]
+        lambda_prev_1, lambda_prev_0, lambda_t = (
+            ns.marginal_lambda(t_prev_1),
+            ns.marginal_lambda(t_prev_0),
+            ns.marginal_lambda(t),
+        )
+        log_alpha_prev_0, log_alpha_t = ns.marginal_log_mean_coeff(t_prev_0), ns.marginal_log_mean_coeff(t)
+        sigma_prev_0, sigma_t = ns.marginal_std(t_prev_0), ns.marginal_std(t)
+        alpha_t = torch.exp(log_alpha_t)
+        h_0 = lambda_prev_0 - lambda_prev_1
+        h = lambda_t - lambda_prev_0
+        r0 = h_0 / h
+        D1_0 = (1.0 / r0) * (model_prev_0 - model_prev_1)
+        if self.algorithm_type == "dpmsolver++":
+            phi_1 = torch.expm1(-h)
+            if solver_type == "dpmsolver":
+                x_t = (sigma_t / sigma_prev_0) * x - (alpha_t * phi_1) * model_prev_0 - 0.5 * (alpha_t * phi_1) * D1_0
+            elif solver_type == "taylor":
+                x_t = (
+                    (sigma_t / sigma_prev_0) * x
+                    - (alpha_t * phi_1) * model_prev_0
+                    + (alpha_t * (phi_1 / h + 1.0)) * D1_0
+                )
+        else:
+            phi_1 = torch.expm1(h)
+            if solver_type == "dpmsolver":
+                x_t = (
+                    (torch.exp(log_alpha_t - log_alpha_prev_0)) * x
+                    - (sigma_t * phi_1) * model_prev_0
+                    - 0.5 * (sigma_t * phi_1) * D1_0
+                )
+            elif solver_type == "taylor":
+                x_t = (
+                    (torch.exp(log_alpha_t - log_alpha_prev_0)) * x
+                    - (sigma_t * phi_1) * model_prev_0
+                    - (sigma_t * (phi_1 / h - 1.0)) * D1_0
+                )
+        return x_t
+    def multistep_dpm_solver_third_update(self, x, model_prev_list, t_prev_list, t, solver_type="dpmsolver"):
+        """
+        Multistep solver DPM-Solver-3 from time `t_prev_list[-1]` to time `t`.
+        Args:
+            x: A pytorch tensor. The initial value at time `s`.
+            model_prev_list: A list of pytorch tensor. The previous computed model values.
+            t_prev_list: A list of pytorch tensor. The previous times, each time has the shape (1,)
+            t: A pytorch tensor. The ending time, with the shape (1,).
+            solver_type: either 'dpmsolver' or 'taylor'. The type for the high-order solvers.
+                The type slightly impacts the performance. We recommend to use 'dpmsolver' type.
+        Returns:
+            x_t: A pytorch tensor. The approximated solution at time `t`.
+        """
+        ns = self.noise_schedule
+        model_prev_2, model_prev_1, model_prev_0 = model_prev_list
+        t_prev_2, t_prev_1, t_prev_0 = t_prev_list
+        lambda_prev_2, lambda_prev_1, lambda_prev_0, lambda_t = (
+            ns.marginal_lambda(t_prev_2),
+            ns.marginal_lambda(t_prev_1),
+            ns.marginal_lambda(t_prev_0),
+            ns.marginal_lambda(t),
+        )
+        log_alpha_prev_0, log_alpha_t = ns.marginal_log_mean_coeff(t_prev_0), ns.marginal_log_mean_coeff(t)
+        sigma_prev_0, sigma_t = ns.marginal_std(t_prev_0), ns.marginal_std(t)
+        alpha_t = torch.exp(log_alpha_t)
+        h_1 = lambda_prev_1 - lambda_prev_2
+        h_0 = lambda_prev_0 - lambda_prev_1
+        h = lambda_t - lambda_prev_0
+        r0, r1 = h_0 / h, h_1 / h
+        D1_0 = (1.0 / r0) * (model_prev_0 - model_prev_1)
+        D1_1 = (1.0 / r1) * (model_prev_1 - model_prev_2)
+        D1 = D1_0 + (r0 / (r0 + r1)) * (D1_0 - D1_1)
+        D2 = (1.0 / (r0 + r1)) * (D1_0 - D1_1)
+        if self.algorithm_type == "dpmsolver++":
+            phi_1 = torch.expm1(-h)
+            phi_2 = phi_1 / h + 1.0
+            phi_3 = phi_2 / h - 0.5
+            x_t = (
+                (sigma_t / sigma_prev_0) * x
+                - (alpha_t * phi_1) * model_prev_0
+                + (alpha_t * phi_2) * D1
+                - (alpha_t * phi_3) * D2
+            )
+        else:
+            phi_1 = torch.expm1(h)
+            phi_2 = phi_1 / h - 1.0
+            phi_3 = phi_2 / h - 0.5
+            x_t = (
+                (torch.exp(log_alpha_t - log_alpha_prev_0)) * x
+                - (sigma_t * phi_1) * model_prev_0
+                - (sigma_t * phi_2) * D1
+                - (sigma_t * phi_3) * D2
+            )
+        return x_t
+    def singlestep_dpm_solver_update(
+        self, x, s, t, order, return_intermediate=False, solver_type="dpmsolver", r1=None, r2=None
+    ):
+        """
+        Singlestep DPM-Solver with the order `order` from time `s` to time `t`.
+        Args:
+            x: A pytorch tensor. The initial value at time `s`.
+            s: A pytorch tensor. The starting time, with the shape (1,).
+            t: A pytorch tensor. The ending time, with the shape (1,).
+            order: A `int`. The order of DPM-Solver. We only support order == 1 or 2 or 3.
+            return_intermediate: A `bool`. If true, also return the model value at time `s`, `s1` and `s2` (the intermediate times).
+            solver_type: either 'dpmsolver' or 'taylor'. The type for the high-order solvers.
+                The type slightly impacts the performance. We recommend to use 'dpmsolver' type.
+            r1: A `float`. The hyperparameter of the second-order or third-order solver.
+            r2: A `float`. The hyperparameter of the third-order solver.
+        Returns:
+            x_t: A pytorch tensor. The approximated solution at time `t`.
+        """
+        if order == 1:
+            return self.dpm_solver_first_update(x, s, t, return_intermediate=return_intermediate)
+        elif order == 2:
+            return self.singlestep_dpm_solver_second_update(
+                x, s, t, return_intermediate=return_intermediate, solver_type=solver_type, r1=r1
+            )
+        elif order == 3:
+            return self.singlestep_dpm_solver_third_update(
+                x, s, t, return_intermediate=return_intermediate, solver_type=solver_type, r1=r1, r2=r2
+            )
+        else:
+            raise ValueError(f"Solver order must be 1 or 2 or 3, got {order}")
+    def multistep_dpm_solver_update(self, x, model_prev_list, t_prev_list, t, order, solver_type="dpmsolver"):
+        """
+        Multistep DPM-Solver with the order `order` from time `t_prev_list[-1]` to time `t`.
+        Args:
+            x: A pytorch tensor. The initial value at time `s`.
+            model_prev_list: A list of pytorch tensor. The previous computed model values.
+            t_prev_list: A list of pytorch tensor. The previous times, each time has the shape (1,)
+            t: A pytorch tensor. The ending time, with the shape (1,).
+            order: A `int`. The order of DPM-Solver. We only support order == 1 or 2 or 3.
+            solver_type: either 'dpmsolver' or 'taylor'. The type for the high-order solvers.
+                The type slightly impacts the performance. We recommend to use 'dpmsolver' type.
+        Returns:
+            x_t: A pytorch tensor. The approximated solution at time `t`.
+        """
+        if order == 1:
+            return self.dpm_solver_first_update(x, t_prev_list[-1], t, model_s=model_prev_list[-1])
+        elif order == 2:
+            return self.multistep_dpm_solver_second_update(x, model_prev_list, t_prev_list, t, solver_type=solver_type)
+        elif order == 3:
+            return self.multistep_dpm_solver_third_update(x, model_prev_list, t_prev_list, t, solver_type=solver_type)
+        else:
+            raise ValueError(f"Solver order must be 1 or 2 or 3, got {order}")
+    def dpm_solver_adaptive(
+        self, x, order, t_T, t_0, h_init=0.05, atol=0.0078, rtol=0.05, theta=0.9, t_err=1e-5, solver_type="dpmsolver"
+    ):
+        """
+        The adaptive step size solver based on singlestep DPM-Solver.
+        Args:
+            x: A pytorch tensor. The initial value at time `t_T`.
+            order: A `int`. The (higher) order of the solver. We only support order == 2 or 3.
+            t_T: A `float`. The starting time of the sampling (default is T).
+            t_0: A `float`. The ending time of the sampling (default is epsilon).
+            h_init: A `float`. The initial step size (for logSNR).
+            atol: A `float`. The absolute tolerance of the solver. For image data, the default setting is 0.0078, followed [1].
+            rtol: A `float`. The relative tolerance of the solver. The default setting is 0.05.
+            theta: A `float`. The safety hyperparameter for adapting the step size. The default setting is 0.9, followed [1].
+            t_err: A `float`. The tolerance for the time. We solve the diffusion ODE until the absolute error between the
+                current time and `t_0` is less than `t_err`. The default setting is 1e-5.
+            solver_type: either 'dpmsolver' or 'taylor'. The type for the high-order solvers.
+                The type slightly impacts the performance. We recommend to use 'dpmsolver' type.
+        Returns:
+            x_0: A pytorch tensor. The approximated solution at time `t_0`.
+        [1] A. Jolicoeur-Martineau, K. Li, R. Piché-Taillefer, T. Kachman, and I. Mitliagkas, "Gotta go fast when generating data with score-based models," arXiv preprint arXiv:2105.14080, 2021.
+        """
+        ns = self.noise_schedule
+        s = t_T * torch.ones((1,)).to(x)
+        lambda_s = ns.marginal_lambda(s)
+        lambda_0 = ns.marginal_lambda(t_0 * torch.ones_like(s).to(x))
+        h = h_init * torch.ones_like(s).to(x)
+        x_prev = x
+        nfe = 0
+        if order == 2:
+            r1 = 0.5
+            lower_update = lambda x, s, t: self.dpm_solver_first_update(x, s, t, return_intermediate=True)
+            higher_update = lambda x, s, t, **kwargs: self.singlestep_dpm_solver_second_update(
+                x, s, t, r1=r1, solver_type=solver_type, **kwargs
+            )
+        elif order == 3:
+            r1, r2 = 1.0 / 3.0, 2.0 / 3.0
+            lower_update = lambda x, s, t: self.singlestep_dpm_solver_second_update(
+                x, s, t, r1=r1, return_intermediate=True, solver_type=solver_type
+            )
+            higher_update = lambda x, s, t, **kwargs: self.singlestep_dpm_solver_third_update(
+                x, s, t, r1=r1, r2=r2, solver_type=solver_type, **kwargs
+            )
+        else:
+            raise ValueError(f"For adaptive step size solver, order must be 2 or 3, got {order}")
+        while torch.abs(s - t_0).mean() > t_err:
+            t = ns.inverse_lambda(lambda_s + h)
+            x_lower, lower_noise_kwargs = lower_update(x, s, t)
+            x_higher = higher_update(x, s, t, **lower_noise_kwargs)
+            delta = torch.max(torch.ones_like(x).to(x) * atol, rtol * torch.max(torch.abs(x_lower), torch.abs(x_prev)))
+            norm_fn = lambda v: torch.sqrt(torch.square(v.reshape((v.shape[0], -1))).mean(dim=-1, keepdim=True))
+            E = norm_fn((x_higher - x_lower) / delta).max()
+            if torch.all(E <= 1.0):
+                x = x_higher
+                s = t
+                x_prev = x_lower
+                lambda_s = ns.marginal_lambda(s)
+            h = torch.min(theta * h * torch.float_power(E, -1.0 / order).float(), lambda_0 - lambda_s)
+            nfe += order
+        print("adaptive solver nfe", nfe)
+        return x
+    def add_noise(self, x, t, noise=None):
+        """
+        Compute the noised input xt = alpha_t * x + sigma_t * noise.
+        Args:
+            x: A `torch.Tensor` with shape `(batch_size, *shape)`.
+            t: A `torch.Tensor` with shape `(t_size,)`.
+        Returns:
+            xt with shape `(t_size, batch_size, *shape)`.
+        """
+        alpha_t, sigma_t = self.noise_schedule.marginal_alpha(t), self.noise_schedule.marginal_std(t)
+        if noise is None:
+            noise = torch.randn((t.shape[0], *x.shape), device=x.device)
+        x = x.reshape((-1, *x.shape))
+        xt = expand_dims(alpha_t, x.dim()) * x + expand_dims(sigma_t, x.dim()) * noise
+        if t.shape[0] == 1:
+            return xt.squeeze(0)
+        else:
+            return xt
+    def inverse(
+        self,
+        x,
+        steps=20,
+        t_start=None,
+        t_end=None,
+        order=2,
+        skip_type="time_uniform",
+        method="multistep",
+        lower_order_final=True,
+        denoise_to_zero=False,
+        solver_type="dpmsolver",
+        atol=0.0078,
+        rtol=0.05,
+        return_intermediate=False,
+    ):
+        """
+        Inverse the sample `x` from time `t_start` to `t_end` by DPM-Solver.
+        For discrete-time DPMs, we use `t_start=1/N`, where `N` is the total time steps during training.
+        """
+        t_0 = 1.0 / self.noise_schedule.total_N if t_start is None else t_start
+        t_T = self.noise_schedule.T if t_end is None else t_end
+        assert (
+            t_0 > 0 and t_T > 0
+        ), "Time range needs to be greater than 0. For discrete-time DPMs, it needs to be in [1 / N, 1], where N is the length of betas array"
+        return self.sample(
+            x,
+            steps=steps,
+            t_start=t_0,
+            t_end=t_T,
+            order=order,
+            skip_type=skip_type,
+            method=method,
+            lower_order_final=lower_order_final,
+            denoise_to_zero=denoise_to_zero,
+            solver_type=solver_type,
+            atol=atol,
+            rtol=rtol,
+            return_intermediate=return_intermediate,
+        )
+    def sample(
+        self,
+        x,
+        steps=20,
+        t_start=None,
+        t_end=None,
+        order=2,
+        skip_type="time_uniform",
+        method="multistep",
+        lower_order_final=True,
+        denoise_to_zero=False,
+        solver_type="dpmsolver",
+        atol=0.0078,
+        rtol=0.05,
+        return_intermediate=False,
+        flow_shift=1.0,
+    ):
+        """
+        Compute the sample at time `t_end` by DPM-Solver, given the initial `x` at time `t_start`.
+        =====================================================
+        We support the following algorithms for both noise prediction model and data prediction model:
+            - 'singlestep':
+                Singlestep DPM-Solver (i.e. "DPM-Solver-fast" in the paper), which combines different orders of singlestep DPM-Solver.
+                We combine all the singlestep solvers with order <= `order` to use up all the function evaluations (steps).
+                The total number of function evaluations (NFE) == `steps`.
+                Given a fixed NFE == `steps`, the sampling procedure is:
+                    - If `order` == 1:
+                        - Denote K = steps. We use K steps of DPM-Solver-1 (i.e. DDIM).
+                    - If `order` == 2:
+                        - Denote K = (steps // 2) + (steps % 2). We take K intermediate time steps for sampling.
+                        - If steps % 2 == 0, we use K steps of singlestep DPM-Solver-2.
+                        - If steps % 2 == 1, we use (K - 1) steps of singlestep DPM-Solver-2 and 1 step of DPM-Solver-1.
+                    - If `order` == 3:
+                        - Denote K = (steps // 3 + 1). We take K intermediate time steps for sampling.
+                        - If steps % 3 == 0, we use (K - 2) steps of singlestep DPM-Solver-3, and 1 step of singlestep DPM-Solver-2 and 1 step of DPM-Solver-1.
+                        - If steps % 3 == 1, we use (K - 1) steps of singlestep DPM-Solver-3 and 1 step of DPM-Solver-1.
+                        - If steps % 3 == 2, we use (K - 1) steps of singlestep DPM-Solver-3 and 1 step of singlestep DPM-Solver-2.
+            - 'multistep':
+                Multistep DPM-Solver with the order of `order`. The total number of function evaluations (NFE) == `steps`.
+                We initialize the first `order` values by lower order multistep solvers.
+                Given a fixed NFE == `steps`, the sampling procedure is:
+                    Denote K = steps.
+                    - If `order` == 1:
+                        - We use K steps of DPM-Solver-1 (i.e. DDIM).
+                    - If `order` == 2:
+                        - We firstly use 1 step of DPM-Solver-1, then use (K - 1) step of multistep DPM-Solver-2.
+                    - If `order` == 3:
+                        - We firstly use 1 step of DPM-Solver-1, then 1 step of multistep DPM-Solver-2, then (K - 2) step of multistep DPM-Solver-3.
+            - 'singlestep_fixed':
+                Fixed order singlestep DPM-Solver (i.e. DPM-Solver-1 or singlestep DPM-Solver-2 or singlestep DPM-Solver-3).
+                We use singlestep DPM-Solver-`order` for `order`=1 or 2 or 3, with total [`steps` // `order`] * `order` NFE.
+            - 'adaptive':
+                Adaptive step size DPM-Solver (i.e. "DPM-Solver-12" and "DPM-Solver-23" in the paper).
+                We ignore `steps` and use adaptive step size DPM-Solver with a higher order of `order`.
+                You can adjust the absolute tolerance `atol` and the relative tolerance `rtol` to balance the computatation costs
+                (NFE) and the sample quality.
+                    - If `order` == 2, we use DPM-Solver-12 which combines DPM-Solver-1 and singlestep DPM-Solver-2.
+                    - If `order` == 3, we use DPM-Solver-23 which combines singlestep DPM-Solver-2 and singlestep DPM-Solver-3.
+        =====================================================
+        Some advices for choosing the algorithm:
+            - For **unconditional sampling** or **guided sampling with small guidance scale** by DPMs:
+                Use singlestep DPM-Solver or DPM-Solver++ ("DPM-Solver-fast" in the paper) with `order = 3`.
+                e.g., DPM-Solver:
+                    >>> dpm_solver = DPM_Solver(model_fn, noise_schedule, algorithm_type="dpmsolver")
+                    >>> x_sample = dpm_solver.sample(x, steps=steps, t_start=t_start, t_end=t_end, order=3,
+                            skip_type='time_uniform', method='singlestep')
+                e.g., DPM-Solver++:
+                    >>> dpm_solver = DPM_Solver(model_fn, noise_schedule, algorithm_type="dpmsolver++")
+                    >>> x_sample = dpm_solver.sample(x, steps=steps, t_start=t_start, t_end=t_end, order=3,
+                            skip_type='time_uniform', method='singlestep')
+            - For **guided sampling with large guidance scale** by DPMs:
+                Use multistep DPM-Solver with `algorithm_type="dpmsolver++"` and `order = 2`.
+                e.g.
+                    >>> dpm_solver = DPM_Solver(model_fn, noise_schedule, algorithm_type="dpmsolver++")
+                    >>> x_sample = dpm_solver.sample(x, steps=steps, t_start=t_start, t_end=t_end, order=2,
+                            skip_type='time_uniform', method='multistep')
+        We support three types of `skip_type`:
+            - 'logSNR': uniform logSNR for the time steps. **Recommended for low-resolutional images**
+            - 'time_uniform': uniform time for the time steps. **Recommended for high-resolutional images**.
+            - 'time_quadratic': quadratic time for the time steps.
+        =====================================================
+        Args:
+            x: A pytorch tensor. The initial value at time `t_start`
+                e.g. if `t_start` == T, then `x` is a sample from the standard normal distribution.
+            steps: A `int`. The total number of function evaluations (NFE).
+            t_start: A `float`. The starting time of the sampling.
+                If `T` is None, we use self.noise_schedule.T (default is 1.0).
+            t_end: A `float`. The ending time of the sampling.
+                If `t_end` is None, we use 1. / self.noise_schedule.total_N.
+                e.g. if total_N == 1000, we have `t_end` == 1e-3.
+                For discrete-time DPMs:
+                    - We recommend `t_end` == 1. / self.noise_schedule.total_N.
+                For continuous-time DPMs:
+                    - We recommend `t_end` == 1e-3 when `steps` <= 15; and `t_end` == 1e-4 when `steps` > 15.
+            order: A `int`. The order of DPM-Solver.
+            skip_type: A `str`. The type for the spacing of the time steps. 'time_uniform' or 'logSNR' or 'time_quadratic'.
+            method: A `str`. The method for sampling. 'singlestep' or 'multistep' or 'singlestep_fixed' or 'adaptive'.
+            denoise_to_zero: A `bool`. Whether to denoise to time 0 at the final step.
+                Default is `False`. If `denoise_to_zero` is `True`, the total NFE is (`steps` + 1).
+                This trick is firstly proposed by DDPM (https://arxiv.org/abs/2006.11239) and
+                score_sde (https://arxiv.org/abs/2011.13456). Such trick can improve the FID
+                for diffusion models sampling by diffusion SDEs for low-resolutional images
+                (such as CIFAR-10). However, we observed that such trick does not matter for
+                high-resolutional images. As it needs an additional NFE, we do not recommend
+                it for high-resolutional images.
+            lower_order_final: A `bool`. Whether to use lower order solvers at the final steps.
+                Only valid for `method=multistep` and `steps < 15`. We empirically find that
+                this trick is a key to stabilizing the sampling by DPM-Solver with very few steps
+                (especially for steps <= 10). So we recommend to set it to be `True`.
+            solver_type: A `str`. The taylor expansion type for the solver. `dpmsolver` or `taylor`. We recommend `dpmsolver`.
+            atol: A `float`. The absolute tolerance of the adaptive step size solver. Valid when `method` == 'adaptive'.
+            rtol: A `float`. The relative tolerance of the adaptive step size solver. Valid when `method` == 'adaptive'.
+            return_intermediate: A `bool`. Whether to save the xt at each step.
+                When set to `True`, method returns a tuple (x0, intermediates); when set to False, method returns only x0.
+        Returns:
+            x_end: A pytorch tensor. The approximated solution at time `t_end`.
+        """
+        t_0 = 1.0 / self.noise_schedule.total_N if t_end is None else t_end
+        t_T = self.noise_schedule.T if t_start is None else t_start
+        assert (
+            t_0 > 0 and t_T > 0
+        ), "Time range needs to be greater than 0. For discrete-time DPMs, it needs to be in [1 / N, 1], where N is the length of betas array"
+        if return_intermediate:
+            assert method in [
+                "multistep",
+                "singlestep",
+                "singlestep_fixed",
+            ], "Cannot use adaptive solver when saving intermediate values"
+        if self.correcting_xt_fn is not None:
+            assert method in [
+                "multistep",
+                "singlestep",
+                "singlestep_fixed",
+            ], "Cannot use adaptive solver when correcting_xt_fn is not None"
+        device = x.device
+        intermediates = []
+        with torch.no_grad():
+            if method == "adaptive":
+                x = self.dpm_solver_adaptive(
+                    x, order=order, t_T=t_T, t_0=t_0, atol=atol, rtol=rtol, solver_type=solver_type
+                )
+            elif method == "multistep":
+                assert steps >= order
+                timesteps = self.get_time_steps(
+                    skip_type=skip_type, t_T=t_T, t_0=t_0, N=steps, device=device, shift=flow_shift
+                )
+                assert timesteps.shape[0] - 1 == steps
+                # Init the initial values.
+                step = 0
+                t = timesteps[step]
+                t_prev_list = [t]
+                model_prev_list = [self.model_fn(x, t)]
+                if self.correcting_xt_fn is not None:
+                    x = self.correcting_xt_fn(x, t, step)
+                if return_intermediate:
+                    intermediates.append(x)
+                self.update_progress(step + 1, len(timesteps))
+                # Init the first `order` values by lower order multistep DPM-Solver.
+                for step in range(1, order):
+                    t = timesteps[step]
+                    x = self.multistep_dpm_solver_update(
+                        x, model_prev_list, t_prev_list, t, step, solver_type=solver_type
+                    )
+                    if self.correcting_xt_fn is not None:
+                        x = self.correcting_xt_fn(x, t, step)
+                    if return_intermediate:
+                        intermediates.append(x)
+                    t_prev_list.append(t)
+                    model_prev_list.append(self.model_fn(x, t))
+                    # update progress bar
+                    self.update_progress(step + 1, len(timesteps))
+                # Compute the remaining values by `order`-th order multistep DPM-Solver.
+                for step in tqdm(range(order, steps + 1), disable=os.getenv("DPM_TQDM", "False") == "True"):
+                    t = timesteps[step]
+                    # We only use lower order for steps < 10
+                    # if lower_order_final and steps < 10:
+                    if lower_order_final:  # recommended by Shuchen Xue
+                        step_order = min(order, steps + 1 - step)
+                    else:
+                        step_order = order
+                    x = self.multistep_dpm_solver_update(
+                        x, model_prev_list, t_prev_list, t, step_order, solver_type=solver_type
+                    )
+                    if self.correcting_xt_fn is not None:
+                        x = self.correcting_xt_fn(x, t, step)
+                    if return_intermediate:
+                        intermediates.append(x)
+                    for i in range(order - 1):
+                        t_prev_list[i] = t_prev_list[i + 1]
+                        model_prev_list[i] = model_prev_list[i + 1]
+                    t_prev_list[-1] = t
+                    # We do not need to evaluate the final model value.
+                    if step < steps:
+                        model_prev_list[-1] = self.model_fn(x, t)
+                    # update progress bar
+                    self.update_progress(step + 1, len(timesteps))
+            elif method in ["singlestep", "singlestep_fixed"]:
+                if method == "singlestep":
+                    timesteps_outer, orders = self.get_orders_and_timesteps_for_singlestep_solver(
+                        steps=steps, order=order, skip_type=skip_type, t_T=t_T, t_0=t_0, device=device
+                    )
+                elif method == "singlestep_fixed":
+                    K = steps // order
+                    orders = [
+                        order,
+                    ] * K
+                    timesteps_outer = self.get_time_steps(skip_type=skip_type, t_T=t_T, t_0=t_0, N=K, device=device)
+                for step, order in enumerate(orders):
+                    s, t = timesteps_outer[step], timesteps_outer[step + 1]
+                    timesteps_inner = self.get_time_steps(
+                        skip_type=skip_type, t_T=s.item(), t_0=t.item(), N=order, device=device
+                    )
+                    lambda_inner = self.noise_schedule.marginal_lambda(timesteps_inner)
+                    h = lambda_inner[-1] - lambda_inner[0]
+                    r1 = None if order <= 1 else (lambda_inner[1] - lambda_inner[0]) / h
+                    r2 = None if order <= 2 else (lambda_inner[2] - lambda_inner[0]) / h
+                    x = self.singlestep_dpm_solver_update(x, s, t, order, solver_type=solver_type, r1=r1, r2=r2)
+                    if self.correcting_xt_fn is not None:
+                        x = self.correcting_xt_fn(x, t, step)
+                    if return_intermediate:
+                        intermediates.append(x)
+                    self.update_progress(step + 1, len(timesteps_outer))
+            else:
+                raise ValueError(f"Got wrong method {method}")
+            if denoise_to_zero:
+                t = torch.ones((1,)).to(device) * t_0
+                x = self.denoise_to_zero_fn(x, t)
+                if self.correcting_xt_fn is not None:
+                    x = self.correcting_xt_fn(x, t, step + 1)
+                if return_intermediate:
+                    intermediates.append(x)
+        if return_intermediate:
+            return x, intermediates
+        else:
+            return x
+#############################################################
+# other utility functions
+#############################################################
+def interpolate_fn(x, xp, yp):
+    """
+    A piecewise linear function y = f(x), using xp and yp as keypoints.
+    We implement f(x) in a differentiable way (i.e. applicable for autograd).
+    The function f(x) is well-defined for all x-axis. (For x beyond the bounds of xp, we use the outmost points of xp to define the linear function.)
+    Args:
+        x: PyTorch tensor with shape [N, C], where N is the batch size, C is the number of channels (we use C = 1 for DPM-Solver).
+        xp: PyTorch tensor with shape [C, K], where K is the number of keypoints.
+        yp: PyTorch tensor with shape [C, K].
+    Returns:
+        The function values f(x), with shape [N, C].
+    """
+    N, K = x.shape[0], xp.shape[1]
+    all_x = torch.cat([x.unsqueeze(2), xp.unsqueeze(0).repeat((N, 1, 1))], dim=2)
+    sorted_all_x, x_indices = torch.sort(all_x, dim=2)
+    x_idx = torch.argmin(x_indices, dim=2)
+    cand_start_idx = x_idx - 1
+    start_idx = torch.where(
+        torch.eq(x_idx, 0),
+        torch.tensor(1, device=x.device),
+        torch.where(
+            torch.eq(x_idx, K),
+            torch.tensor(K - 2, device=x.device),
+            cand_start_idx,
+        ),
+    )
+    end_idx = torch.where(torch.eq(start_idx, cand_start_idx), start_idx + 2, start_idx + 1)
+    start_x = torch.gather(sorted_all_x, dim=2, index=start_idx.unsqueeze(2)).squeeze(2)
+    end_x = torch.gather(sorted_all_x, dim=2, index=end_idx.unsqueeze(2)).squeeze(2)
+    start_idx2 = torch.where(
+        torch.eq(x_idx, 0),
+        torch.tensor(0, device=x.device),
+        torch.where(
+            torch.eq(x_idx, K),
+            torch.tensor(K - 2, device=x.device),
+            cand_start_idx,
+        ),
+    )
+    y_positions_expanded = yp.unsqueeze(0).expand(N, -1, -1)
+    start_y = torch.gather(y_positions_expanded, dim=2, index=start_idx2.unsqueeze(2)).squeeze(2)
+    end_y = torch.gather(y_positions_expanded, dim=2, index=(start_idx2 + 1).unsqueeze(2)).squeeze(2)
+    cand = start_y + (x - start_x) * (end_y - start_y) / (end_x - start_x)
+    return cand
+def expand_dims(v, dims):
+    """
+    Expand the tensor `v` to the dim `dims`.
+    Args:
+        `v`: a PyTorch tensor with shape [N].
+        `dim`: a `int`.
+    Returns:
+        a PyTorch tensor with shape [N, 1, 1, ..., 1] and the total dimension is `dims`.
+    """
+    return v[(...,) + (None,) * (dims - 1)]

transport/integrators.py ADDED Viewed

	@@ -0,0 +1,122 @@

+import torch as th
+from torchdiffeq import odeint
+from .utils import time_shift, get_lin_function
+class sde:
+    """SDE solver class"""
+    def __init__(
+        self,
+        drift,
+        diffusion,
+        *,
+        t0,
+        t1,
+        num_steps,
+        sampler_type,
+    ):
+        assert t0 < t1, "SDE sampler has to be in forward time"
+        self.num_timesteps = num_steps
+        self.t = th.linspace(t0, t1, num_steps)
+        self.dt = self.t[1] - self.t[0]
+        self.drift = drift
+        self.diffusion = diffusion
+        self.sampler_type = sampler_type
+    def __Euler_Maruyama_step(self, x, mean_x, t, model, **model_kwargs):
+        w_cur = th.randn(x.size()).to(x)
+        t = th.ones(x.size(0)).to(x) * t
+        dw = w_cur * th.sqrt(self.dt)
+        drift = self.drift(x, t, model, **model_kwargs)
+        diffusion = self.diffusion(x, t)
+        mean_x = x + drift * self.dt
+        x = mean_x + th.sqrt(2 * diffusion) * dw
+        return x, mean_x
+    def __Heun_step(self, x, _, t, model, **model_kwargs):
+        w_cur = th.randn(x.size()).to(x)
+        dw = w_cur * th.sqrt(self.dt)
+        t_cur = th.ones(x.size(0)).to(x) * t
+        diffusion = self.diffusion(x, t_cur)
+        xhat = x + th.sqrt(2 * diffusion) * dw
+        K1 = self.drift(xhat, t_cur, model, **model_kwargs)
+        xp = xhat + self.dt * K1
+        K2 = self.drift(xp, t_cur + self.dt, model, **model_kwargs)
+        return (
+            xhat + 0.5 * self.dt * (K1 + K2),
+            xhat,
+        )  # at last time point we do not perform the heun step
+    def __forward_fn(self):
+        """TODO: generalize here by adding all private functions ending with steps to it"""
+        sampler_dict = {
+            "Euler": self.__Euler_Maruyama_step,
+            "Heun": self.__Heun_step,
+        }
+        try:
+            sampler = sampler_dict[self.sampler_type]
+        except:
+            raise NotImplementedError("Smapler type not implemented.")
+        return sampler
+    def sample(self, init, model, **model_kwargs):
+        """forward loop of sde"""
+        x = init
+        mean_x = init
+        samples = []
+        sampler = self.__forward_fn()
+        for ti in self.t[:-1]:
+            with th.no_grad():
+                x, mean_x = sampler(x, mean_x, ti, model, **model_kwargs)
+                samples.append(x)
+        return samples
+class ode:
+    """ODE solver class"""
+    def __init__(
+        self,
+        drift,
+        *,
+        t0,
+        t1,
+        sampler_type,
+        num_steps,
+        atol,
+        rtol,
+        do_shift=False,
+        time_shifting_factor=None,
+    ):
+        assert t0 < t1, "ODE sampler has to be in forward time"
+        self.drift = drift
+        self.do_shift = do_shift
+        self.t = th.linspace(t0, t1, num_steps)
+        if time_shifting_factor:
+            self.t = self.t / (self.t + time_shifting_factor - time_shifting_factor * self.t)
+        self.atol = atol
+        self.rtol = rtol
+        self.sampler_type = sampler_type
+    def sample(self, x, model, **model_kwargs):
+        x = x.float()
+        device = x[0].device if isinstance(x, tuple) else x.device
+        def _fn(t, x):
+            t = th.ones(x[0].size(0)).to(device) * t if isinstance(x, tuple) else th.ones(x.size(0)).to(device) * t
+            model_output = self.drift(x, t, model, **model_kwargs).float()
+            return model_output
+        t = self.t.to(device)
+        if self.do_shift:
+            mu = get_lin_function(y1=0.5, y2=1.15)(x.shape[1])
+            t = time_shift(mu, 1.0, t)
+        atol = [self.atol] * len(x) if isinstance(x, tuple) else [self.atol]
+        rtol = [self.rtol] * len(x) if isinstance(x, tuple) else [self.rtol]
+        samples = odeint(_fn, x, t, method=self.sampler_type, atol=atol, rtol=rtol)
+        return samples

transport/path.py ADDED Viewed

	@@ -0,0 +1,201 @@

+import numpy as np
+import torch as th
+def expand_t_like_x(t, x):
+    """Function to reshape time t to broadcastable dimension of x
+    Args:
+      t: [batch_dim,], time vector
+      x: [batch_dim,...], data point
+    """
+    dims = [1] * len(x[0].size())
+    t = t.view(t.size(0), *dims)
+    return t
+#################### Coupling Plans ####################
+class ICPlan:
+    """Linear Coupling Plan"""
+    def __init__(self, sigma=0.0):
+        self.sigma = sigma
+    def compute_alpha_t(self, t):
+        """Compute the data coefficient along the path"""
+        return t, 1
+    def compute_sigma_t(self, t):
+        """Compute the noise coefficient along the path"""
+        return 1 - t, -1
+    def compute_d_alpha_alpha_ratio_t(self, t):
+        """Compute the ratio between d_alpha and alpha"""
+        return 1 / t
+    def compute_drift(self, x, t):
+        """We always output sde according to score parametrization;"""
+        t = expand_t_like_x(t, x)
+        alpha_ratio = self.compute_d_alpha_alpha_ratio_t(t)
+        sigma_t, d_sigma_t = self.compute_sigma_t(t)
+        drift = alpha_ratio * x
+        diffusion = alpha_ratio * (sigma_t**2) - sigma_t * d_sigma_t
+        return -drift, diffusion
+    def compute_diffusion(self, x, t, form="constant", norm=1.0):
+        """Compute the diffusion term of the SDE
+        Args:
+          x: [batch_dim, ...], data point
+          t: [batch_dim,], time vector
+          form: str, form of the diffusion term
+          norm: float, norm of the diffusion term
+        """
+        t = expand_t_like_x(t, x)
+        choices = {
+            "constant": norm,
+            "SBDM": norm * self.compute_drift(x, t)[1],
+            "sigma": norm * self.compute_sigma_t(t)[0],
+            "linear": norm * (1 - t),
+            "decreasing": 0.25 * (norm * th.cos(np.pi * t) + 1) ** 2,
+            "inccreasing-decreasing": norm * th.sin(np.pi * t) ** 2,
+        }
+        try:
+            diffusion = choices[form]
+        except KeyError:
+            raise NotImplementedError(f"Diffusion form {form} not implemented")
+        return diffusion
+    def get_score_from_velocity(self, velocity, x, t):
+        """Wrapper function: transfrom velocity prediction model to score
+        Args:
+            velocity: [batch_dim, ...] shaped tensor; velocity model output
+            x: [batch_dim, ...] shaped tensor; x_t data point
+            t: [batch_dim,] time tensor
+        """
+        t = expand_t_like_x(t, x)
+        alpha_t, d_alpha_t = self.compute_alpha_t(t)
+        sigma_t, d_sigma_t = self.compute_sigma_t(t)
+        mean = x
+        reverse_alpha_ratio = alpha_t / d_alpha_t
+        var = sigma_t**2 - reverse_alpha_ratio * d_sigma_t * sigma_t
+        score = (reverse_alpha_ratio * velocity - mean) / var
+        return score
+    def get_noise_from_velocity(self, velocity, x, t):
+        """Wrapper function: transfrom velocity prediction model to denoiser
+        Args:
+            velocity: [batch_dim, ...] shaped tensor; velocity model output
+            x: [batch_dim, ...] shaped tensor; x_t data point
+            t: [batch_dim,] time tensor
+        """
+        t = expand_t_like_x(t, x)
+        alpha_t, d_alpha_t = self.compute_alpha_t(t)
+        sigma_t, d_sigma_t = self.compute_sigma_t(t)
+        mean = x
+        reverse_alpha_ratio = alpha_t / d_alpha_t
+        var = reverse_alpha_ratio * d_sigma_t - sigma_t
+        noise = (reverse_alpha_ratio * velocity - mean) / var
+        return noise
+    def get_velocity_from_score(self, score, x, t):
+        """Wrapper function: transfrom score prediction model to velocity
+        Args:
+            score: [batch_dim, ...] shaped tensor; score model output
+            x: [batch_dim, ...] shaped tensor; x_t data point
+            t: [batch_dim,] time tensor
+        """
+        t = expand_t_like_x(t, x)
+        drift, var = self.compute_drift(x, t)
+        velocity = var * score - drift
+        return velocity
+    def compute_mu_t(self, t, x0, x1):
+        """Compute the mean of time-dependent density p_t"""
+        t = expand_t_like_x(t, x1)
+        alpha_t, _ = self.compute_alpha_t(t)
+        sigma_t, _ = self.compute_sigma_t(t)
+        if isinstance(x1, (list, tuple)):
+            return [alpha_t[i] * x1[i] + sigma_t[i] * x0[i] for i in range(len(x1))]
+        else:
+            return alpha_t * x1 + sigma_t * x0
+    def compute_xt(self, t, x0, x1):
+        """Sample xt from time-dependent density p_t; rng is required"""
+        xt = self.compute_mu_t(t, x0, x1)
+        return xt
+    def compute_ut(self, t, x0, x1, xt):
+        """Compute the vector field corresponding to p_t"""
+        t = expand_t_like_x(t, x1)
+        _, d_alpha_t = self.compute_alpha_t(t)
+        _, d_sigma_t = self.compute_sigma_t(t)
+        if isinstance(x1, (list, tuple)):
+            return [d_alpha_t * x1[i] + d_sigma_t * x0[i] for i in range(len(x1))]
+        else:
+            return d_alpha_t * x1 + d_sigma_t * x0
+    def plan(self, t, x0, x1):
+        xt = self.compute_xt(t, x0, x1)
+        ut = self.compute_ut(t, x0, x1, xt)
+        return t, xt, ut
+class VPCPlan(ICPlan):
+    """class for VP path flow matching"""
+    def __init__(self, sigma_min=0.1, sigma_max=20.0):
+        self.sigma_min = sigma_min
+        self.sigma_max = sigma_max
+        self.log_mean_coeff = (
+            lambda t: -0.25 * ((1 - t) ** 2) * (self.sigma_max - self.sigma_min) - 0.5 * (1 - t) * self.sigma_min
+        )
+        self.d_log_mean_coeff = lambda t: 0.5 * (1 - t) * (self.sigma_max - self.sigma_min) + 0.5 * self.sigma_min
+    def compute_alpha_t(self, t):
+        """Compute coefficient of x1"""
+        alpha_t = self.log_mean_coeff(t)
+        alpha_t = th.exp(alpha_t)
+        d_alpha_t = alpha_t * self.d_log_mean_coeff(t)
+        return alpha_t, d_alpha_t
+    def compute_sigma_t(self, t):
+        """Compute coefficient of x0"""
+        p_sigma_t = 2 * self.log_mean_coeff(t)
+        sigma_t = th.sqrt(1 - th.exp(p_sigma_t))
+        d_sigma_t = th.exp(p_sigma_t) * (2 * self.d_log_mean_coeff(t)) / (-2 * sigma_t)
+        return sigma_t, d_sigma_t
+    def compute_d_alpha_alpha_ratio_t(self, t):
+        """Special purposed function for computing numerical stabled d_alpha_t / alpha_t"""
+        return self.d_log_mean_coeff(t)
+    def compute_drift(self, x, t):
+        """Compute the drift term of the SDE"""
+        t = expand_t_like_x(t, x)
+        beta_t = self.sigma_min + (1 - t) * (self.sigma_max - self.sigma_min)
+        return -0.5 * beta_t * x, beta_t / 2
+class GVPCPlan(ICPlan):
+    def __init__(self, sigma=0.0):
+        super().__init__(sigma)
+    def compute_alpha_t(self, t):
+        """Compute coefficient of x1"""
+        alpha_t = th.sin(t * np.pi / 2)
+        d_alpha_t = np.pi / 2 * th.cos(t * np.pi / 2)
+        return alpha_t, d_alpha_t
+    def compute_sigma_t(self, t):
+        """Compute coefficient of x0"""
+        sigma_t = th.cos(t * np.pi / 2)
+        d_sigma_t = -np.pi / 2 * th.sin(t * np.pi / 2)
+        return sigma_t, d_sigma_t
+    def compute_d_alpha_alpha_ratio_t(self, t):
+        """Special purposed function for computing numerical stabled d_alpha_t / alpha_t"""
+        return np.pi / (2 * th.tan(t * np.pi / 2))

transport/transport.py ADDED Viewed

	@@ -0,0 +1,490 @@

+import enum
+import math
+from typing import Callable
+import numpy as np
+import torch as th
+from . import path
+from .integrators import ode, sde
+from .utils import mean_flat, expand_dims
+from .dpm_solver import NoiseScheduleFlow, model_wrapper, DPM_Solver
+class ModelType(enum.Enum):
+    """
+    Which type of output the model predicts.
+    """
+    NOISE = enum.auto()  # the model predicts epsilon
+    SCORE = enum.auto()  # the model predicts \nabla \log p(x)
+    VELOCITY = enum.auto()  # the model predicts v(x)
+class PathType(enum.Enum):
+    """
+    Which type of path to use.
+    """
+    LINEAR = enum.auto()
+    GVP = enum.auto()
+    VP = enum.auto()
+class WeightType(enum.Enum):
+    """
+    Which type of weighting to use.
+    """
+    NONE = enum.auto()
+    VELOCITY = enum.auto()
+    LIKELIHOOD = enum.auto()
+class Transport:
+    def __init__(self, *, model_type, path_type, loss_type, train_eps, sample_eps, snr_type, do_shift, seq_len):
+        path_options = {
+            PathType.LINEAR: path.ICPlan,
+            PathType.GVP: path.GVPCPlan,
+            PathType.VP: path.VPCPlan,
+        }
+        self.loss_type = loss_type
+        self.model_type = model_type
+        self.path_sampler = path_options[path_type]()
+        self.train_eps = train_eps
+        self.sample_eps = sample_eps
+        self.snr_type = snr_type
+        self.do_shift = do_shift
+        self.seq_len = seq_len
+    def prior_logp(self, z):
+        """
+        Standard multivariate normal prior
+        Assume z is batched
+        """
+        shape = th.tensor(z.size())
+        N = th.prod(shape[1:])
+        _fn = lambda x: -N / 2.0 * np.log(2 * np.pi) - th.sum(x**2) / 2.0
+        return th.vmap(_fn)(z)
+    def check_interval(
+        self,
+        train_eps,
+        sample_eps,
+        *,
+        diffusion_form="SBDM",
+        sde=False,
+        reverse=False,
+        eval=False,
+        last_step_size=0.0,
+    ):
+        t0 = 0
+        t1 = 1
+        eps = train_eps if not eval else sample_eps
+        if type(self.path_sampler) in [path.VPCPlan]:
+            t1 = 1 - eps if (not sde or last_step_size == 0) else 1 - last_step_size
+        elif (type(self.path_sampler) in [path.ICPlan, path.GVPCPlan]) and (
+            self.model_type != ModelType.VELOCITY or sde
+        ):  # avoid numerical issue by taking a first semi-implicit step
+            t0 = eps if (diffusion_form == "SBDM" and sde) or self.model_type != ModelType.VELOCITY else 0
+            t1 = 1 - eps if (not sde or last_step_size == 0) else 1 - last_step_size
+        if reverse:
+            t0, t1 = 1 - t0, 1 - t1
+        return t0, t1
+    def sample(self, x1):
+        """Sampling x0 & t based on shape of x1 (if needed)
+        Args:
+          x1 - data point; [batch, *dim]
+        """
+        if isinstance(x1, (list, tuple)):
+            x0 = [th.randn_like(img_start) for img_start in x1]
+        else:
+            x0 = th.randn_like(x1)
+        t0, t1 = self.check_interval(self.train_eps, self.sample_eps)
+        if self.snr_type.startswith("uniform"):
+            assert t0 == 0.0 and t1 == 1.0, "not implemented."
+            if "_" in self.snr_type:
+                _, t0, t1 = self.snr_type.split("_")
+                t0, t1 = float(t0), float(t1)
+            t = th.rand((len(x1),)) * (t1 - t0) + t0
+        elif self.snr_type == "lognorm":
+            u = th.normal(mean=0.0, std=1.0, size=(len(x1),))
+            t = 1 / (1 + th.exp(-u)) * (t1 - t0) + t0
+        else:
+            raise NotImplementedError("Not implemented snr_type %s" % self.snr_type)
+        if self.do_shift:
+            base_shift: float = 0.5
+            max_shift: float = 1.15
+            mu = self.get_lin_function(y1=base_shift, y2=max_shift)(self.seq_len)
+            t = self.time_shift(mu, 1.0, t)
+        t = t.to(x1[0])
+        return t, x0, x1
+    def time_shift(self, mu: float, sigma: float, t: th.Tensor):
+        # the following implementation was original for t=0: clean / t=1: noise
+        # Since we adopt the reverse, the 1-t operations are needed
+        t = 1 - t
+        t = math.exp(mu) / (math.exp(mu) + (1 / t - 1) ** sigma)
+        t = 1 - t
+        return t
+    def get_lin_function(
+        self, x1: float = 256, y1: float = 0.5, x2: float = 4096, y2: float = 1.15
+    ) -> Callable[[float], float]:
+        m = (y2 - y1) / (x2 - x1)
+        b = y1 - m * x1
+        return lambda x: m * x + b
+    def training_losses(self, model, x1, model_kwargs=None):
+        """Loss for training the score model
+        Args:
+        - model: backbone model; could be score, noise, or velocity
+        - x1: datapoint
+        - model_kwargs: additional arguments for the model
+        """
+        if model_kwargs == None:
+            model_kwargs = {}
+        t, x0, x1 = self.sample(x1)
+        t, xt, ut = self.path_sampler.plan(t, x0, x1)
+        if "cond" in model_kwargs:
+            conds = model_kwargs.pop("cond")
+            xt = [th.cat([x, cond], dim=0) if cond is not None else x for x, cond in zip(xt, conds)]
+        model_output = model(xt, t, **model_kwargs)
+        B = len(x0)
+        terms = {}
+        # terms['pred'] = model_output
+        if self.model_type == ModelType.VELOCITY:
+            if isinstance(x1, (list, tuple)):
+                assert len(model_output) == len(ut) == len(x1)
+                for i in range(B):
+                    assert (
+                        model_output[i].shape == ut[i].shape == x1[i].shape
+                    ), f"{model_output[i].shape} {ut[i].shape} {x1[i].shape}"
+                terms["task_loss"] = th.stack(
+                    [((ut[i] - model_output[i]) ** 2).mean() for i in range(B)],
+                    dim=0,
+                )
+            else:
+                terms["task_loss"] = mean_flat(((model_output - ut) ** 2))
+        else:
+            raise NotImplementedError
+        terms["loss"] = terms["task_loss"]
+        terms["task_loss"] = terms["task_loss"].clone().detach()
+        terms["t"] = t
+        return terms
+    def get_drift(self):
+        """member function for obtaining the drift of the probability flow ODE"""
+        def score_ode(x, t, model, **model_kwargs):
+            drift_mean, drift_var = self.path_sampler.compute_drift(x, t)
+            model_output = model(x, t, **model_kwargs)
+            return -drift_mean + drift_var * model_output  # by change of variable
+        def noise_ode(x, t, model, **model_kwargs):
+            drift_mean, drift_var = self.path_sampler.compute_drift(x, t)
+            sigma_t, _ = self.path_sampler.compute_sigma_t(path.expand_t_like_x(t, x))
+            model_output = model(x, t, **model_kwargs)
+            score = model_output / -sigma_t
+            return -drift_mean + drift_var * score
+        def velocity_ode(x, t, model, **model_kwargs):
+            model_output = model(x, t, **model_kwargs)
+            return model_output
+        if self.model_type == ModelType.NOISE:
+            drift_fn = noise_ode
+        elif self.model_type == ModelType.SCORE:
+            drift_fn = score_ode
+        else:
+            drift_fn = velocity_ode
+        def body_fn(x, t, model, **model_kwargs):
+            model_output = drift_fn(x, t, model, **model_kwargs)
+            assert model_output.shape == x.shape, "Output shape from ODE solver must match input shape"
+            return model_output
+        return body_fn
+    def get_score(
+        self,
+    ):
+        """member function for obtaining score of
+        x_t = alpha_t * x + sigma_t * eps"""
+        if self.model_type == ModelType.NOISE:
+            score_fn = (
+                lambda x, t, model, **kwargs: model(x, t, **kwargs)
+                / -self.path_sampler.compute_sigma_t(path.expand_t_like_x(t, x))[0]
+            )
+        elif self.model_type == ModelType.SCORE:
+            score_fn = lambda x, t, model, **kwagrs: model(x, t, **kwagrs)
+        elif self.model_type == ModelType.VELOCITY:
+            score_fn = lambda x, t, model, **kwargs: self.path_sampler.get_score_from_velocity(
+                model(x, t, **kwargs), x, t
+            )
+        else:
+            raise NotImplementedError()
+        return score_fn
+class Sampler:
+    """Sampler class for the transport model"""
+    def __init__(
+        self,
+        transport,
+    ):
+        """Constructor for a general sampler; supporting different sampling methods
+        Args:
+        - transport: an tranport object specify model prediction & interpolant type
+        """
+        self.transport = transport
+        self.drift = self.transport.get_drift()
+        self.score = self.transport.get_score()
+    def __get_sde_diffusion_and_drift(
+        self,
+        *,
+        diffusion_form="SBDM",
+        diffusion_norm=1.0,
+    ):
+        def diffusion_fn(x, t):
+            diffusion = self.transport.path_sampler.compute_diffusion(x, t, form=diffusion_form, norm=diffusion_norm)
+            return diffusion
+        sde_drift = lambda x, t, model, **kwargs: self.drift(x, t, model, **kwargs) + diffusion_fn(x, t) * self.score(
+            x, t, model, **kwargs
+        )
+        sde_diffusion = diffusion_fn
+        return sde_drift, sde_diffusion
+    def __get_last_step(
+        self,
+        sde_drift,
+        *,
+        last_step,
+        last_step_size,
+    ):
+        """Get the last step function of the SDE solver"""
+        if last_step is None:
+            last_step_fn = lambda x, t, model, **model_kwargs: x
+        elif last_step == "Mean":
+            last_step_fn = (
+                lambda x, t, model, **model_kwargs: x + sde_drift(x, t, model, **model_kwargs) * last_step_size
+            )
+        elif last_step == "Tweedie":
+            alpha = self.transport.path_sampler.compute_alpha_t  # simple aliasing; the original name was too long
+            sigma = self.transport.path_sampler.compute_sigma_t
+            last_step_fn = lambda x, t, model, **model_kwargs: x / alpha(t)[0][0] + (sigma(t)[0][0] ** 2) / alpha(t)[0][
+                0
+            ] * self.score(x, t, model, **model_kwargs)
+        elif last_step == "Euler":
+            last_step_fn = (
+                lambda x, t, model, **model_kwargs: x + self.drift(x, t, model, **model_kwargs) * last_step_size
+            )
+        else:
+            raise NotImplementedError()
+        return last_step_fn
+    def sample_sde(
+        self,
+        *,
+        sampling_method="Euler",
+        diffusion_form="SBDM",
+        diffusion_norm=1.0,
+        last_step="Mean",
+        last_step_size=0.04,
+        num_steps=250,
+    ):
+        """returns a sampling function with given SDE settings
+        Args:
+        - sampling_method: type of sampler used in solving the SDE; default to be Euler-Maruyama
+        - diffusion_form: function form of diffusion coefficient; default to be matching SBDM
+        - diffusion_norm: function magnitude of diffusion coefficient; default to 1
+        - last_step: type of the last step; default to identity
+        - last_step_size: size of the last step; default to match the stride of 250 steps over [0,1]
+        - num_steps: total integration step of SDE
+        """
+        if last_step is None:
+            last_step_size = 0.0
+        sde_drift, sde_diffusion = self.__get_sde_diffusion_and_drift(
+            diffusion_form=diffusion_form,
+            diffusion_norm=diffusion_norm,
+        )
+        t0, t1 = self.transport.check_interval(
+            self.transport.train_eps,
+            self.transport.sample_eps,
+            diffusion_form=diffusion_form,
+            sde=True,
+            eval=True,
+            reverse=False,
+            last_step_size=last_step_size,
+        )
+        _sde = sde(
+            sde_drift,
+            sde_diffusion,
+            t0=t0,
+            t1=t1,
+            num_steps=num_steps,
+            sampler_type=sampling_method,
+        )
+        last_step_fn = self.__get_last_step(sde_drift, last_step=last_step, last_step_size=last_step_size)
+        def _sample(init, model, **model_kwargs):
+            xs = _sde.sample(init, model, **model_kwargs)
+            ts = th.ones(init.size(0), device=init.device) * t1
+            x = last_step_fn(xs[-1], ts, model, **model_kwargs)
+            xs.append(x)
+            assert len(xs) == num_steps, "Samples does not match the number of steps"
+            return xs
+        return _sample
+    def sample_dpm(
+        self,
+        model,
+        model_kwargs=None,
+    ):
+        noise_schedule = NoiseScheduleFlow(schedule="discrete_flow")
+        def noise_pred_fn(x, t_continuous):
+            output = model(x, 1 - t_continuous, **model_kwargs)
+            _, sigma_t = noise_schedule.marginal_alpha(t_continuous), noise_schedule.marginal_std(t_continuous)
+            try:
+                noise = x - (1 - expand_dims(sigma_t, x.dim()).to(x)) * output
+            except:
+                noise = x - (1 - expand_dims(sigma_t, x.dim()).to(x)) * output[0]
+            return noise
+        return DPM_Solver(noise_pred_fn, noise_schedule, algorithm_type="dpmsolver++").sample
+    def sample_ode(
+        self,
+        *,
+        sampling_method="dopri5",
+        num_steps=50,
+        atol=1e-6,
+        rtol=1e-3,
+        reverse=False,
+        do_shift=False,
+        time_shifting_factor=None,
+    ):
+        """returns a sampling function with given ODE settings
+        Args:
+        - sampling_method: type of sampler used in solving the ODE; default to be Dopri5
+        - num_steps:
+            - fixed solver (Euler, Heun): the actual number of integration steps performed
+            - adaptive solver (Dopri5): the number of datapoints saved during integration; produced by interpolation
+        - atol: absolute error tolerance for the solver
+        - rtol: relative error tolerance for the solver
+        """
+        # for flux
+        drift = lambda x, t, model, **kwargs: self.drift(x, t, model, **kwargs)
+        t0, t1 = self.transport.check_interval(
+            self.transport.train_eps,
+            self.transport.sample_eps,
+            sde=False,
+            eval=True,
+            reverse=reverse,
+            last_step_size=0.0,
+        )
+        _ode = ode(
+            drift=drift,
+            t0=t0,
+            t1=t1,
+            sampler_type=sampling_method,
+            num_steps=num_steps,
+            atol=atol,
+            rtol=rtol,
+            do_shift=do_shift,
+            time_shifting_factor=time_shifting_factor,
+        )
+        return _ode.sample
+    def sample_ode_likelihood(
+        self,
+        *,
+        sampling_method="dopri5",
+        num_steps=50,
+        atol=1e-6,
+        rtol=1e-3,
+    ):
+        """returns a sampling function for calculating likelihood with given ODE settings
+        Args:
+        - sampling_method: type of sampler used in solving the ODE; default to be Dopri5
+        - num_steps:
+            - fixed solver (Euler, Heun): the actual number of integration steps performed
+            - adaptive solver (Dopri5): the number of datapoints saved during integration; produced by interpolation
+        - atol: absolute error tolerance for the solver
+        - rtol: relative error tolerance for the solver
+        """
+        def _likelihood_drift(x, t, model, **model_kwargs):
+            x, _ = x
+            eps = th.randint(2, x.size(), dtype=th.float, device=x.device) * 2 - 1
+            t = th.ones_like(t) * (1 - t)
+            with th.enable_grad():
+                x.requires_grad = True
+                grad = th.autograd.grad(th.sum(self.drift(x, t, model, **model_kwargs) * eps), x)[0]
+                logp_grad = th.sum(grad * eps, dim=tuple(range(1, len(x.size()))))
+                drift = self.drift(x, t, model, **model_kwargs)
+            return (-drift, logp_grad)
+        t0, t1 = self.transport.check_interval(
+            self.transport.train_eps,
+            self.transport.sample_eps,
+            sde=False,
+            eval=True,
+            reverse=False,
+            last_step_size=0.0,
+        )
+        _ode = ode(
+            drift=_likelihood_drift,
+            t0=t0,
+            t1=t1,
+            sampler_type=sampling_method,
+            num_steps=num_steps,
+            atol=atol,
+            rtol=rtol,
+        )
+        def _sample_fn(x, model, **model_kwargs):
+            init_logp = th.zeros(x.size(0)).to(x)
+            input = (x, init_logp)
+            drift, delta_logp = _ode.sample(input, model, **model_kwargs)
+            drift, delta_logp = drift[-1], delta_logp[-1]
+            prior_logp = self.transport.prior_logp(drift)
+            logp = prior_logp - delta_logp
+            return logp, drift
+        return _sample_fn

transport/utils.py ADDED Viewed

	@@ -0,0 +1,56 @@

+import torch as th
+import math
+class EasyDict:
+    def __init__(self, sub_dict):
+        for k, v in sub_dict.items():
+            setattr(self, k, v)
+    def __getitem__(self, key):
+        return getattr(self, key)
+def mean_flat(x):
+    """
+    Take the mean over all non-batch dimensions.
+    """
+    return th.mean(x, dim=list(range(1, len(x.size()))))
+def log_state(state):
+    result = []
+    sorted_state = dict(sorted(state.items()))
+    for key, value in sorted_state.items():
+        # Check if the value is an instance of a class
+        if "<object" in str(value) or "object at" in str(value):
+            result.append(f"{key}: [{value.__class__.__name__}]")
+        else:
+            result.append(f"{key}: {value}")
+    return "\n".join(result)
+def time_shift(mu: float, sigma: float, t: th.Tensor):
+    # the following implementation was original for t=0: clean / t=1: noise
+    # Since we adopt the reverse, the 1-t operations are needed
+    t = 1 - t
+    t = math.exp(mu) / (math.exp(mu) + (1 / t - 1) ** sigma)
+    t = 1 - t
+    return t
+def get_lin_function(x1: float = 256, y1: float = 0.5, x2: float = 4096, y2: float = 1.15):
+    m = (y2 - y1) / (x2 - x1)
+    b = y1 - m * x1
+    return lambda x: m * x + b
+def expand_dims(v, dims):
+    """
+    Expand the tensor `v` to the dim `dims`.
+    Args:
+        `v`: a PyTorch tensor with shape [N].
+        `dim`: a `int`.
+    Returns:
+        a PyTorch tensor with shape [N, 1, 1, ..., 1] and the total dimension is `dims`.
+    """
+    return v[(...,) + (None,) * (dims - 1)]

util/misc.py ADDED Viewed

	@@ -0,0 +1,150 @@

+from collections import defaultdict, deque
+import datetime
+import logging
+import random
+import time
+import numpy as np
+import torch
+import torch.distributed as dist
+logger = logging.getLogger(__name__)
+def random_seed(seed=0):
+    random.seed(seed)
+    torch.random.manual_seed(seed)
+    np.random.seed(seed)
+class SmoothedValue(object):
+    """Track a series of values and provide access to smoothed values over a
+    window or the global series average.
+    """
+    def __init__(self, window_size=1000, fmt=None):
+        if fmt is None:
+            fmt = "{avg:.4f} ({global_avg:.4f})"
+        self.deque = deque(maxlen=window_size)
+        self.total = 0.0
+        self.count = 0
+        self.fmt = fmt
+    def update(self, value, n=1):
+        self.deque.append(value)
+        self.count += n
+        self.total += value * n
+    def synchronize_between_processes(self):
+        """
+        Warning: does not synchronize the deque!
+        """
+        t = torch.tensor([self.count, self.total], dtype=torch.float64, device="cuda")
+        dist.barrier()
+        dist.all_reduce(t)
+        t = t.tolist()
+        self.count = int(t[0])
+        self.total = t[1]
+    @property
+    def median(self):
+        d = torch.tensor(list(self.deque))
+        return d.median().item()
+    @property
+    def avg(self):
+        d = torch.tensor(list(self.deque), dtype=torch.float32)
+        return d.mean().item()
+    @property
+    def global_avg(self):
+        return self.total / self.count
+    @property
+    def max(self):
+        return max(self.deque)
+    @property
+    def value(self):
+        return self.deque[-1]
+    def __str__(self):
+        return self.fmt.format(
+            median=self.median, avg=self.avg, global_avg=self.global_avg, max=self.max, value=self.value
+        )
+class MetricLogger(object):
+    def __init__(self, delimiter="\t", window_size=1000, fmt=None):
+        self.meters = defaultdict(lambda: SmoothedValue(window_size, fmt))
+        self.delimiter = delimiter
+    def update(self, **kwargs):
+        for k, v in kwargs.items():
+            if v is None:
+                continue
+            elif isinstance(v, (torch.Tensor, float, int)):
+                self.meters[k].update(v.item() if isinstance(v, torch.Tensor) else v)
+            elif isinstance(v, list):
+                for i, sub_v in enumerate(v):
+                    self.meters[f"{k}_{i}"].update(sub_v.item() if isinstance(sub_v, torch.Tensor) else sub_v)
+            elif isinstance(v, dict):
+                for sub_key, sub_v in v.items():
+                    self.meters[f"{k}_{sub_key}"].update(sub_v.item() if isinstance(sub_v, torch.Tensor) else sub_v)
+            else:
+                raise TypeError(f"Unsupported type {type(v)} for metric {k}")
+    def __str__(self):
+        loss_str = []
+        for name, meter in self.meters.items():
+            loss_str.append("{}: {}".format(name, str(meter)))
+        return self.delimiter.join(loss_str)
+    def synchronize_between_processes(self):
+        for meter in self.meters.values():
+            meter.synchronize_between_processes()
+    def add_meter(self, name, meter):
+        self.meters[name] = meter
+    def log_every(self, iterable, print_freq, header=None, start_iter=0, samples_per_iter=None):
+        i = start_iter
+        if not header:
+            header = ""
+        start_time = time.time()
+        end = time.time()
+        iter_time = SmoothedValue(fmt="{avg:.4f}")
+        data_time = SmoothedValue(fmt="{avg:.4f}")
+        log_msg = [header, "[{0" + "}/{1}]", "{meters}", "time: {time}", "data: {data}"]
+        if samples_per_iter is not None:
+            log_msg.append("samples/sec: {samples_per_sec:.2f}")
+        if torch.cuda.is_available():
+            log_msg.append("max mem: {memory:.0f}")
+        log_msg = self.delimiter.join(log_msg)
+        MB = 1024.0 * 1024.0
+        for obj in iterable:
+            data_time.update(time.time() - end)
+            yield obj
+            iter_time.update(time.time() - end)
+            if i % print_freq == 0:
+                try:
+                    total_len = len(iterable)
+                except:
+                    total_len = "unknown"
+                msg_kwargs = {
+                    "meters": str(self),
+                    "time": str(iter_time),
+                    "data": str(data_time),
+                }
+                if samples_per_iter is not None:
+                    msg_kwargs["samples_per_sec"] = samples_per_iter / iter_time.avg
+                if torch.cuda.is_available():
+                    msg_kwargs["memory"] = torch.cuda.max_memory_allocated() / MB
+                logger.info(log_msg.format(i, total_len, **msg_kwargs))
+            i += 1
+            end = time.time()
+        total_time = time.time() - start_time
+        total_time_str = str(datetime.timedelta(seconds=int(total_time)))
+        logger.info("{} Total time: {}".format(header, total_time_str))