birdfsd_yolov5.preprocessing package
Submodules
birdfsd_yolov5.preprocessing.add_bg_images module
- add_bg_images(background_label: str, output_dir: str = 'dataset-YOLO', bg_imgs_dir_name: str = 'bg_images', pct: int = 10, seed: int = 8) None[source]
Add n percentage of background images to the dataset.
- Parameters
output_dir (str) – The dataset directory (the output of JSON2YOLO.run).
bg_imgs_dir_name (str) – The background images output directory name.
pct (int) – Percentage of background images to keep.
seed (int) – Seed to initialize the random number generator.
birdfsd_yolov5.preprocessing.construct_dataset module
- construct_dataset(input_tasks: list) None[source]
Constructs a summarized dataset in Apache Parquet format.
This function gets all the tasks from mongodb, and then sends them to the ray cluster for processing. It then waits for the results to come back, and saves them to a parquet file.
- Parameters
input_tasks (list) – A list of all the tasks in the database.
- get_all_tasks_from_mongodb() list[source]
This function is used to get all the tasks from mongodb.
- Returns
A list of all the tasks in the database.
- Return type
list
- simplify(task: dict) Optional[dict][source]
Creayes a dict object out of the most important keys in a task.
This function takes a task from the original database and simplifies it to a format that is easier to work with.
- Parameters
task (dict) – A task from the original database.
- Returns
A simplified task.
- Return type
dict
birdfsd_yolov5.preprocessing.json2yolov5 module
- exception FailedToParseImageURL[source]
Bases:
ExceptionException raised when image URL is not valid.
- class JSON2YOLO(output_dir: str = 'dataset-YOLO', projects: Optional[str] = None, copy_data_from: Optional[str] = None, filter_rare_classes: Optional[str] = None, get_tasks_with_api: bool = False, force_update: bool = False, background_label: str = 'no animal', upload_dataset: bool = False, excluded_labels: Optional[Union[list, str]] = None, seed: int = 8, overwrite: bool = False, verbose: bool = False, imgs_dir_name: str = 'ls_images', labels_dir_name: str = 'ls_labels')[source]
Bases:
objectConverts the output of a Label-studio project to a YOLO dataset.
The output is a folder with the following structure:
dataset-YOLO/ ├── bar.jpg ├── classes.json ├── classes.txt ├── hist.jpg ├── images/ │ ├── train/ │ └── val/ ├── labels/ │ ├── train/ │ └── val/ ├── notes.json └── tasks.json
The output will also be stored in a tarball with the same name as the output folder.
The tasks that failed to export for any reason, will be logged at the end of the run.
- Parameters
output_dir (str) – The path to the output directory.
projects (str) – The project to export.
copy_data_from (str) – The path to a folder containing the dataset.
filter_rare_classes (str) – The number of instances of a class to keep. If set to ‘median’, the median of the class count will be used. If set to ‘mean’, the mean of the class count will be used.
get_tasks_with_api (bool) – If set to True, the tasks will be fetched from the Label-studio API.
force_update (bool) – If set to True, the dataset will be updated even if it already exists.
background_label (str) – The label to use for the background.
upload_dataset (bool) – If set to True, the dataset will be uploaded to the Label-studio API.
excluded_labels (list) – A list of labels to exclude from the dataset.
seed (int) – The seed for the random number generator.
overwrite (bool) – If set to True, the dataset will be overwritten if exists.
verbose (bool) – If set to True, more information will be logged.
imgs_dir_name (str) – The name of the images’ folder.
labels_dir_name (str) – The name of the labels’ folder.
- static bbox_ls_to_yolo(x: float, y: float, width: float, height: float) tuple[source]
From label-studio’s xywh to yolov5’s xywh.
Converts a bounding box from the format used by the labelme tool to the format used by the yolo tool.
- Parameters
x – The x coordinate of the top left corner of the bounding box.
y – The y coordinate of the top left corner of the bounding box.
width – The width of the bounding box.
height – The height of the bounding box.
- Returns
A tuple containing the x, y, width and height of the bounding box in the format used by the yolov5.
- Return type
tuple
- convert_to_yolo(task: dict) Optional[List[Any]][source]
Convert the task to YOLO format.
- Parameters
task (dict) – The task to be converted.
- Returns
A tuple with a list of the labels in the task and a list of background image path if the task is labeled as a background image.
- Return type
Optional[Tuple[list, list]]
- Raises
FailedToParseImageURL – If the image URL is not valid.
TypeError – If the image URL is not valid.
- download_image(task: dict, cur_img_path: str, img_url: str) Optional[bool][source]
This function is used to download the image from the URL.
- Parameters
task (dict) – A dictionary containing the task data.
cur_img_path (str) – The path to which the image will be written.
img_url (str) – The URL of the image.
- Returns
True if the image was downloaded successfully,
- Return type
Optional[bool]
- get_assets_info(task: dict) tuple[source]
This function is used to get assets info from a task.
- Parameters
task – A single task.
- get_data() list[source]
This function is used to get data from the database.
- Returns
A list of data.
- Return type
list
- get_excluded_labels()[source]
Get the excluded labels.
- Returns
The excluded labels.
- Return type
list
- plot_results(results: list) None[source]
Plots the results of the classification.
- Parameters
results (list) – The results of the classification.
- run() None[source]
Runs the preprocessing pipeline.
This method is used to run main preprocessing pipeline and convert the data to the yolov5 format.
- Raises
BucketDoesNotExist – If the dataset S3 bucket does not exist.
FailedToParseImageURL – If the image URL is not valid.