FeedItOut

Hey there, fellow developers! I’m thrilled to share the latest updates from our project, Jira Creator, with the release of version 1.0.3. This update brings a variety of changes, from minor tweaks to significant improvements, all aimed at enhancing your experience with our tool.

What’s Changed?

In this release, we made several changes across various files, focusing on improving the overall structure and functionality of the tool. Here’s a quick rundown of what’s new:

Updated the environment variable names for clarity and consistency.
Refactored the code to improve readability and maintainability.
Fixed bugs related to AI provider integration and issue management.
Bumped the version number from 0.0.43 to 1.0.3, marking a significant milestone.

How Does This Change the Project?

This update is more than just a version bump. By renaming environment variables to include the prefix “JIRA_”, we’ve made it easier for users to understand what each variable relates to. This small change can help reduce confusion and improve the setup process.

Additionally, the refactoring efforts have led to cleaner code, making it easier for us to maintain and for contributors to understand. This means fewer bugs and a smoother experience for everyone!

Bug Fixes, Refactoring, and Feature Enhancements

We’ve tackled some pesky bugs in this release. For instance, the AI provider integration has been streamlined, ensuring that your AI-related commands work more reliably. The refactoring efforts also mean that the code is now more modular, which is a win for future development.

While there are no new features in this release, the improvements we’ve made are crucial for the stability and usability of the tool. We believe that these changes will significantly enhance your workflow when creating Jira issues.

What About Dependencies or Configurations?

No major changes were made to dependencies or configurations in this release. We’ve kept everything stable, ensuring that you can upgrade without worrying about breaking changes.

Release Info and Links

Here’s a quick summary of the release:

Version: 1.0.3
GitHub Release: View Release
Debian Package: Download .deb
RPM Package: Download .rpm
PyPI Release: View on PyPI
Docker Hub Release: docker.io/feeditout/jira-creator:1.0.4
GHCR Release: ghcr.io/dmzoneill/jira-creator:1.0.4

We hope you enjoy the improvements in this release! As always, your feedback is invaluable to us, so feel free to reach out with any thoughts or suggestions. Happy coding!

📚 Further Learning

🎥 Watch this video for more:

Generating Python Docstrings Automatically

Welcome to my blog! Today, I’m excited to share a neat Python script that automates the generation of docstrings for your Python files. This tool is especially useful for developers who want to maintain clean and well-documented code without spending too much time writing documentation manually.

The script utilizes OpenAI’s API to generate meaningful docstrings based on the code provided. It analyzes the structure of your Python classes and functions, and then creates concise and informative docstrings that follow standard conventions. Let’s dive into the code!


#!/usr/bin/python
import argparse
import hashlib
import json
import os
import re
import statistics
import subprocess
import sys
import tempfile
import time

import requests

system_prompt_general = """
You will be provided with a Python code.
Based on this, your task is to generate a Python docstring for it.
This is the docstring that goes at the top of the file.
The file may include multiple classes or other code.

Ensure the docstring follows Python's standard docstring conventions and provides
just enough detail to make the file understandable and usable without overwhelming the reader.

Please only return the docstring, enclosed in triple quotes, without any other explanation
or additional text. The format should be:

\"\"\"

\"\"\"

Make sure to follow the format precisely and provide only the docstring content.
"""

system_prompt_class = """
You will be provided with a Python class, including its code.
Based on this, your task is to generate a Python docstring for it.
Use the class signature and body to infer the purpose of the class,
the attributes it has, and any methods it includes.

Follow these guidelines to create the docstring:

1. Summary: Provide a concise summary of the class's purpose.
    Focus on what the class does and its main goal.
2. Attributes: List the attributes, their types, and a brief description
    of what each one represents.

Ensure the docstring follows Python's standard docstring conventions and provides
just enough detail to make the class understandable and usable without overwhelming the reader.

Please only return the docstring, enclosed in triple quotes, without any other
explanation or additional text. The format should be:

\"\"\"

\"\"\"

Make sure to follow the format precisely and provide only the docstring content.
"""

system_prompt_def = """
You will be provided with a Python function, including its code.
Based on this, your task is to generate a Python docstring for it.
Use the function signature and body to infer the purpose of the function,
the arguments it takes, the return value, and any exceptions it may raise.

Follow these guidelines to create the docstring:

1. Summary: Provide a concise summary of the function's purpose.
    Focus on what the function does and its main goal.
2. Arguments: List the parameters, their types, and a brief description
    of what each one represents.
3. Return: If the function has a return value, describe the return type
    and what it represents. If there's no return, OMIT THE SECTION.
4. Exceptions: If the function raises any exceptions, list them with descriptions.
    If no exceptions are raised, OMIT THE SECTION.
5. Side Effects (if applicable): If the function has side effects
    (e.g., modifies global state, interacts with external services), mention them.
    OMIT THE SECTION if it is not clear in the code.
6. Algorithm or Key Logic (optional): If the function is complex,
    provide a high-level outline of the logic or algorithm involved.
    OMIT THE SECTION if it is not clear in the code.

Ensure the docstring follows Python's standard docstring conventions and provides
just enough detail to make the function understandable and usable without overwhelming the reader.

Please only return the docstring, enclosed in triple quotes, without any other
explanation or additional text. The format should be:

\"\"\"

\"\"\"

Make sure to follow the format precisely and provide only the docstring content.
"""


class OpenAICost:
    # Static member to track the total cost
    cost = 0
    costs = []  # A list to store the individual request costs

    @staticmethod
    def send_cost(tokens, model):
        # The cost calculation can vary based on the model and tokens
        model_costs = {
            "gpt-3.5-turbo": 0.002,  # Example cost per 1k tokens
            "gpt-4o-mini": 0.003,  # Example cost per 1k tokens
            "gpt-4o": 0.005,  # Example cost per 1k tokens
        }

        cost_per_token = model_costs.get(model, 0)
        cost = (tokens / 1000) * cost_per_token  # Cost is proportional to tokens

        OpenAICost.cost += cost
        OpenAICost.costs.append(cost)

    @staticmethod
    def print_cost_metrics():
        print(f"\nTotal Cost: ${OpenAICost.cost:.4f}")
        if OpenAICost.costs:
            print(
                f"    Average Cost per Request: ${statistics.mean(OpenAICost.costs):.4f}"
            )
            print(f"    Max Cost for a Request: ${max(OpenAICost.costs):.4f}")
            print(f"    Min Cost for a Request: ${min(OpenAICost.costs):.4f}")
            if len(OpenAICost.costs) > 1:
                print(
                    f"    Standard Deviation of Cost: ${statistics.stdev(OpenAICost.costs):.4f}"
                )
        else:
            print("    No costs recorded.")


class OpenAIProvider:
    def __init__(self):
        self.api_key = os.getenv("JIRA_AI_API_KEY")
        if not self.api_key:
            raise EnvironmentError("JIRA_AI_API_KEY not set in environment.")
        self.endpoint = "https://api.openai.com/v1/chat/completions"
        self.model = os.getenv("OPENJIRA_AI_MODEL", "gpt-4")

    def estimate_tokens(self, text: str) -> int:
        tokens = len(text) // 3  # Using the 3 bytes per token estimation
        return tokens

    def select_model(self, input):
        tokens = self.estimate_tokens(input)

        if tokens < 1000:  # For small files (under ~1000 tokens)
            model = "gpt-3.5-turbo"
        elif tokens < 10000:  # For medium files (under ~10000 tokens)
            model = "gpt-4o-mini"
        else:  # For large files (over ~10000 tokens)
            model = "gpt-4o"

        OpenAICost.send_cost(tokens, model)

        return model

    def improve_text(self, prompt: str, text: str) -> str:
        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "Content-Type": "application/json",
        }

        body = {
            "model": self.select_model(text),
            "messages": [
                {"role": "system", "content": prompt},
                {"role": "user", "content": text},
            ],
            "temperature": 0.5,
        }

        response = requests.post(self.endpoint, json=body, headers=headers, timeout=300)
        if response.status_code == 200:
            res = response.json()["choices"][0]["message"]["content"].strip()
            result = res
            # Count the occurrences of '"""'
            occurrences = result.count('"""')
            occurrences_backtick = result.count("```")

            # Check if there are more than two occurrences
            if occurrences > 2:
                # Find the positions of the first and second occurrences
                first_pos = result.find('"""')
                second_pos = result.find('"""', first_pos + 1)

                # Get everything from the first '"""' to the second '"""', inclusive
                result = result[first_pos : second_pos + 3]  # Include the second '"""'

            if occurrences_backtick > 2:
                # Find the positions of the first and second occurrences
                first_pos = result.find("```")
                second_pos = result.find("```", first_pos + 1)

                # Get everything from the first '"""' to the second '"""', inclusive
                result = result[first_pos : second_pos + 3]  # Include the second '"""'

            return result
        raise Exception(
            f"OpenAI API call failed: {response.status_code} - {response.text}"
        )


class Docstring:
    def __init__(self, file_path, debug=False, exit=False):
        self.file_path = file_path
        self.lines = []
        self.ai = OpenAIProvider()
        self.line_index = 0
        self.multiline_index = 0
        self.cache_file = "docstring.cache"
        self.debug = debug
        self.exit = exit

        with open(self.file_path, "r") as file:
            self.lines = file.readlines()

        # Ensure cache file exists, create if necessary
        if not os.path.exists(self.cache_file):
            with open(self.cache_file, "w") as cache:
                json.dump({}, cache)

        self._load_cache()

        print(" -> " + self.file_path)

    def print_debug(self, title, out):
        if not self.debug:
            return

        print("=====================================================")
        print("=====================================================")
        print(" > > " + title)
        print("=====================================================")
        out = "".join(out) if isinstance(out, list) else out
        print(out)
        print("=====================================================")
        print("=====================================================")

    def _load_cache(self):
        with open(self.cache_file, "r") as cache:
            self.cache = json.load(cache)

    def _save_cache(self):
        with open(self.cache_file, "w") as cache_file:
            json.dump(self.cache, cache_file, indent=4)

    def _generate_sha1(self, user_prompt):
        return hashlib.sha1(user_prompt.encode("utf-8")).hexdigest()

    def _get_current_timestamp(self):
        return int(time.time())

    def get_ai_docstring(self, sys_prompt, user_prompt, signiture):
        sha1_hash = self._generate_sha1(user_prompt)

        # Check if file is in cache
        if self.file_path in self.cache:
            # Check if the user prompt's SHA1 is in the self.cache for this file
            for entry in self.cache[self.file_path]:
                if entry["sha1"] == sha1_hash:
                    # Update last_accessed timestamp
                    entry["last_accessed"] = self._get_current_timestamp()
                    # Return cached docstring if found
                    return entry["docstring"]

        print("    Requesting AI for: " + signiture)
        # If no self.cache hit, call the AI and get the docstring
        res = self.ai.improve_text(sys_prompt, user_prompt)

        # Create a new self.cache entry with last_accessed timestamp
        new_entry = {
            "sha1": sha1_hash,
            "docstring": res,
            "last_accessed": self._get_current_timestamp(),
        }

        # Add new entry to the self.cache for the current file
        if self.file_path not in self.cache:
            self.cache[self.file_path] = []

        self.cache[self.file_path].append(new_entry)

        # Return the new docstring from AI
        return res

    def remove_old_entries(self, minutes):
        current_timestamp = self._get_current_timestamp()
        threshold_timestamp = current_timestamp - (minutes * 60)

        # Remove old entries for each file in the self.cache
        for file_path, entries in self.cache.items():
            self.cache[file_path] = [
                entry
                for entry in entries
                if "last_accessed" in entry
                and entry["last_accessed"] >= threshold_timestamp
            ]

    def wrap_text(self, text: str, max_length=120, indent=0):
        wrapped_lines = []
        lines = text.strip().splitlines()

        # Ensure indent is an integer
        try:
            indent = int(indent)
        except ValueError:
            indent = 0  # Default to 0 if it’s an invalid value

        spacer = ""
        if indent == 0:
            spacer = ""
        else:
            spacer = " " * (indent * 4)

        for line in lines:
            line = spacer + line.strip()
            while len(line) > max_length:
                # Find last space to split at
                split_at = line.rfind(" ", 0, max_length)
                if split_at == -1:
                    split_at = max_length  # no space found, split at max_length
                wrapped_lines.append(line[:split_at].rstrip())
                line = spacer + line[split_at:].strip()
            wrapped_lines.append(line)
        return wrapped_lines

    def count_and_divide_whitespace(self, line):
        leading_whitespace = len(line) - len(line.lstrip())
        if leading_whitespace == 0:
            return 0
        return leading_whitespace // 4

    def complete(self):
        # Create a temporary file
        with tempfile.NamedTemporaryFile(delete=False) as temp_file:
            temp_file_path = temp_file.name
            temp_file.write(
                "".join(self.lines).encode()
            )  # Write the content to the temporary file

        try:
            # Attempt to compile the temporary file
            result = subprocess.run(
                ["python", "-m", "py_compile", temp_file_path],
                capture_output=True,
                text=True,
            )

            # If there is no compilation error (i.e., result.returncode == 0), move the file to the destination
            if result.returncode == 0:
                with open(self.file_path, "w") as file:
                    print(f"    Wrote: {self.file_path}")
                    file.write("".join(self.lines))  # Write to the destination file
            else:
                print(f"    Error compiling file: {result.stderr}")
                if self.debug:
                    name = "/tmp/" + os.path.basename(self.file_path) + ".failed"
                    with open(name, "w") as file:
                        file.write("".join(self.lines))
                        print(f"    Copied here: {name}")
                        if self.exit:
                            sys.exit(1)

        finally:
            # Clean up the temporary file
            os.remove(temp_file_path)

        self.remove_old_entries(1440 * 14)
        self._save_cache()

    def generate_class_docstring(self):
        line = self.lines[self.line_index]
        class_definition = line
        output = re.sub("\\s+", " ", class_definition.rstrip().replace("\n", " "))
        print("   -> " + output)
        prompt_class_code = [class_definition]

        if self.count_and_divide_whitespace(class_definition) > 0:
            self.line_index = self.line_index + 1
            return None

        t = self.line_index + 1
        # Collect all lines that belong to the class, including "pass" or single-line classes
        while (
            t < len(self.lines)
            and not self.lines[t].startswith("def")
            and not self.lines[t].startswith("class")
        ):
            prompt_class_code.append(self.lines[t].rstrip())
            t += 1

        class_docstring = self.get_ai_docstring(
            system_prompt_class, "\n".join(prompt_class_code), output
        )
        class_docstring = self.wrap_text(class_docstring, max_length=120, indent=1)
        class_docstring[len(class_docstring) - 1] = (
            class_docstring[len(class_docstring) - 1] + "\n"
        )
        class_docstring = [line + "\n" for line in class_docstring]
        class_docstring[len(class_docstring) - 1] = (
            class_docstring[len(class_docstring) - 1].rstrip() + "\n"
        )

        # Check for existing docstring and replace it
        docstring_start_index = None
        docstring_end_index = None

        # Look for the class docstring (the second line should start with """ if it's there)
        if self.lines and self.lines[self.line_index + 1].strip().startswith('"""'):
            # Docstring exists, find the end of it
            docstring_start_index = (
                self.line_index + 1
            )  # The docstring starts from line after the class definition
            for i, line in enumerate(
                self.lines[self.line_index + 2 :], start=self.line_index + 2
            ):
                if line.strip().startswith('"""'):
                    docstring_end_index = i  # End of the docstring
                    break

        # If a docstring exists, replace it
        if docstring_start_index is not None and docstring_end_index is not None:
            self.lines = (
                self.lines[:docstring_start_index]
                + self.lines[docstring_end_index + 1 :]
            )

        # Insert the new docstring after the class definition
        self.lines = (
            self.lines[: self.line_index + 1]
            + class_docstring
            + self.lines[self.line_index + 1 :]
        )

        # self.print_debug("class docstring: " + class_definition.strip(), self.lines)

        return True

    def generate_function_docstring(self):
        line = self.lines[self.line_index]
        mutliline_line = ""

        if not (
            line.strip().endswith("):")
            and not re.search(r"\)\s*->\s*(.*)\s*:.*", line.strip())
        ):
            self.multiline_index = 0
            # multiline def signiture
            while self.line_index < len(self.lines):
                mutliline_line += self.lines[self.line_index]
                if re.match(
                    r".*\):$", self.lines[self.line_index].strip()
                ) or re.search(
                    r".*\)\s*->\s*(.*)\s*:.*", self.lines[self.line_index].strip()
                ):
                    break
                self.line_index += 1
                self.multiline_index += 1

        def_definition = line if mutliline_line == "" else mutliline_line
        output = re.sub("\\s+", " ", def_definition.rstrip().replace("\n", " "))
        print("     -> " + output)
        prompt_def_code = (
            mutliline_line.split("\n") if mutliline_line != "" else [def_definition]
        )

        indent_line = self.count_and_divide_whitespace(
            def_definition if mutliline_line == "" else mutliline_line.splitlines()[0]
        )
        spacer_line = "" if indent_line == 0 else " " * (indent_line * 4)
        spacer_line_minus = "" if indent_line < 2 else " " * ((indent_line - 1) * 4)
        spacer_line_plus = "" if indent_line == 0 else " " * ((indent_line + 1) * 4)

        t = self.line_index + 1
        # Collect all self.lines that belong to the function
        while t < len(self.lines):
            starts_with_def = self.lines[t].strip().startswith("def")

            # same indent
            if starts_with_def and self.lines[t].startswith(spacer_line):
                break

            # outside
            if starts_with_def and self.lines[t].startswith(spacer_line_minus):
                break

            # nested
            if starts_with_def and self.lines[t].startswith(spacer_line_plus):
                pass

            if self.lines[t].rstrip() != def_definition.strip():
                prompt_def_code.append(self.lines[t].rstrip())
            t += 1

        # Now that we have the full function signature, we generate the docstring
        indent = (
            self.count_and_divide_whitespace(
                def_definition
                if mutliline_line == ""
                else mutliline_line.splitlines()[0]
            )
            + 1
        )
        def_docstring = self.get_ai_docstring(
            system_prompt_def, "\n".join(prompt_def_code), output
        )
        def_docstring = self.wrap_text(def_docstring, max_length=120, indent=indent)
        def_docstring[len(def_docstring) - 1] = (
            def_docstring[len(def_docstring) - 1] + "\n"
        )
        if def_definition.strip() == def_docstring[0].strip():
            def_docstring = def_docstring[
                1 if mutliline_line == "" else self.multiline_index :
            ]
        def_docstring = [line + "\n" for line in def_docstring]
        def_docstring[len(def_docstring) - 1] = (
            def_docstring[len(def_docstring) - 1].rstrip() + "\n"
        )

        # Handle one-liner docstring or multi-line docstring
        if '"""' in self.lines[self.line_index + 1]:
            stripped_line = self.lines[self.line_index + 1].strip()
            if re.match(r'"""[\s\S]+?"""', stripped_line):
                # This is a one-liner or multi-line docstring (we always replace with a multi-line docstring)
                self.lines = (
                    self.lines[: self.line_index + 1]
                    + def_docstring
                    + self.lines[self.line_index + 2 :]
                )
            else:
                # Replace the entire docstring if it's multi-line
                end_index = self.line_index + 2
                while end_index < len(self.lines) and not self.lines[
                    end_index
                ].strip().startswith('"""'):
                    end_index += 1

                if (
                    end_index < len(self.lines)
                    and self.lines[end_index].strip() == '"""'
                ):
                    # Found the end of the docstring, now replace the entire docstring block
                    self.lines = (
                        self.lines[: self.line_index + 1]
                        + def_docstring
                        + self.lines[end_index + 1 :]
                    )
        else:
            # If no docstring exists, simply insert the generated docstring
            self.lines = (
                self.lines[: self.line_index + 1]
                + def_docstring
                + self.lines[self.line_index + 1 :]
            )

        # self.print_debug("def docstring: " + def_definition.strip(), self.lines)

        self.line_index = self.line_index + len(def_docstring)
        return True

    def generate_file_docstring(self):
        # Check if we should add a file-level docstring
        # if not self.should_add_file_docstring():
        #     return 0  # Skip generating file-level docstring if not needed

        shebang = ""

        # Check if the first line starts with a shebang (e.g., #! anything)
        if self.lines and not self.lines[0].startswith("#!"):
            self.lines = ["#!/usr/bin/env python\n"] + self.lines
            shebang = "#!/usr/bin/env python\n"
        else:
            shebang = self.lines[0]

        # Check if there's already an existing file-level docstring or comment block
        # We assume the file-level docstring starts with triple quotes (""" or ''') and is at the top
        docstring_start_index = None
        docstring_end_index = None

        if self.lines and self.lines[1].strip().startswith('"""'):
            # If the second line starts with triple quotes, it may be a docstring
            docstring_start_index = 1  # The docstring starts from line 2
            for i, line in enumerate(self.lines[2:], start=2):
                if line.strip().startswith('"""'):
                    docstring_end_index = i  # End of the docstring
                    break

        # Generate new file-level docstring
        general_description = self.get_ai_docstring(
            system_prompt_general, "".join(self.lines), self.file_path
        )
        general_description = self.wrap_text(
            general_description, max_length=120, indent=0
        )
        docstring = [line + "\n" for line in general_description]

        # self.print_debug("docstring_end_index", str(docstring_end_index))
        # self.print_debug("self.lines[:docstring_start_index]", self.lines[:docstring_start_index])
        # self.print_debug("docstring", docstring)
        # self.print_debug("self.lines[docstring_end_index + 1 :]", self.lines[docstring_end_index + 1 :])

        # If a docstring exists, replace it with the new one
        if docstring_start_index is not None and docstring_end_index is not None:
            self.lines = (
                self.lines[:docstring_start_index]
                + docstring
                + self.lines[docstring_end_index + 1 :]
            )
        else:
            # Insert the generated docstring directly after the shebang (no extra newline)
            self.lines = [shebang] + docstring + self.lines[1:]

        # self.print_debug("file docstring: " + self.file_path, self.lines)

        return len(docstring)

    def generate_docstrings(self):
        if len(self.lines) == 0:
            return

        self.line_index = self.generate_file_docstring()

        while self.line_index < len(self.lines):
            line = self.lines[self.line_index]

            # For classes, generate class docstring
            if line.strip().startswith("class "):
                if not self.generate_class_docstring():
                    continue  # Skip to the next line

            # For functions, generate function docstring
            elif line.strip().startswith("def "):
                if not self.generate_function_docstring():
                    continue  # Skip to the next line

            self.line_index = self.line_index + 1

        self.complete()


def process_file(file_path, debug=False, exit=False):
    """Process a single file by generating docstrings."""
    Docstring(file_path, debug=debug, exit=exit).generate_docstrings()


def process_directory(directory_path, recursive=False, debug=False, exit=False):
    """Process all Python files in the directory with progress tracking."""

    # List all python files
    python_files = [
        os.path.join(root, file)
        for root, dirs, files in os.walk(directory_path)
        for file in files
        if file.endswith(".py")
    ]

    total_files = len(python_files)  # Total python files count
    processed_files = 0

    print(f"Processing {total_files} Python files...")

    for file_path in python_files:
        process_file(file_path, debug=debug, exit=exit)
        processed_files += 1
        print(f"\nProcessing file: {processed_files}/{total_files}")

        # If we don't want recursive traversal, we break the loop once we're done with this level
        if not recursive:
            break


def main():
    # Set up the argument parser
    parser = argparse.ArgumentParser(
        description="Generate file-level docstrings for Python files."
    )
    parser.add_argument("path", help="Path to a Python file or directory.")
    parser.add_argument(
        "-r",
        "--recursive",
        action="store_true",
        help="Recursively process all Python files in the directory.",
    )
    parser.add_argument(
        "-d",
        "--debug",
        action="store_true",
        help="Copies failed updates to /tmp/",
    )
    parser.add_argument(
        "-e",
        "--exit",
        action="store_true",
        help="Exits on failure",
    )
    # Parse the arguments
    args = parser.parse_args()

    # Check if the path is a file or directory
    if os.path.isfile(args.path):
        # If it's a file, process it directly
        process_file(args.path, debug=args.debug, exit=args.exit)
    elif os.path.isdir(args.path):
        # If it's a directory, process all Python files
        process_directory(
            args.path, recursive=args.recursive, debug=args.debug, exit=args.exit
        )
    else:
        print(f"Error: {args.path} is neither a valid file nor a directory.")

    OpenAICost.print_cost_metrics()


if __name__ == "__main__":
    main()

#!/usr/bin/python
import argparse
import hashlib
import json
import os
import re
import statistics
import subprocess
import sys
import tempfile
import time

import requests

system_prompt_general = """
You will be provided with a Python code.
Based on this, your task is to generate a Python docstring for it.
This is the docstring that goes at the top of the file.
The file may include multiple classes or other code.

Ensure the docstring follows Python's standard docstring conventions and provides
just enough detail to make the file understandable and usable without overwhelming the reader.

Please only return the docstring, enclosed in triple quotes, without any other explanation
or additional text. The format should be:

\"\"\"

\"\"\"

Make sure to follow the format precisely and provide only the docstring content.
"""

system_prompt_class = """
You will be provided with a Python class, including its code.
Based on this, your task is to generate a Python docstring for it.
Use the class signature and body to infer the purpose of the class,
the attributes it has, and any methods it includes.

Follow these guidelines to create the docstring:

1. Summary: Provide a concise summary of the class's purpose.
    Focus on what the class does and its main goal.
2. Attributes: List the attributes, their types, and a brief description
    of what each one represents.

Ensure the docstring follows Python's standard docstring conventions and provides
just enough detail to make the class understandable and usable without overwhelming the reader.

Please only return the docstring, enclosed in triple quotes, without any other
explanation or additional text. The format should be:

\"\"\"

\"\"\"

Make sure to follow the format precisely and provide only the docstring content.
"""

system_prompt_def = """
You will be provided with a Python function, including its code.
Based on this, your task is to generate a Python docstring for it.
Use the function signature and body to infer the purpose of the function,
the arguments it takes, the return value, and any exceptions it may raise.

Follow these guidelines to create the docstring:

1. Summary: Provide a concise summary of the function's purpose.
    Focus on what the function does and its main goal.
2. Arguments: List the parameters, their types, and a brief description
    of what each one represents.
3. Return: If the function has a return value, describe the return type
    and what it represents. If there's no return, OMIT THE SECTION.
4. Exceptions: If the function raises any exceptions, list them with descriptions.
    If no exceptions are raised, OMIT THE SECTION.
5. Side Effects (if applicable): If the function has side effects
    (e.g., modifies global state, interacts with external services), mention them.
    OMIT THE SECTION if it is not clear in the code.
6. Algorithm or Key Logic (optional): If the function is complex,
    provide a high-level outline of the logic or algorithm involved.
    OMIT THE SECTION if it is not clear in the code.

Ensure the docstring follows Python's standard docstring conventions and provides
just enough detail to make the function understandable and usable without overwhelming the reader.

Please only return the docstring, enclosed in triple quotes, without any other
explanation or additional text. The format should be:

\"\"\"

\"\"\"

Make sure to follow the format precisely and provide only the docstring content.
"""


class OpenAICost:
    # Static member to track the total cost
    cost = 0
    costs = []  # A list to store the individual request costs

    @staticmethod
    def send_cost(tokens, model):
        # The cost calculation can vary based on the model and tokens
        model_costs = {
            "gpt-3.5-turbo": 0.002,  # Example cost per 1k tokens
            "gpt-4o-mini": 0.003,  # Example cost per 1k tokens
            "gpt-4o": 0.005,  # Example cost per 1k tokens
        }

        cost_per_token = model_costs.get(model, 0)
        cost = (tokens / 1000) * cost_per_token  # Cost is proportional to tokens

        OpenAICost.cost += cost
        OpenAICost.costs.append(cost)

    @staticmethod
    def print_cost_metrics():
        print(f"\nTotal Cost: ${OpenAICost.cost:.4f}")
        if OpenAICost.costs:
            print(
                f"    Average Cost per Request: ${statistics.mean(OpenAICost.costs):.4f}"
            )
            print(f"    Max Cost for a Request: ${max(OpenAICost.costs):.4f}")
            print(f"    Min Cost for a Request: ${min(OpenAICost.costs):.4f}")
            if len(OpenAICost.costs) > 1:
                print(
                    f"    Standard Deviation of Cost: ${statistics.stdev(OpenAICost.costs):.4f}"
                )
        else:
            print("    No costs recorded.")


class OpenAIProvider:
    def __init__(self):
        self.api_key = os.getenv("JIRA_AI_API_KEY")
        if not self.api_key:
            raise EnvironmentError("JIRA_AI_API_KEY not set in environment.")
        self.endpoint = "https://api.openai.com/v1/chat/completions"
        self.model = os.getenv("OPENJIRA_AI_MODEL", "gpt-4")

    def estimate_tokens(self, text: str) -> int:
        tokens = len(text) // 3  # Using the 3 bytes per token estimation
        return tokens

    def select_model(self, input):
        tokens = self.estimate_tokens(input)

        if tokens < 1000:  # For small files (under ~1000 tokens)
            model = "gpt-3.5-turbo"
        elif tokens < 10000:  # For medium files (under ~10000 tokens)
            model = "gpt-4o-mini"
        else:  # For large files (over ~10000 tokens)
            model = "gpt-4o"

        OpenAICost.send_cost(tokens, model)

        return model

    def improve_text(self, prompt: str, text: str) -> str:
        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "Content-Type": "application/json",
        }

        body = {
            "model": self.select_model(text),
            "messages": [
                {"role": "system", "content": prompt},
                {"role": "user", "content": text},
            ],
            "temperature": 0.5,
        }

        response = requests.post(self.endpoint, json=body, headers=headers, timeout=300)
        if response.status_code == 200:
            res = response.json()["choices"][0]["message"]["content"].strip()
            result = res
            # Count the occurrences of '"""'
            occurrences = result.count('"""')
            occurrences_backtick = result.count("```")

            # Check if there are more than two occurrences
            if occurrences > 2:
                # Find the positions of the first and second occurrences
                first_pos = result.find('"""')
                second_pos = result.find('"""', first_pos + 1)

                # Get everything from the first '"""' to the second '"""', inclusive
                result = result[first_pos : second_pos + 3]  # Include the second '"""'

            if occurrences_backtick > 2:
                # Find the positions of the first and second occurrences
                first_pos = result.find("```")
                second_pos = result.find("```", first_pos + 1)

                # Get everything from the first '"""' to the second '"""', inclusive
                result = result[first_pos : second_pos + 3]  # Include the second '"""'

            return result
        raise Exception(
            f"OpenAI API call failed: {response.status_code} - {response.text}"
        )


class Docstring:
    def __init__(self, file_path, debug=False, exit=False):
        self.file_path = file_path
        self.lines = []
        self.ai = OpenAIProvider()
        self.line_index = 0
        self.multiline_index = 0
        self.cache_file = "docstring.cache"
        self.debug = debug
        self.exit = exit

        with open(self.file_path, "r") as file:
            self.lines = file.readlines()

        # Ensure cache file exists, create if necessary
        if not os.path.exists(self.cache_file):
            with open(self.cache_file, "w") as cache:
                json.dump({}, cache)

        self._load_cache()

        print(" -> " + self.file_path)

    def print_debug(self, title, out):
        if not self.debug:
            return

        print("=====================================================")
        print("=====================================================")
        print(" > > " + title)
        print("=====================================================")
        out = "".join(out) if isinstance(out, list) else out
        print(out)
        print("=====================================================")
        print("=====================================================")

    def _load_cache(self):
        with open(self.cache_file, "r") as cache:
            self.cache = json.load(cache)

    def _save_cache(self):
        with open(self.cache_file, "w") as cache_file:
            json.dump(self.cache, cache_file, indent=4)

    def _generate_sha1(self, user_prompt):
        return hashlib.sha1(user_prompt.encode("utf-8")).hexdigest()

    def _get_current_timestamp(self):
        return int(time.time())

    def get_ai_docstring(self, sys_prompt, user_prompt, signiture):
        sha1_hash = self._generate_sha1(user_prompt)

        # Check if file is in cache
        if self.file_path in self.cache:
            # Check if the user prompt's SHA1 is in the self.cache for this file
            for entry in self.cache[self.file_path]:
                if entry["sha1"] == sha1_hash:
                    # Update last_accessed timestamp
                    entry["last_accessed"] = self._get_current_timestamp()
                    # Return cached docstring if found
                    return entry["docstring"]

        print("    Requesting AI for: " + signiture)
        # If no self.cache hit, call the AI and get the docstring
        res = self.ai.improve_text(sys_prompt, user_prompt)

        # Create a new self.cache entry with last_accessed timestamp
        new_entry = {
            "sha1": sha1_hash,
            "docstring": res,
            "last_accessed": self._get_current_timestamp(),
        }

        # Add new entry to the self.cache for the current file
        if self.file_path not in self.cache:
            self.cache[self.file_path] = []

        self.cache[self.file_path].append(new_entry)

        # Return the new docstring from AI
        return res

    def remove_old_entries(self, minutes):
        current_timestamp = self._get_current_timestamp()
        threshold_timestamp = current_timestamp - (minutes * 60)

        # Remove old entries for each file in the self.cache
        for file_path, entries in self.cache.items():
            self.cache[file_path] = [
                entry
                for entry in entries
                if "last_accessed" in entry
                and entry["last_accessed"] >= threshold_timestamp
            ]

    def wrap_text(self, text: str, max_length=120, indent=0):
        wrapped_lines = []
        lines = text.strip().splitlines()

        # Ensure indent is an integer
        try:
            indent = int(indent)
        except ValueError:
            indent = 0  # Default to 0 if it’s an invalid value

        spacer = ""
        if indent == 0:
            spacer = ""
        else:
            spacer = " " * (indent * 4)

        for line in lines:
            line = spacer + line.strip()
            while len(line) > max_length:
                # Find last space to split at
                split_at = line.rfind(" ", 0, max_length)
                if split_at == -1:
                    split_at = max_length  # no space found, split at max_length
                wrapped_lines.append(line[:split_at].rstrip())
                line = spacer + line[split_at:].strip()
            wrapped_lines.append(line)
        return wrapped_lines

    def count_and_divide_whitespace(self, line):
        leading_whitespace = len(line) - len(line.lstrip())
        if leading_whitespace == 0:
            return 0
        return leading_whitespace // 4

    def complete(self):
        # Create a temporary file
        with tempfile.NamedTemporaryFile(delete=False) as temp_file:
            temp_file_path = temp_file.name
            temp_file.write(
                "".join(self.lines).encode()
            )  # Write the content to the temporary file

        try:
            # Attempt to compile the temporary file
            result = subprocess.run(
                ["python", "-m", "py_compile", temp_file_path],
                capture_output=True,
                text=True,
            )

            # If there is no compilation error (i.e., result.returncode == 0), move the file to the destination
            if result.returncode == 0:
                with open(self.file_path, "w") as file:
                    print(f"    Wrote: {self.file_path}")
                    file.write("".join(self.lines))  # Write to the destination file
            else:
                print(f"    Error compiling file: {result.stderr}")
                if self.debug:
                    name = "/tmp/" + os.path.basename(self.file_path) + ".failed"
                    with open(name, "w") as file:
                        file.write("".join(self.lines))
                        print(f"    Copied here: {name}")
                        if self.exit:
                            sys.exit(1)

        finally:
            # Clean up the temporary file
            os.remove(temp_file_path)

        self.remove_old_entries(1440 * 14)
        self._save_cache()

    def generate_class_docstring(self):
        line = self.lines[self.line_index]
        class_definition = line
        output = re.sub("\\s+", " ", class_definition.rstrip().replace("\n", " "))
        print("   -> " + output)
        prompt_class_code = [class_definition]

        if self.count_and_divide_whitespace(class_definition) > 0:
            self.line_index = self.line_index + 1
            return None

        t = self.line_index + 1
        # Collect all lines that belong to the class, including "pass" or single-line classes
        while (
            t < len(self.lines)
            and not self.lines[t].startswith("def")
            and not self.lines[t].startswith("class")
        ):
            prompt_class_code.append(self.lines[t].rstrip())
            t += 1

        class_docstring = self.get_ai_docstring(
            system_prompt_class, "\n".join(prompt_class_code), output
        )
        class_docstring = self.wrap_text(class_docstring, max_length=120, indent=1)
        class_docstring[len(class_docstring) - 1] = (
            class_docstring[len(class_docstring) - 1] + "\n"
        )
        class_docstring = [line + "\n" for line in class_docstring]
        class_docstring[len(class_docstring) - 1] = (
            class_docstring[len(class_docstring) - 1].rstrip() + "\n"
        )

        # Check for existing docstring and replace it
        docstring_start_index = None
        docstring_end_index = None

        # Look for the class docstring (the second line should start with """ if it's there)
        if self.lines and self.lines[self.line_index + 1].strip().startswith('"""'):
            # Docstring exists, find the end of it
            docstring_start_index = (
                self.line_index + 1
            )  # The docstring starts from line after the class definition
            for i, line in enumerate(
                self.lines[self.line_index + 2 :], start=self.line_index + 2
            ):
                if line.strip().startswith('"""'):
                    docstring_end_index = i  # End of the docstring
                    break

        # If a docstring exists, replace it
        if docstring_start_index is not None and docstring_end_index is not None:
            self.lines = (
                self.lines[:docstring_start_index]
                + self.lines[docstring_end_index + 1 :]
            )

        # Insert the new docstring after the class definition
        self.lines = (
            self.lines[: self.line_index + 1]
            + class_docstring
            + self.lines[self.line_index + 1 :]
        )

        # self.print_debug("class docstring: " + class_definition.strip(), self.lines)

        return True

    def generate_function_docstring(self):
        line = self.lines[self.line_index]
        mutliline_line = ""

        if not (
            line.strip().endswith("):")
            and not re.search(r"\)\s*->\s*(.*)\s*:.*", line.strip())
        ):
            self.multiline_index = 0
            # multiline def signiture
            while self.line_index < len(self.lines):
                mutliline_line += self.lines[self.line_index]
                if re.match(
                    r".*\):$", self.lines[self.line_index].strip()
                ) or re.search(
                    r".*\)\s*->\s*(.*)\s*:.*", self.lines[self.line_index].strip()
                ):
                    break
                self.line_index += 1
                self.multiline_index += 1

        def_definition = line if mutliline_line == "" else mutliline_line
        output = re.sub("\\s+", " ", def_definition.rstrip().replace("\n", " "))
        print("     -> " + output)
        prompt_def_code = (
            mutliline_line.split("\n") if mutliline_line != "" else [def_definition]
        )

        indent_line = self.count_and_divide_whitespace(
            def_definition if mutliline_line == "" else mutliline_line.splitlines()[0]
        )
        spacer_line = "" if indent_line == 0 else " " * (indent_line * 4)
        spacer_line_minus = "" if indent_line < 2 else " " * ((indent_line - 1) * 4)
        spacer_line_plus = "" if indent_line == 0 else " " * ((indent_line + 1) * 4)

        t = self.line_index + 1
        # Collect all self.lines that belong to the function
        while t < len(self.lines):
            starts_with_def = self.lines[t].strip().startswith("def")

            # same indent
            if starts_with_def and self.lines[t].startswith(spacer_line):
                break

            # outside
            if starts_with_def and self.lines[t].startswith(spacer_line_minus):
                break

            # nested
            if starts_with_def and self.lines[t].startswith(spacer_line_plus):
                pass

            if self.lines[t].rstrip() != def_definition.strip():
                prompt_def_code.append(self.lines[t].rstrip())
            t += 1

        # Now that we have the full function signature, we generate the docstring
        indent = (
            self.count_and_divide_whitespace(
                def_definition
                if mutliline_line == ""
                else mutliline_line.splitlines()[0]
            )
            + 1
        )
        def_docstring = self.get_ai_docstring(
            system_prompt_def, "\n".join(prompt_def_code), output
        )
        def_docstring = self.wrap_text(def_docstring, max_length=120, indent=indent)
        def_docstring[len(def_docstring) - 1] = (
            def_docstring[len(def_docstring) - 1] + "\n"
        )
        if def_definition.strip() == def_docstring[0].strip():
            def_docstring = def_docstring[
                1 if mutliline_line == "" else self.multiline_index :
            ]
        def_docstring = [line + "\n" for line in def_docstring]
        def_docstring[len(def_docstring) - 1] = (
            def_docstring[len(def_docstring) - 1].rstrip() + "\n"
        )

        # Handle one-liner docstring or multi-line docstring
        if '"""' in self.lines[self.line_index + 1]:
            stripped_line = self.lines[self.line_index + 1].strip()
            if re.match(r'"""[\s\S]+?"""', stripped_line):
                # This is a one-liner or multi-line docstring (we always replace with a multi-line docstring)
                self.lines = (
                    self.lines[: self.line_index + 1]
                    + def_docstring
                    + self.lines[self.line_index + 2 :]
                )
            else:
                # Replace the entire docstring if it's multi-line
                end_index = self.line_index + 2
                while end_index < len(self.lines) and not self.lines[
                    end_index
                ].strip().startswith('"""'):
                    end_index += 1

                if (
                    end_index < len(self.lines)
                    and self.lines[end_index].strip() == '"""'
                ):
                    # Found the end of the docstring, now replace the entire docstring block
                    self.lines = (
                        self.lines[: self.line_index + 1]
                        + def_docstring
                        + self.lines[end_index + 1 :]
                    )
        else:
            # If no docstring exists, simply insert the generated docstring
            self.lines = (
                self.lines[: self.line_index + 1]
                + def_docstring
                + self.lines[self.line_index + 1 :]
            )

        # self.print_debug("def docstring: " + def_definition.strip(), self.lines)

        self.line_index = self.line_index + len(def_docstring)
        return True

    def generate_file_docstring(self):
        # Check if we should add a file-level docstring
        # if not self.should_add_file_docstring():
        #     return 0  # Skip generating file-level docstring if not needed

        shebang = ""

        # Check if the first line starts with a shebang (e.g., #! anything)
        if self.lines and not self.lines[0].startswith("#!"):
            self.lines = ["#!/usr/bin/env python\n"] + self.lines
            shebang = "#!/usr/bin/env python\n"
        else:
            shebang = self.lines[0]

        # Check if there's already an existing file-level docstring or comment block
        # We assume the file-level docstring starts with triple quotes (""" or ''') and is at the top
        docstring_start_index = None
        docstring_end_index = None

        if self.lines and self.lines[1].strip().startswith('"""'):
            # If the second line starts with triple quotes, it may be a docstring
            docstring_start_index = 1  # The docstring starts from line 2
            for i, line in enumerate(self.lines[2:], start=2):
                if line.strip().startswith('"""'):
                    docstring_end_index = i  # End of the docstring
                    break

        # Generate new file-level docstring
        general_description = self.get_ai_docstring(
            system_prompt_general, "".join(self.lines), self.file_path
        )
        general_description = self.wrap_text(
            general_description, max_length=120, indent=0
        )
        docstring = [line + "\n" for line in general_description]

        # self.print_debug("docstring_end_index", str(docstring_end_index))
        # self.print_debug("self.lines[:docstring_start_index]", self.lines[:docstring_start_index])
        # self.print_debug("docstring", docstring)
        # self.print_debug("self.lines[docstring_end_index + 1 :]", self.lines[docstring_end_index + 1 :])

        # If a docstring exists, replace it with the new one
        if docstring_start_index is not None and docstring_end_index is not None:
            self.lines = (
                self.lines[:docstring_start_index]
                + docstring
                + self.lines[docstring_end_index + 1 :]
            )
        else:
            # Insert the generated docstring directly after the shebang (no extra newline)
            self.lines = [shebang] + docstring + self.lines[1:]

        # self.print_debug("file docstring: " + self.file_path, self.lines)

        return len(docstring)

    def generate_docstrings(self):
        if len(self.lines) == 0:
            return

        self.line_index = self.generate_file_docstring()

        while self.line_index < len(self.lines):
            line = self.lines[self.line_index]

            # For classes, generate class docstring
            if line.strip().startswith("class "):
                if not self.generate_class_docstring():
                    continue  # Skip to the next line

            # For functions, generate function docstring
            elif line.strip().startswith("def "):
                if not self.generate_function_docstring():
                    continue  # Skip to the next line

            self.line_index = self.line_index + 1

        self.complete()


def process_file(file_path, debug=False, exit=False):
    """Process a single file by generating docstrings."""
    Docstring(file_path, debug=debug, exit=exit).generate_docstrings()


def process_directory(directory_path, recursive=False, debug=False, exit=False):
    """Process all Python files in the directory with progress tracking."""

    # List all python files
    python_files = [
        os.path.join(root, file)
        for root, dirs, files in os.walk(directory_path)
        for file in files
        if file.endswith(".py")
    ]

    total_files = len(python_files)  # Total python files count
    processed_files = 0

    print(f"Processing {total_files} Python files...")

    for file_path in python_files:
        process_file(file_path, debug=debug, exit=exit)
        processed_files += 1
        print(f"\nProcessing file: {processed_files}/{total_files}")

        # If we don't want recursive traversal, we break the loop once we're done with this level
        if not recursive:
            break


def main():
    # Set up the argument parser
    parser = argparse.ArgumentParser(
        description="Generate file-level docstrings for Python files."
    )
    parser.add_argument("path", help="Path to a Python file or directory.")
    parser.add_argument(
        "-r",
        "--recursive",
        action="store_true",
        help="Recursively process all Python files in the directory.",
    )
    parser.add_argument(
        "-d",
        "--debug",
        action="store_true",
        help="Copies failed updates to /tmp/",
    )
    parser.add_argument(
        "-e",
        "--exit",
        action="store_true",
        help="Exits on failure",
    )
    # Parse the arguments
    args = parser.parse_args()

    # Check if the path is a file or directory
    if os.path.isfile(args.path):
        # If it's a file, process it directly
        process_file(args.path, debug=args.debug, exit=args.exit)
    elif os.path.isdir(args.path):
        # If it's a directory, process all Python files
        process_directory(
            args.path, recursive=args.recursive, debug=args.debug, exit=args.exit
        )
    else:
        print(f"Error: {args.path} is neither a valid file nor a directory.")

    OpenAICost.print_cost_metrics()


if __name__ == "__main__":
    main()

Now, let's break down how this script works:

OpenAICost Class: This class tracks the cost of using the OpenAI API based on the number of tokens used. It provides methods to send cost data and print cost metrics.
OpenAIProvider Class: This class handles the interaction with the OpenAI API. It estimates the number of tokens in the input text, selects the appropriate model based on the token count, and sends requests to improve the text.
Docstring Class: This is the heart of the script. It reads the Python file, generates docstrings for classes and functions using the OpenAI API, and saves the updated file. It also manages a cache for previously generated docstrings to optimize performance.
Main Functionality: The script can process a single file or an entire directory of Python files, generating docstrings for each. It includes command-line arguments for flexibility.

This tool can significantly enhance your coding workflow by ensuring that your code is well-documented and easy to understand. I hope you find it as useful as I do! Happy coding!

📚 Further Learning

🎥 Watch this video for more:

Enhance Your Git Commits with AI

Hey there, fellow developers! Today, I’m excited to share a nifty little Python script that can help you generate better commit messages for your Git repositories. This script uses the OpenAI API to analyze your code changes and suggest a meaningful commit message. Let’s dive into how it works!

Summary of the Code

This script, named fancy-git-commit.py, leverages the OpenAI API to create insightful commit messages based on the differences in your code. It reads the diff output from Git, cleans it up by removing comments, and then sends it to OpenAI’s model to generate a concise commit message. You can choose to either print the message or directly commit it to your repository.

Here’s the Code


#!/usr/bin/env python
import argparse
import os
import re
import subprocess
import sys

import requests


class OpenAIProvider:
    def __init__(self):
        self.api_key = os.getenv("AI_API_KEY")
        if not self.api_key:
            raise EnvironmentError("AI_API_KEY not set in environment.")
        self.endpoint = "https://api.openai.com/v1/chat/completions"
        self.model = os.getenv("OPENAI_MODEL", "gpt-4o-mini")

    def improve_text(self, prompt: str, text: str) -> str:
        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "Content-Type": "application/json",
        }

        body = {
            "model": self.model,
            "messages": [
                {"role": "system", "content": prompt},
                {"role": "user", "content": text},
            ],
            "temperature": 0.3,
        }

        response = requests.post(self.endpoint, json=body, headers=headers, timeout=30)
        if response.status_code == 200:
            return response.json()["choices"][0]["message"]["content"].strip()

        raise Exception(
            f"OpenAI API call failed: {response.status_code} - {response.text}"
        )


def remove_comments_from_diff(diff: str) -> str:
    # Remove single-line comments
    diff = re.sub(r"#.*", "", diff)

    # Remove multi-line comments (triple quotes)
    diff = re.sub(r'("""[\s\S]*?"""|\'\'\'[\s\S]*?\'\'\')', "", diff)

    # Remove comments from git diff (e.g., lines starting with '---', '+++'
    # are part of the diff headers)
    diff = re.sub(r"^\s*(---|\+\+\+)\s.*", "", diff, flags=re.MULTILINE)

    return diff.strip()


def get_diff_from_stdin():
    """Reads the diff from stdin (useful for piped input)."""
    diff = sys.stdin.read()
    return remove_comments_from_diff(diff)


def get_diff_from_git():
    """Runs 'git diff HEAD~1' and captures the output."""
    diff = subprocess.check_output(
        "GIT_PAGER=cat git diff HEAD~1", shell=True, text=True
    )
    return remove_comments_from_diff(diff)


def main():
    parser = argparse.ArgumentParser(
        description="Generate and commit a git commit message."
    )
    parser.add_argument(
        "--print",
        action="store_true",
        help="Print the commit message without creating the commit",
    )
    args = parser.parse_args()

    cwd = os.getcwd()

    diff = None

    if not sys.stdin.isatty():
        print("Reading git diff from stdin...")
        diff = get_diff_from_stdin()
    else:
        print("Getting git diff from HEAD~1...")
        diff = get_diff_from_git()

    if not diff:
        print("No diff found, nothing to commit.")
        return

    prompt = "Take the diff and give me a good commit message, give me the commit message and nothing else"

    openai_provider = OpenAIProvider()

    print("Generating commit message from OpenAI...")
    commit_message = openai_provider.improve_text(prompt, diff)

    if not commit_message:
        print("No commit message generated. Aborting commit.")
        return

    if args.print:
        print(f"Commit message (not committing): {commit_message}")
        return

    print(f"Committing with message: {commit_message}")

    result = subprocess.run(
        ["git", "commit", "-m", commit_message],
        check=True,
        cwd=cwd,
        stdin=subprocess.DEVNULL,
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
    )

    print(f"Commit successful! {result.stdout.decode('utf-8')}")


if __name__ == "__main__":
    main()

Code Explanation

Let’s break down the main components of the script:

OpenAIProvider Class: This class handles interaction with the OpenAI API. It initializes with the API key and model, and has a method improve_text that sends a prompt along with the text (the diff) to the API and retrieves the generated commit message.
remove_comments_from_diff Function: This function takes the raw diff output and removes comments, ensuring that only the relevant changes are sent to the AI for processing. It handles both single-line and multi-line comments, as well as Git-specific diff headers.
get_diff_from_stdin and get_diff_from_git Functions: These functions are responsible for obtaining the diff. The first reads from standard input (which is useful for piping), while the second runs a Git command to get the latest changes directly from the repository.
main Function: This is where the script starts executing. It sets up argument parsing, determines how to get the diff, and then uses the OpenAIProvider to generate a commit message. Depending on the user’s input, it can print the message or commit the changes directly.

The script is designed to be user-friendly and efficient, making it easier for developers to maintain a clear commit history without the hassle of crafting messages manually.

So, if you’re tired of writing commit messages or just want to add a bit of flair to your Git workflow, give this script a try! It’s a great way to leverage AI in your development process.

Final Thoughts

As we continue to integrate more tools into our workflows, scripts like this can save time and enhance our productivity. If you have any questions or suggestions, feel free to drop a comment below. Happy coding!

📚 Further Learning

🎥 Watch this video for more:

Enhancing Python Code with Type Safety