innofactororg · ottolote · May 4, 2024 · May 4, 2024 · May 4, 2024 · May 5, 2024
diff --git a/.dockerignore b/.dockerignore
@@ -0,0 +1,10 @@
+__pycache__
+*.pyc
+.env
+venv
+.venv
+.vscode
+.idea
+.git
+.github
+
diff --git a/.github/workflows/test.yaml b/.github/workflows/test.yaml
@@ -0,0 +1,41 @@
+name: tests
+
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+
+    steps:
+    - name: Check out repository
+      uses: actions/checkout@v4
+
+    - name: Set up Python 3.12
+      uses: actions/setup-python@v5
+      with:
+        python-version: 3.12
+
+    - name: Install dependencies
+      run: |
+        python -m pip install --upgrade pip
+        pip install -r requirements.txt
+        pip install pytest
+        pip install coverage
+        pip install .
+
+    - name: Run tests
+      run: coverage run -m pytest
+
+    - name: Make coverage report
+      run: coverage lcov
+
+    - name: Comment coverage report on PR
+      if: ${{ github.event_name == 'pull_request' }}
+      uses: romeovs/lcov-reporter-action@v0.3.1
+      with:
+        lcov-file: coverage.lcov
+        delete-old-comments: true
diff --git a/.gptcontext b/.gptcontext
@@ -2,23 +2,13 @@ Additional context is provided below.
 
 Preferences for python code:
 - adhere to common style conventions, e.g. PEP8
-- keep lines under 80 characters long
+- you MUST keep lines under 80 characters long
 
 Markdown2confluence pushes a folder containing markdown files and pushes them to confluence, with a page structure like the file and folder structure of the markdown files, and ignoring any non-markdown files.
 
 Required behavior:
-All pages managed by markdown2confluence contains $CONFLUENCE_PAGE_TITLE_SUFFIX, e.g. '(autogenerated)'. New pages are created with this suffix, and on subsequent runs any pages with the suffix (or label, TBD) are overwritten or deleted.
-Depending on how confluence labels work it might be best to use labels instead. If using labels, refuse to delete any pages that does not have the page title suffix.
-Any markdown that contains full or relative links to local media files should be published as pages with attached media. Relative links in markdown to local media are resolved from the location of the markdown file. Full-path links in markdown are resolved from the $MARKDOWN_FOLDER
-
-
-Currently I am working on:
-- Publisher class in publish.py contains the old code for now, I am moving
-  functionality to the other classes.
-- Change from directly using requests to using the confluence client from
-  atlassian
-- Use labels instead of only relying on the suffix (previously called search
-  pattern)
+- All pages managed by markdown2confluence contains a suffix, e.g. '(autogenerated)'. New pages are created with this suffix, and on subsequent runs any pages with the suffix (or label, TBD) are overwritten or deleted. Depending on how confluence labels work it might be best to use labels instead. If using labels, refuse to delete any pages that does not have the page title suffix.
+- Any markdown that contains full or relative links to local media files should be published as pages with attached media. Relative links in markdown to local media are resolved from the location of the markdown file. Full-path links in markdown are resolved from the $MARKDOWN_FOLDER
 
 
 file structure:
@@ -29,12 +19,15 @@ markdown2confluence/
 │   └── usage.md
 ├── LICENCE
 ├── markdown2confluence
-│   ├── converter.py
 │   ├── __init_.py
 │   ├── main.py
+│   ├── converter.py
 │   ├── confluence.py
 │   ├── config.py
-│   ├── file_manager.py
+│   ├── content_tree.py
+│   ├── parser.py
+│   ├── util.py
+│   ├── version.py
 │   └── publisher.py
 ├── README.md
 ├── requirements.txt
@@ -48,7 +41,7 @@ markdown2confluence/
     │   └── test_integration.py
     └── unit
         ├── __init__.py
-        ├── test_file_manager.py
+        ├── test_parser.py
         ├── test_confluence.py
         └── test_publisher.py
 
@@ -79,63 +72,123 @@ CONFLUENCE_IGNOREFILE
 
 #### Components and Their Key Interfaces
 
-1. **ConfluenceClient**
+1. **Publisher**
 
-Responsible for direct interactions with the Confluence API, handling operations like page creation, updates, deletion, and labeling with retries and backoff for robustness.
+Abstract Publisher class for publishing a content tree, respecting the ContentTree structure and managing page relationships.
 
 ```python
-class ConfluenceClient:
-    def __init__(self, confluence_config: dict):
-        """Initialize with API configuration."""
-
-    def create_or_update_page(self, title: str, html: str, parent_id=None, space_key: str, labels=None) -> dict:
-        """Create or update a Confluence page, applying labels."""
-
-    def delete_page(self, page_id: str) -> dict:
-        """Delete a Confluence page by ID."""
+class Publisher:
+    @abstractmethod
+    def publish_node(self, node: ContentNode, parent_id: str | None) -> str:
+        pass
+
+    def pre_publish_hook(self):
+        """
+        Optional step for actions to perform before publishing, such as
+        fetching/deleting previously published resources.
+        Can be overridden by subclasses.
+        """
+        pass
+
+    def post_publish_hook(self):
+        """
+        Optional step for actions to perform after publishing, such as
+        cleaning up resources or performing additional logging.
+        Can be overridden by subclasses.
+        """
+        pass
+
+    def publish_content(self, content_tree: ContentTree):
+        """
+        Traverse a content tree and call publish_node on each element.
+        """
+        pass
 ```
 
-2. **Publisher**
+2. **ConfluencePublisher**
 
-Orchestrates the conversion of Markdown to HTML and the subsequent publishing to Confluence, respecting the original directory structure and managing page relationships.
+Specialized publisher for confluence, implements the publish_node function responsible for creating/updating pages with labels etc in confluence
 
 ```python
-class Publisher:
-    def __init__(self, confluence_client: ConfluenceClient, source_directory: str, space_key: str):
-        """Setup with Confluence client, source directory, and target space key."""
-
-    def publish(self):
-        """Main method to start the publishing process."""
-
-    def traverse_directory(self, directory: str, parent_id=None):
-        """Recursively traverse directories, converting and uploading Markdown files."""
+class ConfluencePublisher(Publisher):
+    def __init__(self, confluence: Confluence = None):
+        pass
+
+    def pre_publish_hook(self):
+        """
+        Specialized for this subclass.
+        Fetch all pages matching space, label and suffix
+        """
+
+    def post_publish_hook(self):
+        """
+        Specialized for this subclass.
+        Delete pages not in the ContentTree
+        """
+
+    def publish_node(self, node: ContentNode, parent_id: str | None) -> str:
+        """
+        Create or update pages, including attachments, ensuring labels on newly created pages.
+        """
+        pass
 ```
 
-3. **FileManager** (unchanged, conceptual)
+3. **Parser**
 
-Handles file reading and potentially logging or other file outputs, and maybe traversing the file system
+Responsible for parsing the source files from e.g. the file system.
 
 ```python
-class FileManager:
-    def read_file(self, path: str) -> str:
-        """Read the content of a file."""
+class Parser(ABC):
+    @abstractmethod
+    def parse_directory(self, directory: str) -> ContentTree:
+        pass
+
+
+class MarkdownParser(Parser):
+    def parse_directory(self, directory: str) -> ContentTree:
+        pass
 ```
 
-### Workflow Overview with Snippets
+4. **ContentTree**
+
+Defines the shared data structure for content between Parser and Publisher
 
-- The process starts with `Publisher`, which is initialized with necessary configurations and an instance of `ConfluenceClient`.
-
 ```python
-publisher = Publisher(confluence_client=ConfluenceClient(confluence_config), source_directory="path/to/markdown", space_key="SPACEKEY")
-publisher.publish()
-```
+@dataclass
+class ContentNode:
+    name: str
+    content: str | None = None
+    metadata: dict | None = None
+    parent: 'ContentNode | None' = None
+    children: dict[str, 'ContentNode'] = field(default_factory=dict)
+
+    def add_child(self, node: 'ContentNode'):
+        pass
+
+    def get_child(self, name: str) -> 'ContentNode | None':
+        pass
 
-- `Publisher.publish()` begins the process, invoking `traverse_directory()` to walk through the directory structure, processing each Markdown file by converting it to HTML.
+    def is_leaf(self) -> bool:
+        pass
 
-- For each processed file, `Publisher` uses `ConfluenceClient.create_or_update_page()` to either create a new page or update an existing one in Confluence, applying a predefined label to mark the page as managed by `markdown2confluence`.
+    def is_root(self) -> bool:
+        pass
 
-- Should a page need to be deleted or labels added, `Publisher` utilizes other methods of `ConfluenceClient` like `delete_page()` and maybe `add_labels_to_page()`, ensuring the Confluence space remains synchronized with the source content.
+    def __str__(self, level: int = 0) -> str:
+        pass
 
-### Conclusion
 
-This architecture, enriched with interface snippets, outlines a clear, modular approach to converting and managing Markdown content within Confluence, ensuring scalability and maintainability through well-defined responsibilities and robust Confluence API interactions.
+@dataclass
+class ContentTree:
+    root: ContentNode = field(default_factory=lambda: ContentNode('root'))
+
+    def add_node(self, path_list: list, content: str | None = None,
+                 metadata: dict | None = None):
+        pass
+
+    def find_node(self, path_list: list) -> ContentNode | None:
+        pass
+
+    def __str__(self) -> str:
+        pass
+```
diff --git a/Dockerfile b/Dockerfile
@@ -1,21 +1,23 @@
-FROM python:3.10-slim
+FROM python:3.11-slim
 
 WORKDIR /app
 
 COPY requirements.txt /app/
-
 RUN pip install --no-cache-dir -r requirements.txt
 
+COPY . /app/
+
+# Install the current package
+RUN pip install .
+
 ENV CONFLUENCE_USERNAME=""
 ENV CONFLUENCE_PASSWORD=""
-ENV CONFLUENCE_URL="https://yourdomain.atlassian.net/wiki/rest/api/"
+ENV CONFLUENCE_URL="https://yourdomain.atlassian.net/wiki/"
 ENV CONFLUENCE_SPACE_ID="yourspace"
 ENV CONFLUENCE_PARENT_PAGE_ID="12345"
 ENV CONFLUENCE_PAGE_TITLE_SUFFIX="(autogenerated)"
 ENV CONFLUENCE_PAGE_LABEL="markdown2confluence"
 ENV MARKDOWN_FOLDER="./"
 ENV MARKDOWN_SOURCE_REF=""
 
-COPY ./markdown2confluence /app
-
-CMD ["python", "/app/main.py"]
+CMD ["python", "markdown2confluence/main.py"]
diff --git a/Pipfile b/Pipfile
@@ -0,0 +1,16 @@
+[[source]]
+url = "https://pypi.org/simple"
+verify_ssl = true
+name = "pypi"
+
+[packages]
+atlassian-python-api = "*"
+markdown = "*"
+
+[dev-packages]
+setuptools = "*"
+pytest-watch = "*"
+pytest = "*"
+
+[requires]
+python_version = "3.11"