1 rewrite of scan batch dir by bdgregg · Pull Request #2 · ulsdevteam/scan-batch-dir

bdgregg · 2026-02-26T15:59:30Z

This PR addresses issue #1.

This is a full rewrite of the original scan-batch-dir script to allow it to be more modular so that changes needed in the future should be easier to implement. This also added the ability to use PDF files as newspaper issues.

…e islandora model in the model map.

…e Images.

ctgraham

Initial readthrough with some questions and suggestions.

scan-batch-dir

…as being the first value from the return column.

ojas-uls-dev · 2026-03-03T15:48:52Z

scan-batch-dir

@@ -1,4 +1,5 @@
-#!/usr/bin/python3
+#!/usr/bin/python3.12


Explicitly depending on 3.12 should be fine as long as the machine running the script has 3.12 (in the future, some machines may have versions >3.12). It might help to add a warning like if sys.version_info() < (3, 12): logger.warn("unsupported python version")

One way to not use the 3.6 default, but also not use an explicitly defined version would be to use #!/usr/bin/env python and then run the script within a virtual environment generated by python 3.12

The choosing of what version of python being run has been an ongoing issue for a while. There are various ways this can be achieved. None seem to be the best way. RHEL 8 ships with 3.6, RHEL 9 ships with 3.9, and the Ansible version used to build our systems requires 3.12. It is recommended by RedHat to not remove the shipped version as it impacts their code. But 3.6 is too old for some of the python modules we need to use. How best to not have to worry about the required version? Your test of sys.version_info() will probably help but only if the OS can expose the correct version to the script at runtime even if 3.12 is installed. RHEL has the 'alternatives' command but again changing the default version of python may impact OS required scripts is not recommended. This is why I hardcoded the 3.12 version. Additionally, I don't believe logger.warn will work as the logger hasn't been setup at the time of the check which I would do very early in the script.

ojas-uls-dev · 2026-03-03T15:51:26Z

scan-batch-dir

        print(f"Invalid content in YAML file: {path}.")
        raise
    except Exception as e:
        print(f"An unexpected error occurred while reading the YAML file: {str(e)}")


Log yaml error so it can be seen in log file if script is executed from non cli environment

@ojas-uls-dev at the time of executing read_yaml_file the log hasn't been created yet for the function to write to the log. This is a catch 22 if we move the creation of the log file before the read_yaml_file as the config has the default log file settings. I'm not sure what the best approach is to resolve this.

ojas-uls-dev · 2026-03-03T16:00:24Z

scan-batch-dir

-        parent    (str) The parent directory of the object.
-        df (pd.DataFrame)  The Pandas DataFrame we will be updating.
+    # Validate columns
+    if match_column not in df.columns:


is match_column intended to be global/inherited from caller? if not, it should be passed as arg.

match_value and return_column also seem to be inherited

@ojas-uls-dev, the match_column and return_column are both passed into the get_value_from_df function as arguements. They are not per say global.

ojas-uls-dev · 2026-03-03T16:02:35Z

scan-batch-dir

+        headers["Authorization"] = f"Bearer {auth_token}"
+
+    try:
+        response = requests.get(url, headers=headers, params=params)


response.raise_for_status() can also be used to handle 403s and 404s in same try except block

@ojas-uls-dev, are you suggesting halting the program if a 403/404 is encountered? And yes, we can add stanzas to handle 403/404 as well. Just wasn't sure you were indicating a failure should trigger a stop or just handle the 403/404 differently.

ojas-uls-dev · 2026-03-03T16:16:49Z

scan-batch-dir

+    return(df)
+
+
+def get_directory(directory: str):


os.walk does something very similar to this function

@ojas-uls-dev, it does look like it might do the same as this function except the function sorts the result alphabetically. os.walk doesn't look like it does that, but maybe there is away to sort the os.walk result the same way. This would be something that could be looked into.

bdgregg added 11 commits February 18, 2026 10:50

Initial issue commit.

7cc0f7a

Adjusting Islandora Models.

7075968

A bit of cleanup.

fb43989

Built out the PDF model, updated the model map, and explicitly set th…

97078c7

…e islandora model in the model map.

Added additional fields for PDFs and added minimal row_data for Simpl…

2d83e10

…e Images.

Added tables to the README.md file.

0a92610

Updated the tables in the README.md file.

e946d26

Updated the tables in the README.md file.

cd956a3

Updated the tables in the README.md file.

a6cce36

Updated the tables in the README.md file.

05ed05c

Updated the tables in the README.md file.

90a0624

bdgregg requested review from alex-wreschnig, chryslovelace, ctgraham, ojas-uls-dev and rzhang152 February 26, 2026 15:59

bdgregg self-assigned this Feb 26, 2026

bdgregg linked an issue Feb 26, 2026 that may be closed by this pull request

Rewrite of scan-batch-dir #1

Open

ctgraham requested changes Feb 26, 2026

View reviewed changes

bdgregg added 11 commits February 26, 2026 13:12

Fixed spacing in function call parameters.

ea1316c

Adjust function documentation to correctly describe the return value …

a7226d5

…as being the first value from the return column.

Added function documentation to process_file.

d2c1561

Adding some argument signatures to functions.

f04de97

Removed some unused functions.

4726124

Added the missing $ in the regex.

736b240

Remove function in preference for in-line code.

607681e

Removed unused function dump_df_columns.

91d8a30

Moved skip patterns to the config file to allow for customization.

fb5ca84

Updated the README.md file to address the \'skip\' parameter.

e356d52

Renaming of Models and a bit of clean up and more logging.

454e125

ojas-uls-dev reviewed Mar 3, 2026

View reviewed changes

		@@ -1,4 +1,5 @@
		#!/usr/bin/python3
		#!/usr/bin/python3.12

Conversation

bdgregg commented Feb 26, 2026

Uh oh!

ctgraham left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants