Claire/sport scrapers by claiireyu · Pull Request #70 · cuappdev/score-backend

claiireyu · 2026-04-22T19:01:07Z

Overview

Added scraping functionality for sports that do not have box scores but instead have a recap link.

Changes Made

This scraper captures the recap links for:
"Swimming & Diving",
"Track & Field",
"Wrestling",
"Golf",
"Polo",
"Fencing",
"Equestrian",
"Gymnastics"
and, using those links, visits their respective news article and scrapes the article's title and published at date/time.

Test Coverage

Tested locally with Graphql

Summary by CodeRabbit

Release Notes

New Features
- Integrated Firebase authentication for user login and signup
- Added game recap information including article links, titles, and publication dates to game records
- Added support for parsing softball game score summaries
Chores
- Added Firebase Admin SDK dependency

- Integrated Firebase for user login and signup using Firebase ID tokens. - Updated login and signup mutations to handle Firebase authentication. - Modified user schema to reflect changes in authentication method. - Updated .gitignore to include Firebase service account key.

- Introduced a new function to extract scoring summaries for softball games, including inning and score details. - Updated the database index setup to ensure uniqueness of game entries by including the time field. - Added an 'inning' field to the BoxScoreEntryType for better data representation.

…oftball scraper with coderabbit feedback - Added new environment variables for Google and Firebase credentials in .env_template. - Updated Docker configuration to mount Firebase service account key. - Improved error handling in login and signup mutations for better token validation. - Enhanced database indexing for users to ensure unique Firebase UID entries. - Improved softball scraper logic

- Added optional fields for game recap links, article titles, and published dates in the GameType and Game models. - Updated CreateGame mutation to accept new recap fields. - Implemented a new scraping function to extract recap headlines and published times from Cornell Sidearm story pages.

coderabbitai · 2026-04-22T19:01:20Z

📝 Walkthrough

Walkthrough

This pull request integrates Firebase authentication, extends the game data model with recap information, and enhances game scraping to fetch recap details from Sidearm story pages. Changes include Firebase Admin SDK setup, authentication mutation refactoring from net_id to ID token verification, new recap fields in the game model, time-aware game lookups, and conditional recap scraping logic.

Changes

Cohort / File(s)	Summary
Firebase Setup `.env_template`, `.gitignore`, `app.py`, `docker-compose.yml`, `requirements.txt`	Added Firebase Admin SDK initialization, environment variables (`GOOGLE_APPLICATION_CREDENTIALS`, `FIREBASE_CREDENTIALS_HOST_PATH`), Docker volume mount for credentials, dependency (`firebase-admin==7.3.0`), and .gitignore entry for credentials file.
Authentication System Refactor `src/mutations/login_user.py`, `src/mutations/signup_user.py`, `src/schema.py`	Changed user authentication from `net_id` to Firebase ID token verification. Login/signup mutations now accept `id_token`, verify via `firebase_auth.verify_id_token`, and query users by `firebase_uid`. Added token validation error handling and updated mutation descriptions.
Game Data Model Extensions `src/models/game.py`, `src/types.py`, `src/mutations/create_game.py`	Extended `Game` model and `GameType` with three new optional fields: `recap_link`, `recap_article_title`, `recap_published_at`. Updated `CreateGame` mutation to accept these fields and pass them through to service layer.
Game Lookup Enhancement `src/repositories/game_repository.py`, `src/services/game_service.py`	Added optional `time` parameter to `find_by_key_fields` and `find_by_tournament_key_fields` to support time-aware game lookups, with helper logic to filter `None`/`"TBD"`/`"TBA"` values. Updated service methods to forward time parameter through lookup chain.
Game Scraping & Recap Data `src/scrapers/game_details_scrape.py`, `src/scrapers/games_scraper.py`	Added `scrape_sidearm_story_recap()` function to extract recap metadata from Sidearm story pages. Added `softball_summary()` parser for softball scoring events. Enhanced schedule scraper with `to_absolute_url()` and `parse_game_links()` helpers. Modified scraping flow to conditionally fetch and merge recap data when available. Added HTTP headers and timeouts to all requests.
Database & Constants `src/database.py`, `src/utils/constants.py`	Updated compound index on `game` collection to include `time` field; added new unique, sparse index on `users` for `firebase_uid`. Introduced new constants: `HTTP_REQUEST_HEADERS`, `SPORTS_WITH_SIDEARM_STORY_RECAP`, `RECAP_TAG`, `SIDEARM_STORY_HEADLINE`, `SIDEARM_STORY_PUBLISHED_TIME`.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Client
    participant Backend
    participant Firebase
    participant Database

    User->>Client: Provide ID token
    Client->>Backend: login(id_token) or signup(id_token, name, email)
    Backend->>Firebase: verify_id_token(id_token)
    Firebase-->>Backend: {uid, email, ...}
    Backend->>Database: find_one({firebase_uid: uid})
    Database-->>Backend: user doc or null
    alt User exists (login)
        Backend->>Database: (user already retrieved)
    else User not found (signup)
        Backend->>Database: insert_one({firebase_uid: uid, email, name, ...})
        Database-->>Backend: {_id, ...}
    end
    Backend->>Backend: create_access_token(_id)
    Backend-->>Client: {access_token, refresh_token}
    Client-->>User: Auth success

sequenceDiagram
    participant Scraper
    participant SchedulePage
    participant SidearmStory
    participant Database

    Scraper->>SchedulePage: fetch schedule page
    SchedulePage-->>Scraper: game list with links
    Scraper->>Scraper: parse_game_links() → recap_link, ticket_link
    alt Sport in SPORTS_WITH_SIDEARM_STORY_RECAP
        Scraper->>SidearmStory: scrape_sidearm_story_recap(recap_link)
        SidearmStory-->>Scraper: {recap_article_title, recap_published_at}
    else Sport without Sidearm story
        Scraper->>Scraper: scrape_game() → box score data
    end
    Scraper->>Scraper: merge recap/box score data
    Scraper->>Database: create/update game with recap_link, recap_article_title, recap_published_at
    Database-->>Scraper: game saved

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰 Firebase tokens now light the way,
With recap stories for games we play,
Time-aware lookups, no more TBD,
Sidearm stories scraped with glee,
Authentication reborn—hop hooray! 🌱

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 57.58% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'Claire/sport scrapers' is vague and does not clearly convey the main change; it reads as a feature branch name rather than a descriptive pull request summary.	Rename the title to clearly describe the primary change, such as 'Add recap scraping for sports without box scores' or 'Implement scraper for sports recap articles'.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description covers the main change (recap scraping functionality) but lacks detail on test coverage specifics and related Firebase authentication/database changes mentioned in PR objectives.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claire/sportScrapers

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

src/database.py (1)
72-90: ⚠️ Potential issue | 🟠 Major

Old unique index likely still present after rename — drop it during migration.

The previous unique compound index (without time) was created under a different name and will not be replaced by create_index just because the key spec and name changed here. On existing deployments you'll end up with both the old index (enforcing uniqueness without time) and the new uniq_game_key_with_time, which defeats the purpose of adding time to distinguish doubleheaders — inserts for a same-day doubleheader will still fail the old index.

Please drop the prior index explicitly (by its old name) as part of the migration/startup, e.g.:
try:
    game_collection.drop_index("<previous_unique_index_name>")
except OperationFailure:
    pass
Also note: since this index is not sparse, legacy documents missing time will compare as time: null, so doubleheaders where neither row has a time value still collide. Consider whether sparse=True (like the firebase_uid index below) or a partial filter is appropriate.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/database.py` around lines 72 - 90, Drop the old unique compound index
before creating the new one so the legacy uniqueness constraint (without "time")
doesn't block doubleheader inserts: call
game_collection.drop_index("<previous_unique_index_name>") inside a try/except
that ignores OperationFailure, then proceed to game_collection.create_index(...)
that creates "uniq_game_key_with_time"; additionally consider making the new
index sparse=True or using a partialFilterExpression to avoid collisions for
legacy documents missing the "time" field (or update legacy docs to include
time) — adjust the code around game_collection.create_index and the surrounding
migration/startup logic.
src/types.py (1)
116-136: ⚠️ Potential issue | 🟠 Major

GameType.__init__ is missing team parameter and uses a fragile positional signature.

The issues are concrete:

The team field is declared (line 110) and resolve_team accesses parent.team (line 151), but __init__ never assigns it. This will raise AttributeError when resolve_team tries to access parent.team, though the fallback to opponent_id masks the failure. The fix: accept and assign team=None, matching the pattern in TeamType, YoutubeVideoType, and ArticleType already in this file.

The signature forces 10 positional parameters in a fixed order—any caller using keyword args or a different order breaks. Future field additions shift all positions further. Prefer **kwargs (the standard in this codebase for the other GraphQL types), which also removes the need for positional parameter management.

The id parameter shadows Python's builtin (Ruff A002). Using **kwargs resolves this.
♻️ Suggested refactor
-    def __init__(
-        self, id, city, date, gender, location, opponent_id, result, sport, state, time, box_score=None, score_breakdown=None, utc_date=None, ticket_link=None, recap_link=None, recap_article_title=None, recap_published_at=None
-    ):
-        self.id = id    
-        self.city = city
-        self.date = date
-        self.gender = gender
-        self.location = location
-        self.opponent_id = opponent_id
-        self.result = result
-        self.sport = sport
-        self.state = state
-        self.time = time
-        self.box_score = box_score
-        self.score_breakdown = score_breakdown
-        self.utc_date = utc_date
-        self.ticket_link = ticket_link
-        self.recap_link = recap_link
-        self.recap_article_title = recap_article_title
-        self.recap_published_at = recap_published_at
+    def __init__(self, **kwargs):
+        kwargs.setdefault("team", None)
+        for key, value in kwargs.items():
+            setattr(self, key, value)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/types.py` around lines 116 - 136, GameType.__init__ currently omits
assigning the team field and uses a fragile positional signature; update
GameType.__init__ to accept a team parameter (team=None) and assign it to
self.team, and refactor the signature to use **kwargs (matching TeamType,
YoutubeVideoType, ArticleType) so callers can pass fields by name and avoid
positional fragility and builtin shadowing of id; ensure resolve_team can safely
access parent.team and that other existing callers constructing GameType
continue to work with the new **kwargs pattern.
src/scrapers/games_scraper.py (1)
315-349: ⚠️ Potential issue | 🟡 Minor

Inconsistent handling of recap fields between update and create paths.

Update (L315-321): ticket_link and recap_link are always written (can overwrite an existing non-null value with None), but recap_article_title / recap_published_at are conditional on key presence (preserving prior values when scrape returns empty).

Create (L346-349): all four fields are set unconditionally via .get(...), so they land as None when missing.

If the intent is "never stomp a previously scraped title/date with None on re-scrape," then recap_link probably deserves the same protection (a recap page that 404s temporarily shouldn't wipe the stored link). Conversely, if None overwrites are fine, the conditional branches at L318-321 are unnecessary. Please pick one policy and apply it consistently.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scrapers/games_scraper.py` around lines 315 - 349, The code treats recap
fields inconsistently between the update path (where ticket_link and recap_link
are always set but recap_article_title/recap_published_at are only set if
present) and the create path (where all four fields use .get(...) so can be
written as None); pick one policy — I suggest preserving existing stored values
unless the scraper returns a non-null value — and apply it consistently: in the
update flow around GameService.update_game only include/assign recap_link,
recap_article_title, and recap_published_at when game_data contains a non-null
value, and in the create flow (the game_data dict used for creating new games)
do not set those keys to None — build the dict so those keys are only included
when game_data.get(...) is not None (same treatment for ticket_link if you want
it preserved), ensuring consistent behavior between creation and
GameService.update_game.

🧹 Nitpick comments (6)

src/models/game.py (1)
21-23: Nit: docstring uses single quotes for the new keys while other bullets use backticks.

Lines 20–23 mix 'ticket_link'/'recap_link' style with the backtick-quoted keys above. Low priority, but worth aligning for consistency.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/models/game.py` around lines 21 - 23, The docstring in src/models/game.py
mixes single-quoted keys with the backtick-quoted style used elsewhere; update
the bullets for recap_link, recap_article_title, recap_published_at (and
ticket_link if present nearby) to use backticks instead of single quotes so the
docstring formatting is consistent with the other key entries in that docstring.
app.py (1)
26-28: Nit: imports placed mid-file rather than with the other top-of-file imports.

import os, import firebase_admin, and from firebase_admin import credentials at Lines 26–28 are added after an unrelated block of project imports. Consider hoisting os to the top (it's a stdlib) and grouping the firebase imports with the other third-party imports for readability (and to satisfy most import-order linters if you enable one).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app.py` around lines 26 - 28, Move the misplaced imports so they follow
normal import ordering: hoist the standard library import "import os" up into
the top-of-file stdlib import block, and place "import firebase_admin" and "from
firebase_admin import credentials" with the other third-party imports; update
any import grouping/comments to reflect the change so import_order linters (and
readers) see stdlib, third-party, then local/project imports.
requirements.txt (1)
1-14: Most dependencies remain unpinned.

Only Flask-JWT-Extended and now firebase-admin are version-pinned; the rest float to whatever PyPI serves at build time. This makes builds non-reproducible and can introduce surprise breakage (e.g., a new Flask/Flask-GraphQL/graphene major). Consider pinning the remaining packages (or at least using >=,< upper bounds), or generating a requirements.lock via pip-compile. Out of scope for this PR — flagging for follow-up.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@requirements.txt` around lines 1 - 14, The requirements file leaves most
packages unpinned (e.g., Flask, Flask-CORS, Flask-GraphQL, graphene, pymongo,
beautifulsoup4, requests, pillow, Flask-APScheduler, python-dotenv, pytz,
gunicorn) which makes builds non-reproducible; update requirements.txt to pin
versions or use bounded ranges (e.g., package==x.y.z or package>=a.b,<c.d) for
these entries, or alternatively generate and commit a frozen requirements.lock
via pip-compile (or pip freeze) to capture exact resolved versions and reference
that lock file in CI; ensure you update entries like Flask, graphene, pymongo,
requests, pillow, etc., rather than leaving them floating.
src/mutations/signup_user.py (1)
45-48: Consider idempotent signup when a user already exists.

When a client calls signup after a previous successful signup (same Firebase user), this raises "User already exists." and the client must then call loginUser. Since the Firebase ID token is already verified and firebase_uid is the primary identity, you could make signup idempotent: on DuplicateKeyError, look up the existing user by firebase_uid and return fresh access/refresh tokens. This avoids an extra round trip for clients that don't track whether the user was previously registered.

Not blocking — purely a UX/API-design consideration.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/mutations/signup_user.py` around lines 45 - 48, On DuplicateKeyError in
the signup_user flow, make signup idempotent: catch DuplicateKeyError, lookup
the existing user via db["users"].find_one({"firebase_uid": firebase_uid}) (use
the same firebase_uid from the incoming request/user_doc), and if found generate
and return fresh access/refresh tokens for that user (reuse your existing token
issuance logic / loginUser helper such as issue_tokens_for_user or
create_access_token/create_refresh_token); only if no user is found re-raise the
error. This keeps signup idempotent while preserving existing error behavior
otherwise.
src/scrapers/games_scraper.py (2)
106-107: Consider response.raise_for_status() before parsing.

If cornellbigred.com returns a 4xx/5xx (which the comment above HTTP_REQUEST_HEADERS in src/utils/constants.py explicitly notes can happen with the default UA, and can still happen intermittently), BeautifulSoup(response.content, ...) silently parses an empty/error page and the sport's entire schedule is skipped with no signal. Raising here surfaces the failure to the thread and to your logs.
♻️ Proposed fix
-    response = requests.get(url, headers=HTTP_REQUEST_HEADERS, timeout=30)
+    response = requests.get(url, headers=HTTP_REQUEST_HEADERS, timeout=30)
+    response.raise_for_status()
     soup = BeautifulSoup(response.content, "html.parser")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scrapers/games_scraper.py` around lines 106 - 107, The code calls
requests.get(url, ...) and immediately passes response.content into
BeautifulSoup; add a call to response.raise_for_status() immediately after the
requests.get(...) call (before BeautifulSoup(...)) so HTTP 4xx/5xx responses
raise and surface to the thread/logs; optionally wrap the requests.get +
raise_for_status sequence in a try/except to log the error and re-raise if you
want clearer diagnostics — modify the block around requests.get, response, and
BeautifulSoup in games_scraper.py accordingly.
70-71: Use .get("href") to avoid a potential KeyError.

ticket_link_tag["href"] raises KeyError if the anchor exists without an href attribute. to_absolute_url already handles None, so switching to .get is strictly safer with no behavioral downside (mirrors the pattern you use at L61 for the box-score tag).
🛡️ Proposed fix
     ticket_link_tag = game_item.select_one(GAME_TICKET_LINK)
-    ticket_link = to_absolute_url(ticket_link_tag["href"]) if ticket_link_tag else None
+    ticket_link = to_absolute_url(ticket_link_tag.get("href")) if ticket_link_tag else None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scrapers/games_scraper.py` around lines 70 - 71, Replace the direct key
access ticket_link_tag["href"] with ticket_link_tag.get("href") so an anchor
without an href doesn't raise KeyError; update the assignment in the
games_scraper logic where ticket_link_tag is used (mirroring the pattern used
for box_score_tag at L61) and pass the possibly None value into
to_absolute_url(ticket_link) as before.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app.py`:
- Around line 30-45: The module currently reads SERVICE_ACCOUNT_PATH at import
and calls initialize_firebase() unconditionally which causes hard failures and
premature logging; change initialize_firebase() to not run at import (remove the
top-level initialize_firebase() call), refactor it to attempt
firebase_admin.get_app() in a try/except instead of using firebase_admin._apps,
only initialize with credentials.Certificate(SERVICE_ACCOUNT_PATH) if
SERVICE_ACCOUNT_PATH is present, and when credentials are missing or unreadable
log a warning (using a module logger or after logging.basicConfig()) and return
None so callers can opt into Firebase; ensure any code that needs Firebase calls
initialize_firebase() explicitly (or use an app factory/request hook) rather
than relying on import-time side effects.

In `@src/mutations/login_user.py`:
- Around line 8-33: Remove firebase_auth.RevokedIdTokenError from the
_TOKEN_ERRORS tuple since verify_id_token(id_token) is called without
check_revoked=True; then in LoginUser.mutate add a separate except block for
firebase_auth.CertificateFetchError (imported from firebase_auth) to catch
transient certificate fetch/network errors and raise a distinct GraphQLError
like "Authentication service unavailable, please try again later." from err so
certificate fetch failures are distinguished from token validation failures;
keep the existing ValueError and other token-related exception handling around
verify_id_token.

In `@src/mutations/signup_user.py`:
- Around line 33-48: The uid-check in signup_user.py currently raises
GraphQLError("Token missing uid") and chains a synthetic KeyError("uid"); change
the GraphQLError message to match login_user.py ("Invalid or expired token.")
for consistent client-facing auth errors, and remove the artificial exception
chaining (drop the "from KeyError('uid')" so you simply raise GraphQLError(...))
at the firebase_uid check in the signup flow where firebase_uid =
decoded.get("uid") is validated.

In `@src/scrapers/game_details_scrape.py`:
- Around line 29-63: The scrape_sidearm_story_recap function currently swallows
all errors with a broad except Exception and returns {} silently; narrow the
exception handling to requests.RequestException (for the requests.get call) and
log any non-200 responses and caught exceptions at warning level so failures are
visible in production monitoring; keep the existing fallback parsing for
og:title and article:published_time, but replace the bare except block around
requests.get/BeautifulSoup with targeted try/except for requests and separately
handle parsing errors by logging a warning if BeautifulSoup parsing/selectors
(SIDEARM_STORY_HEADLINE, SIDEARM_STORY_PUBLISHED_TIME) return unexpected
results, ensuring you reference scrape_sidearm_story_recap, requests.get,
response.status_code, and the selector names when making the changes.

---

Outside diff comments:
In `@src/database.py`:
- Around line 72-90: Drop the old unique compound index before creating the new
one so the legacy uniqueness constraint (without "time") doesn't block
doubleheader inserts: call
game_collection.drop_index("<previous_unique_index_name>") inside a try/except
that ignores OperationFailure, then proceed to game_collection.create_index(...)
that creates "uniq_game_key_with_time"; additionally consider making the new
index sparse=True or using a partialFilterExpression to avoid collisions for
legacy documents missing the "time" field (or update legacy docs to include
time) — adjust the code around game_collection.create_index and the surrounding
migration/startup logic.

In `@src/scrapers/games_scraper.py`:
- Around line 315-349: The code treats recap fields inconsistently between the
update path (where ticket_link and recap_link are always set but
recap_article_title/recap_published_at are only set if present) and the create
path (where all four fields use .get(...) so can be written as None); pick one
policy — I suggest preserving existing stored values unless the scraper returns
a non-null value — and apply it consistently: in the update flow around
GameService.update_game only include/assign recap_link, recap_article_title, and
recap_published_at when game_data contains a non-null value, and in the create
flow (the game_data dict used for creating new games) do not set those keys to
None — build the dict so those keys are only included when game_data.get(...) is
not None (same treatment for ticket_link if you want it preserved), ensuring
consistent behavior between creation and GameService.update_game.

In `@src/types.py`:
- Around line 116-136: GameType.__init__ currently omits assigning the team
field and uses a fragile positional signature; update GameType.__init__ to
accept a team parameter (team=None) and assign it to self.team, and refactor the
signature to use **kwargs (matching TeamType, YoutubeVideoType, ArticleType) so
callers can pass fields by name and avoid positional fragility and builtin
shadowing of id; ensure resolve_team can safely access parent.team and that
other existing callers constructing GameType continue to work with the new
**kwargs pattern.

---

Nitpick comments:
In `@app.py`:
- Around line 26-28: Move the misplaced imports so they follow normal import
ordering: hoist the standard library import "import os" up into the top-of-file
stdlib import block, and place "import firebase_admin" and "from firebase_admin
import credentials" with the other third-party imports; update any import
grouping/comments to reflect the change so import_order linters (and readers)
see stdlib, third-party, then local/project imports.

In `@requirements.txt`:
- Around line 1-14: The requirements file leaves most packages unpinned (e.g.,
Flask, Flask-CORS, Flask-GraphQL, graphene, pymongo, beautifulsoup4, requests,
pillow, Flask-APScheduler, python-dotenv, pytz, gunicorn) which makes builds
non-reproducible; update requirements.txt to pin versions or use bounded ranges
(e.g., package==x.y.z or package>=a.b,<c.d) for these entries, or alternatively
generate and commit a frozen requirements.lock via pip-compile (or pip freeze)
to capture exact resolved versions and reference that lock file in CI; ensure
you update entries like Flask, graphene, pymongo, requests, pillow, etc., rather
than leaving them floating.

In `@src/models/game.py`:
- Around line 21-23: The docstring in src/models/game.py mixes single-quoted
keys with the backtick-quoted style used elsewhere; update the bullets for
recap_link, recap_article_title, recap_published_at (and ticket_link if present
nearby) to use backticks instead of single quotes so the docstring formatting is
consistent with the other key entries in that docstring.

In `@src/mutations/signup_user.py`:
- Around line 45-48: On DuplicateKeyError in the signup_user flow, make signup
idempotent: catch DuplicateKeyError, lookup the existing user via
db["users"].find_one({"firebase_uid": firebase_uid}) (use the same firebase_uid
from the incoming request/user_doc), and if found generate and return fresh
access/refresh tokens for that user (reuse your existing token issuance logic /
loginUser helper such as issue_tokens_for_user or
create_access_token/create_refresh_token); only if no user is found re-raise the
error. This keeps signup idempotent while preserving existing error behavior
otherwise.

In `@src/scrapers/games_scraper.py`:
- Around line 106-107: The code calls requests.get(url, ...) and immediately
passes response.content into BeautifulSoup; add a call to
response.raise_for_status() immediately after the requests.get(...) call (before
BeautifulSoup(...)) so HTTP 4xx/5xx responses raise and surface to the
thread/logs; optionally wrap the requests.get + raise_for_status sequence in a
try/except to log the error and re-raise if you want clearer diagnostics —
modify the block around requests.get, response, and BeautifulSoup in
games_scraper.py accordingly.
- Around line 70-71: Replace the direct key access ticket_link_tag["href"] with
ticket_link_tag.get("href") so an anchor without an href doesn't raise KeyError;
update the assignment in the games_scraper logic where ticket_link_tag is used
(mirroring the pattern used for box_score_tag at L61) and pass the possibly None
value into to_absolute_url(ticket_link) as before.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: deb16694-2e0a-4b85-8909-ca368440e431

📥 Commits

Reviewing files that changed from the base of the PR and between 8ebefa3 and 5805267.

📒 Files selected for processing (17)

.env_template
.gitignore
app.py
docker-compose.yml
requirements.txt
src/database.py
src/models/game.py
src/mutations/create_game.py
src/mutations/login_user.py
src/mutations/signup_user.py
src/repositories/game_repository.py
src/schema.py
src/scrapers/game_details_scrape.py
src/scrapers/games_scraper.py
src/services/game_service.py
src/types.py
src/utils/constants.py

coderabbitai · 2026-04-22T19:06:43Z

+SERVICE_ACCOUNT_PATH = os.environ.get("GOOGLE_APPLICATION_CREDENTIALS")
+
+
+def initialize_firebase():
+    if not firebase_admin._apps:
+        if not SERVICE_ACCOUNT_PATH:
+            raise ValueError(
+                "GOOGLE_APPLICATION_CREDENTIALS is not set. Set it to your firebase-service-account-key.json path."
+            )
+        cred = credentials.Certificate(SERVICE_ACCOUNT_PATH)
+        firebase_admin.initialize_app(cred)
+        logging.info("Firebase app initialized.")
+    return firebase_admin.get_app()
+
+
+initialize_firebase()


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

firebase-admin Python 7.3.0 recommended way to check if default app is initialized

💡 Result:

The recommended way to check if the default Firebase app is initialized in firebase-admin Python SDK version 7.3.0 is to attempt to retrieve it using firebase_admin.get_app and catch the ValueError exception if it does not exist. This is the idiomatic approach since the official documentation states that get_app raises ValueError if the specified app (default is '[DEFAULT]') does not exist, while initialize_app raises ValueError if it already exists. Example code: import firebase_admin from firebase_admin import credentials try: app = firebase_admin.get_app # Checks if default app exists except ValueError: cred = credentials.Certificate('path/to/serviceAccountKey.json') app = firebase_admin.initialize_app(cred) Alternative (not officially recommended, uses internal state): if not firebase_admin._apps: firebase_admin.initialize_app(cred) This checks the internal _apps dict, but _apps is an implementation detail (protected member). No public is_app_initialized function exists in v7.3.0; a PR for it remains open. Official docs emphasize initializing once and using get_app thereafter.

Citations:

1: https://stackoverflow.com/questions/44293241/check-if-a-firebase-app-is-already-initialized-in-python

2: https://stackoverflow.com/questions/44293241/check-if-a-firebase-app-is-already-initialized-in-python/68236344#68236344

3: https://stackoverflow.com/questions/68426892/why-i-get-this-error-on-python-firebase-admin-initialize-app

4: [FR] Improve Ability to detect if App has been Initialized firebase/firebase-admin-python#701

5: https://firebase.google.com/docs/reference/admin/python/firebase_admin

6: https://github.com/firebase/firebase-admin-python/blob/master/firebase_admin/__init__.py

7: https://github.com/firebase/firebase-admin-python/releases/tag/v7.3.0

🏁 Script executed:

# First, find app.py and check its content around lines 30-45 find . -name "app.py" -type f | head -5

Repository: cuappdev/score-backend

Length of output: 74

🏁 Script executed:

# Also check for docker-compose.yml find . -name "docker-compose.yml" -type f

Repository: cuappdev/score-backend

Length of output: 86

🏁 Script executed:

# Look for test files that might import app find . -path "*/test*" -name "*.py" -type f | head -10

Repository: cuappdev/score-backend

Length of output: 48

🏁 Script executed:

# Check app.py size and read it wc -l ./app.py && echo "---" && cat -n ./app.py

Repository: cuappdev/score-backend

Length of output: 8210

🏁 Script executed:

# Check docker-compose.yml cat -n ./docker-compose.yml

Repository: cuappdev/score-backend

Length of output: 843

🏁 Script executed:

# Look for test files more broadly find . -name "*test*.py" -type f 2>/dev/null

Repository: cuappdev/score-backend

Length of output: 48

🏁 Script executed:

# Search for imports of app module to understand blast radius rg "from app import|import app" --type py

Repository: cuappdev/score-backend

Length of output: 48

🏁 Script executed:

# Also check if scraper.py or other scripts might import app grep -r "import app\|from app" --include="*.py" . 2>/dev/null | head -20

Repository: cuappdev/score-backend

Length of output: 48

🏁 Script executed:

# Find scraper.py and check it find . -name "scraper.py" -type f && echo "---" && wc -l ./scraper.py

Repository: cuappdev/score-backend

Length of output: 98

🏁 Script executed:

# Check what happens when scraper.py runs without GOOGLE_APPLICATION_CREDENTIALS head -50 ./scraper.py

Repository: cuappdev/score-backend

Length of output: 1441

🏁 Script executed:

# Also look for any pytest or test configuration find . -name "pytest.ini" -o -name "conftest.py" -o -name "tox.ini" -o -name ".pytest_cache" 2>/dev/null

Repository: cuappdev/score-backend

Length of output: 48

Firebase initialization at import time will hard-fail any process that imports app.

The code captures SERVICE_ACCOUNT_PATH at module load (line 30) and invokes initialize_firebase() unconditionally at line 45, before logging.basicConfig() (line 115). This creates multiple failure modes:

The scraper service in docker-compose.yml (line 18) does not set GOOGLE_APPLICATION_CREDENTIALS, so starting it will crash with ValueError, taking down the container.

If the credentials file is missing or unreadable, credentials.Certificate() also fails with the same blast radius.

The logging.info("Firebase app initialized.") call at line 41 executes before logging.basicConfig(), so the message goes to the root logger with default configuration and is effectively swallowed.

Move initialization out of module scope — either into an app factory, a request context, or behind an explicit initialization call. When credentials are absent, degrade gracefully (log a warning, skip init) rather than crashing. Only the mutations that actually require auth need Firebase.

Also, relying on the private firebase_admin._apps attribute is brittle. Use the documented approach:

try: app = firebase_admin.get_app() except ValueError: # App not initialized yet cred = credentials.Certificate(service_account_path) app = firebase_admin.initialize_app(cred)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@app.py` around lines 30 - 45, The module currently reads SERVICE_ACCOUNT_PATH at import and calls initialize_firebase() unconditionally which causes hard failures and premature logging; change initialize_firebase() to not run at import (remove the top-level initialize_firebase() call), refactor it to attempt firebase_admin.get_app() in a try/except instead of using firebase_admin._apps, only initialize with credentials.Certificate(SERVICE_ACCOUNT_PATH) if SERVICE_ACCOUNT_PATH is present, and when credentials are missing or unreadable log a warning (using a module logger or after logging.basicConfig()) and return None so callers can opt into Firebase; ensure any code that needs Firebase calls initialize_firebase() explicitly (or use an app factory/request hook) rather than relying on import-time side effects.

coderabbitai · 2026-04-22T19:06:43Z

+_TOKEN_ERRORS = (
+    firebase_auth.InvalidIdTokenError,
+    firebase_auth.ExpiredIdTokenError,
+    firebase_auth.RevokedIdTokenError,
+)
+

 class LoginUser(Mutation):
    class Arguments:
-        net_id = String(required=True, description="User's net ID (e.g. Cornell netid).")
+        id_token = String(required=True, description="Firebase ID token from the client.")

    access_token = String()
    refresh_token = String()

-    def mutate(self, info, net_id):
-        user = db["users"].find_one({"net_id": net_id})
+    def mutate(self, info, id_token):
+        try:
+            decoded = firebase_auth.verify_id_token(id_token)
+        except _TOKEN_ERRORS as err:
+            raise GraphQLError("Invalid or expired token.") from err
+        except ValueError as err:
+            raise GraphQLError("Invalid or expired token.") from err
+
+        firebase_uid = decoded.get("uid")
+        if not firebase_uid:
+            raise GraphQLError("Invalid or expired token.")
+        user = db["users"].find_one({"firebase_uid": firebase_uid})


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

firebase_admin auth verify_id_token check_revoked parameter Python SDK

💡 Result:

In the Firebase Admin Python SDK (firebase_admin.auth), the verify_id_token method includes a check_revoked parameter (named check_revoked, boolean, default False). Usage: from firebase_admin import auth try: decoded_token = auth.verify_id_token(id_token, check_revoked=True) uid = decoded_token['uid'] # Token valid and not revoked except auth.RevokedIdTokenError: # Token revoked - force reauthentication pass except auth.UserDisabledError: # User disabled pass except auth.InvalidIdTokenError: # Invalid token pass When check_revoked=True, it fetches the UserRecord for the token's uid and compares the token's iat (issued at) against user.tokens_valid_after_timestamp (milliseconds since epoch; None if no revocation). If iat * 1000 < tokens_valid_after_timestamp, raises RevokedIdTokenError. Also checks if user is disabled (raises UserDisabledError). This requires an extra network call to Firebase Auth backend, making it slower. Basic verification (check_revoked=False) only checks signature, expiration, issuer, audience. Official docs confirm this behavior with Python examples using check_revoked=True. Note: Early bug fixed in 2018 where None handling caused TypeError.

Citations:

1: https://firebase.google.com/docs/reference/admin/python/firebase_admin.auth

2: https://firebase.google.com/docs/auth/admin/manage-sessions

3: Broken auth.verify_id_token when check_revoked=True firebase/firebase-admin-python#199

4: https://firebase-dot-devsite-v2-prod.appspot.com/docs/auth/admin/manage-sessions

🏁 Script executed:

# First, let's check the actual code in the file cat -n src/mutations/login_user.py | head -50

Repository: cuappdev/score-backend

Length of output: 1702

🌐 Web query:

firebase_admin.auth CertificateFetchError exception Python SDK

💡 Result:

The firebase_admin.auth.CertificateFetchError is a specific exception raised by the Firebase Admin Python SDK when it fails to fetch public key certificates needed to verify Firebase ID tokens or session cookies. This occurs during calls to auth.verify_id_token or auth.verify_session_cookie. Key details: - Bases: UnknownError (subtype of FirebaseError) - Trigger: Network or connectivity issues preventing access to Google's certificate endpoint (typically https://www.googleapis.com/robot/v1/metadata/x509/securetoken@system.gserviceaccount.com). - Common causes: Firewalls blocking outbound HTTPS to Google APIs, proxy misconfigurations (especially on macOS where system proxies intercept localhost but also affect external calls), network timeouts, ISP outages, SSL/TLS issues (outdated OpenSSL/LibreSSL), or corporate network restrictions. - The SDK caches certificates after successful fetch to reduce repeated calls. Solutions and workarounds: - Verify network connectivity: Test direct access to the certs URL using requests.get('https://www.googleapis.com/robot/v1/metadata/x509/securetoken@system.gserviceaccount.com'). - Check proxies: Set NO_PROXY='www.googleapis.com,.googleapis.com' or os.environ['NO_PROXY'] = 'localhost,127.0.0.1,.googleapis.com'. - Update environment: Ensure recent Python (3.9+), OpenSSL, and firebase-admin SDK versions. Check ssl.OPENSSL_VERSION. - Disable proxy trust if using requests: session.trust_env = False (though SDK manages this internally). - Firewall/VPN: Temporarily disable to test; configure exceptions for Google domains. - Retry logic: Implement exponential backoff around verify_id_token calls. - Ensure latest SDK: Uses HTTP/2, improved timeouts, and caching. Handle in code: try: claims = auth.verify_id_token(id_token) except auth.CertificateFetchError as exc: # Log cause: exc.cause may contain underlying requests/urllib error raise # or handle with retry

Citations:

1: https://firebase.google.com/docs/reference/admin/python/firebase_admin.auth

2: FR: auth.verify_id_token should throw a more specific exception firebase/firebase-admin-python#252

3: https://stackoverflow.com/questions/71915309/token-used-too-early-error-thrown-by-firebase-admin-auths-verify-id-token-metho

4: Firebase with Python 3.9 giving an SSL error firebase/firebase-admin-python#610

5: https://lightrun.com/answers/firebase-firebase-admin-python-requests-attempt-fails-when-it-is-trying-to-verify-a-token-with-transporterror

6: Requests attempt fails when it is trying to verify a token with TransportError firebase/firebase-admin-python#167

7: https://dev.to/pavelbuild/your-python-sdk-silently-routes-through-macos-proxy-3j22

8: OPENSSL_internal:CERTIFICATE_VERIFY_FAILED googleapis/python-firestore#916

🌐 Web query:

firebase_admin.auth exceptions complete list verify_id_token

💡 Result:

The complete list of exceptions raised by firebase_admin.auth.verify_id_token(id_token, check_revoked=False, clock_skew_seconds=0) is: - ValueError: If id_token is not a string or is empty. - InvalidIdTokenError: If id_token is not a valid Firebase ID token. - ExpiredIdTokenError: If the specified ID token has expired. - RevokedIdTokenError: If check_revoked is True and the ID token has been revoked. - TenantIdMismatchError: If id_token belongs to a tenant that is different than this Client instance. - CertificateFetchError: If an error occurs while fetching the public key certificates required to verify the ID token. - UserDisabledError: If check_revoked is True and the corresponding user record is disabled. These are the specific exceptions documented for this method. Note that some older versions raised more generic ValueError for token issues, but specific auth exceptions were added later.

Citations:

1: https://firebase.google.com./docs/reference/admin/python/firebase_admin.auth

2: https://firebase.google.com/docs/reference/admin/python/firebase_admin.auth

Remove RevokedIdTokenError from caught exceptions—it's not reachable without check_revoked=True.

firebase_auth.verify_id_token(id_token) is called without check_revoked=True, so RevokedIdTokenError will never be raised. Remove it from _TOKEN_ERRORS.

Additionally, handle firebase_auth.CertificateFetchError separately. This transient network error (fetching Google's public key certificates) currently bubbles up unhandled, when it should be distinguished from authentication failures since it indicates an upstream infrastructure issue, not an invalid token.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/mutations/login_user.py` around lines 8 - 33, Remove firebase_auth.RevokedIdTokenError from the _TOKEN_ERRORS tuple since verify_id_token(id_token) is called without check_revoked=True; then in LoginUser.mutate add a separate except block for firebase_auth.CertificateFetchError (imported from firebase_auth) to catch transient certificate fetch/network errors and raise a distinct GraphQLError like "Authentication service unavailable, please try again later." from err so certificate fetch failures are distinguished from token validation failures; keep the existing ValueError and other token-related exception handling around verify_id_token.

coderabbitai · 2026-04-22T19:06:43Z

+        firebase_uid = decoded.get("uid")
+        if firebase_uid is None:
+            raise GraphQLError("Token missing uid") from KeyError("uid")
+
+        email = email or decoded.get("email")
        user_doc = {
-            "net_id": net_id,
+            "firebase_uid": firebase_uid,
+            "email": email,
            "favorite_game_ids": [],
        }
        if name is not None:
            user_doc["name"] = name
-        if email is not None:
-            user_doc["email"] = email
-        result = db["users"].insert_one(user_doc)
+        try:
+            result = db["users"].insert_one(user_doc)
+        except DuplicateKeyError as err:
+            raise GraphQLError("User already exists.") from err


⚠️ Potential issue | 🟡 Minor

Minor: inconsistency with login_user.py and synthetic KeyError chain.

Two small nits:

When uid is missing from the decoded token, this raises GraphQLError("Token missing uid"), but login_user.py raises GraphQLError("Invalid or expired token.") for the same condition. Consider aligning the user-facing message across both mutations so clients can handle auth failures uniformly.

raise GraphQLError("Token missing uid") from KeyError("uid") constructs a throwaway KeyError solely for chaining. A bare raise GraphQLError(...) (no from) is more idiomatic here since there is no original exception to chain.

💡 Proposed tweak

- firebase_uid = decoded.get("uid") - if firebase_uid is None: - raise GraphQLError("Token missing uid") from KeyError("uid") + firebase_uid = decoded.get("uid") + if not firebase_uid: + raise GraphQLError("Invalid or expired token.")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

firebase_uid = decoded.get("uid")

if firebase_uid is None:

raise GraphQLError("Token missing uid") from KeyError("uid")

email = email or decoded.get("email")

user_doc = {

"net_id": net_id,

"firebase_uid": firebase_uid,

"email": email,

"favorite_game_ids": [],

}

if name is not None:

user_doc["name"] = name

if email is not None:

user_doc["email"] = email

result = db["users"].insert_one(user_doc)

try:

result = db["users"].insert_one(user_doc)

except DuplicateKeyError as err:

raise GraphQLError("User already exists.") from err

firebase_uid = decoded.get("uid")

if not firebase_uid:

raise GraphQLError("Invalid or expired token.")

email = email or decoded.get("email")

user_doc = {

"firebase_uid": firebase_uid,

"email": email,

"favorite_game_ids": [],

}

if name is not None:

user_doc["name"] = name

try:

result = db["users"].insert_one(user_doc)

except DuplicateKeyError as err:

raise GraphQLError("User already exists.") from err

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/mutations/signup_user.py` around lines 33 - 48, The uid-check in signup_user.py currently raises GraphQLError("Token missing uid") and chains a synthetic KeyError("uid"); change the GraphQLError message to match login_user.py ("Invalid or expired token.") for consistent client-facing auth errors, and remove the artificial exception chaining (drop the "from KeyError('uid')" so you simply raise GraphQLError(...)) at the firebase_uid check in the signup flow where firebase_uid = decoded.get("uid") is validated.

coderabbitai · 2026-04-22T19:06:43Z

+def scrape_sidearm_story_recap(url):
+    """
+    Extract headline and published time from a Cornell Sidearm story/recap page
+    """
+    if not url:
+        return {}
+    try:
+        response = requests.get(url, headers=HTTP_REQUEST_HEADERS, timeout=20)
+        if response.status_code != 200:
+            return {}
+        soup = BeautifulSoup(response.text, "html.parser")
+    except Exception:
+        return {}
+    headline = soup.select_one(SIDEARM_STORY_HEADLINE)
+    time_el = soup.select_one(SIDEARM_STORY_PUBLISHED_TIME)
+    title = headline.get_text(strip=True) if headline else None
+    if not title:
+        og = soup.find("meta", property="og:title")
+        if og and og.get("content"):
+            title = og["content"].strip()
+    published_at = None
+    if time_el:
+        published_at = time_el.get_text(strip=True)
+        if not published_at and time_el.get("datetime"):
+            published_at = time_el["datetime"].strip()
+    if not published_at:
+        pmeta = soup.find("meta", property="article:published_time")
+        if pmeta and pmeta.get("content"):
+            published_at = pmeta["content"].strip()
+    out = {}
+    if title:
+        out["recap_article_title"] = title
+    if published_at:
+        out["recap_published_at"] = published_at
+    return out


⚠️ Potential issue | 🟡 Minor

Avoid blanket except Exception and log scraping failures.

except Exception silently swallows every failure (network, DNS, SSL, encoding, BeautifulSoup parse errors, programming errors) and returns {}. This is Ruff BLE001 and, more importantly, will make it hard to notice when the Sidearm markup drifts and the recap scraper quietly stops producing titles/dates.

Recommend narrowing to requests.RequestException and logging non-200 responses and exceptions at warning level so production monitoring can surface regressions.

🛠️ Proposed fix

+import logging +logger = logging.getLogger(__name__) + def scrape_sidearm_story_recap(url): """ Extract headline and published time from a Cornell Sidearm story/recap page """ if not url: return {} try: response = requests.get(url, headers=HTTP_REQUEST_HEADERS, timeout=20) - if response.status_code != 200: - return {} - soup = BeautifulSoup(response.text, "html.parser") - except Exception: - return {} + except requests.RequestException as exc: + logger.warning("Recap fetch failed for %s: %s", url, exc) + return {} + if response.status_code != 200: + logger.warning("Recap fetch returned %s for %s", response.status_code, url) + return {} + soup = BeautifulSoup(response.text, "html.parser")

🧰 Tools

🪛 Ruff (0.15.10)

[warning] 40-40: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/scrapers/game_details_scrape.py` around lines 29 - 63, The scrape_sidearm_story_recap function currently swallows all errors with a broad except Exception and returns {} silently; narrow the exception handling to requests.RequestException (for the requests.get call) and log any non-200 responses and caught exceptions at warning level so failures are visible in production monitoring; keep the existing fallback parsing for og:title and article:published_time, but replace the bare except block around requests.get/BeautifulSoup with targeted try/except for requests and separately handle parsing errors by logging a warning if BeautifulSoup parsing/selectors (SIDEARM_STORY_HEADLINE, SIDEARM_STORY_PUBLISHED_TIME) return unexpected results, ensuring you reference scrape_sidearm_story_recap, requests.get, response.status_code, and the selector names when making the changes.

coderabbitai · 2026-04-22T19:06:44Z

+def softball_summary(box_score_section):
+    summary = []
+    scoring_section = box_score_section.find(TAG_SECTION, {ATTR_ARIA_LABEL: LABEL_SCORING_SUMMARY})
+    if scoring_section:
+        scoring_rows = scoring_section.find(TAG_TBODY)
+        if scoring_rows:
+            for row in scoring_rows.find_all(TAG_TR):
+                team = row.find_all(TAG_TD)[0].find(TAG_IMG)[ATTR_ALT]
+                inning = row.find_all(TAG_TD)[3].text.strip()
+                desc_cell = row.find_all(TAG_TD)[4]
+                span = desc_cell.find(TAG_SPAN)
+                if span:
+                    span.extract()
+                desc = desc_cell.get_text(strip=True)
+                cornell_score = int(row.find_all(TAG_TD)[5].get_text(strip=True) or 0)
+                opp_score = int(row.find_all(TAG_TD)[6].get_text(strip=True) or 0)
+                summary.append({
+                        'team': team,
+                        'period': inning,
+                        'inning': inning,
+                        'description': desc,
+                        'cor_score': cornell_score,
+                        'opp_score': opp_score
+                    })
+    if not summary:
+        summary = [{"message": "No scoring events in this game."}]
+    return summary 


⚠️ Potential issue | 🟠 Major

softball_summary will crash on malformed/edge-case rows.

Several lines assume an exact table shape and will raise unhandled exceptions, aborting the scrape for that game:

Line 100: row.find_all(TAG_TD)[0].find(TAG_IMG)[ATTR_ALT] raises TypeError (if find(TAG_IMG) returns None) or KeyError (if the <img> has no alt). Summary headers/footers inside <tbody> or rows without a team icon will break this.

Lines 107–108: int(row.find_all(TAG_TD)[5].get_text(strip=True) or 0) raises ValueError for any non-numeric text such as "-", "—", or a score column that briefly contains a hyphen. The or 0 guard only covers an empty string.

Hardcoded indices [0, 3, 4, 5, 6] silently shift semantics if Sidearm adds a column.

Since this is the scraper hot path, consider wrapping each row in a try/except and skipping malformed rows (with a logger.warning), and using a safe int-coercion helper. Other *_summary parsers in this file have the same pattern, but softball is the one being introduced here.

🛠️ Suggested hardening

+def _safe_int(text): + try: + return int(text) + except (TypeError, ValueError): + return 0 + def softball_summary(box_score_section): summary = [] scoring_section = box_score_section.find(TAG_SECTION, {ATTR_ARIA_LABEL: LABEL_SCORING_SUMMARY}) if scoring_section: scoring_rows = scoring_section.find(TAG_TBODY) if scoring_rows: for row in scoring_rows.find_all(TAG_TR): - team = row.find_all(TAG_TD)[0].find(TAG_IMG)[ATTR_ALT] - inning = row.find_all(TAG_TD)[3].text.strip() - desc_cell = row.find_all(TAG_TD)[4] - span = desc_cell.find(TAG_SPAN) - if span: - span.extract() - desc = desc_cell.get_text(strip=True) - cornell_score = int(row.find_all(TAG_TD)[5].get_text(strip=True) or 0) - opp_score = int(row.find_all(TAG_TD)[6].get_text(strip=True) or 0) - summary.append({ + tds = row.find_all(TAG_TD) + if len(tds) < 7: + continue + img = tds[0].find(TAG_IMG) + team = img.get(ATTR_ALT) if img else None + if not team: + continue + inning = tds[3].text.strip() + desc_cell = tds[4] + span = desc_cell.find(TAG_SPAN) + if span: + span.extract() + desc = desc_cell.get_text(strip=True) + cornell_score = _safe_int(tds[5].get_text(strip=True)) + opp_score = _safe_int(tds[6].get_text(strip=True)) + summary.append({ 'team': team, 'period': inning, 'inning': inning, 'description': desc, 'cor_score': cornell_score, 'opp_score': opp_score })

claiireyu added 4 commits March 7, 2026 13:35

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

claiireyu requested review from AnikDey-exe and JoshD94 April 22, 2026 19:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Claire/sport scrapers#70

Claire/sport scrapers#70
claiireyu wants to merge 4 commits intomasterfrom
claire/sportScrapers

claiireyu commented Apr 22, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Uh oh!

coderabbitai Bot Apr 22, 2026

Uh oh!

coderabbitai Bot Apr 22, 2026

Uh oh!

coderabbitai Bot Apr 22, 2026

Uh oh!

coderabbitai Bot Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

claiireyu commented Apr 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Changes Made

Test Coverage

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claiireyu commented Apr 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading