Conversation
- Integrated Firebase for user login and signup using Firebase ID tokens. - Updated login and signup mutations to handle Firebase authentication. - Modified user schema to reflect changes in authentication method. - Updated .gitignore to include Firebase service account key.
- Introduced a new function to extract scoring summaries for softball games, including inning and score details. - Updated the database index setup to ensure uniqueness of game entries by including the time field. - Added an 'inning' field to the BoxScoreEntryType for better data representation.
…oftball scraper with coderabbit feedback - Added new environment variables for Google and Firebase credentials in .env_template. - Updated Docker configuration to mount Firebase service account key. - Improved error handling in login and signup mutations for better token validation. - Enhanced database indexing for users to ensure unique Firebase UID entries. - Improved softball scraper logic
- Added optional fields for game recap links, article titles, and published dates in the GameType and Game models. - Updated CreateGame mutation to accept new recap fields. - Implemented a new scraping function to extract recap headlines and published times from Cornell Sidearm story pages.
📝 WalkthroughWalkthroughThis pull request integrates Firebase authentication, extends the game data model with recap information, and enhances game scraping to fetch recap details from Sidearm story pages. Changes include Firebase Admin SDK setup, authentication mutation refactoring from net_id to ID token verification, new recap fields in the game model, time-aware game lookups, and conditional recap scraping logic. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Client
participant Backend
participant Firebase
participant Database
User->>Client: Provide ID token
Client->>Backend: login(id_token) or signup(id_token, name, email)
Backend->>Firebase: verify_id_token(id_token)
Firebase-->>Backend: {uid, email, ...}
Backend->>Database: find_one({firebase_uid: uid})
Database-->>Backend: user doc or null
alt User exists (login)
Backend->>Database: (user already retrieved)
else User not found (signup)
Backend->>Database: insert_one({firebase_uid: uid, email, name, ...})
Database-->>Backend: {_id, ...}
end
Backend->>Backend: create_access_token(_id)
Backend-->>Client: {access_token, refresh_token}
Client-->>User: Auth success
sequenceDiagram
participant Scraper
participant SchedulePage
participant SidearmStory
participant Database
Scraper->>SchedulePage: fetch schedule page
SchedulePage-->>Scraper: game list with links
Scraper->>Scraper: parse_game_links() → recap_link, ticket_link
alt Sport in SPORTS_WITH_SIDEARM_STORY_RECAP
Scraper->>SidearmStory: scrape_sidearm_story_recap(recap_link)
SidearmStory-->>Scraper: {recap_article_title, recap_published_at}
else Sport without Sidearm story
Scraper->>Scraper: scrape_game() → box score data
end
Scraper->>Scraper: merge recap/box score data
Scraper->>Database: create/update game with recap_link, recap_article_title, recap_published_at
Database-->>Scraper: game saved
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/database.py (1)
72-90:⚠️ Potential issue | 🟠 MajorOld unique index likely still present after rename — drop it during migration.
The previous unique compound index (without
time) was created under a different name and will not be replaced bycreate_indexjust because the key spec and name changed here. On existing deployments you'll end up with both the old index (enforcing uniqueness withouttime) and the newuniq_game_key_with_time, which defeats the purpose of addingtimeto distinguish doubleheaders — inserts for a same-day doubleheader will still fail the old index.Please drop the prior index explicitly (by its old name) as part of the migration/startup, e.g.:
try: game_collection.drop_index("<previous_unique_index_name>") except OperationFailure: passAlso note: since this index is not
sparse, legacy documents missingtimewill compare astime: null, so doubleheaders where neither row has atimevalue still collide. Consider whethersparse=True(like thefirebase_uidindex below) or a partial filter is appropriate.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/database.py` around lines 72 - 90, Drop the old unique compound index before creating the new one so the legacy uniqueness constraint (without "time") doesn't block doubleheader inserts: call game_collection.drop_index("<previous_unique_index_name>") inside a try/except that ignores OperationFailure, then proceed to game_collection.create_index(...) that creates "uniq_game_key_with_time"; additionally consider making the new index sparse=True or using a partialFilterExpression to avoid collisions for legacy documents missing the "time" field (or update legacy docs to include time) — adjust the code around game_collection.create_index and the surrounding migration/startup logic.src/types.py (1)
116-136:⚠️ Potential issue | 🟠 Major
GameType.__init__is missingteamparameter and uses a fragile positional signature.The issues are concrete:
The
teamfield is declared (line 110) andresolve_teamaccessesparent.team(line 151), but__init__never assigns it. This will raiseAttributeErrorwhenresolve_teamtries to accessparent.team, though the fallback toopponent_idmasks the failure. The fix: accept and assignteam=None, matching the pattern inTeamType,YoutubeVideoType, andArticleTypealready in this file.The signature forces 10 positional parameters in a fixed order—any caller using keyword args or a different order breaks. Future field additions shift all positions further. Prefer
**kwargs(the standard in this codebase for the other GraphQL types), which also removes the need for positional parameter management.The
idparameter shadows Python's builtin (Ruff A002). Using**kwargsresolves this.♻️ Suggested refactor
- def __init__( - self, id, city, date, gender, location, opponent_id, result, sport, state, time, box_score=None, score_breakdown=None, utc_date=None, ticket_link=None, recap_link=None, recap_article_title=None, recap_published_at=None - ): - self.id = id - self.city = city - self.date = date - self.gender = gender - self.location = location - self.opponent_id = opponent_id - self.result = result - self.sport = sport - self.state = state - self.time = time - self.box_score = box_score - self.score_breakdown = score_breakdown - self.utc_date = utc_date - self.ticket_link = ticket_link - self.recap_link = recap_link - self.recap_article_title = recap_article_title - self.recap_published_at = recap_published_at + def __init__(self, **kwargs): + kwargs.setdefault("team", None) + for key, value in kwargs.items(): + setattr(self, key, value)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/types.py` around lines 116 - 136, GameType.__init__ currently omits assigning the team field and uses a fragile positional signature; update GameType.__init__ to accept a team parameter (team=None) and assign it to self.team, and refactor the signature to use **kwargs (matching TeamType, YoutubeVideoType, ArticleType) so callers can pass fields by name and avoid positional fragility and builtin shadowing of id; ensure resolve_team can safely access parent.team and that other existing callers constructing GameType continue to work with the new **kwargs pattern.src/scrapers/games_scraper.py (1)
315-349:⚠️ Potential issue | 🟡 MinorInconsistent handling of recap fields between update and create paths.
- Update (L315-321):
ticket_linkandrecap_linkare always written (can overwrite an existing non-null value withNone), butrecap_article_title/recap_published_atare conditional on key presence (preserving prior values when scrape returns empty).- Create (L346-349): all four fields are set unconditionally via
.get(...), so they land asNonewhen missing.If the intent is "never stomp a previously scraped title/date with
Noneon re-scrape," thenrecap_linkprobably deserves the same protection (a recap page that 404s temporarily shouldn't wipe the stored link). Conversely, ifNoneoverwrites are fine, the conditional branches at L318-321 are unnecessary. Please pick one policy and apply it consistently.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scrapers/games_scraper.py` around lines 315 - 349, The code treats recap fields inconsistently between the update path (where ticket_link and recap_link are always set but recap_article_title/recap_published_at are only set if present) and the create path (where all four fields use .get(...) so can be written as None); pick one policy — I suggest preserving existing stored values unless the scraper returns a non-null value — and apply it consistently: in the update flow around GameService.update_game only include/assign recap_link, recap_article_title, and recap_published_at when game_data contains a non-null value, and in the create flow (the game_data dict used for creating new games) do not set those keys to None — build the dict so those keys are only included when game_data.get(...) is not None (same treatment for ticket_link if you want it preserved), ensuring consistent behavior between creation and GameService.update_game.
🧹 Nitpick comments (6)
src/models/game.py (1)
21-23: Nit: docstring uses single quotes for the new keys while other bullets use backticks.Lines 20–23 mix
'ticket_link'/'recap_link'style with the backtick-quoted keys above. Low priority, but worth aligning for consistency.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/models/game.py` around lines 21 - 23, The docstring in src/models/game.py mixes single-quoted keys with the backtick-quoted style used elsewhere; update the bullets for recap_link, recap_article_title, recap_published_at (and ticket_link if present nearby) to use backticks instead of single quotes so the docstring formatting is consistent with the other key entries in that docstring.app.py (1)
26-28: Nit: imports placed mid-file rather than with the other top-of-file imports.
import os,import firebase_admin, andfrom firebase_admin import credentialsat Lines 26–28 are added after an unrelated block of project imports. Consider hoistingosto the top (it's a stdlib) and grouping the firebase imports with the other third-party imports for readability (and to satisfy most import-order linters if you enable one).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app.py` around lines 26 - 28, Move the misplaced imports so they follow normal import ordering: hoist the standard library import "import os" up into the top-of-file stdlib import block, and place "import firebase_admin" and "from firebase_admin import credentials" with the other third-party imports; update any import grouping/comments to reflect the change so import_order linters (and readers) see stdlib, third-party, then local/project imports.requirements.txt (1)
1-14: Most dependencies remain unpinned.Only
Flask-JWT-Extendedand nowfirebase-adminare version-pinned; the rest float to whatever PyPI serves at build time. This makes builds non-reproducible and can introduce surprise breakage (e.g., a new Flask/Flask-GraphQL/graphene major). Consider pinning the remaining packages (or at least using>=,<upper bounds), or generating arequirements.lockviapip-compile. Out of scope for this PR — flagging for follow-up.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@requirements.txt` around lines 1 - 14, The requirements file leaves most packages unpinned (e.g., Flask, Flask-CORS, Flask-GraphQL, graphene, pymongo, beautifulsoup4, requests, pillow, Flask-APScheduler, python-dotenv, pytz, gunicorn) which makes builds non-reproducible; update requirements.txt to pin versions or use bounded ranges (e.g., package==x.y.z or package>=a.b,<c.d) for these entries, or alternatively generate and commit a frozen requirements.lock via pip-compile (or pip freeze) to capture exact resolved versions and reference that lock file in CI; ensure you update entries like Flask, graphene, pymongo, requests, pillow, etc., rather than leaving them floating.src/mutations/signup_user.py (1)
45-48: Consider idempotent signup when a user already exists.When a client calls signup after a previous successful signup (same Firebase user), this raises
"User already exists."and the client must then callloginUser. Since the Firebase ID token is already verified andfirebase_uidis the primary identity, you could make signup idempotent: onDuplicateKeyError, look up the existing user byfirebase_uidand return fresh access/refresh tokens. This avoids an extra round trip for clients that don't track whether the user was previously registered.Not blocking — purely a UX/API-design consideration.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/mutations/signup_user.py` around lines 45 - 48, On DuplicateKeyError in the signup_user flow, make signup idempotent: catch DuplicateKeyError, lookup the existing user via db["users"].find_one({"firebase_uid": firebase_uid}) (use the same firebase_uid from the incoming request/user_doc), and if found generate and return fresh access/refresh tokens for that user (reuse your existing token issuance logic / loginUser helper such as issue_tokens_for_user or create_access_token/create_refresh_token); only if no user is found re-raise the error. This keeps signup idempotent while preserving existing error behavior otherwise.src/scrapers/games_scraper.py (2)
106-107: Considerresponse.raise_for_status()before parsing.If
cornellbigred.comreturns a 4xx/5xx (which the comment aboveHTTP_REQUEST_HEADERSinsrc/utils/constants.pyexplicitly notes can happen with the default UA, and can still happen intermittently),BeautifulSoup(response.content, ...)silently parses an empty/error page and the sport's entire schedule is skipped with no signal. Raising here surfaces the failure to the thread and to your logs.♻️ Proposed fix
- response = requests.get(url, headers=HTTP_REQUEST_HEADERS, timeout=30) + response = requests.get(url, headers=HTTP_REQUEST_HEADERS, timeout=30) + response.raise_for_status() soup = BeautifulSoup(response.content, "html.parser")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scrapers/games_scraper.py` around lines 106 - 107, The code calls requests.get(url, ...) and immediately passes response.content into BeautifulSoup; add a call to response.raise_for_status() immediately after the requests.get(...) call (before BeautifulSoup(...)) so HTTP 4xx/5xx responses raise and surface to the thread/logs; optionally wrap the requests.get + raise_for_status sequence in a try/except to log the error and re-raise if you want clearer diagnostics — modify the block around requests.get, response, and BeautifulSoup in games_scraper.py accordingly.
70-71: Use.get("href")to avoid a potentialKeyError.
ticket_link_tag["href"]raisesKeyErrorif the anchor exists without anhrefattribute.to_absolute_urlalready handlesNone, so switching to.getis strictly safer with no behavioral downside (mirrors the pattern you use at L61 for the box-score tag).🛡️ Proposed fix
ticket_link_tag = game_item.select_one(GAME_TICKET_LINK) - ticket_link = to_absolute_url(ticket_link_tag["href"]) if ticket_link_tag else None + ticket_link = to_absolute_url(ticket_link_tag.get("href")) if ticket_link_tag else None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/scrapers/games_scraper.py` around lines 70 - 71, Replace the direct key access ticket_link_tag["href"] with ticket_link_tag.get("href") so an anchor without an href doesn't raise KeyError; update the assignment in the games_scraper logic where ticket_link_tag is used (mirroring the pattern used for box_score_tag at L61) and pass the possibly None value into to_absolute_url(ticket_link) as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app.py`:
- Around line 30-45: The module currently reads SERVICE_ACCOUNT_PATH at import
and calls initialize_firebase() unconditionally which causes hard failures and
premature logging; change initialize_firebase() to not run at import (remove the
top-level initialize_firebase() call), refactor it to attempt
firebase_admin.get_app() in a try/except instead of using firebase_admin._apps,
only initialize with credentials.Certificate(SERVICE_ACCOUNT_PATH) if
SERVICE_ACCOUNT_PATH is present, and when credentials are missing or unreadable
log a warning (using a module logger or after logging.basicConfig()) and return
None so callers can opt into Firebase; ensure any code that needs Firebase calls
initialize_firebase() explicitly (or use an app factory/request hook) rather
than relying on import-time side effects.
In `@src/mutations/login_user.py`:
- Around line 8-33: Remove firebase_auth.RevokedIdTokenError from the
_TOKEN_ERRORS tuple since verify_id_token(id_token) is called without
check_revoked=True; then in LoginUser.mutate add a separate except block for
firebase_auth.CertificateFetchError (imported from firebase_auth) to catch
transient certificate fetch/network errors and raise a distinct GraphQLError
like "Authentication service unavailable, please try again later." from err so
certificate fetch failures are distinguished from token validation failures;
keep the existing ValueError and other token-related exception handling around
verify_id_token.
In `@src/mutations/signup_user.py`:
- Around line 33-48: The uid-check in signup_user.py currently raises
GraphQLError("Token missing uid") and chains a synthetic KeyError("uid"); change
the GraphQLError message to match login_user.py ("Invalid or expired token.")
for consistent client-facing auth errors, and remove the artificial exception
chaining (drop the "from KeyError('uid')" so you simply raise GraphQLError(...))
at the firebase_uid check in the signup flow where firebase_uid =
decoded.get("uid") is validated.
In `@src/scrapers/game_details_scrape.py`:
- Around line 29-63: The scrape_sidearm_story_recap function currently swallows
all errors with a broad except Exception and returns {} silently; narrow the
exception handling to requests.RequestException (for the requests.get call) and
log any non-200 responses and caught exceptions at warning level so failures are
visible in production monitoring; keep the existing fallback parsing for
og:title and article:published_time, but replace the bare except block around
requests.get/BeautifulSoup with targeted try/except for requests and separately
handle parsing errors by logging a warning if BeautifulSoup parsing/selectors
(SIDEARM_STORY_HEADLINE, SIDEARM_STORY_PUBLISHED_TIME) return unexpected
results, ensuring you reference scrape_sidearm_story_recap, requests.get,
response.status_code, and the selector names when making the changes.
---
Outside diff comments:
In `@src/database.py`:
- Around line 72-90: Drop the old unique compound index before creating the new
one so the legacy uniqueness constraint (without "time") doesn't block
doubleheader inserts: call
game_collection.drop_index("<previous_unique_index_name>") inside a try/except
that ignores OperationFailure, then proceed to game_collection.create_index(...)
that creates "uniq_game_key_with_time"; additionally consider making the new
index sparse=True or using a partialFilterExpression to avoid collisions for
legacy documents missing the "time" field (or update legacy docs to include
time) — adjust the code around game_collection.create_index and the surrounding
migration/startup logic.
In `@src/scrapers/games_scraper.py`:
- Around line 315-349: The code treats recap fields inconsistently between the
update path (where ticket_link and recap_link are always set but
recap_article_title/recap_published_at are only set if present) and the create
path (where all four fields use .get(...) so can be written as None); pick one
policy — I suggest preserving existing stored values unless the scraper returns
a non-null value — and apply it consistently: in the update flow around
GameService.update_game only include/assign recap_link, recap_article_title, and
recap_published_at when game_data contains a non-null value, and in the create
flow (the game_data dict used for creating new games) do not set those keys to
None — build the dict so those keys are only included when game_data.get(...) is
not None (same treatment for ticket_link if you want it preserved), ensuring
consistent behavior between creation and GameService.update_game.
In `@src/types.py`:
- Around line 116-136: GameType.__init__ currently omits assigning the team
field and uses a fragile positional signature; update GameType.__init__ to
accept a team parameter (team=None) and assign it to self.team, and refactor the
signature to use **kwargs (matching TeamType, YoutubeVideoType, ArticleType) so
callers can pass fields by name and avoid positional fragility and builtin
shadowing of id; ensure resolve_team can safely access parent.team and that
other existing callers constructing GameType continue to work with the new
**kwargs pattern.
---
Nitpick comments:
In `@app.py`:
- Around line 26-28: Move the misplaced imports so they follow normal import
ordering: hoist the standard library import "import os" up into the top-of-file
stdlib import block, and place "import firebase_admin" and "from firebase_admin
import credentials" with the other third-party imports; update any import
grouping/comments to reflect the change so import_order linters (and readers)
see stdlib, third-party, then local/project imports.
In `@requirements.txt`:
- Around line 1-14: The requirements file leaves most packages unpinned (e.g.,
Flask, Flask-CORS, Flask-GraphQL, graphene, pymongo, beautifulsoup4, requests,
pillow, Flask-APScheduler, python-dotenv, pytz, gunicorn) which makes builds
non-reproducible; update requirements.txt to pin versions or use bounded ranges
(e.g., package==x.y.z or package>=a.b,<c.d) for these entries, or alternatively
generate and commit a frozen requirements.lock via pip-compile (or pip freeze)
to capture exact resolved versions and reference that lock file in CI; ensure
you update entries like Flask, graphene, pymongo, requests, pillow, etc., rather
than leaving them floating.
In `@src/models/game.py`:
- Around line 21-23: The docstring in src/models/game.py mixes single-quoted
keys with the backtick-quoted style used elsewhere; update the bullets for
recap_link, recap_article_title, recap_published_at (and ticket_link if present
nearby) to use backticks instead of single quotes so the docstring formatting is
consistent with the other key entries in that docstring.
In `@src/mutations/signup_user.py`:
- Around line 45-48: On DuplicateKeyError in the signup_user flow, make signup
idempotent: catch DuplicateKeyError, lookup the existing user via
db["users"].find_one({"firebase_uid": firebase_uid}) (use the same firebase_uid
from the incoming request/user_doc), and if found generate and return fresh
access/refresh tokens for that user (reuse your existing token issuance logic /
loginUser helper such as issue_tokens_for_user or
create_access_token/create_refresh_token); only if no user is found re-raise the
error. This keeps signup idempotent while preserving existing error behavior
otherwise.
In `@src/scrapers/games_scraper.py`:
- Around line 106-107: The code calls requests.get(url, ...) and immediately
passes response.content into BeautifulSoup; add a call to
response.raise_for_status() immediately after the requests.get(...) call (before
BeautifulSoup(...)) so HTTP 4xx/5xx responses raise and surface to the
thread/logs; optionally wrap the requests.get + raise_for_status sequence in a
try/except to log the error and re-raise if you want clearer diagnostics —
modify the block around requests.get, response, and BeautifulSoup in
games_scraper.py accordingly.
- Around line 70-71: Replace the direct key access ticket_link_tag["href"] with
ticket_link_tag.get("href") so an anchor without an href doesn't raise KeyError;
update the assignment in the games_scraper logic where ticket_link_tag is used
(mirroring the pattern used for box_score_tag at L61) and pass the possibly None
value into to_absolute_url(ticket_link) as before.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: deb16694-2e0a-4b85-8909-ca368440e431
📒 Files selected for processing (17)
.env_template.gitignoreapp.pydocker-compose.ymlrequirements.txtsrc/database.pysrc/models/game.pysrc/mutations/create_game.pysrc/mutations/login_user.pysrc/mutations/signup_user.pysrc/repositories/game_repository.pysrc/schema.pysrc/scrapers/game_details_scrape.pysrc/scrapers/games_scraper.pysrc/services/game_service.pysrc/types.pysrc/utils/constants.py
| SERVICE_ACCOUNT_PATH = os.environ.get("GOOGLE_APPLICATION_CREDENTIALS") | ||
|
|
||
|
|
||
| def initialize_firebase(): | ||
| if not firebase_admin._apps: | ||
| if not SERVICE_ACCOUNT_PATH: | ||
| raise ValueError( | ||
| "GOOGLE_APPLICATION_CREDENTIALS is not set. Set it to your firebase-service-account-key.json path." | ||
| ) | ||
| cred = credentials.Certificate(SERVICE_ACCOUNT_PATH) | ||
| firebase_admin.initialize_app(cred) | ||
| logging.info("Firebase app initialized.") | ||
| return firebase_admin.get_app() | ||
|
|
||
|
|
||
| initialize_firebase() |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
firebase-admin Python 7.3.0 recommended way to check if default app is initialized
💡 Result:
The recommended way to check if the default Firebase app is initialized in firebase-admin Python SDK version 7.3.0 is to attempt to retrieve it using firebase_admin.get_app and catch the ValueError exception if it does not exist. This is the idiomatic approach since the official documentation states that get_app raises ValueError if the specified app (default is '[DEFAULT]') does not exist, while initialize_app raises ValueError if it already exists. Example code: import firebase_admin from firebase_admin import credentials try: app = firebase_admin.get_app # Checks if default app exists except ValueError: cred = credentials.Certificate('path/to/serviceAccountKey.json') app = firebase_admin.initialize_app(cred) Alternative (not officially recommended, uses internal state): if not firebase_admin._apps: firebase_admin.initialize_app(cred) This checks the internal _apps dict, but _apps is an implementation detail (protected member). No public is_app_initialized function exists in v7.3.0; a PR for it remains open. Official docs emphasize initializing once and using get_app thereafter.
Citations:
- 1: https://stackoverflow.com/questions/44293241/check-if-a-firebase-app-is-already-initialized-in-python
- 2: https://stackoverflow.com/questions/44293241/check-if-a-firebase-app-is-already-initialized-in-python/68236344#68236344
- 3: https://stackoverflow.com/questions/68426892/why-i-get-this-error-on-python-firebase-admin-initialize-app
- 4: [FR] Improve Ability to detect if App has been Initialized firebase/firebase-admin-python#701
- 5: https://firebase.google.com/docs/reference/admin/python/firebase_admin
- 6: https://github.com/firebase/firebase-admin-python/blob/master/firebase_admin/__init__.py
- 7: https://github.com/firebase/firebase-admin-python/releases/tag/v7.3.0
🏁 Script executed:
# First, find app.py and check its content around lines 30-45
find . -name "app.py" -type f | head -5Repository: cuappdev/score-backend
Length of output: 74
🏁 Script executed:
# Also check for docker-compose.yml
find . -name "docker-compose.yml" -type fRepository: cuappdev/score-backend
Length of output: 86
🏁 Script executed:
# Look for test files that might import app
find . -path "*/test*" -name "*.py" -type f | head -10Repository: cuappdev/score-backend
Length of output: 48
🏁 Script executed:
# Check app.py size and read it
wc -l ./app.py && echo "---" && cat -n ./app.pyRepository: cuappdev/score-backend
Length of output: 8210
🏁 Script executed:
# Check docker-compose.yml
cat -n ./docker-compose.ymlRepository: cuappdev/score-backend
Length of output: 843
🏁 Script executed:
# Look for test files more broadly
find . -name "*test*.py" -type f 2>/dev/nullRepository: cuappdev/score-backend
Length of output: 48
🏁 Script executed:
# Search for imports of app module to understand blast radius
rg "from app import|import app" --type pyRepository: cuappdev/score-backend
Length of output: 48
🏁 Script executed:
# Also check if scraper.py or other scripts might import app
grep -r "import app\|from app" --include="*.py" . 2>/dev/null | head -20Repository: cuappdev/score-backend
Length of output: 48
🏁 Script executed:
# Find scraper.py and check it
find . -name "scraper.py" -type f && echo "---" && wc -l ./scraper.pyRepository: cuappdev/score-backend
Length of output: 98
🏁 Script executed:
# Check what happens when scraper.py runs without GOOGLE_APPLICATION_CREDENTIALS
head -50 ./scraper.pyRepository: cuappdev/score-backend
Length of output: 1441
🏁 Script executed:
# Also look for any pytest or test configuration
find . -name "pytest.ini" -o -name "conftest.py" -o -name "tox.ini" -o -name ".pytest_cache" 2>/dev/nullRepository: cuappdev/score-backend
Length of output: 48
Firebase initialization at import time will hard-fail any process that imports app.
The code captures SERVICE_ACCOUNT_PATH at module load (line 30) and invokes initialize_firebase() unconditionally at line 45, before logging.basicConfig() (line 115). This creates multiple failure modes:
- The scraper service in
docker-compose.yml(line 18) does not setGOOGLE_APPLICATION_CREDENTIALS, so starting it will crash withValueError, taking down the container. - If the credentials file is missing or unreadable,
credentials.Certificate()also fails with the same blast radius. - The
logging.info("Firebase app initialized.")call at line 41 executes beforelogging.basicConfig(), so the message goes to the root logger with default configuration and is effectively swallowed.
Move initialization out of module scope — either into an app factory, a request context, or behind an explicit initialization call. When credentials are absent, degrade gracefully (log a warning, skip init) rather than crashing. Only the mutations that actually require auth need Firebase.
Also, relying on the private firebase_admin._apps attribute is brittle. Use the documented approach:
try:
app = firebase_admin.get_app()
except ValueError:
# App not initialized yet
cred = credentials.Certificate(service_account_path)
app = firebase_admin.initialize_app(cred)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app.py` around lines 30 - 45, The module currently reads SERVICE_ACCOUNT_PATH
at import and calls initialize_firebase() unconditionally which causes hard
failures and premature logging; change initialize_firebase() to not run at
import (remove the top-level initialize_firebase() call), refactor it to attempt
firebase_admin.get_app() in a try/except instead of using firebase_admin._apps,
only initialize with credentials.Certificate(SERVICE_ACCOUNT_PATH) if
SERVICE_ACCOUNT_PATH is present, and when credentials are missing or unreadable
log a warning (using a module logger or after logging.basicConfig()) and return
None so callers can opt into Firebase; ensure any code that needs Firebase calls
initialize_firebase() explicitly (or use an app factory/request hook) rather
than relying on import-time side effects.
| _TOKEN_ERRORS = ( | ||
| firebase_auth.InvalidIdTokenError, | ||
| firebase_auth.ExpiredIdTokenError, | ||
| firebase_auth.RevokedIdTokenError, | ||
| ) | ||
|
|
||
|
|
||
| class LoginUser(Mutation): | ||
| class Arguments: | ||
| net_id = String(required=True, description="User's net ID (e.g. Cornell netid).") | ||
| id_token = String(required=True, description="Firebase ID token from the client.") | ||
|
|
||
| access_token = String() | ||
| refresh_token = String() | ||
|
|
||
| def mutate(self, info, net_id): | ||
| user = db["users"].find_one({"net_id": net_id}) | ||
| def mutate(self, info, id_token): | ||
| try: | ||
| decoded = firebase_auth.verify_id_token(id_token) | ||
| except _TOKEN_ERRORS as err: | ||
| raise GraphQLError("Invalid or expired token.") from err | ||
| except ValueError as err: | ||
| raise GraphQLError("Invalid or expired token.") from err | ||
|
|
||
| firebase_uid = decoded.get("uid") | ||
| if not firebase_uid: | ||
| raise GraphQLError("Invalid or expired token.") | ||
| user = db["users"].find_one({"firebase_uid": firebase_uid}) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
firebase_admin auth verify_id_token check_revoked parameter Python SDK
💡 Result:
In the Firebase Admin Python SDK (firebase_admin.auth), the verify_id_token method includes a check_revoked parameter (named check_revoked, boolean, default False). Usage: from firebase_admin import auth try: decoded_token = auth.verify_id_token(id_token, check_revoked=True) uid = decoded_token['uid'] # Token valid and not revoked except auth.RevokedIdTokenError: # Token revoked - force reauthentication pass except auth.UserDisabledError: # User disabled pass except auth.InvalidIdTokenError: # Invalid token pass When check_revoked=True, it fetches the UserRecord for the token's uid and compares the token's iat (issued at) against user.tokens_valid_after_timestamp (milliseconds since epoch; None if no revocation). If iat * 1000 < tokens_valid_after_timestamp, raises RevokedIdTokenError. Also checks if user is disabled (raises UserDisabledError). This requires an extra network call to Firebase Auth backend, making it slower. Basic verification (check_revoked=False) only checks signature, expiration, issuer, audience. Official docs confirm this behavior with Python examples using check_revoked=True. Note: Early bug fixed in 2018 where None handling caused TypeError.
Citations:
- 1: https://firebase.google.com/docs/reference/admin/python/firebase_admin.auth
- 2: https://firebase.google.com/docs/auth/admin/manage-sessions
- 3: Broken auth.verify_id_token when check_revoked=True firebase/firebase-admin-python#199
- 4: https://firebase-dot-devsite-v2-prod.appspot.com/docs/auth/admin/manage-sessions
🏁 Script executed:
# First, let's check the actual code in the file
cat -n src/mutations/login_user.py | head -50Repository: cuappdev/score-backend
Length of output: 1702
🌐 Web query:
firebase_admin.auth CertificateFetchError exception Python SDK
💡 Result:
The firebase_admin.auth.CertificateFetchError is a specific exception raised by the Firebase Admin Python SDK when it fails to fetch public key certificates needed to verify Firebase ID tokens or session cookies. This occurs during calls to auth.verify_id_token or auth.verify_session_cookie. Key details: - Bases: UnknownError (subtype of FirebaseError) - Trigger: Network or connectivity issues preventing access to Google's certificate endpoint (typically https://www.googleapis.com/robot/v1/metadata/x509/securetoken@system.gserviceaccount.com). - Common causes: Firewalls blocking outbound HTTPS to Google APIs, proxy misconfigurations (especially on macOS where system proxies intercept localhost but also affect external calls), network timeouts, ISP outages, SSL/TLS issues (outdated OpenSSL/LibreSSL), or corporate network restrictions. - The SDK caches certificates after successful fetch to reduce repeated calls. Solutions and workarounds: - Verify network connectivity: Test direct access to the certs URL using requests.get('https://www.googleapis.com/robot/v1/metadata/x509/securetoken@system.gserviceaccount.com'). - Check proxies: Set NO_PROXY='www.googleapis.com,.googleapis.com' or os.environ['NO_PROXY'] = 'localhost,127.0.0.1,.googleapis.com'. - Update environment: Ensure recent Python (3.9+), OpenSSL, and firebase-admin SDK versions. Check ssl.OPENSSL_VERSION. - Disable proxy trust if using requests: session.trust_env = False (though SDK manages this internally). - Firewall/VPN: Temporarily disable to test; configure exceptions for Google domains. - Retry logic: Implement exponential backoff around verify_id_token calls. - Ensure latest SDK: Uses HTTP/2, improved timeouts, and caching. Handle in code: try: claims = auth.verify_id_token(id_token) except auth.CertificateFetchError as exc: # Log cause: exc.cause may contain underlying requests/urllib error raise # or handle with retry
Citations:
- 1: https://firebase.google.com/docs/reference/admin/python/firebase_admin.auth
- 2: FR: auth.verify_id_token should throw a more specific exception firebase/firebase-admin-python#252
- 3: https://stackoverflow.com/questions/71915309/token-used-too-early-error-thrown-by-firebase-admin-auths-verify-id-token-metho
- 4: Firebase with Python 3.9 giving an SSL error firebase/firebase-admin-python#610
- 5: https://lightrun.com/answers/firebase-firebase-admin-python-requests-attempt-fails-when-it-is-trying-to-verify-a-token-with-transporterror
- 6: Requests attempt fails when it is trying to verify a token with TransportError firebase/firebase-admin-python#167
- 7: https://dev.to/pavelbuild/your-python-sdk-silently-routes-through-macos-proxy-3j22
- 8: OPENSSL_internal:CERTIFICATE_VERIFY_FAILED googleapis/python-firestore#916
🌐 Web query:
firebase_admin.auth exceptions complete list verify_id_token
💡 Result:
The complete list of exceptions raised by firebase_admin.auth.verify_id_token(id_token, check_revoked=False, clock_skew_seconds=0) is: - ValueError: If id_token is not a string or is empty. - InvalidIdTokenError: If id_token is not a valid Firebase ID token. - ExpiredIdTokenError: If the specified ID token has expired. - RevokedIdTokenError: If check_revoked is True and the ID token has been revoked. - TenantIdMismatchError: If id_token belongs to a tenant that is different than this Client instance. - CertificateFetchError: If an error occurs while fetching the public key certificates required to verify the ID token. - UserDisabledError: If check_revoked is True and the corresponding user record is disabled. These are the specific exceptions documented for this method. Note that some older versions raised more generic ValueError for token issues, but specific auth exceptions were added later.
Citations:
- 1: https://firebase.google.com./docs/reference/admin/python/firebase_admin.auth
- 2: https://firebase.google.com/docs/reference/admin/python/firebase_admin.auth
Remove RevokedIdTokenError from caught exceptions—it's not reachable without check_revoked=True.
firebase_auth.verify_id_token(id_token) is called without check_revoked=True, so RevokedIdTokenError will never be raised. Remove it from _TOKEN_ERRORS.
Additionally, handle firebase_auth.CertificateFetchError separately. This transient network error (fetching Google's public key certificates) currently bubbles up unhandled, when it should be distinguished from authentication failures since it indicates an upstream infrastructure issue, not an invalid token.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/mutations/login_user.py` around lines 8 - 33, Remove
firebase_auth.RevokedIdTokenError from the _TOKEN_ERRORS tuple since
verify_id_token(id_token) is called without check_revoked=True; then in
LoginUser.mutate add a separate except block for
firebase_auth.CertificateFetchError (imported from firebase_auth) to catch
transient certificate fetch/network errors and raise a distinct GraphQLError
like "Authentication service unavailable, please try again later." from err so
certificate fetch failures are distinguished from token validation failures;
keep the existing ValueError and other token-related exception handling around
verify_id_token.
| firebase_uid = decoded.get("uid") | ||
| if firebase_uid is None: | ||
| raise GraphQLError("Token missing uid") from KeyError("uid") | ||
|
|
||
| email = email or decoded.get("email") | ||
| user_doc = { | ||
| "net_id": net_id, | ||
| "firebase_uid": firebase_uid, | ||
| "email": email, | ||
| "favorite_game_ids": [], | ||
| } | ||
| if name is not None: | ||
| user_doc["name"] = name | ||
| if email is not None: | ||
| user_doc["email"] = email | ||
| result = db["users"].insert_one(user_doc) | ||
| try: | ||
| result = db["users"].insert_one(user_doc) | ||
| except DuplicateKeyError as err: | ||
| raise GraphQLError("User already exists.") from err |
There was a problem hiding this comment.
Minor: inconsistency with login_user.py and synthetic KeyError chain.
Two small nits:
- When
uidis missing from the decoded token, this raisesGraphQLError("Token missing uid"), butlogin_user.pyraisesGraphQLError("Invalid or expired token.")for the same condition. Consider aligning the user-facing message across both mutations so clients can handle auth failures uniformly. raise GraphQLError("Token missing uid") from KeyError("uid")constructs a throwawayKeyErrorsolely for chaining. A bareraise GraphQLError(...)(nofrom) is more idiomatic here since there is no original exception to chain.
💡 Proposed tweak
- firebase_uid = decoded.get("uid")
- if firebase_uid is None:
- raise GraphQLError("Token missing uid") from KeyError("uid")
+ firebase_uid = decoded.get("uid")
+ if not firebase_uid:
+ raise GraphQLError("Invalid or expired token.")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| firebase_uid = decoded.get("uid") | |
| if firebase_uid is None: | |
| raise GraphQLError("Token missing uid") from KeyError("uid") | |
| email = email or decoded.get("email") | |
| user_doc = { | |
| "net_id": net_id, | |
| "firebase_uid": firebase_uid, | |
| "email": email, | |
| "favorite_game_ids": [], | |
| } | |
| if name is not None: | |
| user_doc["name"] = name | |
| if email is not None: | |
| user_doc["email"] = email | |
| result = db["users"].insert_one(user_doc) | |
| try: | |
| result = db["users"].insert_one(user_doc) | |
| except DuplicateKeyError as err: | |
| raise GraphQLError("User already exists.") from err | |
| firebase_uid = decoded.get("uid") | |
| if not firebase_uid: | |
| raise GraphQLError("Invalid or expired token.") | |
| email = email or decoded.get("email") | |
| user_doc = { | |
| "firebase_uid": firebase_uid, | |
| "email": email, | |
| "favorite_game_ids": [], | |
| } | |
| if name is not None: | |
| user_doc["name"] = name | |
| try: | |
| result = db["users"].insert_one(user_doc) | |
| except DuplicateKeyError as err: | |
| raise GraphQLError("User already exists.") from err |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/mutations/signup_user.py` around lines 33 - 48, The uid-check in
signup_user.py currently raises GraphQLError("Token missing uid") and chains a
synthetic KeyError("uid"); change the GraphQLError message to match
login_user.py ("Invalid or expired token.") for consistent client-facing auth
errors, and remove the artificial exception chaining (drop the "from
KeyError('uid')" so you simply raise GraphQLError(...)) at the firebase_uid
check in the signup flow where firebase_uid = decoded.get("uid") is validated.
| def scrape_sidearm_story_recap(url): | ||
| """ | ||
| Extract headline and published time from a Cornell Sidearm story/recap page | ||
| """ | ||
| if not url: | ||
| return {} | ||
| try: | ||
| response = requests.get(url, headers=HTTP_REQUEST_HEADERS, timeout=20) | ||
| if response.status_code != 200: | ||
| return {} | ||
| soup = BeautifulSoup(response.text, "html.parser") | ||
| except Exception: | ||
| return {} | ||
| headline = soup.select_one(SIDEARM_STORY_HEADLINE) | ||
| time_el = soup.select_one(SIDEARM_STORY_PUBLISHED_TIME) | ||
| title = headline.get_text(strip=True) if headline else None | ||
| if not title: | ||
| og = soup.find("meta", property="og:title") | ||
| if og and og.get("content"): | ||
| title = og["content"].strip() | ||
| published_at = None | ||
| if time_el: | ||
| published_at = time_el.get_text(strip=True) | ||
| if not published_at and time_el.get("datetime"): | ||
| published_at = time_el["datetime"].strip() | ||
| if not published_at: | ||
| pmeta = soup.find("meta", property="article:published_time") | ||
| if pmeta and pmeta.get("content"): | ||
| published_at = pmeta["content"].strip() | ||
| out = {} | ||
| if title: | ||
| out["recap_article_title"] = title | ||
| if published_at: | ||
| out["recap_published_at"] = published_at | ||
| return out |
There was a problem hiding this comment.
Avoid blanket except Exception and log scraping failures.
except Exception silently swallows every failure (network, DNS, SSL, encoding, BeautifulSoup parse errors, programming errors) and returns {}. This is Ruff BLE001 and, more importantly, will make it hard to notice when the Sidearm markup drifts and the recap scraper quietly stops producing titles/dates.
Recommend narrowing to requests.RequestException and logging non-200 responses and exceptions at warning level so production monitoring can surface regressions.
🛠️ Proposed fix
+import logging
+logger = logging.getLogger(__name__)
+
def scrape_sidearm_story_recap(url):
"""
Extract headline and published time from a Cornell Sidearm story/recap page
"""
if not url:
return {}
try:
response = requests.get(url, headers=HTTP_REQUEST_HEADERS, timeout=20)
- if response.status_code != 200:
- return {}
- soup = BeautifulSoup(response.text, "html.parser")
- except Exception:
- return {}
+ except requests.RequestException as exc:
+ logger.warning("Recap fetch failed for %s: %s", url, exc)
+ return {}
+ if response.status_code != 200:
+ logger.warning("Recap fetch returned %s for %s", response.status_code, url)
+ return {}
+ soup = BeautifulSoup(response.text, "html.parser")🧰 Tools
🪛 Ruff (0.15.10)
[warning] 40-40: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/scrapers/game_details_scrape.py` around lines 29 - 63, The
scrape_sidearm_story_recap function currently swallows all errors with a broad
except Exception and returns {} silently; narrow the exception handling to
requests.RequestException (for the requests.get call) and log any non-200
responses and caught exceptions at warning level so failures are visible in
production monitoring; keep the existing fallback parsing for og:title and
article:published_time, but replace the bare except block around
requests.get/BeautifulSoup with targeted try/except for requests and separately
handle parsing errors by logging a warning if BeautifulSoup parsing/selectors
(SIDEARM_STORY_HEADLINE, SIDEARM_STORY_PUBLISHED_TIME) return unexpected
results, ensuring you reference scrape_sidearm_story_recap, requests.get,
response.status_code, and the selector names when making the changes.
| def softball_summary(box_score_section): | ||
| summary = [] | ||
| scoring_section = box_score_section.find(TAG_SECTION, {ATTR_ARIA_LABEL: LABEL_SCORING_SUMMARY}) | ||
| if scoring_section: | ||
| scoring_rows = scoring_section.find(TAG_TBODY) | ||
| if scoring_rows: | ||
| for row in scoring_rows.find_all(TAG_TR): | ||
| team = row.find_all(TAG_TD)[0].find(TAG_IMG)[ATTR_ALT] | ||
| inning = row.find_all(TAG_TD)[3].text.strip() | ||
| desc_cell = row.find_all(TAG_TD)[4] | ||
| span = desc_cell.find(TAG_SPAN) | ||
| if span: | ||
| span.extract() | ||
| desc = desc_cell.get_text(strip=True) | ||
| cornell_score = int(row.find_all(TAG_TD)[5].get_text(strip=True) or 0) | ||
| opp_score = int(row.find_all(TAG_TD)[6].get_text(strip=True) or 0) | ||
| summary.append({ | ||
| 'team': team, | ||
| 'period': inning, | ||
| 'inning': inning, | ||
| 'description': desc, | ||
| 'cor_score': cornell_score, | ||
| 'opp_score': opp_score | ||
| }) | ||
| if not summary: | ||
| summary = [{"message": "No scoring events in this game."}] | ||
| return summary |
There was a problem hiding this comment.
softball_summary will crash on malformed/edge-case rows.
Several lines assume an exact table shape and will raise unhandled exceptions, aborting the scrape for that game:
- Line 100:
row.find_all(TAG_TD)[0].find(TAG_IMG)[ATTR_ALT]raisesTypeError(iffind(TAG_IMG)returnsNone) orKeyError(if the<img>has noalt). Summary headers/footers inside<tbody>or rows without a team icon will break this. - Lines 107–108:
int(row.find_all(TAG_TD)[5].get_text(strip=True) or 0)raisesValueErrorfor any non-numeric text such as"-","—", or a score column that briefly contains a hyphen. Theor 0guard only covers an empty string. - Hardcoded indices
[0, 3, 4, 5, 6]silently shift semantics if Sidearm adds a column.
Since this is the scraper hot path, consider wrapping each row in a try/except and skipping malformed rows (with a logger.warning), and using a safe int-coercion helper. Other *_summary parsers in this file have the same pattern, but softball is the one being introduced here.
🛠️ Suggested hardening
+def _safe_int(text):
+ try:
+ return int(text)
+ except (TypeError, ValueError):
+ return 0
+
def softball_summary(box_score_section):
summary = []
scoring_section = box_score_section.find(TAG_SECTION, {ATTR_ARIA_LABEL: LABEL_SCORING_SUMMARY})
if scoring_section:
scoring_rows = scoring_section.find(TAG_TBODY)
if scoring_rows:
for row in scoring_rows.find_all(TAG_TR):
- team = row.find_all(TAG_TD)[0].find(TAG_IMG)[ATTR_ALT]
- inning = row.find_all(TAG_TD)[3].text.strip()
- desc_cell = row.find_all(TAG_TD)[4]
- span = desc_cell.find(TAG_SPAN)
- if span:
- span.extract()
- desc = desc_cell.get_text(strip=True)
- cornell_score = int(row.find_all(TAG_TD)[5].get_text(strip=True) or 0)
- opp_score = int(row.find_all(TAG_TD)[6].get_text(strip=True) or 0)
- summary.append({
+ tds = row.find_all(TAG_TD)
+ if len(tds) < 7:
+ continue
+ img = tds[0].find(TAG_IMG)
+ team = img.get(ATTR_ALT) if img else None
+ if not team:
+ continue
+ inning = tds[3].text.strip()
+ desc_cell = tds[4]
+ span = desc_cell.find(TAG_SPAN)
+ if span:
+ span.extract()
+ desc = desc_cell.get_text(strip=True)
+ cornell_score = _safe_int(tds[5].get_text(strip=True))
+ opp_score = _safe_int(tds[6].get_text(strip=True))
+ summary.append({
'team': team,
'period': inning,
'inning': inning,
'description': desc,
'cor_score': cornell_score,
'opp_score': opp_score
})
Overview
Added scraping functionality for sports that do not have box scores but instead have a recap link.
Changes Made
This scraper captures the recap links for:
"Swimming & Diving",
"Track & Field",
"Wrestling",
"Golf",
"Polo",
"Fencing",
"Equestrian",
"Gymnastics"
and, using those links, visits their respective news article and scrapes the article's title and published at date/time.
Test Coverage
Tested locally with Graphql
Summary by CodeRabbit
Release Notes
New Features
Chores