Conversation
📝 WalkthroughWalkthroughThis PR introduces a canonical files system that persistently stores and caches Telegram media with public hashes for secure sharing. It adds database collections, new streaming endpoints, background file-record maintenance, and refactors the link generation pipeline to reuse cached media instead of re-forwarding. Changes
Sequence DiagramsequenceDiagram
participant Client
participant StreamBot as Stream Bot
participant DB as Database
participant BinChannel as BIN Channel
participant Streamer as ByteStreamer
Client->>StreamBot: send_channel_links(links, source_info, source_id)
StreamBot->>DB: get_or_create_canonical_file(source_message)
alt Canonical record exists & valid
DB->>BinChannel: fetch canonical message
BinChannel-->>DB: canonical_message
DB->>DB: touch_file_record (update cache)
DB-->>StreamBot: return existing record
else No valid canonical record
DB->>DB: acquire_file_ingest_claim
alt Claim acquired
StreamBot->>StreamBot: copy_media (forward to BIN)
StreamBot->>DB: create_file_record (persist)
DB-->>StreamBot: return new record
else Claim held by another worker
DB->>DB: poll claim & re-validate
DB-->>StreamBot: return record after other worker completes
end
end
StreamBot->>StreamBot: gen_canonical_links (from record)
StreamBot->>Client: send links (or fallback to gen_links)
Note over DB,Streamer: On file access via public hash
Client->>Streamer: GET /f/{secure_hash}/{name}
Streamer->>DB: get_file_by_hash(secure_hash)
DB-->>Streamer: return file record
Streamer->>BinChannel: resolve canonical_message_id
BinChannel-->>Streamer: media_ref
Streamer->>Streamer: stream_file(media_ref, on_fallback_message)
Streamer-->>Client: media bytes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (4)
Thunder/utils/database.py (2)
254-266: Inconsistent error handling between similar getter methods.
get_file_by_unique_idreturnsNoneon exception (line 259), whileget_file_by_hashre-raises the exception (line 266). This inconsistency can lead to unexpected behavior for callers expecting uniform behavior. Consider aligning the error handling strategy across all file lookup methods.♻️ Proposed fix to make error handling consistent
Either return
Noneconsistently (if callers should handle missing records gracefully):async def get_file_by_hash(self, public_hash: str) -> Optional[Dict[str, Any]]: try: return await self.files_col.find_one({"public_hash": public_hash}) except Exception as e: logger.error(f"Error getting file by hash {public_hash}: {e}", exc_info=True) - raise + return NoneOr raise consistently (if failures should propagate):
async def get_file_by_unique_id(self, file_unique_id: str) -> Optional[Dict[str, Any]]: try: return await self.files_col.find_one({"file_unique_id": file_unique_id}) except Exception as e: logger.error(f"Error getting file by unique_id {file_unique_id}: {e}", exc_info=True) - return None + raise🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Thunder/utils/database.py` around lines 254 - 266, Both getters (get_file_by_unique_id and get_file_by_hash) currently handle exceptions differently; make them consistent by modifying get_file_by_hash to mirror get_file_by_unique_id: catch exceptions, log the error with exc_info via logger.error including the public_hash value, and return None instead of re-raising; also scan other file lookup methods using files_col to ensure they follow the same pattern if the project convention is to return None on failure.
302-327: Silent error swallowing may hide database issues.Both
touch_file_recordandupdate_file_idlog errors but do not propagate them. While this might be intentional for non-critical "fire-and-forget" operations, it can hide persistent database problems. Consider whether these operations should at least have an option to raise or return a success indicator.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Thunder/utils/database.py` around lines 302 - 327, The current touch_file_record and update_file_id methods swallow exceptions after logging, which can hide persistent DB failures; modify these functions (touch_file_record and update_file_id) to return a success indicator (bool) or accept an optional parameter like raise_on_error: bool = False that, when true, re-raises the caught exception after logging; ensure you update the calls that rely on files_col.update_one to handle the returned bool or allow exceptions to propagate, and keep logger.error(..., exc_info=True) but do not silently suppress errors when raise_on_error is requested.Thunder/bot/plugins/stream.py (1)
389-403: Consider clarifying fallback behavior whenstored_msgis None.When
canonical_recordexists butstored_msgis None,send_channel_linksis called withtarget_msg=None. This works correctly becausesend_channel_linksusesStreamBot.send_messagewithreply_to_message_idin that case. However, a brief comment would help future readers understand this intentional design.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Thunder/bot/plugins/stream.py` around lines 389 - 403, Add a brief inline comment explaining the intentional fallback when canonical_record exists but stored_msg is None: that send_channel_links is deliberately called with target_msg=None so StreamBot.send_message will fall back to using reply_to_message_id for threading. Update the call site around the canonical_record / stored_msg logic (referencing variables canonical_record and stored_msg and the function send_channel_links) to include this explanatory comment so future readers understand why target_msg is passed as None instead of skipping or altering the call.Thunder/utils/render_template.py (1)
42-42: Function parameteridshadows Python builtin.Consider renaming to
message_idfor clarity and to avoid shadowing the builtinid()function.♻️ Proposed fix
-async def render_page(id: int, secure_hash: str, requested_action: str | None = None) -> str: +async def render_page(message_id: int, secure_hash: str, requested_action: str | None = None) -> str: try: try: - message = await StreamBot.get_messages(chat_id=int(Var.BIN_CHANNEL), message_ids=id) + message = await StreamBot.get_messages(chat_id=int(Var.BIN_CHANNEL), message_ids=message_id)(and update the error message at line 63 accordingly)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Thunder/utils/render_template.py` at line 42, Rename the parameter id in the async function render_page to message_id to avoid shadowing the builtin id(); update its type hint and every reference inside render_page (including uses in f-strings, comparisons, and the error/ValueError message currently referencing id) to message_id, and adjust any callers to pass message_id (or update call sites to use the new name) so behavior remains identical; also update the error message text that mentioned id to use message_id for clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Thunder/utils/canonical_files.py`:
- Around line 293-297: Don't treat any exception from
_is_canonical_record_valid(existing, file_unique_id) as a reuse; instead, only
call schedule_touch_file_record and return the existing record when validation
explicitly succeeds. Change the except branch to either re-raise the exception
(to surface permanent fetch/config errors) or return a non-reuse result (e.g.,
existing, error, False) so stale canonical mappings are not kept and
reuse_count/seen_count aren't inflated; ensure the only place that
touches/reuses the record is the success path where is_valid is True and retain
references to _is_canonical_record_valid, schedule_touch_file_record, existing,
and file_unique_id to locate the code.
- Around line 303-332: The code currently calls copy_media() before reserving
canonical ownership, which can leave untracked uploads if another worker wins
the unique-key race; change the flow in the function that uses copy_media,
create_file_record, replace_file_record and _merge_replacement_record so that
the unique canonical row is reserved first (e.g., insert a tentative record or
acquire a DB lock/transaction on file_unique_id), then call copy_media() to
upload and update that reserved record with stored_message and final metadata;
on DuplicateKeyError or if get_file_by_unique_id shows another owner, ensure you
schedule_touch_file_record(existing, reused=True) and cleanup any uploaded
message in Var.BIN_CHANNEL (so the caller can remove the stray stored_message),
and only call _remember(record) after the reservation+upload+DB update succeeds.
- Around line 258-279: The file_ingest_lock context manager increments
_upload_lock_counts before awaiting lock.acquire(), which lets a CancelledError
leak counts and lock entries; fix by adding a cancellation-safe path: after
incrementing but before yielding, wrap await lock.acquire() in a try/except that
catches asyncio.CancelledError (and other exceptions if desired), and inside
that except acquire the _upload_locks_guard and decrement/remove the
_upload_lock_counts and _upload_locks for the file_unique_id as done in the
finally cleanup, then re-raise the cancellation; keep the existing finally block
unchanged to handle normal releases. Ensure you reference the same symbols
(_upload_locks, _upload_lock_counts, _upload_locks_guard, file_ingest_lock,
lock.acquire) so the cleanup logic mirrors the existing removal behavior.
In `@Thunder/utils/custom_dl.py`:
- Around line 52-54: The current comparison can wrongly treat a string/file_id
or Message as unequal to an int fallback, so change the logic around refs,
media_ref and fallback_message_id to compare IDs explicitly: if
fallback_message_id is not None, derive media_id = media_ref if
isinstance(media_ref, int) else (media_ref.message_id if isinstance(media_ref,
Message) else None) and only append fallback_message_id to refs when media_id is
None (no comparable id) or fallback_message_id != media_id; use the symbols
refs, media_ref, fallback_message_id and the Message.type check to implement
this explicit type-aware comparison.
In `@Thunder/utils/render_template.py`:
- Around line 16-22: The Jinja2 Environment is created without autoescaping
which risks XSS; update the Environment construction for template_env (the
Environment call using FileSystemLoader) to enable autoescape (either
autoescape=True or use jinja2.select_autoescape for HTML templates) so templates
rendered by render_media_page (which currently manually escapes file_name) are
automatically escaped by default; ensure the change targets the template_env
setup so existing render calls continue to work and adjust any templates that
intentionally output raw HTML to use |safe.
---
Nitpick comments:
In `@Thunder/bot/plugins/stream.py`:
- Around line 389-403: Add a brief inline comment explaining the intentional
fallback when canonical_record exists but stored_msg is None: that
send_channel_links is deliberately called with target_msg=None so
StreamBot.send_message will fall back to using reply_to_message_id for
threading. Update the call site around the canonical_record / stored_msg logic
(referencing variables canonical_record and stored_msg and the function
send_channel_links) to include this explanatory comment so future readers
understand why target_msg is passed as None instead of skipping or altering the
call.
In `@Thunder/utils/database.py`:
- Around line 254-266: Both getters (get_file_by_unique_id and get_file_by_hash)
currently handle exceptions differently; make them consistent by modifying
get_file_by_hash to mirror get_file_by_unique_id: catch exceptions, log the
error with exc_info via logger.error including the public_hash value, and return
None instead of re-raising; also scan other file lookup methods using files_col
to ensure they follow the same pattern if the project convention is to return
None on failure.
- Around line 302-327: The current touch_file_record and update_file_id methods
swallow exceptions after logging, which can hide persistent DB failures; modify
these functions (touch_file_record and update_file_id) to return a success
indicator (bool) or accept an optional parameter like raise_on_error: bool =
False that, when true, re-raises the caught exception after logging; ensure you
update the calls that rely on files_col.update_one to handle the returned bool
or allow exceptions to propagate, and keep logger.error(..., exc_info=True) but
do not silently suppress errors when raise_on_error is requested.
In `@Thunder/utils/render_template.py`:
- Line 42: Rename the parameter id in the async function render_page to
message_id to avoid shadowing the builtin id(); update its type hint and every
reference inside render_page (including uses in f-strings, comparisons, and the
error/ValueError message currently referencing id) to message_id, and adjust any
callers to pass message_id (or update call sites to use the new name) so
behavior remains identical; also update the error message text that mentioned id
to use message_id for clarity.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c4d662ab-86e0-4b4c-b713-4cb3b500961c
📒 Files selected for processing (9)
Thunder/__init__.pyThunder/__main__.pyThunder/bot/plugins/stream.pyThunder/server/stream_routes.pyThunder/utils/bot_utils.pyThunder/utils/canonical_files.pyThunder/utils/custom_dl.pyThunder/utils/database.pyThunder/utils/render_template.py
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
Thunder/server/stream_routes.py (1)
31-31:⚠️ Potential issue | 🟡 MinorReject unsupported multi-range headers explicitly.
match()accepts values likebytes=0-1,2-3and silently parses only the first span, but the response later advertises a singleContent-Range. Unsupported multi-range requests should fail fast instead of being partially honored.♻️ Proposed fix
-RANGE_REGEX = re.compile(r"bytes=(?P<start>\d*)-(?P<end>\d*)") +RANGE_REGEX = re.compile(r"bytes=(?P<start>\d*)-(?P<end>\d*)$") @@ - match = RANGE_REGEX.match(range_header) + match = RANGE_REGEX.fullmatch(range_header)Also applies to: 119-121
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Thunder/server/stream_routes.py` at line 31, RANGE_REGEX currently allows multi-range headers like "bytes=0-1,2-3" because it isn't anchored; update RANGE_REGEX (and any other occurrence) to only accept a single range by anchoring the pattern (or switch to using fullmatch) so the entire header must match a single "bytes=start-end". In the code paths that use RANGE_REGEX (the RANGE_REGEX variable and the logic that reads the Range header around the later checks at lines ~119-121), explicitly detect a comma or a non-full match and reject the request (return the appropriate error/416) instead of silently parsing only the first range.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Thunder/utils/canonical_files.py`:
- Around line 58-79: build_file_record currently stores raw
get_fname(stored_message) which can be None or bytes and later becomes
canonical_record["file_name"] used in gen_canonical_links; replace that by
normalizing the filename to a str and applying the same fallback/name-derivation
logic used by ByteStreamer.get_file_info_sync: if get_fname(...) returns bytes
decode to UTF-8 (with errors='replace'), if it returns None compute a sensible
fallback (e.g., use derived extension or a default like "file" plus
file_unique_id or mime-derived name) and assign that string to "file_name" in
the returned dict so gen_canonical_links always gets a proper display name.
Ensure you reference get_fname(stored_message), file_unique_id,
ByteStreamer.get_file_info_sync logic, and canonical_record["file_name"] when
making the change.
In `@Thunder/utils/custom_dl.py`:
- Around line 59-88: Limit retries/fallbacks to the "no bytes sent yet" case and
surface real backend errors: in the async streaming loop in custom_dl.py (the
generator that calls self.client.stream_media), only swallow FloodWait or try
the next ref when started_stream is False; if a FloodWait or any Exception
occurs after started_stream is True re-raise it immediately. Also, after
exhausting refs, if last_error is set re-raise that original exception instead
of wrapping it in FileNotFound; only raise FileNotFound when there was no prior
exception. This affects the exception handlers around stream_media and the final
raise at the end of the function.
In `@Thunder/utils/render_template.py`:
- Around line 52-60: get_fname() may return bytes and Var.URL's path can be lost
by urllib.parse.urljoin; first coerce file_name to a str (if
isinstance(file_name, bytes) decode with 'utf-8' and safe fallback) before
calling .replace() and urllib.parse.quote, then build the preview URL without
dropping Var.URL's path (e.g. ensure a single trailing slash on Var.URL or join
using Var.URL.rstrip('/') + '/' + f'{secure_hash}{message_id}/{quoted_filename}'
instead of relying on urljoin). Keep the existing file_unique_id check
(get_uniqid) and still call render_media_page(file_name, src, requested_action)
after these fixes.
---
Outside diff comments:
In `@Thunder/server/stream_routes.py`:
- Line 31: RANGE_REGEX currently allows multi-range headers like "bytes=0-1,2-3"
because it isn't anchored; update RANGE_REGEX (and any other occurrence) to only
accept a single range by anchoring the pattern (or switch to using fullmatch) so
the entire header must match a single "bytes=start-end". In the code paths that
use RANGE_REGEX (the RANGE_REGEX variable and the logic that reads the Range
header around the later checks at lines ~119-121), explicitly detect a comma or
a non-full match and reject the request (return the appropriate error/416)
instead of silently parsing only the first range.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2a1e19f4-ed8f-4134-8e02-5e0bcf39e715
📒 Files selected for processing (7)
Thunder/__main__.pyThunder/bot/plugins/stream.pyThunder/server/stream_routes.pyThunder/utils/canonical_files.pyThunder/utils/custom_dl.pyThunder/utils/database.pyThunder/utils/render_template.py
| media = get_media(stored_message) | ||
| file_unique_id = get_uniqid(stored_message) | ||
| if not media or not file_unique_id: | ||
| return None | ||
|
|
||
| now = datetime.datetime.utcnow() | ||
| return { | ||
| "file_unique_id": file_unique_id, | ||
| "public_hash": build_public_hash(file_unique_id), | ||
| "canonical_message_id": stored_message.id, | ||
| "file_id": getattr(media, "file_id", None), | ||
| "file_name": get_fname(stored_message), | ||
| "mime_type": _infer_mime_type(media), | ||
| "file_size": getattr(media, "file_size", 0) or 0, | ||
| "media_type": type(media).__name__.lower(), | ||
| "first_source_chat_id": source_chat_id, | ||
| "first_source_message_id": source_message_id, | ||
| "created_at": now, | ||
| "last_seen_at": now, | ||
| "seen_count": 1, | ||
| "reuse_count": 0 | ||
| } |
There was a problem hiding this comment.
Persist a real display filename in the canonical record.
build_file_record() stores raw get_fname(stored_message), but the new canonical path later feeds canonical_record["file_name"] into gen_canonical_links(). For media without a Telegram filename, users will get "None" in link text; byte filenames become b'...'. Normalize to str here and apply the same fallback-name logic already used by ByteStreamer.get_file_info_sync() before saving.
♻️ Proposed fix
def build_file_record(
stored_message: Message,
@@
) -> Optional[Dict[str, Any]]:
media = get_media(stored_message)
file_unique_id = get_uniqid(stored_message)
if not media or not file_unique_id:
return None
+ media_type = type(media).__name__.lower()
+ file_name = get_fname(stored_message)
+ if isinstance(file_name, bytes):
+ file_name = file_name.decode("utf-8", errors="replace")
+ if not file_name:
+ ext_map = {
+ "photo": "jpg",
+ "audio": "mp3",
+ "voice": "ogg",
+ "video": "mp4",
+ "animation": "mp4",
+ "videonote": "mp4",
+ "sticker": "webp",
+ }
+ file_name = f"Thunder_{stored_message.id}.{ext_map.get(media_type, 'bin')}"
+
now = datetime.datetime.utcnow()
return {
"file_unique_id": file_unique_id,
"public_hash": build_public_hash(file_unique_id),
"canonical_message_id": stored_message.id,
"file_id": getattr(media, "file_id", None),
- "file_name": get_fname(stored_message),
+ "file_name": file_name,
"mime_type": _infer_mime_type(media),
"file_size": getattr(media, "file_size", 0) or 0,
- "media_type": type(media).__name__.lower(),
+ "media_type": media_type,
"first_source_chat_id": source_chat_id,
"first_source_message_id": source_message_id,
"created_at": now,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Thunder/utils/canonical_files.py` around lines 58 - 79, build_file_record
currently stores raw get_fname(stored_message) which can be None or bytes and
later becomes canonical_record["file_name"] used in gen_canonical_links; replace
that by normalizing the filename to a str and applying the same
fallback/name-derivation logic used by ByteStreamer.get_file_info_sync: if
get_fname(...) returns bytes decode to UTF-8 (with errors='replace'), if it
returns None compute a sensible fallback (e.g., use derived extension or a
default like "file" plus file_unique_id or mime-derived name) and assign that
string to "file_name" in the returned dict so gen_canonical_links always gets a
proper display name. Ensure you reference get_fname(stored_message),
file_unique_id, ByteStreamer.get_file_info_sync logic, and
canonical_record["file_name"] when making the change.
| last_error: Exception | None = None | ||
| for ref in refs: | ||
| started_stream = False | ||
| while True: | ||
| try: | ||
| target = await self.get_message(ref) if isinstance(ref, int) else ref | ||
| if ( | ||
| on_fallback_message is not None and | ||
| fallback_message_id is not None and | ||
| ref == fallback_message_id and | ||
| isinstance(target, Message) | ||
| ): | ||
| await on_fallback_message(target) | ||
| async for chunk in self.client.stream_media( | ||
| target, offset=chunk_offset, limit=chunk_limit | ||
| ): | ||
| started_stream = True | ||
| yield chunk | ||
| return | ||
| except FloodWait as e: | ||
| logger.debug(f"FloodWait: stream_file, sleep {e.value}s") | ||
| await asyncio.sleep(e.value) | ||
| except Exception as e: | ||
| last_error = e | ||
| logger.debug(f"Error streaming media ref {ref}: {e}", exc_info=True) | ||
| if started_stream: | ||
| raise | ||
| break | ||
|
|
||
| raise FileNotFound(f"Unable to stream file: {last_error}") |
There was a problem hiding this comment.
Only retry or fall back before any bytes have been sent.
Once Line 75 has already yielded a chunk, the FloodWait branch restarts stream_media() from the original chunk_offset, which duplicates bytes in the HTTP response. And if every pre-stream attempt fails, wrapping the last arbitrary exception in FileNotFound makes stream_routes.py return 404 for backend errors. Retries/fallbacks should be limited to the “no bytes sent yet” case, and the final non-FileNotFound error should be re-raised.
♻️ Proposed fix
last_error: Exception | None = None
for ref in refs:
started_stream = False
while True:
try:
@@
async for chunk in self.client.stream_media(
target, offset=chunk_offset, limit=chunk_limit
):
started_stream = True
yield chunk
return
except FloodWait as e:
+ if started_stream:
+ raise
logger.debug(f"FloodWait: stream_file, sleep {e.value}s")
await asyncio.sleep(e.value)
except Exception as e:
last_error = e
logger.debug(f"Error streaming media ref {ref}: {e}", exc_info=True)
if started_stream:
raise
break
- raise FileNotFound(f"Unable to stream file: {last_error}")
+ if isinstance(last_error, FileNotFound):
+ raise last_error
+ if last_error is not None:
+ raise last_error
+ raise FileNotFound("Unable to stream file")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Thunder/utils/custom_dl.py` around lines 59 - 88, Limit retries/fallbacks to
the "no bytes sent yet" case and surface real backend errors: in the async
streaming loop in custom_dl.py (the generator that calls
self.client.stream_media), only swallow FloodWait or try the next ref when
started_stream is False; if a FloodWait or any Exception occurs after
started_stream is True re-raise it immediately. Also, after exhausting refs, if
last_error is set re-raise that original exception instead of wrapping it in
FileNotFound; only raise FileNotFound when there was no prior exception. This
affects the exception handlers around stream_media and the final raise at the
end of the function.
| file_unique_id = get_uniqid(message) | ||
| file_name = get_fname(message) | ||
|
|
||
| if not file_unique_id or file_unique_id[:6] != secure_hash: | ||
| raise InvalidHash("File unique ID or secure hash mismatch during rendering.") | ||
|
|
||
| quoted_filename = urllib.parse.quote(file_name.replace('/', '_')) | ||
| src = urllib.parse.urljoin(Var.URL, f'{secure_hash}{id}/{quoted_filename}') | ||
| safe_filename = html_module.escape(file_name) | ||
| if requested_action == 'stream': | ||
| template = template_env.get_template('req.html') | ||
| context = { | ||
| 'heading': f"View {safe_filename}", | ||
| 'file_name': safe_filename, | ||
| 'src': f"{src}?disposition=inline" | ||
| } | ||
| else: | ||
| template = template_env.get_template('dl.html') | ||
| context = { | ||
| 'file_name': safe_filename, | ||
| 'src': src | ||
| } | ||
| return await template.render_async(**context) | ||
| except Exception as e: | ||
| logger.error(f"Error in render_page for ID {id} and hash {secure_hash}: {e}", exc_info=True) | ||
| raise | ||
|
|
||
| quoted_filename = urllib.parse.quote(file_name.replace('/', '_'), safe="") | ||
| src = urllib.parse.urljoin(Var.URL, f'{secure_hash}{message_id}/{quoted_filename}') | ||
| return await render_media_page(file_name, src, requested_action) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
python - <<'PY'
from urllib.parse import urljoin
for base in ("https://example.com/base", "https://example.com/base/"):
print(base, "=>", urljoin(base, "abc123/file.txt"))
PYRepository: fyaz05/FileToLink
Length of output: 194
🏁 Script executed:
# First, let's understand the repository structure
git ls-files | head -20Repository: fyaz05/FileToLink
Length of output: 505
🏁 Script executed:
# Find the render_template.py file and examine it
fd render_template.pyRepository: fyaz05/FileToLink
Length of output: 93
🏁 Script executed:
# Find and examine the get_fname function
rg "def get_fname" -A 10Repository: fyaz05/FileToLink
Length of output: 742
🏁 Script executed:
# Check bot_utils.py around lines 133-136
fd bot_utils.pyRepository: fyaz05/FileToLink
Length of output: 87
🏁 Script executed:
# Search for get_fname usage patterns
rg "get_fname" -B 2 -A 2Repository: fyaz05/FileToLink
Length of output: 3450
🏁 Script executed:
# Get the complete get_fname implementation
cat -n Thunder/utils/file_properties.py | head -40Repository: fyaz05/FileToLink
Length of output: 1468
🏁 Script executed:
# Get the full context around lines 133-136 in bot_utils.py
cat -n Thunder/utils/bot_utils.py | sed -n '120,145p'Repository: fyaz05/FileToLink
Length of output: 1484
🏁 Script executed:
# Check the complete render_template.py file to understand context
wc -l Thunder/utils/render_template.pyRepository: fyaz05/FileToLink
Length of output: 96
🏁 Script executed:
# Read the complete render_template.py
cat -n Thunder/utils/render_template.pyRepository: fyaz05/FileToLink
Length of output: 2897
🏁 Script executed:
# Get the complete get_fname function
cat -n Thunder/utils/file_properties.py | sed -n '29,50p'Repository: fyaz05/FileToLink
Length of output: 807
🏁 Script executed:
# Check if get_fname is fully shown
cat -n Thunder/utils/file_properties.py | grep -A 20 "def get_fname"Repository: fyaz05/FileToLink
Length of output: 896
🏁 Script executed:
# Get the rest of get_fname function
cat -n Thunder/utils/file_properties.py | sed -n '49,80p'Repository: fyaz05/FileToLink
Length of output: 1242
Handle potential bytes return from get_fname() and fix URL construction with urljoin().
The function annotation says get_fname() returns str, but defensive handling in bot_utils.py:136 checks for bytes—suggesting it can return bytes in practice. The current code calls .replace() directly without checking, which will fail if bytes are returned. Additionally, urllib.parse.urljoin(Var.URL, ...) drops any path prefix from Var.URL unless the base ends with /, breaking preview links behind a subpath.
♻️ Proposed fix
- file_unique_id = get_uniqid(message)
- file_name = get_fname(message)
+ file_unique_id = get_uniqid(message)
+ file_name_raw = get_fname(message)
+ if isinstance(file_name_raw, bytes):
+ file_name = file_name_raw.decode("utf-8", errors="replace")
+ else:
+ file_name = str(file_name_raw) if file_name_raw else f"file_{message_id}"
@@
- quoted_filename = urllib.parse.quote(file_name.replace('/', '_'), safe="")
- src = urllib.parse.urljoin(Var.URL, f'{secure_hash}{message_id}/{quoted_filename}')
+ quoted_filename = urllib.parse.quote(file_name.replace('/', '_'), safe="")
+ src = f"{Var.URL.rstrip('/')}/{secure_hash}{message_id}/{quoted_filename}"
return await render_media_page(file_name, src, requested_action)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Thunder/utils/render_template.py` around lines 52 - 60, get_fname() may
return bytes and Var.URL's path can be lost by urllib.parse.urljoin; first
coerce file_name to a str (if isinstance(file_name, bytes) decode with 'utf-8'
and safe fallback) before calling .replace() and urllib.parse.quote, then build
the preview URL without dropping Var.URL's path (e.g. ensure a single trailing
slash on Var.URL or join using Var.URL.rstrip('/') + '/' +
f'{secure_hash}{message_id}/{quoted_filename}' instead of relying on urljoin).
Keep the existing file_unique_id check (get_uniqid) and still call
render_media_page(file_name, src, requested_action) after these fixes.
Summary by CodeRabbit
New Features
Improvements