Skip to content

2.1#44

Open
fyaz05 wants to merge 2 commits intomainfrom
beta
Open

2.1#44
fyaz05 wants to merge 2 commits intomainfrom
beta

Conversation

@fyaz05
Copy link
Owner

@fyaz05 fyaz05 commented Mar 19, 2026

Summary by CodeRabbit

  • New Features

    • Canonical file deduplication and caching system for improved performance and faster access to frequently served files.
    • Fallback file streaming support for enhanced download reliability.
  • Improvements

    • Refactored media delivery endpoints with optimized routing.
    • Background index maintenance during startup for faster initialization.
    • Enhanced file link generation and URL pattern handling.

@coderabbitai
Copy link

coderabbitai bot commented Mar 19, 2026

📝 Walkthrough

Walkthrough

This PR introduces a canonical files system that persistently stores and caches Telegram media with public hashes for secure sharing. It adds database collections, new streaming endpoints, background file-record maintenance, and refactors the link generation pipeline to reuse cached media instead of re-forwarding.

Changes

Cohort / File(s) Summary
Canonical Files Infrastructure
Thunder/utils/canonical_files.py, Thunder/utils/database.py
Introduces new canonical_files.py module with file record creation, validation, caching, and reuse logic using in-memory TTL caches. Extends Database with files and file_ingest_locks collections, file lookup/write methods, and ingest-claim concurrency helpers to coordinate multi-worker file ingestion.
Link Generation & Rendering
Thunder/utils/bot_utils.py, Thunder/utils/render_template.py
Adds quote_media_name() and gen_canonical_links() for canonical link generation. Refactors gen_links() to use centralized _build_links() and new path patterns. Introduces render_media_page() for template rendering and updates render_page() signature to use message_id parameter.
Stream Handling
Thunder/bot/plugins/stream.py, Thunder/utils/custom_dl.py
Refactors send_channel_links() to accept optional target_msg and integrate canonical-file lookup; skips forwarding when canonical copy exists. Updates ByteStreamer.stream_file() to accept flexible media_ref, fallback references, and optional on_fallback_message callback with retry-on-fallback logic.
Server Routes
Thunder/server/stream_routes.py
Adds new canonical endpoints (GET /f/{secure_hash}/{name}, GET /watch/f/{secure_hash}/{name}) with public hash validation. Refactors media delivery into _serve_media_response() helper; updates original routes to delegate to the new helper.
Startup & Shutdown
Thunder/__main__.py
Adds schedule_index_ensure() to run index creation asynchronously and calls drain_background_touch_tasks() on shutdown to clean up background file-record touch tasks; logs index and cleanup outcomes.

Sequence Diagram

sequenceDiagram
    participant Client
    participant StreamBot as Stream Bot
    participant DB as Database
    participant BinChannel as BIN Channel
    participant Streamer as ByteStreamer

    Client->>StreamBot: send_channel_links(links, source_info, source_id)
    StreamBot->>DB: get_or_create_canonical_file(source_message)
    
    alt Canonical record exists & valid
        DB->>BinChannel: fetch canonical message
        BinChannel-->>DB: canonical_message
        DB->>DB: touch_file_record (update cache)
        DB-->>StreamBot: return existing record
    else No valid canonical record
        DB->>DB: acquire_file_ingest_claim
        alt Claim acquired
            StreamBot->>StreamBot: copy_media (forward to BIN)
            StreamBot->>DB: create_file_record (persist)
            DB-->>StreamBot: return new record
        else Claim held by another worker
            DB->>DB: poll claim & re-validate
            DB-->>StreamBot: return record after other worker completes
        end
    end
    
    StreamBot->>StreamBot: gen_canonical_links (from record)
    StreamBot->>Client: send links (or fallback to gen_links)
    
    Note over DB,Streamer: On file access via public hash
    Client->>Streamer: GET /f/{secure_hash}/{name}
    Streamer->>DB: get_file_by_hash(secure_hash)
    DB-->>Streamer: return file record
    Streamer->>BinChannel: resolve canonical_message_id
    BinChannel-->>Streamer: media_ref
    Streamer->>Streamer: stream_file(media_ref, on_fallback_message)
    Streamer-->>Client: media bytes
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • PR #31: Modifies the same core files (stream.py, stream_routes.py) with overlapping changes to send_channel_links and media delivery routing.
  • PR #11: Updates stream_routes.py and custom_dl.py with related changes to ByteStreamer.stream_file behavior and media delivery flow.
  • PR #36: Touches both __main__.py and database.py, indicating related startup/shutdown and database schema integration.

Poem

🐰 A canonical copy keeps the clutter at bay,
No forwarding circles, just links we can share,
Touch tokens refresh, the cache knows the way,
Public hashes bloom where the media's there—
One file, one truth, efficiently rare! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title '2.1' is vague and generic, providing no meaningful information about the changeset's purpose or scope. Replace the title with a descriptive summary of the main change, such as 'Add canonical file caching and streaming infrastructure' or 'Implement canonical file records with public hash validation'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch beta

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (4)
Thunder/utils/database.py (2)

254-266: Inconsistent error handling between similar getter methods.

get_file_by_unique_id returns None on exception (line 259), while get_file_by_hash re-raises the exception (line 266). This inconsistency can lead to unexpected behavior for callers expecting uniform behavior. Consider aligning the error handling strategy across all file lookup methods.

♻️ Proposed fix to make error handling consistent

Either return None consistently (if callers should handle missing records gracefully):

     async def get_file_by_hash(self, public_hash: str) -> Optional[Dict[str, Any]]:
         try:
             return await self.files_col.find_one({"public_hash": public_hash})
         except Exception as e:
             logger.error(f"Error getting file by hash {public_hash}: {e}", exc_info=True)
-            raise
+            return None

Or raise consistently (if failures should propagate):

     async def get_file_by_unique_id(self, file_unique_id: str) -> Optional[Dict[str, Any]]:
         try:
             return await self.files_col.find_one({"file_unique_id": file_unique_id})
         except Exception as e:
             logger.error(f"Error getting file by unique_id {file_unique_id}: {e}", exc_info=True)
-            return None
+            raise
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Thunder/utils/database.py` around lines 254 - 266, Both getters
(get_file_by_unique_id and get_file_by_hash) currently handle exceptions
differently; make them consistent by modifying get_file_by_hash to mirror
get_file_by_unique_id: catch exceptions, log the error with exc_info via
logger.error including the public_hash value, and return None instead of
re-raising; also scan other file lookup methods using files_col to ensure they
follow the same pattern if the project convention is to return None on failure.

302-327: Silent error swallowing may hide database issues.

Both touch_file_record and update_file_id log errors but do not propagate them. While this might be intentional for non-critical "fire-and-forget" operations, it can hide persistent database problems. Consider whether these operations should at least have an option to raise or return a success indicator.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Thunder/utils/database.py` around lines 302 - 327, The current
touch_file_record and update_file_id methods swallow exceptions after logging,
which can hide persistent DB failures; modify these functions (touch_file_record
and update_file_id) to return a success indicator (bool) or accept an optional
parameter like raise_on_error: bool = False that, when true, re-raises the
caught exception after logging; ensure you update the calls that rely on
files_col.update_one to handle the returned bool or allow exceptions to
propagate, and keep logger.error(..., exc_info=True) but do not silently
suppress errors when raise_on_error is requested.
Thunder/bot/plugins/stream.py (1)

389-403: Consider clarifying fallback behavior when stored_msg is None.

When canonical_record exists but stored_msg is None, send_channel_links is called with target_msg=None. This works correctly because send_channel_links uses StreamBot.send_message with reply_to_message_id in that case. However, a brief comment would help future readers understand this intentional design.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Thunder/bot/plugins/stream.py` around lines 389 - 403, Add a brief inline
comment explaining the intentional fallback when canonical_record exists but
stored_msg is None: that send_channel_links is deliberately called with
target_msg=None so StreamBot.send_message will fall back to using
reply_to_message_id for threading. Update the call site around the
canonical_record / stored_msg logic (referencing variables canonical_record and
stored_msg and the function send_channel_links) to include this explanatory
comment so future readers understand why target_msg is passed as None instead of
skipping or altering the call.
Thunder/utils/render_template.py (1)

42-42: Function parameter id shadows Python builtin.

Consider renaming to message_id for clarity and to avoid shadowing the builtin id() function.

♻️ Proposed fix
-async def render_page(id: int, secure_hash: str, requested_action: str | None = None) -> str:
+async def render_page(message_id: int, secure_hash: str, requested_action: str | None = None) -> str:
     try:
         try:
-            message = await StreamBot.get_messages(chat_id=int(Var.BIN_CHANNEL), message_ids=id)
+            message = await StreamBot.get_messages(chat_id=int(Var.BIN_CHANNEL), message_ids=message_id)

(and update the error message at line 63 accordingly)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Thunder/utils/render_template.py` at line 42, Rename the parameter id in the
async function render_page to message_id to avoid shadowing the builtin id();
update its type hint and every reference inside render_page (including uses in
f-strings, comparisons, and the error/ValueError message currently referencing
id) to message_id, and adjust any callers to pass message_id (or update call
sites to use the new name) so behavior remains identical; also update the error
message text that mentioned id to use message_id for clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Thunder/utils/canonical_files.py`:
- Around line 293-297: Don't treat any exception from
_is_canonical_record_valid(existing, file_unique_id) as a reuse; instead, only
call schedule_touch_file_record and return the existing record when validation
explicitly succeeds. Change the except branch to either re-raise the exception
(to surface permanent fetch/config errors) or return a non-reuse result (e.g.,
existing, error, False) so stale canonical mappings are not kept and
reuse_count/seen_count aren't inflated; ensure the only place that
touches/reuses the record is the success path where is_valid is True and retain
references to _is_canonical_record_valid, schedule_touch_file_record, existing,
and file_unique_id to locate the code.
- Around line 303-332: The code currently calls copy_media() before reserving
canonical ownership, which can leave untracked uploads if another worker wins
the unique-key race; change the flow in the function that uses copy_media,
create_file_record, replace_file_record and _merge_replacement_record so that
the unique canonical row is reserved first (e.g., insert a tentative record or
acquire a DB lock/transaction on file_unique_id), then call copy_media() to
upload and update that reserved record with stored_message and final metadata;
on DuplicateKeyError or if get_file_by_unique_id shows another owner, ensure you
schedule_touch_file_record(existing, reused=True) and cleanup any uploaded
message in Var.BIN_CHANNEL (so the caller can remove the stray stored_message),
and only call _remember(record) after the reservation+upload+DB update succeeds.
- Around line 258-279: The file_ingest_lock context manager increments
_upload_lock_counts before awaiting lock.acquire(), which lets a CancelledError
leak counts and lock entries; fix by adding a cancellation-safe path: after
incrementing but before yielding, wrap await lock.acquire() in a try/except that
catches asyncio.CancelledError (and other exceptions if desired), and inside
that except acquire the _upload_locks_guard and decrement/remove the
_upload_lock_counts and _upload_locks for the file_unique_id as done in the
finally cleanup, then re-raise the cancellation; keep the existing finally block
unchanged to handle normal releases. Ensure you reference the same symbols
(_upload_locks, _upload_lock_counts, _upload_locks_guard, file_ingest_lock,
lock.acquire) so the cleanup logic mirrors the existing removal behavior.

In `@Thunder/utils/custom_dl.py`:
- Around line 52-54: The current comparison can wrongly treat a string/file_id
or Message as unequal to an int fallback, so change the logic around refs,
media_ref and fallback_message_id to compare IDs explicitly: if
fallback_message_id is not None, derive media_id = media_ref if
isinstance(media_ref, int) else (media_ref.message_id if isinstance(media_ref,
Message) else None) and only append fallback_message_id to refs when media_id is
None (no comparable id) or fallback_message_id != media_id; use the symbols
refs, media_ref, fallback_message_id and the Message.type check to implement
this explicit type-aware comparison.

In `@Thunder/utils/render_template.py`:
- Around line 16-22: The Jinja2 Environment is created without autoescaping
which risks XSS; update the Environment construction for template_env (the
Environment call using FileSystemLoader) to enable autoescape (either
autoescape=True or use jinja2.select_autoescape for HTML templates) so templates
rendered by render_media_page (which currently manually escapes file_name) are
automatically escaped by default; ensure the change targets the template_env
setup so existing render calls continue to work and adjust any templates that
intentionally output raw HTML to use |safe.

---

Nitpick comments:
In `@Thunder/bot/plugins/stream.py`:
- Around line 389-403: Add a brief inline comment explaining the intentional
fallback when canonical_record exists but stored_msg is None: that
send_channel_links is deliberately called with target_msg=None so
StreamBot.send_message will fall back to using reply_to_message_id for
threading. Update the call site around the canonical_record / stored_msg logic
(referencing variables canonical_record and stored_msg and the function
send_channel_links) to include this explanatory comment so future readers
understand why target_msg is passed as None instead of skipping or altering the
call.

In `@Thunder/utils/database.py`:
- Around line 254-266: Both getters (get_file_by_unique_id and get_file_by_hash)
currently handle exceptions differently; make them consistent by modifying
get_file_by_hash to mirror get_file_by_unique_id: catch exceptions, log the
error with exc_info via logger.error including the public_hash value, and return
None instead of re-raising; also scan other file lookup methods using files_col
to ensure they follow the same pattern if the project convention is to return
None on failure.
- Around line 302-327: The current touch_file_record and update_file_id methods
swallow exceptions after logging, which can hide persistent DB failures; modify
these functions (touch_file_record and update_file_id) to return a success
indicator (bool) or accept an optional parameter like raise_on_error: bool =
False that, when true, re-raises the caught exception after logging; ensure you
update the calls that rely on files_col.update_one to handle the returned bool
or allow exceptions to propagate, and keep logger.error(..., exc_info=True) but
do not silently suppress errors when raise_on_error is requested.

In `@Thunder/utils/render_template.py`:
- Line 42: Rename the parameter id in the async function render_page to
message_id to avoid shadowing the builtin id(); update its type hint and every
reference inside render_page (including uses in f-strings, comparisons, and the
error/ValueError message currently referencing id) to message_id, and adjust any
callers to pass message_id (or update call sites to use the new name) so
behavior remains identical; also update the error message text that mentioned id
to use message_id for clarity.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c4d662ab-86e0-4b4c-b713-4cb3b500961c

📥 Commits

Reviewing files that changed from the base of the PR and between 0dce97e and 2fe1440.

📒 Files selected for processing (9)
  • Thunder/__init__.py
  • Thunder/__main__.py
  • Thunder/bot/plugins/stream.py
  • Thunder/server/stream_routes.py
  • Thunder/utils/bot_utils.py
  • Thunder/utils/canonical_files.py
  • Thunder/utils/custom_dl.py
  • Thunder/utils/database.py
  • Thunder/utils/render_template.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
Thunder/server/stream_routes.py (1)

31-31: ⚠️ Potential issue | 🟡 Minor

Reject unsupported multi-range headers explicitly.

match() accepts values like bytes=0-1,2-3 and silently parses only the first span, but the response later advertises a single Content-Range. Unsupported multi-range requests should fail fast instead of being partially honored.

♻️ Proposed fix
-RANGE_REGEX = re.compile(r"bytes=(?P<start>\d*)-(?P<end>\d*)")
+RANGE_REGEX = re.compile(r"bytes=(?P<start>\d*)-(?P<end>\d*)$")
@@
-    match = RANGE_REGEX.match(range_header)
+    match = RANGE_REGEX.fullmatch(range_header)

Also applies to: 119-121

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Thunder/server/stream_routes.py` at line 31, RANGE_REGEX currently allows
multi-range headers like "bytes=0-1,2-3" because it isn't anchored; update
RANGE_REGEX (and any other occurrence) to only accept a single range by
anchoring the pattern (or switch to using fullmatch) so the entire header must
match a single "bytes=start-end". In the code paths that use RANGE_REGEX (the
RANGE_REGEX variable and the logic that reads the Range header around the later
checks at lines ~119-121), explicitly detect a comma or a non-full match and
reject the request (return the appropriate error/416) instead of silently
parsing only the first range.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Thunder/utils/canonical_files.py`:
- Around line 58-79: build_file_record currently stores raw
get_fname(stored_message) which can be None or bytes and later becomes
canonical_record["file_name"] used in gen_canonical_links; replace that by
normalizing the filename to a str and applying the same fallback/name-derivation
logic used by ByteStreamer.get_file_info_sync: if get_fname(...) returns bytes
decode to UTF-8 (with errors='replace'), if it returns None compute a sensible
fallback (e.g., use derived extension or a default like "file" plus
file_unique_id or mime-derived name) and assign that string to "file_name" in
the returned dict so gen_canonical_links always gets a proper display name.
Ensure you reference get_fname(stored_message), file_unique_id,
ByteStreamer.get_file_info_sync logic, and canonical_record["file_name"] when
making the change.

In `@Thunder/utils/custom_dl.py`:
- Around line 59-88: Limit retries/fallbacks to the "no bytes sent yet" case and
surface real backend errors: in the async streaming loop in custom_dl.py (the
generator that calls self.client.stream_media), only swallow FloodWait or try
the next ref when started_stream is False; if a FloodWait or any Exception
occurs after started_stream is True re-raise it immediately. Also, after
exhausting refs, if last_error is set re-raise that original exception instead
of wrapping it in FileNotFound; only raise FileNotFound when there was no prior
exception. This affects the exception handlers around stream_media and the final
raise at the end of the function.

In `@Thunder/utils/render_template.py`:
- Around line 52-60: get_fname() may return bytes and Var.URL's path can be lost
by urllib.parse.urljoin; first coerce file_name to a str (if
isinstance(file_name, bytes) decode with 'utf-8' and safe fallback) before
calling .replace() and urllib.parse.quote, then build the preview URL without
dropping Var.URL's path (e.g. ensure a single trailing slash on Var.URL or join
using Var.URL.rstrip('/') + '/' + f'{secure_hash}{message_id}/{quoted_filename}'
instead of relying on urljoin). Keep the existing file_unique_id check
(get_uniqid) and still call render_media_page(file_name, src, requested_action)
after these fixes.

---

Outside diff comments:
In `@Thunder/server/stream_routes.py`:
- Line 31: RANGE_REGEX currently allows multi-range headers like "bytes=0-1,2-3"
because it isn't anchored; update RANGE_REGEX (and any other occurrence) to only
accept a single range by anchoring the pattern (or switch to using fullmatch) so
the entire header must match a single "bytes=start-end". In the code paths that
use RANGE_REGEX (the RANGE_REGEX variable and the logic that reads the Range
header around the later checks at lines ~119-121), explicitly detect a comma or
a non-full match and reject the request (return the appropriate error/416)
instead of silently parsing only the first range.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2a1e19f4-ed8f-4134-8e02-5e0bcf39e715

📥 Commits

Reviewing files that changed from the base of the PR and between 2fe1440 and 89e33ed.

📒 Files selected for processing (7)
  • Thunder/__main__.py
  • Thunder/bot/plugins/stream.py
  • Thunder/server/stream_routes.py
  • Thunder/utils/canonical_files.py
  • Thunder/utils/custom_dl.py
  • Thunder/utils/database.py
  • Thunder/utils/render_template.py

Comment on lines +58 to +79
media = get_media(stored_message)
file_unique_id = get_uniqid(stored_message)
if not media or not file_unique_id:
return None

now = datetime.datetime.utcnow()
return {
"file_unique_id": file_unique_id,
"public_hash": build_public_hash(file_unique_id),
"canonical_message_id": stored_message.id,
"file_id": getattr(media, "file_id", None),
"file_name": get_fname(stored_message),
"mime_type": _infer_mime_type(media),
"file_size": getattr(media, "file_size", 0) or 0,
"media_type": type(media).__name__.lower(),
"first_source_chat_id": source_chat_id,
"first_source_message_id": source_message_id,
"created_at": now,
"last_seen_at": now,
"seen_count": 1,
"reuse_count": 0
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Persist a real display filename in the canonical record.

build_file_record() stores raw get_fname(stored_message), but the new canonical path later feeds canonical_record["file_name"] into gen_canonical_links(). For media without a Telegram filename, users will get "None" in link text; byte filenames become b'...'. Normalize to str here and apply the same fallback-name logic already used by ByteStreamer.get_file_info_sync() before saving.

♻️ Proposed fix
 def build_file_record(
     stored_message: Message,
@@
 ) -> Optional[Dict[str, Any]]:
     media = get_media(stored_message)
     file_unique_id = get_uniqid(stored_message)
     if not media or not file_unique_id:
         return None
 
+    media_type = type(media).__name__.lower()
+    file_name = get_fname(stored_message)
+    if isinstance(file_name, bytes):
+        file_name = file_name.decode("utf-8", errors="replace")
+    if not file_name:
+        ext_map = {
+            "photo": "jpg",
+            "audio": "mp3",
+            "voice": "ogg",
+            "video": "mp4",
+            "animation": "mp4",
+            "videonote": "mp4",
+            "sticker": "webp",
+        }
+        file_name = f"Thunder_{stored_message.id}.{ext_map.get(media_type, 'bin')}"
+
     now = datetime.datetime.utcnow()
     return {
         "file_unique_id": file_unique_id,
         "public_hash": build_public_hash(file_unique_id),
         "canonical_message_id": stored_message.id,
         "file_id": getattr(media, "file_id", None),
-        "file_name": get_fname(stored_message),
+        "file_name": file_name,
         "mime_type": _infer_mime_type(media),
         "file_size": getattr(media, "file_size", 0) or 0,
-        "media_type": type(media).__name__.lower(),
+        "media_type": media_type,
         "first_source_chat_id": source_chat_id,
         "first_source_message_id": source_message_id,
         "created_at": now,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Thunder/utils/canonical_files.py` around lines 58 - 79, build_file_record
currently stores raw get_fname(stored_message) which can be None or bytes and
later becomes canonical_record["file_name"] used in gen_canonical_links; replace
that by normalizing the filename to a str and applying the same
fallback/name-derivation logic used by ByteStreamer.get_file_info_sync: if
get_fname(...) returns bytes decode to UTF-8 (with errors='replace'), if it
returns None compute a sensible fallback (e.g., use derived extension or a
default like "file" plus file_unique_id or mime-derived name) and assign that
string to "file_name" in the returned dict so gen_canonical_links always gets a
proper display name. Ensure you reference get_fname(stored_message),
file_unique_id, ByteStreamer.get_file_info_sync logic, and
canonical_record["file_name"] when making the change.

Comment on lines +59 to +88
last_error: Exception | None = None
for ref in refs:
started_stream = False
while True:
try:
target = await self.get_message(ref) if isinstance(ref, int) else ref
if (
on_fallback_message is not None and
fallback_message_id is not None and
ref == fallback_message_id and
isinstance(target, Message)
):
await on_fallback_message(target)
async for chunk in self.client.stream_media(
target, offset=chunk_offset, limit=chunk_limit
):
started_stream = True
yield chunk
return
except FloodWait as e:
logger.debug(f"FloodWait: stream_file, sleep {e.value}s")
await asyncio.sleep(e.value)
except Exception as e:
last_error = e
logger.debug(f"Error streaming media ref {ref}: {e}", exc_info=True)
if started_stream:
raise
break

raise FileNotFound(f"Unable to stream file: {last_error}")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Only retry or fall back before any bytes have been sent.

Once Line 75 has already yielded a chunk, the FloodWait branch restarts stream_media() from the original chunk_offset, which duplicates bytes in the HTTP response. And if every pre-stream attempt fails, wrapping the last arbitrary exception in FileNotFound makes stream_routes.py return 404 for backend errors. Retries/fallbacks should be limited to the “no bytes sent yet” case, and the final non-FileNotFound error should be re-raised.

♻️ Proposed fix
         last_error: Exception | None = None
         for ref in refs:
             started_stream = False
             while True:
                 try:
@@
                     async for chunk in self.client.stream_media(
                         target, offset=chunk_offset, limit=chunk_limit
                     ):
                         started_stream = True
                         yield chunk
                     return
                 except FloodWait as e:
+                    if started_stream:
+                        raise
                     logger.debug(f"FloodWait: stream_file, sleep {e.value}s")
                     await asyncio.sleep(e.value)
                 except Exception as e:
                     last_error = e
                     logger.debug(f"Error streaming media ref {ref}: {e}", exc_info=True)
                     if started_stream:
                         raise
                     break
 
-        raise FileNotFound(f"Unable to stream file: {last_error}")
+        if isinstance(last_error, FileNotFound):
+            raise last_error
+        if last_error is not None:
+            raise last_error
+        raise FileNotFound("Unable to stream file")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Thunder/utils/custom_dl.py` around lines 59 - 88, Limit retries/fallbacks to
the "no bytes sent yet" case and surface real backend errors: in the async
streaming loop in custom_dl.py (the generator that calls
self.client.stream_media), only swallow FloodWait or try the next ref when
started_stream is False; if a FloodWait or any Exception occurs after
started_stream is True re-raise it immediately. Also, after exhausting refs, if
last_error is set re-raise that original exception instead of wrapping it in
FileNotFound; only raise FileNotFound when there was no prior exception. This
affects the exception handlers around stream_media and the final raise at the
end of the function.

Comment on lines +52 to +60
file_unique_id = get_uniqid(message)
file_name = get_fname(message)

if not file_unique_id or file_unique_id[:6] != secure_hash:
raise InvalidHash("File unique ID or secure hash mismatch during rendering.")

quoted_filename = urllib.parse.quote(file_name.replace('/', '_'))
src = urllib.parse.urljoin(Var.URL, f'{secure_hash}{id}/{quoted_filename}')
safe_filename = html_module.escape(file_name)
if requested_action == 'stream':
template = template_env.get_template('req.html')
context = {
'heading': f"View {safe_filename}",
'file_name': safe_filename,
'src': f"{src}?disposition=inline"
}
else:
template = template_env.get_template('dl.html')
context = {
'file_name': safe_filename,
'src': src
}
return await template.render_async(**context)
except Exception as e:
logger.error(f"Error in render_page for ID {id} and hash {secure_hash}: {e}", exc_info=True)
raise

quoted_filename = urllib.parse.quote(file_name.replace('/', '_'), safe="")
src = urllib.parse.urljoin(Var.URL, f'{secure_hash}{message_id}/{quoted_filename}')
return await render_media_page(file_name, src, requested_action)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
python - <<'PY'
from urllib.parse import urljoin

for base in ("https://example.com/base", "https://example.com/base/"):
    print(base, "=>", urljoin(base, "abc123/file.txt"))
PY

Repository: fyaz05/FileToLink

Length of output: 194


🏁 Script executed:

# First, let's understand the repository structure
git ls-files | head -20

Repository: fyaz05/FileToLink

Length of output: 505


🏁 Script executed:

# Find the render_template.py file and examine it
fd render_template.py

Repository: fyaz05/FileToLink

Length of output: 93


🏁 Script executed:

# Find and examine the get_fname function
rg "def get_fname" -A 10

Repository: fyaz05/FileToLink

Length of output: 742


🏁 Script executed:

# Check bot_utils.py around lines 133-136
fd bot_utils.py

Repository: fyaz05/FileToLink

Length of output: 87


🏁 Script executed:

# Search for get_fname usage patterns
rg "get_fname" -B 2 -A 2

Repository: fyaz05/FileToLink

Length of output: 3450


🏁 Script executed:

# Get the complete get_fname implementation
cat -n Thunder/utils/file_properties.py | head -40

Repository: fyaz05/FileToLink

Length of output: 1468


🏁 Script executed:

# Get the full context around lines 133-136 in bot_utils.py
cat -n Thunder/utils/bot_utils.py | sed -n '120,145p'

Repository: fyaz05/FileToLink

Length of output: 1484


🏁 Script executed:

# Check the complete render_template.py file to understand context
wc -l Thunder/utils/render_template.py

Repository: fyaz05/FileToLink

Length of output: 96


🏁 Script executed:

# Read the complete render_template.py
cat -n Thunder/utils/render_template.py

Repository: fyaz05/FileToLink

Length of output: 2897


🏁 Script executed:

# Get the complete get_fname function
cat -n Thunder/utils/file_properties.py | sed -n '29,50p'

Repository: fyaz05/FileToLink

Length of output: 807


🏁 Script executed:

# Check if get_fname is fully shown
cat -n Thunder/utils/file_properties.py | grep -A 20 "def get_fname"

Repository: fyaz05/FileToLink

Length of output: 896


🏁 Script executed:

# Get the rest of get_fname function
cat -n Thunder/utils/file_properties.py | sed -n '49,80p'

Repository: fyaz05/FileToLink

Length of output: 1242


Handle potential bytes return from get_fname() and fix URL construction with urljoin().

The function annotation says get_fname() returns str, but defensive handling in bot_utils.py:136 checks for bytes—suggesting it can return bytes in practice. The current code calls .replace() directly without checking, which will fail if bytes are returned. Additionally, urllib.parse.urljoin(Var.URL, ...) drops any path prefix from Var.URL unless the base ends with /, breaking preview links behind a subpath.

♻️ Proposed fix
-        file_unique_id = get_uniqid(message)
-        file_name = get_fname(message)
+        file_unique_id = get_uniqid(message)
+        file_name_raw = get_fname(message)
+        if isinstance(file_name_raw, bytes):
+            file_name = file_name_raw.decode("utf-8", errors="replace")
+        else:
+            file_name = str(file_name_raw) if file_name_raw else f"file_{message_id}"
@@
-        quoted_filename = urllib.parse.quote(file_name.replace('/', '_'), safe="")
-        src = urllib.parse.urljoin(Var.URL, f'{secure_hash}{message_id}/{quoted_filename}')
+        quoted_filename = urllib.parse.quote(file_name.replace('/', '_'), safe="")
+        src = f"{Var.URL.rstrip('/')}/{secure_hash}{message_id}/{quoted_filename}"
         return await render_media_page(file_name, src, requested_action)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Thunder/utils/render_template.py` around lines 52 - 60, get_fname() may
return bytes and Var.URL's path can be lost by urllib.parse.urljoin; first
coerce file_name to a str (if isinstance(file_name, bytes) decode with 'utf-8'
and safe fallback) before calling .replace() and urllib.parse.quote, then build
the preview URL without dropping Var.URL's path (e.g. ensure a single trailing
slash on Var.URL or join using Var.URL.rstrip('/') + '/' +
f'{secure_hash}{message_id}/{quoted_filename}' instead of relying on urljoin).
Keep the existing file_unique_id check (get_uniqid) and still call
render_media_page(file_name, src, requested_action) after these fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant