Skip to content

add rss feed scraping#7

Open
Oliviayuuuu317 wants to merge 1 commit intomainfrom
olivia/rss_feed
Open

add rss feed scraping#7
Oliviayuuuu317 wants to merge 1 commit intomainfrom
olivia/rss_feed

Conversation

@Oliviayuuuu317
Copy link
Copy Markdown

@Oliviayuuuu317 Oliviayuuuu317 commented Apr 22, 2026

Overview

Added rss_feed.go to scrape recent concert information from Chimes' rss feed (https://apps.chimes.cornell.edu/music/rss.xml)

Changes Made

  • created models for concert days, time slots, and songs
  • added rss feed scraping and parsing into structured models

Test Coverage

  • ran the main method and printed the scraped results

Summary by CodeRabbit

  • New Features
    • Added RSS feed integration to fetch and parse music concert data
    • Songs are organized into morning, afternoon, and evening time slots
    • Concert information is formatted with title, artist, and source details for easy reference

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

📝 Walkthrough

Walkthrough

A new Go program fetches an RSS feed from Cornell's music API, parses XML items into time-slot grouped structures (Morning, Afternoon, Evening), and extracts song metadata including title, artist, and source from HTML descriptions.

Changes

Cohort / File(s) Summary
RSS Feed Parser
models/rss_feed.go
New main program that fetches and parses RSS XML feed, converts HTML descriptions into time-slot groups, extracts song details (title, artist, source), and outputs formatted concert information with HTML stripping and entity normalization.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A feast of songs from a distant feed,
Parsed line by line with careful speed,
Morning, afternoon, and evening's call,
Organized melodies, grouped for all!
The rabbit hops through XML's delight, 🎵

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'add rss feed scraping' directly and clearly summarizes the main change: adding RSS feed scraping functionality to the codebase.
Description check ✅ Passed The description covers the required sections (Overview, Changes Made, Test Coverage) from the template and provides sufficient detail about the RSS feed scraping implementation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch olivia/rss_feed

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
models/rss_feed.go (2)

104-138: No test coverage.

The PR description notes testing was done by running main and eyeballing output. Since parseSong and parseDescription are pure functions operating on strings, they're easy to unit-test with a small fixture of the feed's description XML. This will lock down the (from "..."), " / ", and <br> handling against future RSS format drift.

Want me to draft rss_feed_test.go with table-driven tests for parseSong and parseDescription?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@models/rss_feed.go` around lines 104 - 138, The PR lacks unit tests for the
pure string-parsing functions; add a new table-driven test file
(rss_feed_test.go) that exercises parseSong and parseDescription with
representative description HTML snippets from the feed (covering "(from
\"...\")" source parsing, " / " artist separators, <br> line breaks, and edge
cases like missing fields), asserting expected Song and TimeSlot outputs; name
tests clearly, include fixtures for multiple item descriptions, and use subtests
to make failures actionable so future RSS format drift is caught by CI.

72-102: Shadowed current and fragile <br> splitting.

Two readability/robustness concerns:

  1. The outer var current *TimeSlot (line 74) is immediately shadowed by the loop variable current (line 77) used as a string. It happens to work because the pointer is reassigned on line 96 before any read, but reusing the name makes this code confusing and brittle to future edits. Rename one of them (e.g., chunk for the loop variable).
  2. strings.Split(desc, "<br>") won't match <br/>, <br />, or <BR>, which RSS descriptions frequently contain. A single regex split handles all variants.
🔧 Suggested fix
-var tagPattern = regexp.MustCompile(`<[^>]+>`)
+var tagPattern = regexp.MustCompile(`<[^>]+>`)
+var brPattern  = regexp.MustCompile(`(?i)<br\s*/?>`)
@@
-	var lines []string
-	for _, current := range strings.Split(desc, "<br>") {
-		for _, line := range strings.Split(current, "\n") {
-			lines = append(lines, line)
-		}
-	}
+	var lines []string
+	for _, chunk := range brPattern.Split(desc, -1) {
+		lines = append(lines, strings.Split(chunk, "\n")...)
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@models/rss_feed.go` around lines 72 - 102, In parseDescription, avoid
shadowing the outer pointer variable current and make <br> splitting robust:
replace the loop "for _, current := range strings.Split(desc, \"<br>\")" with a
loop over chunks (e.g., "for _, chunk := range ...") and split desc using a
case-insensitive regex that matches "<br>", "<br/>", "<br />" variants (e.g.,
regexp.MustCompile(`(?i)<br\s*/?>`).Split(desc, -1)), then iterate lines by
splitting each chunk on "\n"; ensure you update references to use the new chunk
variable name and keep assigning the pointer current (TimeSlot pointer) only
when creating a header slot.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@models/rss_feed.go`:
- Around line 52-69: parseSong currently only applies originPattern to the
title, so lines like `Title / Artist (from "X")` leave Song.Source empty and the
origin stuck in Song.Artist; update parseSong to check originPattern against
both title and artist: first attempt to extract source from title (using
originPattern.FindStringSubmatch and ReplaceAllString), and if not found, try
the same extraction on artist, removing the matched text from artist when found;
ensure you TrimSpace both title and artist before assigning Song.Title and
Song.Artist and prefer the source found in title over one found in artist.
- Around line 104-118: The current main function uses http.Get (which uses
http.DefaultClient with no timeout), panics on errors, and feeds non-2xx
responses straight to xml.Unmarshal for the RSS type; change this to use an
http.Client with a reasonable Timeout (e.g., 10s), replace http.Get with
client.Do(req), check resp.StatusCode and return a descriptive error for any
non-2xx status before reading the body, read and unmarshal into the RSS struct
but propagate errors (return error) instead of calling panic so callers can
handle retries/logging; update the function signature (e.g., fetchRSS or
NewRSSFromURL) to return (*RSS, error) and reference the main function, http.Get
usage, resp.StatusCode check, xml.Unmarshal, and RSS type when making these
changes.
- Around line 1-10: The file currently declares "package main" which conflicts
with other files in the models directory; change the package declaration to
"package models" and refactor by moving only library code (type definitions and
helper functions) into a new models file (e.g., rss.go) under package models,
and create a separate command entrypoint (cmd/rss_scraper/main.go) with "package
main" that imports models and implements main() to fetch the feed and call the
library functions; ensure exported types/functions keep capitalized names used
by the new main and update any imports accordingly (look for the package
declaration and functions that parse/fetch RSS to split between models and the
cmd main).

---

Nitpick comments:
In `@models/rss_feed.go`:
- Around line 104-138: The PR lacks unit tests for the pure string-parsing
functions; add a new table-driven test file (rss_feed_test.go) that exercises
parseSong and parseDescription with representative description HTML snippets
from the feed (covering "(from \"...\")" source parsing, " / " artist
separators, <br> line breaks, and edge cases like missing fields), asserting
expected Song and TimeSlot outputs; name tests clearly, include fixtures for
multiple item descriptions, and use subtests to make failures actionable so
future RSS format drift is caught by CI.
- Around line 72-102: In parseDescription, avoid shadowing the outer pointer
variable current and make <br> splitting robust: replace the loop "for _,
current := range strings.Split(desc, \"<br>\")" with a loop over chunks (e.g.,
"for _, chunk := range ...") and split desc using a case-insensitive regex that
matches "<br>", "<br/>", "<br />" variants (e.g.,
regexp.MustCompile(`(?i)<br\s*/?>`).Split(desc, -1)), then iterate lines by
splitting each chunk on "\n"; ensure you update references to use the new chunk
variable name and keep assigning the pointer current (TimeSlot pointer) only
when creating a header slot.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: eb3482ee-4b12-43b5-abae-f65f4c39deb7

📥 Commits

Reviewing files that changed from the base of the PR and between e92d7a3 and fc7ef51.

📒 Files selected for processing (1)
  • models/rss_feed.go

Comment thread models/rss_feed.go
Comment on lines +1 to +10
package main

import (
"encoding/xml"
"fmt"
"io"
"net/http"
"regexp"
"strings"
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm whether other files in models/ declare a different package.
fd . models -t f -e go -x head -n 2 {}

Repository: cuappdev/chimes-backend

Length of output: 160


Fix package declaration: package main in models/ directory will not compile.

This file declares package main while sibling files in models/ declare package models. Go requires all files in the same directory to share the same package, so this will cause a compilation error.

Refactor by splitting into:

  • models/rss.go — type definitions and helper functions as package models (reusable by handlers and tests).
  • cmd/rss_scraper/main.go — the main entry point that fetches the feed and invokes the library.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@models/rss_feed.go` around lines 1 - 10, The file currently declares "package
main" which conflicts with other files in the models directory; change the
package declaration to "package models" and refactor by moving only library code
(type definitions and helper functions) into a new models file (e.g., rss.go)
under package models, and create a separate command entrypoint
(cmd/rss_scraper/main.go) with "package main" that imports models and implements
main() to fetch the feed and call the library functions; ensure exported
types/functions keep capitalized names used by the new main and update any
imports accordingly (look for the package declaration and functions that
parse/fetch RSS to split between models and the cmd main).

Comment thread models/rss_feed.go
Comment on lines +52 to +69
func parseSong(line string) Song {
song := Song{}

title, artist, found := strings.Cut(line, " / ")
if found {
song.Artist = strings.TrimSpace(artist)
}

match := originPattern.FindStringSubmatch(title)

if match != nil {
song.Source = match[1]
title = strings.TrimSpace(originPattern.ReplaceAllString(title, ""))
}
song.Title = title
return song

}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

parseSong can miss the Source when it lives in the artist segment.

strings.Cut(line, " / ") splits off the artist, but originPattern is then applied only to title. If the RSS line is shaped like Song Title / Artist Name (from "Movie") — which is common in the Chimes feed — the (from "...") ends up in artist and will be preserved verbatim in Song.Artist while Song.Source stays empty.

🔧 Suggested fix
 func parseSong(line string) Song {
 	song := Song{}
 
+	// Extract source first so it can't be split across title/artist.
+	if match := originPattern.FindStringSubmatch(line); match != nil {
+		song.Source = match[1]
+		line = strings.TrimSpace(originPattern.ReplaceAllString(line, ""))
+	}
+
 	title, artist, found := strings.Cut(line, " / ")
 	if found {
 		song.Artist = strings.TrimSpace(artist)
 	}
-
-	match := originPattern.FindStringSubmatch(title)
-
-	if match != nil {
-		song.Source = match[1]
-		title = strings.TrimSpace(originPattern.ReplaceAllString(title, ""))
-	}
-	song.Title = title
+	song.Title = strings.TrimSpace(title)
 	return song
-
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func parseSong(line string) Song {
song := Song{}
title, artist, found := strings.Cut(line, " / ")
if found {
song.Artist = strings.TrimSpace(artist)
}
match := originPattern.FindStringSubmatch(title)
if match != nil {
song.Source = match[1]
title = strings.TrimSpace(originPattern.ReplaceAllString(title, ""))
}
song.Title = title
return song
}
func parseSong(line string) Song {
song := Song{}
// Extract source first so it can't be split across title/artist.
if match := originPattern.FindStringSubmatch(line); match != nil {
song.Source = match[1]
line = strings.TrimSpace(originPattern.ReplaceAllString(line, ""))
}
title, artist, found := strings.Cut(line, " / ")
if found {
song.Artist = strings.TrimSpace(artist)
}
song.Title = strings.TrimSpace(title)
return song
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@models/rss_feed.go` around lines 52 - 69, parseSong currently only applies
originPattern to the title, so lines like `Title / Artist (from "X")` leave
Song.Source empty and the origin stuck in Song.Artist; update parseSong to check
originPattern against both title and artist: first attempt to extract source
from title (using originPattern.FindStringSubmatch and ReplaceAllString), and if
not found, try the same extraction on artist, removing the matched text from
artist when found; ensure you TrimSpace both title and artist before assigning
Song.Title and Song.Artist and prefer the source found in title over one found
in artist.

Comment thread models/rss_feed.go
Comment on lines +104 to +118
func main() {
resp, err := http.Get("https://apps.chimes.cornell.edu/music/rss.xml")
if err != nil {
panic(err)
}
defer resp.Body.Close()
data, err := io.ReadAll(resp.Body)
if err != nil {
panic(err)
}

var rss RSS
if err := xml.Unmarshal(data, &rss); err != nil {
panic(err)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Set an HTTP timeout and check the response status.

http.Get uses http.DefaultClient, which has no timeout — a slow/hung upstream will block this goroutine indefinitely. Also, a non-2xx response (e.g., 500, 404 HTML error page) will flow straight into xml.Unmarshal and panic with a confusing parse error. Finally, panicking on every transport/parse error is too aggressive for a library-shaped function; return errors so callers (including future HTTP handlers/cron jobs) can retry, log, or degrade gracefully.

🔧 Suggested fix
-func main() {
-	resp, err := http.Get("https://apps.chimes.cornell.edu/music/rss.xml")
-	if err != nil {
-		panic(err)
-	}
-	defer resp.Body.Close()
-	data, err := io.ReadAll(resp.Body)
-	if err != nil {
-		panic(err)
-	}
-
-	var rss RSS
-	if err := xml.Unmarshal(data, &rss); err != nil {
-		panic(err)
-	}
+func FetchConcerts(ctx context.Context, url string) ([]ConcertDay, error) {
+	client := &http.Client{Timeout: 15 * time.Second}
+	req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
+	if err != nil {
+		return nil, err
+	}
+	resp, err := client.Do(req)
+	if err != nil {
+		return nil, fmt.Errorf("fetch rss: %w", err)
+	}
+	defer resp.Body.Close()
+	if resp.StatusCode/100 != 2 {
+		return nil, fmt.Errorf("rss feed returned status %d", resp.StatusCode)
+	}
+	data, err := io.ReadAll(resp.Body)
+	if err != nil {
+		return nil, fmt.Errorf("read rss body: %w", err)
+	}
+	var rss RSS
+	if err := xml.Unmarshal(data, &rss); err != nil {
+		return nil, fmt.Errorf("parse rss xml: %w", err)
+	}
+	// ... build and return []ConcertDay
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@models/rss_feed.go` around lines 104 - 118, The current main function uses
http.Get (which uses http.DefaultClient with no timeout), panics on errors, and
feeds non-2xx responses straight to xml.Unmarshal for the RSS type; change this
to use an http.Client with a reasonable Timeout (e.g., 10s), replace http.Get
with client.Do(req), check resp.StatusCode and return a descriptive error for
any non-2xx status before reading the body, read and unmarshal into the RSS
struct but propagate errors (return error) instead of calling panic so callers
can handle retries/logging; update the function signature (e.g., fetchRSS or
NewRSSFromURL) to return (*RSS, error) and reference the main function, http.Get
usage, resp.StatusCode check, xml.Unmarshal, and RSS type when making these
changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant