add rss feed scraping by Oliviayuuuu317 · Pull Request #7 · cuappdev/chimes-backend

Oliviayuuuu317 · 2026-04-22T22:31:05Z

Overview

Added rss_feed.go to scrape recent concert information from Chimes' rss feed (https://apps.chimes.cornell.edu/music/rss.xml)

Changes Made

created models for concert days, time slots, and songs
added rss feed scraping and parsing into structured models

Test Coverage

ran the main method and printed the scraped results

Summary by CodeRabbit

New Features
- Added RSS feed integration to fetch and parse music concert data
- Songs are organized into morning, afternoon, and evening time slots
- Concert information is formatted with title, artist, and source details for easy reference

coderabbitai · 2026-04-22T22:31:18Z

📝 Walkthrough

Walkthrough

A new Go program fetches an RSS feed from Cornell's music API, parses XML items into time-slot grouped structures (Morning, Afternoon, Evening), and extracts song metadata including title, artist, and source from HTML descriptions.

Changes

Cohort / File(s)	Summary
RSS Feed Parser `models/rss_feed.go`	New main program that fetches and parses RSS XML feed, converts HTML descriptions into time-slot groups, extracts song details (title, artist, source), and outputs formatted concert information with HTML stripping and entity normalization.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A feast of songs from a distant feed,
Parsed line by line with careful speed,
Morning, afternoon, and evening's call,
Organized melodies, grouped for all!
The rabbit hops through XML's delight, 🎵

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'add rss feed scraping' directly and clearly summarizes the main change: adding RSS feed scraping functionality to the codebase.
Description check	✅ Passed	The description covers the required sections (Overview, Changes Made, Test Coverage) from the template and provides sufficient detail about the RSS feed scraping implementation.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch olivia/rss_feed

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

models/rss_feed.go (2)
104-138: No test coverage.

The PR description notes testing was done by running main and eyeballing output. Since parseSong and parseDescription are pure functions operating on strings, they're easy to unit-test with a small fixture of the feed's description XML. This will lock down the (from "..."), " / ", and   handling against future RSS format drift.

Want me to draft rss_feed_test.go with table-driven tests for parseSong and parseDescription?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@models/rss_feed.go` around lines 104 - 138, The PR lacks unit tests for the
pure string-parsing functions; add a new table-driven test file
(rss_feed_test.go) that exercises parseSong and parseDescription with
representative description HTML snippets from the feed (covering "(from
\"...\")" source parsing, " / " artist separators, line breaks, and edge
cases like missing fields), asserting expected Song and TimeSlot outputs; name
tests clearly, include fixtures for multiple item descriptions, and use subtests
to make failures actionable so future RSS format drift is caught by CI.
72-102: Shadowed current and fragile   splitting.

Two readability/robustness concerns:

The outer var current *TimeSlot (line 74) is immediately shadowed by the loop variable current (line 77) used as a string. It happens to work because the pointer is reassigned on line 96 before any read, but reusing the name makes this code confusing and brittle to future edits. Rename one of them (e.g., chunk for the loop variable).

strings.Split(desc, " ") won't match  ,  , or  , which RSS descriptions frequently contain. A single regex split handles all variants.
🔧 Suggested fix
-var tagPattern = regexp.MustCompile(`<[^>]+>`)
+var tagPattern = regexp.MustCompile(`<[^>]+>`)
+var brPattern = regexp.MustCompile(`(?i)<br\s*/?>`)
@@
-	var lines []string
-	for _, current := range strings.Split(desc, " ") {
-		for _, line := range strings.Split(current, "\n") {
-			lines = append(lines, line)
-		}
-	}
+	var lines []string
+	for _, chunk := range brPattern.Split(desc, -1) {
+		lines = append(lines, strings.Split(chunk, "\n")...)
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@models/rss_feed.go` around lines 72 - 102, In parseDescription, avoid
shadowing the outer pointer variable current and make splitting robust:
replace the loop "for _, current := range strings.Split(desc, \" \")" with a
loop over chunks (e.g., "for _, chunk := range ...") and split desc using a
case-insensitive regex that matches " ", " ", " " variants (e.g.,
regexp.MustCompile(`(?i)<br\s*/?>`).Split(desc, -1)), then iterate lines by
splitting each chunk on "\n"; ensure you update references to use the new chunk
variable name and keep assigning the pointer current (TimeSlot pointer) only
when creating a header slot.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@models/rss_feed.go`:
- Around line 52-69: parseSong currently only applies originPattern to the
title, so lines like `Title / Artist (from "X")` leave Song.Source empty and the
origin stuck in Song.Artist; update parseSong to check originPattern against
both title and artist: first attempt to extract source from title (using
originPattern.FindStringSubmatch and ReplaceAllString), and if not found, try
the same extraction on artist, removing the matched text from artist when found;
ensure you TrimSpace both title and artist before assigning Song.Title and
Song.Artist and prefer the source found in title over one found in artist.
- Around line 104-118: The current main function uses http.Get (which uses
http.DefaultClient with no timeout), panics on errors, and feeds non-2xx
responses straight to xml.Unmarshal for the RSS type; change this to use an
http.Client with a reasonable Timeout (e.g., 10s), replace http.Get with
client.Do(req), check resp.StatusCode and return a descriptive error for any
non-2xx status before reading the body, read and unmarshal into the RSS struct
but propagate errors (return error) instead of calling panic so callers can
handle retries/logging; update the function signature (e.g., fetchRSS or
NewRSSFromURL) to return (*RSS, error) and reference the main function, http.Get
usage, resp.StatusCode check, xml.Unmarshal, and RSS type when making these
changes.
- Around line 1-10: The file currently declares "package main" which conflicts
with other files in the models directory; change the package declaration to
"package models" and refactor by moving only library code (type definitions and
helper functions) into a new models file (e.g., rss.go) under package models,
and create a separate command entrypoint (cmd/rss_scraper/main.go) with "package
main" that imports models and implements main() to fetch the feed and call the
library functions; ensure exported types/functions keep capitalized names used
by the new main and update any imports accordingly (look for the package
declaration and functions that parse/fetch RSS to split between models and the
cmd main).

---

Nitpick comments:
In `@models/rss_feed.go`:
- Around line 104-138: The PR lacks unit tests for the pure string-parsing
functions; add a new table-driven test file (rss_feed_test.go) that exercises
parseSong and parseDescription with representative description HTML snippets
from the feed (covering "(from \"...\")" source parsing, " / " artist
separators, <br> line breaks, and edge cases like missing fields), asserting
expected Song and TimeSlot outputs; name tests clearly, include fixtures for
multiple item descriptions, and use subtests to make failures actionable so
future RSS format drift is caught by CI.
- Around line 72-102: In parseDescription, avoid shadowing the outer pointer
variable current and make <br> splitting robust: replace the loop "for _,
current := range strings.Split(desc, \"<br>\")" with a loop over chunks (e.g.,
"for _, chunk := range ...") and split desc using a case-insensitive regex that
matches "<br>", "<br/>", "<br />" variants (e.g.,
regexp.MustCompile(`(?i)<br\s*/?>`).Split(desc, -1)), then iterate lines by
splitting each chunk on "\n"; ensure you update references to use the new chunk
variable name and keep assigning the pointer current (TimeSlot pointer) only
when creating a header slot.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: eb3482ee-4b12-43b5-abae-f65f4c39deb7

📥 Commits

Reviewing files that changed from the base of the PR and between e92d7a3 and fc7ef51.

📒 Files selected for processing (1)

models/rss_feed.go

coderabbitai · 2026-04-22T22:34:50Z

+package main
+
+import (
+	"encoding/xml"
+	"fmt"
+	"io"
+	"net/http"
+	"regexp"
+	"strings"
+)


🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Confirm whether other files in models/ declare a different package. fd . models -t f -e go -x head -n 2 {}

Repository: cuappdev/chimes-backend

Length of output: 160

Fix package declaration: package main in models/ directory will not compile.

This file declares package main while sibling files in models/ declare package models. Go requires all files in the same directory to share the same package, so this will cause a compilation error.

Refactor by splitting into:

models/rss.go — type definitions and helper functions as package models (reusable by handlers and tests).

cmd/rss_scraper/main.go — the main entry point that fetches the feed and invokes the library.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@models/rss_feed.go` around lines 1 - 10, The file currently declares "package main" which conflicts with other files in the models directory; change the package declaration to "package models" and refactor by moving only library code (type definitions and helper functions) into a new models file (e.g., rss.go) under package models, and create a separate command entrypoint (cmd/rss_scraper/main.go) with "package main" that imports models and implements main() to fetch the feed and call the library functions; ensure exported types/functions keep capitalized names used by the new main and update any imports accordingly (look for the package declaration and functions that parse/fetch RSS to split between models and the cmd main).

coderabbitai · 2026-04-22T22:34:50Z

+func parseSong(line string) Song {
+	song := Song{}
+
+	title, artist, found := strings.Cut(line, " / ")
+	if found {
+		song.Artist = strings.TrimSpace(artist)
+	}
+
+	match := originPattern.FindStringSubmatch(title)
+
+	if match != nil {
+		song.Source = match[1]
+		title = strings.TrimSpace(originPattern.ReplaceAllString(title, ""))
+	}
+	song.Title = title
+	return song
+
+}


⚠️ Potential issue | 🟡 Minor

parseSong can miss the Source when it lives in the artist segment.

strings.Cut(line, " / ") splits off the artist, but originPattern is then applied only to title. If the RSS line is shaped like Song Title / Artist Name (from "Movie") — which is common in the Chimes feed — the (from "...") ends up in artist and will be preserved verbatim in Song.Artist while Song.Source stays empty.

🔧 Suggested fix

func parseSong(line string) Song { song := Song{} + // Extract source first so it can't be split across title/artist. + if match := originPattern.FindStringSubmatch(line); match != nil { + song.Source = match[1] + line = strings.TrimSpace(originPattern.ReplaceAllString(line, "")) + } + title, artist, found := strings.Cut(line, " / ") if found { song.Artist = strings.TrimSpace(artist) } - - match := originPattern.FindStringSubmatch(title) - - if match != nil { - song.Source = match[1] - title = strings.TrimSpace(originPattern.ReplaceAllString(title, "")) - } - song.Title = title + song.Title = strings.TrimSpace(title) return song - }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func parseSong(line string) Song {

song := Song{}

title, artist, found := strings.Cut(line, " / ")

if found {

song.Artist = strings.TrimSpace(artist)

}

match := originPattern.FindStringSubmatch(title)

if match != nil {

song.Source = match[1]

title = strings.TrimSpace(originPattern.ReplaceAllString(title, ""))

}

song.Title = title

return song

}

func parseSong(line string) Song {

song := Song{}

// Extract source first so it can't be split across title/artist.

if match := originPattern.FindStringSubmatch(line); match != nil {

song.Source = match[1]

line = strings.TrimSpace(originPattern.ReplaceAllString(line, ""))

}

title, artist, found := strings.Cut(line, " / ")

if found {

song.Artist = strings.TrimSpace(artist)

}

song.Title = strings.TrimSpace(title)

return song

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@models/rss_feed.go` around lines 52 - 69, parseSong currently only applies originPattern to the title, so lines like `Title / Artist (from "X")` leave Song.Source empty and the origin stuck in Song.Artist; update parseSong to check originPattern against both title and artist: first attempt to extract source from title (using originPattern.FindStringSubmatch and ReplaceAllString), and if not found, try the same extraction on artist, removing the matched text from artist when found; ensure you TrimSpace both title and artist before assigning Song.Title and Song.Artist and prefer the source found in title over one found in artist.

coderabbitai · 2026-04-22T22:34:50Z

+func main() {
+	resp, err := http.Get("https://apps.chimes.cornell.edu/music/rss.xml")
+	if err != nil {
+		panic(err)
+	}
+	defer resp.Body.Close()
+	data, err := io.ReadAll(resp.Body)
+	if err != nil {
+		panic(err)
+	}
+
+	var rss RSS
+	if err := xml.Unmarshal(data, &rss); err != nil {
+		panic(err)
+	}


⚠️ Potential issue | 🟠 Major

Set an HTTP timeout and check the response status.

http.Get uses http.DefaultClient, which has no timeout — a slow/hung upstream will block this goroutine indefinitely. Also, a non-2xx response (e.g., 500, 404 HTML error page) will flow straight into xml.Unmarshal and panic with a confusing parse error. Finally, panicking on every transport/parse error is too aggressive for a library-shaped function; return errors so callers (including future HTTP handlers/cron jobs) can retry, log, or degrade gracefully.

🔧 Suggested fix

-func main() { - resp, err := http.Get("https://apps.chimes.cornell.edu/music/rss.xml") - if err != nil { - panic(err) - } - defer resp.Body.Close() - data, err := io.ReadAll(resp.Body) - if err != nil { - panic(err) - } - - var rss RSS - if err := xml.Unmarshal(data, &rss); err != nil { - panic(err) - } +func FetchConcerts(ctx context.Context, url string) ([]ConcertDay, error) { + client := &http.Client{Timeout: 15 * time.Second} + req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil) + if err != nil { + return nil, err + } + resp, err := client.Do(req) + if err != nil { + return nil, fmt.Errorf("fetch rss: %w", err) + } + defer resp.Body.Close() + if resp.StatusCode/100 != 2 { + return nil, fmt.Errorf("rss feed returned status %d", resp.StatusCode) + } + data, err := io.ReadAll(resp.Body) + if err != nil { + return nil, fmt.Errorf("read rss body: %w", err) + } + var rss RSS + if err := xml.Unmarshal(data, &rss); err != nil { + return nil, fmt.Errorf("parse rss xml: %w", err) + } + // ... build and return []ConcertDay +}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@models/rss_feed.go` around lines 104 - 118, The current main function uses http.Get (which uses http.DefaultClient with no timeout), panics on errors, and feeds non-2xx responses straight to xml.Unmarshal for the RSS type; change this to use an http.Client with a reasonable Timeout (e.g., 10s), replace http.Get with client.Do(req), check resp.StatusCode and return a descriptive error for any non-2xx status before reading the body, read and unmarshal into the RSS struct but propagate errors (return error) instead of calling panic so callers can handle retries/logging; update the function signature (e.g., fetchRSS or NewRSSFromURL) to return (*RSS, error) and reference the main function, http.Get usage, resp.StatusCode check, xml.Unmarshal, and RSS type when making these changes.

add rss feed scraping

fc7ef51

Oliviayuuuu317 requested review from JoshD94, skyeslattery and tnt07-t April 22, 2026 22:31

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add rss feed scraping#7

add rss feed scraping#7
Oliviayuuuu317 wants to merge 1 commit intomainfrom
olivia/rss_feed

Oliviayuuuu317 commented Apr 22, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Uh oh!

coderabbitai Bot Apr 22, 2026

Uh oh!

coderabbitai Bot Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Oliviayuuuu317 commented Apr 22, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Changes Made

Test Coverage

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Oliviayuuuu317 commented Apr 22, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading