Skip to content

Code challenge submission#378

Open
6in4 wants to merge 5 commits into
serpapi:masterfrom
6in4:ruby-solution
Open

Code challenge submission#378
6in4 wants to merge 5 commits into
serpapi:masterfrom
6in4:ruby-solution

Conversation

@6in4
Copy link
Copy Markdown

@6in4 6in4 commented May 20, 2026

I used Nokogiri for HTML parsing and RSpec for testing.
Run bundle exec rspec for tests.
Run ruby lib/page.rb <html file> to output JSON.
I've added three additional HTML files: Michelangelo sculptures, Picasso paintings, and Monet paintings.

Key assumptions

  • The scrape method is a single public entrypoint by design
    • Adding a second block type would mean adding a private scrape_ method and merging the output there
    • The lazy-load map is built eagerly in the constructor — a known trade-off if block types diverge in how they load images
  • This only scrapes artwork carousels (kc:/visual_art/visual_artist:works) - the scope limitation is intentional
    • expected-array.json returns an object containing { "artworks": [] }
    • Other types, like discography, filmography, and books, all internally use a grid (wp-grid-{view,tile})
      • Grids and carousels may look similar, but are semantically different, so I made the call to restrict it to artwork
  • Errors are handled higher up in the call stack (in other words, the caller handles observability)
    • The code here will raise at the slightest error - my intent is: "how do we know if the SERP structure changed?"
    • That's the sole reason the ScraperError class exists (could be split further, but for the sake of this exercise, I haven't done this) - throw a specific error, with reasoning as to why it was thrown
    • This also implies that observability instrumentation is already in place elsewhere

In short, my scraping philosophy is: fail fast, fail loudly, and be observable.

Full transparency: this was my first time writing Ruby from scratch, and while I tried to write idiomatic Ruby, there may be some instances where the code is non-idiomatic. This branch contains cleaner history, but my full workflow is in the prototype branch (browser console -> Cheerio -> Nokogiri).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants