How to structure multiple parsers? #227
Replies: 1 comment
-
|
Just following up with the solution I'm currently using: @ors_home_page "https://www.oregonlegislature.gov/bills_laws/Pages/ORS.aspx"
@chapter_root "https://www.oregonlegislature.gov/bills_laws/ors/ors"
@anno_root "https://www.oregonlegislature.gov/bills_laws/ors/ano"
@impl Crawly.Spider
def base_url(), do: "https://www.oregonlegislature.gov/"
@impl Crawly.Spider
def init() do
[start_urls: [@ors_home_page]]
end
@impl Crawly.Spider
def parse_item(%{request_url: @ors_home_page} = response) do
Logger.info("Parsing #{response.request_url}...")
Parser.parse_home_page(response)
end
def parse_item(%{request_url: @chapter_root <> _} = response) do
Logger.info("Parsing #{response.request_url}...")
ChapterFile.parse(response)
end
def parse_item(%{request_url: @anno_root <> _} = response) do
Logger.info("Parsing #{response.request_url}...")
AnnotationFile.parse(response)
end |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I've read the docs, but I'm still a little unsure about it. Say I have two page types: a home page and item pages. My non-framework way to handle this is:
So, I'm using plain Elixir pattern matching on response properties to choose a parser. What would this code look like if implemented using Response Parsers? Could someone expand on the expected return type? (Maybe Crawly would benefit from a ResponseParser behavior?)
So it must return a tuple and the first item must be a
ParsedItem?How should Response Parsers choose to not process a Response? I'm guessing by just returning an empty
ParsedItem? Will the framework call each Response Parser in turn?Beta Was this translation helpful? Give feedback.
All reactions