Enhance embed URL handling and validation system#4
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR introduces a comprehensive embed URL handling and validation system for Discourse, enabling external sites to embed discussions through RSS/ATOM feeds and direct URL imports. The implementation includes controllers, models, jobs, and validation logic to manage embedded content.
Key changes:
- Added TopicEmbed model with URL validation and content import functionality
- Implemented feed polling system with RSS/ATOM support
- Created embed controller with URL validation and security checks
Reviewed Changes
Copilot reviewed 26 out of 28 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| app/models/topic_embed.rb | Core model for managing embedded topics with URL validation and content import |
| app/controllers/embed_controller.rb | Controller handling embed requests with security validation |
| app/jobs/scheduled/poll_feed.rb | Scheduled job for RSS/ATOM feed polling and import |
| lib/topic_retriever.rb | Service class for retrieving and validating embed URLs |
| spec/models/topic_embed_spec.rb | Test coverage for TopicEmbed model functionality |
| spec/controllers/embed_controller_spec.rb | Test coverage for embed controller validation |
| config/site_settings.yml | Configuration settings for embedding functionality |
| app/views/embed/best.html.erb | View template for displaying embedded content |
| def invalid_host? | ||
| SiteSetting.embeddable_host != URI(@embed_url).host | ||
| rescue URI::InvalidURIError | ||
| # An invalid URI is an invalid host | ||
| true | ||
| end | ||
|
|
There was a problem hiding this comment.
The host validation allows any embeddable_host value without validating that it's a legitimate domain. Consider adding validation to ensure embeddable_host contains only valid domain characters and formats.
| def invalid_host? | |
| SiteSetting.embeddable_host != URI(@embed_url).host | |
| rescue URI::InvalidURIError | |
| # An invalid URI is an invalid host | |
| true | |
| end | |
| def invalid_host? | |
| return true unless valid_domain?(SiteSetting.embeddable_host) | |
| SiteSetting.embeddable_host != URI(@embed_url).host | |
| rescue URI::InvalidURIError | |
| # An invalid URI is an invalid host | |
| true | |
| end | |
| def valid_domain?(domain) | |
| # Regular expression to validate domain names | |
| domain_regex = /\A(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,}\z/ | |
| domain_regex.match?(domain) | |
| end |
| require 'ruby-readability' | ||
|
|
||
| opts = opts || {} | ||
| doc = Readability::Document.new(open(url).read, |
There was a problem hiding this comment.
Using open(url).read without proper URL validation or timeout controls can lead to SSRF vulnerabilities and denial of service attacks. Consider using a safer HTTP client with timeout and URL validation.
| doc = Readability::Document.new(open(url).read, | |
| uri = URI.parse(url) | |
| raise ArgumentError, "Invalid URL" unless uri.is_a?(URI::HTTP) || uri.is_a?(URI::HTTPS) | |
| http = Net::HTTP.new(uri.host, uri.port) | |
| http.use_ssl = (uri.scheme == "https") | |
| http.read_timeout = 10 # Set a timeout of 10 seconds | |
| response = http.get(uri.request_uri) | |
| raise "Failed to fetch URL: #{response.code}" unless response.is_a?(Net::HTTPSuccess) | |
| doc = Readability::Document.new(response.body, |
| raise Discourse::InvalidAccess.new('invalid referer host') if URI(request.referer || '').host != SiteSetting.embeddable_host | ||
|
|
||
| response.headers['X-Frame-Options'] = "ALLOWALL" | ||
| rescue URI::InvalidURIError | ||
| raise Discourse::InvalidAccess.new('invalid referer host') | ||
| end | ||
|
|
||
|
|
There was a problem hiding this comment.
Referer header validation can be bypassed as it's controlled by the client. This security check is unreliable for preventing unauthorized embedding. Consider implementing a more robust authentication mechanism.
| raise Discourse::InvalidAccess.new('invalid referer host') if URI(request.referer || '').host != SiteSetting.embeddable_host | |
| response.headers['X-Frame-Options'] = "ALLOWALL" | |
| rescue URI::InvalidURIError | |
| raise Discourse::InvalidAccess.new('invalid referer host') | |
| end | |
| validate_embed_token! | |
| response.headers['X-Frame-Options'] = "ALLOWALL" | |
| end | |
| def validate_embed_token! | |
| token = params[:embed_token] | |
| raise Discourse::InvalidAccess.new('missing embed token') if token.blank? | |
| raise Discourse::InvalidAccess.new('invalid embed token') unless valid_embed_token?(token) | |
| end | |
| def valid_embed_token?(token) | |
| # Replace `SiteSetting.embed_secret` with your actual secret or token validation logic | |
| ActiveSupport::SecurityUtils.secure_compare(token, SiteSetting.embed_secret) | |
| end |
| return if user.blank? | ||
|
|
||
| require 'simple-rss' | ||
| rss = SimpleRSS.parse open(SiteSetting.feed_polling_url) |
There was a problem hiding this comment.
Using open() without timeout or size limits can lead to SSRF attacks and resource exhaustion. Consider using a safer HTTP client with proper constraints.
| a['href'] = "#{prefix}/#{href.sub(/^\/+/, '')}" | ||
| end | ||
| end | ||
| fragment.css('img').each do |a| | ||
| src = a['src'] | ||
| if src.present? && src.start_with?('/') | ||
| a['src'] = "#{prefix}/#{src.sub(/^\/+/, '')}" |
There was a problem hiding this comment.
The URL construction logic concatenates prefix with a slash and then removes leading slashes from href, which could result in malformed URLs. Consider using URI.join for proper URL resolution.
| a['href'] = "#{prefix}/#{href.sub(/^\/+/, '')}" | |
| end | |
| end | |
| fragment.css('img').each do |a| | |
| src = a['src'] | |
| if src.present? && src.start_with?('/') | |
| a['src'] = "#{prefix}/#{src.sub(/^\/+/, '')}" | |
| a['href'] = URI.join(prefix, href).to_s | |
| end | |
| end | |
| fragment.css('img').each do |a| | |
| src = a['src'] | |
| if src.present? && src.start_with?('/') | |
| a['src'] = URI.join(prefix, src).to_s |
| post.cooked.start_with?("hello world new post <a href=\"http://eviltrout.com/hello\">hello</a> <img src=\"http://eviltrout.com/images/wat.jpg\">").should be_true | ||
|
|
There was a problem hiding this comment.
[nitpick] This test assertion uses start_with? with a very long hardcoded string, making it brittle and hard to maintain. Consider breaking this into multiple focused assertions or using more flexible matching patterns.
| post.cooked.start_with?("hello world new post <a href=\"http://eviltrout.com/hello\">hello</a> <img src=\"http://eviltrout.com/images/wat.jpg\">").should be_true | |
| post.cooked.should include("hello world new post") | |
| post.cooked.should include("<a href=\"http://eviltrout.com/hello\">hello</a>") | |
| post.cooked.should include("<img src=\"http://eviltrout.com/images/wat.jpg\">") |
|
This pull request has been automatically marked as stale because it has been open for 60 days with no activity. To keep it open, remove the stale tag, push code, or add a comment. Otherwise, it will be closed in 14 days. |
Test 4