This project explores Airbnb listings in Amsterdam using a dataset sourced from Inside Airbnb, capturing data as of December 6th, 2018. The objective is to identify trends, patterns, and insights into the Airbnb market, including pricing, availability, and customer reviews.
The dataset comprises approximately 20,000 listings and includes the following files:
calendar.csv: Daily availability and pricing for each listing (365 days ahead).listings.csv: Key variables for visualizations, including price, location, and host details.listings_details.csv: 80 additional variables describing listings.neighbourhoods.csv: Names of Amsterdam neighborhoods.neighbourhoods.geojson: Geo-spatial data for mapping neighborhoods.reviews.csv: Review counts and related information.reviews_details.csv: Full details of customer reviews, suitable for text mining.
- How does the price of listings vary across different neighborhoods in Amsterdam?
- Is there a correlation between room type (e.g., entire home, private room) and price?
- What is the relationship between reviews per month and listing price?
- How does the number of a host’s listings affect reviews and pricing?
- Which neighborhoods have the highest-priced listings?
- Can sentiment analysis of reviews provide insights into guest experiences?
- Does availability correlate with price and review count?
- Handled missing values in key columns (
name,reviews_per_month, etc.) by replacing them with placeholders likeunknown. - Removed unrealistic outliers (e.g., minimum nights > 1000).
- Removed duplicate entries and unnecessary columns.
In compliance with PII regulations, 41 sensitive columns were removed from listings_details.csv, including:
- Host details (
host_name,host_location,host_response_rate, etc.) - Internal metadata (
listing_url,calendar_last_scraped, etc.)
- The
commentscolumn inreviews_details.csvhad 530 missing values, replaced withUnknown.
- The dataset captures a snapshot from December 6th, 2018, which may not reflect current trends, especially post-COVID.
- Certain variables (e.g., host details) were removed to maintain privacy, limiting analysis on host-specific behaviors.
- Python: For data wrangling, analysis, and visualization.
- Libraries: Pandas, NumPy, Matplotlib, Seaborn, Geopandas, Scikit-learn, NLTK.
- Tools: Jupyter Notebook for interactive exploration.
- Pricing trends across neighborhoods.
- Room type influences on customer preferences and pricing.
- Correlation between reviews, pricing, and availability.
- Host strategies for pricing and customer engagement.
- Identified outliers in minimum nights and pricing data.
- Privacy: Anonymized all personally identifiable information.
- Bias Awareness: Recognized potential biases in reviews and listing data.
- Transparency: Limitations of the dataset were explicitly stated.
- Clone the repository:
git clone https://github.com/your-username/Amsterdam-Airbnb-Analysis.git