- Clone:
git clone https://github.com/iAbdullah/bootcamp_data.git - Setup:
pip install pandas numpy plotly nbformat - Open:
jupyter notebook notebooks/eda.ipynb
This repository contains a comprehensive Exploratory Data Analysis (EDA) of e-commerce order data, focusing on regional performance and statistical significance.
- Data Cleaning: Processed over 5,000 orders, including date formatting and outlier management (Winsorization).
- Revenue Insights: Identified UAE and Kuwait as the primary revenue drivers.
- Trend Analysis: Analyzed monthly seasonality to understand peak sales periods.
- Statistical Testing: Performed Bootstrap comparison to analyze refund rate differences between Saudi Arabia and UAE.
notebooks/eda.ipynb: The main analysis notebook with code and visualizations.reports/figures/: Exported charts (Bar charts, Histograms, Line trends).data/: Processed datasets used for the analysis.
- Python (Pandas, NumPy)
- Plotly (Interactive Visualizations)
- Git/GitHub (Version Control)
Created as part of the AI Pros Bootcamp.
We compare the total revenue, number of orders, and Average Order Value (AOV) across different countries to identify the top-performing markets.
After auditing the data (5,250 rows) and performing exploratory data analysis (EDA), we found the following:
- Market Leadership: The UAE (AE) and Kuwait (KW) are our strongest markets, contributing the highest total revenue.
- Growth Trends: The monthly revenue analysis shows a steady trend with specific peaks, suggesting that our promotional periods are effective.
- Customer Behavior: Most orders are within a consistent "typical" range as shown in the distribution, allowing for predictable inventory planning.
- Data Reliability: By using Winsorization, we have ensured that our insights are not distorted by extreme outliers, providing a more realistic view of the business performance.
Next Steps:
- Investigate the factors driving high AOV in specific countries.
- Align marketing campaigns with the peak months identified in the trend analysis.
- Total Revenue by Country: Which country generates the highest total revenue (using amount_winsor)?
- Order Volume over Time: What is the monthly trend of order counts, and are there specific peak months?
- Average Order Value Comparison: Is there a significant difference in the average transaction amount between the top two countries?
canclogn## Conclusion: Key Business Insights
Based on the Exploratory Data Analysis (EDA) of the orders data, we conclude the following:
- Market Leadership: The UAE (AE) is our primary revenue driver, followed by Kuwait (KW). Marketing efforts should prioritize maintaining this lead while investigating growth opportunities in other GCC markets.
- Temporal Trends: Monthly revenue shows clear seasonality. The peaks identified in the trend line suggest that current promotional strategies are effective during those periods.
- Order Consistency: The distribution of order amounts is right-skewed, indicating that the majority of our revenue comes from a high volume of smaller "typical" orders rather than a few massive transactions.
- Data Reliability: By using Winsorization, we have ensured that our insights are robust and not skewed by extreme outliers, providing a realistic view for decision-making.
Final Recommendation: Maintain focus on the UAE market while exploring why certain months show lower performance to stabilize revenue year-round.
In this section, we analyze the distribution of transaction values using a histogram to understand what a "typical" order look like and identify any remaining skewness.

- Typical Order: Most orders fall within the range of 50 to 150 (depending on your data), which represents the core customer segment.
- Skewness: The distribution remains right-skewed, indicating that while we have many small orders, there is a long tail of higher-value transactions.
- Outlier Management: By using the winsorized amount, we have successfully managed extreme values, making the chart easier to read and more representative of the business.
In this section, we analyze which countries generate the most revenue using the winsorized amount to ensure outliers do not distort the rankings.
- Market Leader: The UAE (AE) stands out as the highest revenue-generating market, significantly leading other regions.
- Secondary Markets: Kuwait (KW) and Saudi Arabia (SA) show strong performance, while Qatar (QA) remains a growing market.
- Revenue Stability: Since we used
amount_winsor, these results are reliable and not driven by single massive "whale" orders.
Caveat: This analysis only looks at total revenue; it does not account for the operational costs or marketing spend in each country.
-
The Result: The difference is about 2%.
-
Is it real? No, because the interval includes zero (it goes from -0.017 to 0.059).
-
Final Word: This means the difference between SA and AE is just a matter of luck/chance. Practically, they have the same refund rates.
- Best Markets: UAE and Kuwait are our "Big Players" in revenue.
- Sales Timing: Sales go up and down depending on the month (Seasonality).
- Typical Order: Most customers spend a similar amount, we don't have many crazy high or low orders.
- Refunds: No real difference in refunds between Saudi and UAE.
Conclusion: The business is doing great in the UAE. We should keep doing what we're doing but keep an eye on the busy months!
Follow these steps to run the analysis on your local machine:
-
Clone the repository:
git clone https://github.com/iAbdullah/bootcamp_data.git -
Navigate to the project folder:
cd bootcamp_data -
Install dependencies:
pip install pandas numpy plotly nbformat -
Run the Notebook:
jupyter notebook notebooks/eda.ipynb
- Data Cleaning: Processed over 5,000 orders.
- Statistical Testing: Performed
Bootstrap comparisonfor refund rates.