diff --git a/02_activities/assignments/DC_Cohort/Assignment1.md b/02_activities/assignments/DC_Cohort/Assignment1.md index f78778f5b..eb1fec78c 100644 --- a/02_activities/assignments/DC_Cohort/Assignment1.md +++ b/02_activities/assignments/DC_Cohort/Assignment1.md @@ -205,5 +205,9 @@ Consider, for example, concepts of fariness, inequality, social structures, marg ``` -Your thoughts... +The article was published in 'Ideas' in November 2021 and authored by Rida Qadri, then a PhD Candidate in MIT's program of urban information systems. In the article, Rida shares the story of Riz, a Pakistani person who was excluded from the National Database and Registration Authority (NADRA) because their parents' marriages did not meet the (rigid) criteria of the new digitalized identification system. The inability to be identified by NADRA resulted in Riz's inaccessibility to social and welfare services and significant constraints on their freedom of movement. To my understanding, the main claim of the article is that the social hierarchies, social perspectives and political power relationships are embedded in data systems. This claim echoes scholars in Science and Technology Studies (e.g., Bruno Latour, Donna Haraway) and the Philosophy of Technology (e.g., Andrew Feenberg) who argued decades ago that technologies, and specifically information and communication technologies, are sociotechnical objects. + +While I agree with the main argument, in this comment, I will problematize some of the assumptions made in the article. Then I will add some insights from my practice and research as a social worker/community organizer, focusing on the interplay between welfare/community practice and information technology. First, I would like to challenge the contrast between NADRA's lineage/genealogical system design and systems for identifying individuals through the organization and processing of biometric data (i.e., the 'unique' physical, behavioural, and biological characteristics). While biometric data is seen as a more objective/reliable process for identification, it is imperative to remember that, as sociotechnical objects, data systems (and any other artifacts) can also be used (and abused) for purposes that deviate from the original one. Thus, while biometric systems automate, streamline, and particularize the identification process, they have also been used to profile and surveil marginalized groups, including poor and racialized individuals and communities. This brings me to my second argument. Similar to Qadri, I support the call for a more reflexive (critical) approach to the politics and ethics embedded in data system design. However, it is crucial to remember that the way we structure/construct the data (through schemes, possible data types, and even the practice of surveys/questionnaires in social science) is only one side of the story. The other side is how we interpret the data produced by, and/or mined/retrieved from, these systems and databases. In her book, Virginia Eubanks (2018) traces how these two practices—structuring data systems and interpreting their outputs—are increasingly used to profile and punish the poor in the U.S. (e.g., via decision algorithms in the child welfare system and in determining eligibility for allowances and social support). + +The freedom to interpret, however, can also work in another direction and prepare the ground for other social demands. In this regard, the example of Pakistan's Khawaja Sira community is essential. It shows that databases and information technologies can be a site for political demands and social change (even if not perfect). To motivate this change, it is essential to remember that we (still) have spaces (courts, streets, universities, community centers) that we should foster and take care of together to make these struggles for data justice, ownership, and sovereignty possible and effective. ``` diff --git a/02_activities/assignments/DC_Cohort/Assignment2.md b/02_activities/assignments/DC_Cohort/Assignment2.md index 9b804e9ee..98299d438 100644 --- a/02_activities/assignments/DC_Cohort/Assignment2.md +++ b/02_activities/assignments/DC_Cohort/Assignment2.md @@ -54,7 +54,10 @@ The store wants to keep customer addresses. Propose two architectures for the CU **HINT:** search type 1 vs type 2 slowly changing dimensions. ``` -Your answer... +CUSTOMER_ADDRESS (type 1)→ overwrite changes in costumer's address record, no history is retained +columns: custumer_id; customer_first_name; cutomer_last_name; customer_current_address +CUSTOMER_ADDRESS (type 2)→ does not overwrite changes in costumer's address record, address history is retained by adding another two columns +columns: custumer_id; customer_first_name; cutomer_last_name; customer_address; address_start_date; address_end_date ``` *** @@ -183,5 +186,5 @@ Consider, for example, concepts of labour, bias, LLM proliferation, moderating c ``` -Your thoughts... +Recently, in one of the advanced qualitative methods classes I took, my colleagues and I had a very engaging discussion about the new AI features inserted into qualitative software programs such as NVivo and Dedoose. These features are supposed to ease the tedious line-by-line coding in methods such as Grounded Theory, save at least the initial coding stage, and help research teams "do more!" - analyze a larger volume of qualitative data excerpts in less time. Reading Boykis's article, I was thinking about how easily we give up the skills we have been working to develop for so long. For decades, qualitative researchers have developed strategies to turn qualitative inquiry into a more rigorous process through methods such as collaborative coding and member checking, while also maintaining the unique perspectives of the researcher and research participants and accounting for the specifics of the context in which data were collected and generated. After reading Boykis's move in the article, I was curious to understand how these qualitative software and new AI features actually work. Visiting Nvivo website, I have discovered that Nvivo-15's Lumivero AI Assistant automates the coding process in three stages: (1) learning the researcher coding patterns based on a bulk of previous intial excerpts done by the researcher; (2) sending the excerpts to a third-party services such as Geminei or OpenAI GPT for processing them (with a commitment that interview's data it is not used to train the AI model); and (3) generating predictions to code the following qualitative excerpts (e.g., interviews/policy documents/focus-groups). As someone with extensive experience in qualitative research, I could not help but think about the consequences of this coding streamlining process. Qualitative research was developed to understand local, context-dependent social processes that can't be "captured" by quantitative methods and statistics. Statistical inference requires a sample with a limited amount of variance to draw statistically significant results (this variance should be above zero but not too large, depending on the size of the sample) and then to generalize to an imagined homogeneous group (i.e., population). In contrast, qualitative research values diverse and contradictory perspectives within the same sample, which is then the work of the researcher, using social theory and other materials to interpret the diversity of perspectives and experiences, showing how they can dwell in the same context, while illuminating and representing the experiences of those who are absent. In this regard, I am afraid that the mechanism used by NVivo 15 will quantify qualitative research, through the decontextualization of data and its comparison to an "average" perspective/experience produced by Geminei/ChatGPT, and by doing so, it will undermine the goal and rationale bringing us to develop qualitative research in the first place. It is worth emphasizing that I have nothing against quantitative research and often use it to answer specific questions. However, quantitative methodology cannot answer all questions. It cannot, for example, adequately capture the diverse perspectives and experiences of marginalized individuals and communities. It also fails to represent both the depth and breadth of viewpoints that are often overlooked due to "publication/dissemination bias" (i.e., the tendency for research studies with positive or statistically significant results to be published more frequently than those with negative, null, or inconclusive findings). Thus, the motivation for incorporating these AI tools seems to be not just to improve them. Here, it is time to return to the "doing more" celebrated in one of NVivo-15's AI-feature promotions. In the ultra-productionist academic culture, social sciences are immersed in the imperative to "do more" (produce more articles, based on a larger amount of data), which becomes the only criterion to evaluate both researchers and research. I am afraid that an unreflexive adoption of these recent "innovative" features in qualitative research will undermine our ability to learn from neglected perspectives and will also influence qualitative research to reproduce and echo the "average perspective", which is, unfortunately, biased towards those who have already been extensively represented in the public digital sphere.   ``` diff --git a/02_activities/assignments/DC_Cohort/assignment1.sql b/02_activities/assignments/DC_Cohort/assignment1.sql index c992e3205..098881c7a 100644 --- a/02_activities/assignments/DC_Cohort/assignment1.sql +++ b/02_activities/assignments/DC_Cohort/assignment1.sql @@ -5,49 +5,126 @@ --SELECT /* 1. Write a query that returns everything in the customer table. */ - +SELECT * +FROM customer; /* 2. Write a query that displays all of the columns and 10 rows from the cus- tomer table, sorted by customer_last_name, then customer_first_ name. */ +SELECT* +FROM customer +ORDER BY customer_first_name, customer_last_name +LIMIT 10; --WHERE /* 1. Write a query that returns all customer purchases of product IDs 4 and 9. */ +SELECT * +FROM customer_purchases +WHERE product_id = 4 +OR product_id = 9; - -/*2. Write a query that returns all customer purchases and a new calculated column 'price' (quantity * cost_to_customer_per_qty), +/*2. Write a query that returns all customer purchases and a new calculated column 'price' (quantity * cost_to_customer_per_qty), filtered by customer IDs between 8 and 10 (inclusive) using either: 1. two conditions using AND 2. one condition using BETWEEN */ -- option 1 +SELECT +quantity, +cost_to_customer_per_qty, +quantity * cost_to_customer_per_qty as price, +product_id, +market_date, +vendor_id, +customer_id, +transaction_time + +FROM customer_purchases + +WHERE customer_id > 7 +AND customer_id < 11 ; -- option 2 +SELECT * +FROM customer_purchases +WHERE product_id = 4 +OR product_id = 9; +/*2. Write a query that returns all customer purchases and a new calculated column 'price' (quantity * cost_to_customer_per_qty), +filtered by customer IDs between 8 and 10 (inclusive) using either: + 1. two conditions using AND + 2. one condition using BETWEEN +*/ +-- option 1 + +SELECT +quantity, +cost_to_customer_per_qty, +quantity * cost_to_customer_per_qty as price, +product_id, +market_date, +vendor_id, +customer_id, +transaction_time + +FROM customer_purchases + +WHERE customer_id BETWEEN 8 AND 10; --CASE /* 1. Products can be sold by the individual unit or by bulk measures like lbs. or oz. Using the product table, write a query that outputs the product_id and product_name columns and add a column called prod_qty_type_condensed that displays the word “unit” -if the product_qty_type is “unit,” and otherwise displays the word “bulk.” */ +if the product_qty_type is “unit,” and othewise displays the word “bulk.” */ +SELECT +product_id, +product_name, +product_qty_type +,CASE + WHEN product_qty_type = 'unit' THEN 'unit' + ELSE 'bulk' +END as prod_qty_type_condensed +FROM product; /* 2. We want to flag all of the different types of pepper products that are sold at the market. add a column to the previous query called pepper_flag that outputs a 1 if the product_name contains the word “pepper” (regardless of capitalization), and otherwise outputs 0. */ +SELECT +product_id, +product_name, +product_qty_type + +,CASE + WHEN product_qty_type = 'unit' THEN 'unit' + ELSE 'bulk' +END as prod_qty_type_condensed +,CASE + WHEN product_name LIKE '%eppers%' THEN 1 + ELSE 0 +END as pepper_flag + +FROM product; --JOIN /* 1. Write a query that INNER JOINs the vendor table to the vendor_booth_assignments table on the vendor_id field they both have in common, and sorts the result by vendor_name, then market_date. */ +SELECT * + +FROM vendor as v +INNER JOIN vendor_booth_assignments as vba + ON v.vendor_id = vba.vendor_id + +ORDER BY vendor_name, market_date; /* SECTION 3 */ @@ -56,14 +133,30 @@ vendor_id field they both have in common, and sorts the result by vendor_name, t /* 1. Write a query that determines how many times each vendor has rented a booth at the farmer’s market by counting the vendor booth assignments per vendor_id. */ - + SELECT vendor_id ,COUNT (market_date) as number_booth_renting + FROM vendor_booth_assignments + GROUP BY vendor_id; /* 2. The Farmer’s Market Customer Appreciation Committee wants to give a bumper sticker to everyone who has ever spent more than $2000 at the market. Write a query that generates a list of customers for them to give stickers to, sorted by last name, then first name. HINT: This query requires you to join two tables, use an aggregate function, and use the HAVING keyword. */ +SELECT +cp. customer_id, +customer_first_name, +customer_last_name, +SUM (quantity*cost_to_customer_per_qty) as total_spend + +FROM customer_purchases as cp +INNER JOIN customer as c + ON c.customer_id = cp.customer_id + +GROUP BY c.customer_id +HAVING total_spend > 2000 + +ORDER BY customer_last_name, customer_first_name; --Temp Table @@ -78,9 +171,23 @@ When inserting the new vendor, you need to appropriately align the columns to be VALUES(col1,col2,col3,col4,col5) */ +DROP TABLE IF EXISTS temp.new_vendor; + +CREATE TABLE temp.new_vendor AS + +SELECT * + +FROM vendor; + +INSERT INTO temp.new_vendor (vendor_id, vendor_name, vendor_type, vendor_owner_first_name, vendor_owner_last_name ) +VALUES(10,'Thomass Superfood Store', 'Fresh Focused store', 'Thomas', 'Rosenthal') + + + + --- Date +-- Date [no need for this assigment] /*1. Get the customer_id, month, and year (in separate columns) of every purchase in the customer_purchases table. HINT: you might need to search for strfrtime modifers sqlite on the web to know what the modifers for month diff --git a/02_activities/assignments/DC_Cohort/assignment2.sql b/02_activities/assignments/DC_Cohort/assignment2.sql index d6a10dbe0..303e76ae7 100644 --- a/02_activities/assignments/DC_Cohort/assignment2.sql +++ b/02_activities/assignments/DC_Cohort/assignment2.sql @@ -21,7 +21,11 @@ The `||` values concatenate the columns into strings. Edit the appropriate columns -- you're making two edits -- and the NULL rows will be fixed. All the other rows will remain the same. */ +SELECT + +product_name || ', ' || ifnull(product_size, '')|| ' (' || coalesce (product_qty_type, 'unit') || ')' as listformanager +FROM product; --Windowed Functions /* 1. Write a query that selects from the customer_purchases table and numbers each customer’s @@ -34,16 +38,48 @@ each new market date for each customer, or select only the unique market dates p HINT: One of these approaches uses ROW_NUMBER() and one uses DENSE_RANK(). */ +SELECT + +customer_id +,market_date +,row_number() OVER (PARTITION BY customer_id ORDER BY market_date ASC) as aggrageted_visit + +FROM customer_purchases; + + /* 2. Reverse the numbering of the query from a part so each customer’s most recent visit is labeled 1, then write another query that uses this one as a subquery (or temp table) and filters the results to only the customer’s most recent visit. */ +SELECT + +customer_id +,market_date +,row_number() OVER (PARTITION BY customer_id ORDER BY market_date DESC) as aggrageted_visit + +FROM customer_purchases; + +SELECT* +FROM( + SELECT + customer_id + ,market_date + ,row_number() OVER (PARTITION BY customer_id ORDER BY market_date DESC) as aggrageted_visit + + + FROM customer_purchases + +)x +WHERE x.aggrageted_visit = 1; /* 3. Using a COUNT() window function, include a value along with each row of the customer_purchases table that indicates how many different times that customer has purchased that product_id. */ +SELECT* +,COUNT(*) OVER (PARTITION BY customer_id, product_id ORDER BY market_date, transaction_time ASC) as agg_product_per_custumer +FROM customer_purchases; -- String manipulations @@ -58,11 +94,26 @@ Remove any trailing or leading whitespaces. Don't just use a case statement for Hint: you might need to use INSTR(product_name,'-') to find the hyphens. INSTR will help split the column. */ +SELECT* + +,CASE + WHEN product_name like '%-%' THEN SUBSTR(product_name, INSTR(product_name, '-') + 1) + ELSE NULL +END as description + +FROM product; /* 2. Filter the query to show any product_size value that contain a number with REGEXP. */ +SELECT* +,CASE + WHEN product_name like '%-%' THEN SUBSTR(product_name, INSTR(product_name, '-') + 1) + ELSE NULL +END as description +FROM product +WHERE product_size REGEXP '^(0|1|2|3|4|5|6|7|8|9)'; -- UNION /* 1. Using a UNION, write a query that displays the market dates with the highest and lowest total sales. @@ -75,6 +126,43 @@ HINT: There are a possibly a few ways to do this query, but if you're struggling with a UNION binding them. */ +----stage 1 +DROP TABLE IF EXISTS temp.sales_per_day; + +CREATE TEMP TABLE temp.sales_per_day as +SELECT + market_date, + SUM(quantity * cost_to_customer_per_qty) as total_per_day +FROM customer_purchases +GROUP BY market_date; + +SELECT*--chek +FROM temp.sales_per_day +ORDER BY total_per_day DESC; +--stage II +DROP TABLE IF EXISTS temp.ranked_days; + +CREATE TEMP TABLE temp.ranked_days AS +SELECT + market_date, + total_per_day, + RANK() OVER (ORDER BY total_per_day DESC) AS best_rank, + RANK() OVER (ORDER BY total_per_day ASC) AS worst_rank +FROM temp.sales_per_day; + +SELECT* -- CHECK +FROM temp.ranked_days; + +--stage II +SELECT market_date, total_per_day, 'Best Day' AS day_type +FROM temp.ranked_days +WHERE best_rank = 1 + +UNION + +SELECT market_date, total_per_day, 'Worst Day' AS day_type +FROM temp.ranked_days +WHERE worst_rank = 1 /* SECTION 3 */ @@ -90,6 +178,49 @@ Think a bit about the row counts: how many distinct vendors, product names are t How many customers are there (y). Before your final group by you should have the product of those two queries (x*y). */ +-- first we I will culculate the custumer mumber * 5 (number of units from each product) + + --number of customers + SELECT + count (*) FROM customer; --26*5(TIMES SALES)--130 from each product for each vendor + + --there are three vendors in vendor_inventory + --vendor 4 (product 16); vendor 7 (products 1-4); vendor 8 (product5,7,8)-- sum 8 rows for each 130 sales + --by prices + +SELECT DISTINCT product_id,vendor_id, original_price +FROM vendor_inventory +ORDER BY vendor_id ASC,product_id ASC; + + +---without cross join +SELECT DISTINCT + v.vendor_name, + p.product_name, + vi.original_price * 5* 26 as total_income +FROM vendor_inventory vi +JOIN vendor v + ON vi.vendor_id = v.vendor_id +JOIN product p + ON vi.product_id = p.product_id +ORDER BY v.vendor_name, p.product_name + + +---with cross join +SELECT DISTINCT + v.vendor_name, + p.product_name, + vi.original_price * 5* c.customer_count as total_income +FROM vendor_inventory vi +JOIN vendor v + ON vi.vendor_id = v.vendor_id +JOIN product p + ON vi.product_id = p.product_id +CROSS JOIN ( + SELECT COUNT(*) AS customer_count + FROM customer +) as c +ORDER BY v.vendor_name, p.product_name -- INSERT @@ -98,19 +229,47 @@ This table will contain only products where the `product_qty_type = 'unit'`. It should use all of the columns from the product table, as well as a new column for the `CURRENT_TIMESTAMP`. Name the timestamp column `snapshot_timestamp`. */ +DROP TABLE IF EXISTS temp.product_units; +CREATE TEMP TABLE product_units AS +SELECT * FROM product; +SELECT* --precheck +FROM product_units +WHERE product_qty_type != 'unit' OR product_qty_type IS NULL; -/*2. Using `INSERT`, add a new row to the product_units table (with an updated timestamp). -This can be any product you desire (e.g. add another record for Apple Pie). */ +DELETE FROM product_units +WHERE product_qty_type != 'unit' OR product_qty_type IS NULL; + +SELECT* ---post check +FROM product_units; + +ALTER TABLE product_units ADD COLUMN snapshot_timestamp datetime; +UPDATE product_units SET snapshot_timestamp = CURRENT_TIMESTAMP; +SELECT* +FROM product_units; +/*2. Using `INSERT`, add a new row to the product_units table (with an updated timestamp). +This can be any product you desire (e.g. add another record for Apple Pie). */ +INSERT INTO product_units +VALUES(25,'Basbousa', '45"', 3, 'unit', CURRENT_TIMESTAMP); + +SELECT* +FROM product_units; -- DELETE /* 1. Delete the older record for the whatever product you added. HINT: If you don't specify a WHERE clause, you are going to have a bad time.*/ +SELECT* --pre-check +FROM product_units +WHERE product_id =25; +DELETE FROM product_units +WHERE product_id =25; +SELECT*-- post check +FROM product_units; -- UPDATE /* 1.We want to add the current_quantity to the product_units table. @@ -129,6 +288,56 @@ Finally, make sure you have a WHERE statement to update the right row, you'll need to use product_units.product_id to refer to the correct row within the product_units table. When you have all of these components, you can run the update statement. */ +ALTER TABLE product_units ADD COLUMN current_quantity INT; + +SELECT*--check +FROM product_units; + +SELECT* --review the table +FROM vendor_inventory +ORDER BY market_date DESC, product_id DESC; + +-- get the "last" quantity per product: + + +SELECT* +FROM( + SELECT + market_date + ,product_id + ,quantity + ,row_number() OVER (PARTITION BY product_id ORDER BY market_date DESC) as rn_last + + FROM vendor_inventory + + +)as x +WHERE x.rn_last = 1; + +--coalesce null values to 0 +UPDATE product_units +SET current_quantity = COALESCE(current_quantity, 0) +WHERE current_quantity IS NULL; + +SELECT*--checking +FROM product_units; +--SET current_quantity = (...your select statement...) +UPDATE product_units +SET current_quantity = x.quantity + FROM( + SELECT + market_date + ,product_id + ,quantity + ,row_number() OVER (PARTITION BY product_id ORDER BY market_date DESC) as rn_last + FROM vendor_inventory + )as x + WHERE x.rn_last = 1 + AND current_quantity = 0 + AND product_units.product_id = x.product_id; + +SELECT* +FROM product_units diff --git a/02_activities/assignments/DC_Cohort/images/Assigment 1 logical diagram.jpg b/02_activities/assignments/DC_Cohort/images/Assigment 1 logical diagram.jpg new file mode 100644 index 000000000..990230560 Binary files /dev/null and b/02_activities/assignments/DC_Cohort/images/Assigment 1 logical diagram.jpg differ diff --git a/02_activities/assignments/DC_Cohort/images/Assigment 2 propmpt 2 bookstore and shifts.jpg b/02_activities/assignments/DC_Cohort/images/Assigment 2 propmpt 2 bookstore and shifts.jpg new file mode 100644 index 000000000..72c266f41 Binary files /dev/null and b/02_activities/assignments/DC_Cohort/images/Assigment 2 propmpt 2 bookstore and shifts.jpg differ diff --git a/02_activities/assignments/DC_Cohort/images/Assignment 2 prompt 1 bookstore.jpg b/02_activities/assignments/DC_Cohort/images/Assignment 2 prompt 1 bookstore.jpg new file mode 100644 index 000000000..403ac0f28 Binary files /dev/null and b/02_activities/assignments/DC_Cohort/images/Assignment 2 prompt 1 bookstore.jpg differ diff --git a/04_this_cohort/live_code/DC/module_2/module_2.sqbpro b/04_this_cohort/live_code/DC/module_2/module_2.sqbpro index bdca500ba..5eea736eb 100644 --- a/04_this_cohort/live_code/DC/module_2/module_2.sqbpro +++ b/04_this_cohort/live_code/DC/module_2/module_2.sqbpro @@ -1,3 +1,265 @@ +<<<<<<< HEAD +
/* MODULE 2 */ +/* SELECT */ + + +/* 1. Select everything in the customer table */ + +-- sql is not a case senesive +-- select * all table +-- however we should selec specific COLUMN as a best practice +-- debugging +-- select from the name of table + +SELECT* +FROM customer; + +/* 2. Use sql as a calculator */ + +SELECT 1+1 as first_grade, 5*5 as second, pi() as highschool; + +/* 3. Add order by and limit clauses */ + +SELECT* +FROM customer +ORDER BY customer_first_name +LIMIT 10; + +/* 4. Select multiple specific columns / +-- we can use[] as an escape phrase +SELECT customer_id, customer_first_name as [SELECT] +FROM customer; + +/* 5. Add a static value in a column */ + +SELECT 2025 as this_year,'October' as this_moonth, customer_id +FROM customer/* MODULE 2 */ +/* WHERE */ + +/* 1. Select only customer 1 from the customer table */ +SELECT * +FROM customer +WHERE customer_id = 1; + + +/* 2. Differentiate between AND and OR */ +SELECT * +FROM customer +WHERE customer_id = 1 +OR customer_id = 2; + +-- we canno use AND here - logical contractiction + + +/* 3. IN */ + +SELECT * +FROM customer +WHERE customer_id IN (3,4,5) +OR customer_postal_code IN ('M1L'); + +--we can use logical operators + +/* 4. LIKE */ +--all the papers + +SELECT * +FROM product +WHERE product_name LIKE '%pepper%'; + --when customer_last_name strats with a +SELECT * +FROM customer +WHERE customer_last_name LIKE 'a%'; +/* 5. Nulls and Blanks*/ + +SELECT * +FROM product +WHERE product_size IS NULL; + +SELECT * +FROM product +WHERE product_size = ''; + + -- THERE IS DIFFERENT BETWEEN NULL AND BLANK + +/* 6. BETWEEN x AND y */ + +SELECT * +FROM customer +WHERE customer_id BETWEEN 1 AND 20; + +SELECT * +FROM market_date_info +WHERE market_date BETWEEN '2022-10-01' AND '2022-10-30'; +--between int and date → not srings!!!/* MODULE 2 */ +/* CASE */ + + +SELECT * +/* 1. Add a CASE statement declaring which days vendors should come */ +,CASE WHEN vendor_type = 'Fresh Focused' THEN 'Wednesday' + WHEN vendor_type = 'Prepared Foods' THEN 'Thursday' + ELSE 'Saturday' +END as day_of_speciality + +/* 2. Add another CASE statement for Pie Day */ +,CASE WHEN vendor_name = "Annie's Pies" -- double quote if the name contains single quote + THEN 'Annie is great!' +END as pie_day --when the cell cannot satesfy the condition there is null + +/* 3. Add another CASE statement with an ELSE clause to handle rows evaluating to False */ + +,CASE WHEN vendor_name = "Annie's Pies" -- double quote if the name contains single quote + THEN 'Annie is great!' + ELSE 'no pie today' +END as pie_day + +FROM vendor; + +/* 4. Experiment with selecting a different column instead of just a string value */ + +SELECT * +,CASE WHEN cost_to_customer_per_qty <'1.00' + THEN cost_to_customer_per_qty*5 + ELSE cost_to_customer_per_qty + END as inflation +FROM customer_purchases +/* MODULE 2 */ +/* DISTINCT */ + + +/* 1. Compare how many customer_ids are the customer_purchases table, one select with distinct, one without */ + +-- 4221 rows +SELECT customer_id FROM customer_purchases; + +SELECT DISTINCT customer_id FROM customer_purchases; + + + +/* 2. Compare the difference between selecting market_day in market_date_info, with and without distinct: + what do these difference mean?*/ + + SELECT market_day FROM market_date_info; + --marker is open 150 days - perspective of dates + SELECT DISTINCT market_day FROM market_date_info; + -- open only on two days sytax - perspective + +/* 3. Which vendor has sold products to a customer */ + +SELECT DISTINCT vendor_id +From customer_purchases; + +/*4. Which vendor has sold products to a customer ... and which product was it */ +SELECT DISTINCT vendor_id, product_id +FROM customer_purchases; + + +/* 5. Which vendor has sold products to a customer +... and which product was it? +... AND to whom was it sold*/ + +SELECT DISTINCT vendor_id, product_id, customer_id +FROM customer_purchases +/* MODULE 2 */ +/* INNER JOIN */ + + +/* 1. Get product names (from product table) alongside customer_purchases + ... use an INNER JOIN to see only products that have been purchased */ + +-- without table aliases + +SELECT product_name,--from the product table +vendor_id,-- the rest from customer purshases +market_date, +customer_id, +customer_purchases.product_id +FROM product +INNER JOIN customer_purchases + ON customer_purchases.product_id = product.product_id; + + + +/* 2. Using the Query #4 from DISTINCT earlier + (Which vendor has sold products to a customer AND which product was it AND to whom was it sold) + + Add customers' first and last names with an INNER JOIN */ + +-- using table aliases + +SELECT DISTINCT +vendor_id, +product_id, +c.customer_id, +customer_first_name, +customer_last_name +FROM customer_purchases as cp +INNER JOIN customer as c + ON c.customer_id = cp.customer_id +/* MODULE 2 */ +/* LEFT JOIN */ + + +/* 1. There are products that have been bought +... but are there products that have not been bought? +Use a LEFT JOIN to find out*/ + +SELECT DISTINCT +p.product_id, +cp. product_id as [cp.product_id], +product_name + +FROM product as p +LEFT JOIN customer_purchases as cp + ON p.product_id = cp.product_id ; + +/* 2. Directions of LEFT JOINs matter ...*/ +-- only products that have been sold +SELECT DISTINCT +p.product_id, +cp. product_id as [cp.product_id], +product_name + +FROM customer_purchases as p +LEFT JOIN product as cp + ON p.product_id = cp.product_id; + + +/* 3. As do which values you filter on ... */ + +SELECT DISTINCT +p.product_id, +cp. product_id as [cp.product_id], +product_name + +FROM customer_purchases as p +LEFT JOIN product as cp + ON p.product_id = cp.product_id + +WHERE p.product_id BETWEEN 1 AND 6; + + +/* 4. Without using a RIGHT JOIN, make this query return the RIGHT JOIN result set +...**Hint, flip the order of the joins** ... + +SELECT * + +FROM product_category AS pc +LEFT JOIN product AS p + ON pc.product_category_id = p.product_category_id + ORDER by pc.product_category_id + +...Note how the row count changed from 24 to 23 +*/ + +SELECT * + +FROM product_category AS pc +RIGHT JOIN product AS p + ON pc.product_category_id = p.product_category_id + ORDER by pc.product_category_id
+=======
/* MODULE 2 */ /* SELECT */ @@ -313,3 +575,4 @@ LEFT JOIN customer_purchases as cp ORDER BY cp.product_id
+>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 diff --git a/04_this_cohort/live_code/DC/module_3/module_3.sqbpro b/04_this_cohort/live_code/DC/module_3/module_3.sqbpro index 3636a6be1..eb7f696c2 100644 --- a/04_this_cohort/live_code/DC/module_3/module_3.sqbpro +++ b/04_this_cohort/live_code/DC/module_3/module_3.sqbpro @@ -1,13 +1,65 @@ +<<<<<<< HEAD +
/* MODULE 3 */ +=======
/* MODULE 3 */ +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 /* COUNT */ /* 1. Count the number of products */ +<<<<<<< HEAD +SELECT count (product_id) as num_of_product +FROM product; +======= SELECT COUNT(product_id) as num_of_product FROM product; +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 - /* 2. How many products per product_qty_type */ +<<<<<<< HEAD + SELECT product_qty_type, COUNT (product_id) as num_of_product + FROM product + GROUP BY product_qty_type; + +/* 3. How many products per product_qty_type and per their product_size */ + SELECT product_size, + product_qty_type, + COUNT (product_id) as num_of_product + FROM product + GROUP BY product_size, product_qty_type + ORDER BY product_qty_type; +/* COUNT DISTINCT + 4. How many unique products were bought*/ +SELECT COUNT(DISTINCT product_id) as bought_product +FROM customer_purchases; + +SELECT COUNT(product_id) as bought_product +FROM customer_purchases +/* MODULE 3 */ +/* SUM & AVG */ + + +/* 1. How much did customers spend each day */ +SELECT +market_date, +customer_id, +SUM(quantity*cost_to_customer_per_qty) as total_cost + +FROM customer_purchases +GROUP BY customer_id, market_date +ORDER BY market_date; + + +/* 2. How much does each customer spend on average */ +SELECT +customer_first_name, +customer_last_name, +AVG (quantity*cost_to_customer_per_qty) as avg_spend +FROM customer_purchases as cp +INNER JOIN customer as c + ON c.customer_id = cp. customer_id +GROUP BY c.customer_id; +======= SELECT product_qty_type, COUNT(product_id) as num_of_product FROM product GROUP BY product_qty_type; @@ -51,33 +103,91 @@ INNER JOIN customer as c ON c.customer_id = cp.customer_id GROUP BY c.customer_id +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 +SELECT +customer_first_name, +customer_last_name, +ROUND (AVG (quantity*cost_to_customer_per_qty),2) as avg_spend +FROM customer_purchases as cp +INNER JOIN customer as c + ON c.customer_id = cp. customer_id +GROUP BY c.customer_id +ORDER BY customer_first_name; +<<<<<<< HEAD +SELECT +customer_first_name, +customer_last_name, +market_date, +ROUND (AVG (quantity*cost_to_customer_per_qty),2) as avg_spend +FROM customer_purchases as cp +INNER JOIN customer as c + ON c.customer_id = cp. customer_id +GROUP BY c.customer_id, market_date +ORDER BY customer_first_name, market_date/* MODULE 3 */ +======= /* MODULE 3 */ +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 /* MIN & MAX */ /* 1. What is the most expensive product ...pay attention to how it doesn't handle ties very well */ +<<<<<<< HEAD + +SELECT +product_name, +max (original_price) as most_expesive_product + +FROM vendor_inventory as vi +INNER JOIN product as p + ON p.product_id = vi.product_id; + +======= SELECT product_name, max(original_price) as most_expensive FROM vendor_inventory as vi INNER JOIN product as p ON p.product_id = vi.product_id; +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 /* 2. Prove that max is working */ SELECT DISTINCT product_name, original_price +<<<<<<< HEAD +SELECT DISTINCT +product_name, +original_price +======= FROM vendor_inventory as vi INNER JOIN product as p ON p.product_id = vi.product_id; +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 - +FROM +vendor_inventory as vi +INNER JOIN product as p + ON p.product_id = vi.product_id + ORDER BY original_price; + /* 3. Find the minimum price per each product_qty_type */ +<<<<<<< HEAD +SELECT +product_name, +product_qty_type, +min (original_price) + +FROM vendor_inventory as vi +INNER JOIN product as p + ON p.product_id = vi.product_id + +GROUP BY product_qty_type; +======= SELECT product_name ,product_qty_type ,min(original_price) @@ -85,6 +195,7 @@ SELECT product_name FROM vendor_inventory as vi INNER JOIN product as p ON p.product_id = vi.product_id +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 GROUP BY product_qty_type; @@ -94,20 +205,62 @@ SELECT DISTINCT product_name --,min(original_price) ,original_price +<<<<<<< HEAD +SELECT DISTINCT +product_name, +original_price, +product_qty_type + +FROM vendor_inventory as vi +INNER JOIN product as p + ON p.product_id = vi.product_id +ORDER BY product_qty_type, original_price; + +======= FROM vendor_inventory as vi INNER JOIN product as p ON p.product_id = vi.product_id; +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 /* 5. Min/max on a string ... not particularly useful? */ SELECT max(product_name) FROM product +<<<<<<< HEAD +SELECT min (product_name) +FROM product; + +SELECT max (product_name) +FROM product; + +/* MODULE 3 */ +======= /* MODULE 3 */ +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 /* Arithmitic */ /* 1. power, pi(), ceiling, division, integer division, etc */ +<<<<<<< HEAD +SELECT +power (4,2), pi(); +SELECT +CAST(10.0 as INT)/CAST(3.0 as INT) as integer_devision; + +/* 2. Every even vendor_id with modulo */ +SELECT * FROM vendor +WHERE vendor_id%2 = 0; + +--odd +SELECT * FROM vendor +WHERE vendor_id%2 != 0; + +/* 3. What about every third? */ +SELECT * FROM vendor +WHERE vendor_id%3 =0 +/* MODULE 3 */ +======= SELECT power(4,2), pi(); SELECT 10.0 / 3.0 as division, @@ -122,16 +275,30 @@ SELECT * FROM vendor WHERE vendor_id % 3 = 0; /* MODULE 3 */ +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 /* HAVING */ /* 1. How much did a customer spend on each day? Filter to customer_id between 1 and 5 and total_spend > 50 ... What order of execution occurs?*/ +<<<<<<< HEAD + SELECT + market_date, + customer_id, + SUM (quantity*cost_to_customer_per_qty) as total_spend + FROM customer_purchases + WHERE customer_id BETWEEN 1 AND 5 + + GROUP BY market_date, customer_id + HAVING total_spend >50 + ORDER BY customer_id, market_date; +======= SELECT market_date ,customer_id ,SUM(quantity*cost_to_customer_per_qty) as total_spend +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 FROM customer_purchases WHERE customer_id BETWEEN 1 AND 5 @@ -141,6 +308,15 @@ HAVING total_spend > 50; /* 2. How many products were bought? Filter to number of purchases between 300 and 500 */ +<<<<<<< HEAD +SELECT count (product_id) as number_of_product, +product_id + FROM customer_purchases + GROUP BY product_id + HAVING count (product_id) BETWEEN 300 AND 500 + + /* MODULE 3 */ +======= SELECT count(product_id) as num_of_prod, product_id FROM customer_purchases GROUP BY product_id @@ -148,6 +324,7 @@ HAVING count(product_id) BETWEEN 300 AND 500 /* MODULE 3 */ +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 /* Subquery FROM */ @@ -159,21 +336,44 @@ product_id, inflation FROM ( SELECT product_id, cost_to_customer_per_qty, CASE WHEN cost_to_customer_per_qty < '1.00' THEN cost_to_customer_per_qty*5 +<<<<<<< HEAD + ELSE cost_to_customer_per_qty + END as inflation + + FROM customer_purchases +); + +======= ELSE cost_to_customer_per_qty END AS inflation FROM customer_purchases ); +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 /* 2. What is the single item that has been bought in the greatest quantity?*/ --outer query SELECT product_name, MAX(quantity_purchased) +<<<<<<< HEAD +SELECT +product_name, max (quntity_purchased) +FROM product as p +INNER JOIN ( + SELECT product_id, + count (quantity) as quntity_purchased + FROM customer_purchases + GROUP BY product_id + +) as x + ON p.product_id = x.product_id +======= FROM product AS p INNER JOIN ( -- inner query SELECT product_id ,count(quantity) as quantity_purchased +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 FROM customer_purchases GROUP BY product_id @@ -185,6 +385,13 @@ INNER JOIN ( /* 1. How much did each customer spend at each vendor for each day at the market WHEN IT RAINS */ +<<<<<<< HEAD +SELECT +market_date, +customer_id, +vendor_id, +SUM (quantity*cost_to_customer_per_qty) as total_spend +======= SELECT market_date, customer_id, @@ -202,25 +409,78 @@ WHERE market_date IN ( ) GROUP BY market_date, vendor_id, customer_id; +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 +FROM customer_purchases +WHERE market_date IN ( + SELECT market_date + FROM market_date_info + WHERE market_rain_flag = 1 +) +GROUP BY market_date, vendor_id, market_date; /* 2. What is the name of the vendor who sells pie */ +SELECT DISTINCT vendor_name +<<<<<<< HEAD +FROM customer_purchases as cp +======= SELECT DISTINCT vendor_name FROM customer_purchases as cp +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 INNER JOIN vendor as v ON cp.vendor_id = v.vendor_id WHERE product_id IN ( SELECT product_id FROM product +<<<<<<< HEAD + WHERE product_name like '%pie%' + ) +/* MODULE 3 */ +/* Common Table Expression (CTE) */ + + +/* 1. Calculate sales per vendor per day */ +WITH vendor_daily_sales AS ( + SELECT + md.market_date, + market_day, + market_week, + market_year, + vendor_name, + SUM (quantity*cost_to_customer_per_qty) as sales + + FROM customer_purchases as cp + INNER JOIN vendor as v + ON v.vendor_id = cp.vendor_id + INNER JOIN market_date_info as md + ON cp.market_date = md.market_date + + GROUP BY md.market_date, v.vendor_id + +) + +/* ... re-aggregate the daily sales for each WEEK instead now */ + +SELECT +market_week, +market_year, +vendor_name, +sum (sales) as sales + +FROM vendor_daily_sales +GROUP BY market_year, market_week, vendor_name +ORDER BY vendor_name, market_week/* MODULE 3 */ +======= WHERE product_name LIKE '%pie%' ) /* MODULE 3 */ +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 /* Temp Tables */ @@ -236,14 +496,28 @@ DROP TABLE IF EXISTS temp.new_vendor_inventory; CREATE TABLE temp.new_vendor_inventory AS -- definition of the table +<<<<<<< HEAD +SELECT *, +original_price*5 as inflation +FROM vendor_inventory; + + + +======= SELECT *, original_price*5 as inflation FROM vendor_inventory; +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 /* 2. put the previous table into another temp table, e.g. as temp.new_new_vendor_inventory */ DROP TABLE IF EXISTS temp.new_new_vendor_inventory; CREATE TABLE temp.new_new_vendor_inventory AS +<<<<<<< HEAD +SELECT *, +inflation*3 as super_inflation +FROM temp.new_vendor_inventory +======= SELECT * ,inflation*2 as super_inflation @@ -252,6 +526,7 @@ FROM temp.new_vendor_inventory /* MODULE 3 */ /* Common Table Expression (CTE) */ +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 /* 1. Calculate sales per vendor per day */ WITH vendor_daily_sales AS ( @@ -292,6 +567,27 @@ GROUP BY market_year,market_week, vendor_name /* 1. now */ +<<<<<<< HEAD +SELECT +date ('now') as [now], +datetime ('now') as [rightnow], +date()as [today], + +/* 2. strftime */ +strftime ('%Y/%m', 'now') as [i], +strftime ('%Y-%m-%d', 'now','+50 days') as the_future; + +SELECT DISTINCT +market_date, +strftime ('%Y-%m-%d',market_date,'+50 days','-1 year') as the_past + +FROM +market_date_info +ORDER BY market_date; +/* 3. adding dates, e.g. last date of the month */ + +--last date of last month +======= SELECT DISTINCT DATE('now') as [now] ,DATETIME('now') [rightnow] @@ -306,13 +602,26 @@ DATE('now') as [now] /* 3. adding dates, e.g. last date of the month */ --last date of last month ,DATE(market_date,'start of month','-1 day') as end_of_prev_month +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 +SELECT DISTINCT + +date (market_date, 'start of month', '-1 day') as end_of_prev_month +From market_date_info; /* 4. difference between dates, a. number of days between now and each market_date b. number of YEARS between now and market_date c. number of HOURS bewtween now and market_date */ +<<<<<<< HEAD + SELECT DISTINCT + julianday ('now') - julianday (market_date) as datebetweenmarketdates, + (julianday ('now') - julianday (market_date)) /365.25 as differentyears, + (julianday ('now') - julianday (market_date))*24*60*60 as numberofseconds + + FROM market_date_info-- Reference to file "C:/Users/ofirs/OneDrive/Desktop/DSI PRACTICE/sql/02_activities/assignments/DC_Cohort/assignment2.sql" (not supported by this version) --
+======= ,market_date ,julianday('now') - julianday(market_date) as datesbetweenmktdate -- number of days between now and each market date ,(julianday('now') - julianday(market_date)) / 365.25 @@ -320,3 +629,4 @@ DATE('now') as [now] FROM market_date_info
+>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 diff --git a/04_this_cohort/live_code/DC/module_4/module_4.sqbpro b/04_this_cohort/live_code/DC/module_4/module_4.sqbpro index 14f953a1a..71f1e94d9 100644 --- a/04_this_cohort/live_code/DC/module_4/module_4.sqbpro +++ b/04_this_cohort/live_code/DC/module_4/module_4.sqbpro @@ -1,42 +1,82 @@ +<<<<<<< HEAD +
/* MODULE 4 */ +======= /* MODULE 4 */ +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 /* NULL Management */ /* 1. IFNULL: Missing product_size, missing product_qty_type */ +<<<<<<< HEAD + +SELECT*, +======= SELECT * ,IFNULL(product_size,'Unknown') as new_product_size --replace with another column ,IFNULL(product_size,product_qty_type) as silly_replacement +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 +ifnull (product_size, 'Unknown') as new_product_size + FROM product; +--replace in another column +SELECT * +, ifnull(product_size, product_name) as sillyrrplacemnt +FROM product; /* 2. Coalesce */ ,coalesce(product_size,product_qty_type,'missing') as more_replacements -- if the first value is null, then the second value, ... if the second value is null, then "missing" ,coalesce(product_qty_type,product_size,'missing') as more_replacements -- if the first value is null, then the second value, ... if the second value is null, then "missing" +<<<<<<< HEAD +SELECT * +,coalesce (product_size, product_qty_type, 'missing') as replacement -- if the value is null so the second value and in the second in null so missing +,coalesce (product_qty_type, product_size, 'missing') as anotherreplacemnt +-- we can get the same result in this way: +, ifnull(ifnull (product_qty_type, product_size),'missing') as onelastreplacment +======= ,IFNULL(IFNULL(product_size,product_qty_type),'missing') as equiv_replacements -- if the first value is null, then the second value, ... if the second value is null, then "missing" FROM product; +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 +From product; /* 3. NULLIF finding values in the product_size column that are "blank" strings and setting them to NULL if they are blank */ +<<<<<<< HEAD +SELECT* +--,ifnull(product_size, 'Unknown') +--,nullif (product_size, '') +, ifnull(nullif(product_size, ''),'unknown') as newquant +, coalesce(nullif(product_size, ''),'unknown') as newquantII +======= SELECT * --,IFNULL(product_size, 'Unknown') --,NULLIF(product_size,'') ,coalesce(NULLIF(product_size,''),'Unknown') replaced_values +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 FROM product; /* 4. NULLIF filtering which rows are null or blank */ +SELECT* +FROM product +<<<<<<< HEAD +WHERE nullif(product_size,'') IS NULL +-- this is a way to capture invalid values! +/* MODULE 4 */ +======= SELECT * FROM product WHERE NULLIF(product_size,'') IS NULL /* MODULE 4 */ +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 /* NULLIF Budget (example from the slides) */ /* The following example creates a budgets table to show a department (dept) @@ -82,6 +122,19 @@ https://learn.microsoft.com/en-us/sql/t-sql/language-elements/nullif-transact-sq /* 1. What product is the highest price per vendor */ +<<<<<<< HEAD +SELECT x.* +,product_name +FROM ( + SELECT + vendor_id + ,product_id + ,original_price + ,market_date + ,ROW_NUMBER() OVER (PARTITION BY vendor_id ORDER BY original_price DESC) as price_rn + + FROM vendor_inventory +======= --outer query SELECT x.* ,product_name @@ -99,11 +152,19 @@ FROM ,ROW_NUMBER() OVER(PARTITION BY vendor_id,product_id ORDER BY original_price DESC) FROM vendor_inventory +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 ) x INNER JOIN product p ON x.product_id = p.product_id +<<<<<<< HEAD +INNER JOIN product as p + ON x.product_id = p.product_id + +WHERE x.price_rn = 1; +======= WHERE x.price_rank = 1; +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 /* See how this varies from using max due to the group by SELECT vendor_id, @@ -132,7 +193,28 @@ GROUP BY vendor_id--,product_id ) x WHERE x.sales_rank = 1 +<<<<<<< HEAD +--highest single purchace in a day per custumer: +SELECT* +FROM( + SELECT + + customer_id + ,product_id + ,market_date + ,quantity + ,quantity*cost_to_customer_per_qty as cost + ,row_number() OVER (PARTITION BY customer_id ORDER BY quantity*cost_to_customer_per_qty DESC) as sales_rank + + FROM customer_purchases + + +)x +WHERE x.sales_rank = 1 +ORDER BY cost DESC/* MODULE 4 */ +======= ORDER BY cost desc/* MODULE 4 */ +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 /* Windowed functions: dense_rank, rank, row_number */ @@ -159,14 +241,32 @@ VALUES (9, 165000), (10, 100000), (11, 90000); +<<<<<<< HEAD +======= SELECT * ,ROW_NUMBER() OVER(ORDER BY salary DESC) as [row_number] ,RANK() OVER(ORDER BY salary DESC) as [rank] ,DENSE_RANK() OVER(ORDER BY salary DESC) as [dense_rank] +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 +SELECT * +,row_number () OVER (ORDER BY salary DESC) as [row_number] +,rank () OVER (ORDER BY salary DESC) as [rank] +,dense_rank () OVER (ORDER BY salary DESC) as [dense_rank] FROM row_rank_dense +<<<<<<< HEAD +/* MODULE 4 */ +/* Windowed functions: NTILE */ + + +/* 1. Calculate quartile, quntiles, and percentiles from vendor daily sales */ +SELECT * +,ntile (4) OVER (PARTITION BY vendor_name ORDER BY sales ASC) as quartile +,ntile (5) OVER (PARTITION BY vendor_name ORDER BY sales ASC) as quantiles +,ntile (10) OVER (PARTITION BY vendor_name ORDER BY sales ASC) as precentages +======= /* MODULE 4 */ @@ -178,6 +278,7 @@ SELECT * ,NTILE(4) OVER (PARTITION BY vendor_name ORDER BY sales ASC) as [quartile] ,NTILE(5) OVER (PARTITION BY vendor_name ORDER BY sales ASC) as [quintile] ,NTILE(100) OVER (PARTITION BY vendor_name ORDER BY sales ASC) as [percentile] +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 FROM ( -- vendor daily sales @@ -196,12 +297,42 @@ FROM ( ON v.vendor_id = cp.vendor_id GROUP BY cp.market_date, v.vendor_id +<<<<<<< HEAD + ) x +/* MODULE 4 */ +======= ) x/* MODULE 4 */ +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 /* String Manipulations */ /* 1. ltrim, rtrim, trim*/ SELECT +<<<<<<< HEAD +LTRIM (' ofir sivan ') as [ltrim], +RTRIM (' ofir sivan ') as [rtrim], +TRIM (' ofir sivan ') as [trim]; +/* 2. replace*/ +SELECT +replace ('Itamar', 'mar', 'mama') as zoeitsa; + +SELECT +customer_first_name, +replace (customer_first_name,'a','mama') as custumamama +FROM customer; + +/* 3. upper, lower*/ +SELECT +customer_first_name, +upper (customer_first_name) as customer_CAP +FROM customer; + +/* 4. concat with || */ +SELECT +customer_first_name, +customer_first_name || ' ' || customer_last_name as customer_fullname +FROM customer; +======= LTRIM(' THOMAS ROSENTHAL ') as [ltrim] ,RTRIM(' THOMAS ROSENTHAL ') as [rtrim] ,TRIM(' THOMAS ROSENTHAL ') as [trim] @@ -225,6 +356,7 @@ LTRIM(' THOMAS ROSENTHAL ') as [ltrim] FROM customer; +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 /* 5. substr */ SELECT DISTINCT @@ -233,7 +365,36 @@ customer_last_name ,SUBSTR(customer_last_name,4,2) as [4,2] ,SUBSTR(customer_last_name, -5,4) as [count_right] -- counting from the right +SELECT DISTINCT +customer_last_name +,substr(customer_last_name,4) as [4] --any length from the 4th charachter +,substr(customer_last_name,4,2) as [4,2] -- length of 2 from the 4th charachter +,substr(customer_last_name,-5,4) as [-5,4] -- going fiv back from the end and count 4 +FROM customer; + +--INSERT +--we use with substr + + /* 6. length */ +<<<<<<< HEAD +SELECT DISTINCT +customer_last_name, +length (customer_last_name) as len + +FROM customer; + +/* 7. unicode, char */ +SELECT +unicode ('o'); + +/* 8. REGEXP in a WHERE statement */ +SELECT DISTINCT +customer_last_name +FROM customer +WHERE customer_last_name REGEXP'(a)$' ---filtering customer_last_name ending with a +/* MODULE 4 */ +======= ,LENGTH(customer_last_name) as len /* 7. unicode, char */ @@ -253,6 +414,7 @@ FROM customer WHERE customer_last_name REGEXP '(a)$' -- filtering to cust last name ending in a /* MODULE 4 */ +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 /* Substring & instring together */ @@ -272,4 +434,125 @@ SELECT INSTR('FirstWord, SecondWord, ThirdWord',',')+1)) ,',') + INSTR('FirstWord, SecondWord, ThirdWord',',')+1) AS ThirdDelim +<<<<<<< HEAD +/* MODULE 4 */ +/* UNION */ + +/* 1. Find the most and least expensive product by vendor with UNION (and row_number!) */ +SELECT +vendor_id, product_id, original_price, rn_max as [row_number], 'max price' as type +FROM ( + + SELECT + vendor_id, + product_id, + original_price, + row_number () OVER (PARTITION BY vendor_id ORDER BY original_price DESC) as rn_max + FROM vendor_inventory + ) +WHERE rn_max = 1 + +UNION ALL + +SELECT +vendor_id, product_id, original_price, rn_min, 'min price' as type +FROM ( + + SELECT + vendor_id, + product_id, + original_price, + row_number () OVER (PARTITION BY vendor_id ORDER BY original_price ASC) as rn_min + FROM vendor_inventory + ) +WHERE rn_min = 1;/* MODULE 4 */ +/* UNION */ + +/* 1. Emulate a FULL OUTER JOIN with a UNION */ +DROP TABLE IF EXISTS temp.store1; +CREATE TEMP TABLE IF NOT EXISTS temp.store1 +( +costume TEXT, +quantity INT +); + +INSERT INTO temp.store1 +VALUES("tiger",6), + ("elephant",2), + ("princess", 4); + + +DROP TABLE IF EXISTS temp.store2; +CREATE TEMP TABLE IF NOT EXISTS temp.store2 +( +costume TEXT, +quantity INT +); + +INSERT INTO temp.store2 +VALUES("tiger",2), + ("dancer",7), + ("superhero", 5); + +SELECT s1.costume, s1.quantity as store1_quantitty, s2.costume, s2.quantity as store2_quantity +FROM store1 as s1 +LEFT join store2 as s2 + ON s1.costume = s2.costume + +UNION ALL + +SELECT s2.costume, s1.quantity, s2.quantity + +FROM store2 as s2 +LEFT JOIN store1 as s1 + ON s1.costume = s2.costume +WHERE s1.costume is NULL/* MODULE 4 */ +/* INTERSECT & EXCEPT */ + +/* 1. Find products that have been sold (e.g. are in customer purchases AND product) */ + +SELECT product_id +from customer_purchases +INTERSECT +select product_id +from product; + +/* 2. Find products that have NOT been sold (e.g. are NOT in customer purchases even though in product) */ +SELECT x.product_id, product_name +FROM( + select product_id + from product + except + SELECT product_id + FROM customer_purchases +)as x + +JOIN product as p + on x.product_id=p.product_id; + +/* 3. Directions matter... if we switch the order here: +products that do not exist, because no products purchased are NOT in the product table (e.g. are NOT in product even though in customer purchases)*/ + +-- returning 0 rows +select product_id +from customer_purchases +except +SELECT product_id +FROM product + +/* 4. We can remake the intersect with a WHERE subquery for more details ... */ + +select* +FROM product +WHERE product_id IN + ( + select product_id + from customer_purchases + INTERSECT + SELECT product_id + FROM product + ); +-- Reference to file "C:/Users/ofirs/OneDrive/Desktop/DSI PRACTICE/sql/02_activities/assignments/DC_Cohort/assignment2.sql" (not supported by this version) ---- Reference to file "C:/Users/ofirs/OneDrive/Desktop/DSI PRACTICE/sql/02_activities/assignments/DC_Cohort/assignment1.sql" (not supported by this version) -- +======= -- Reference to file "/Users/thomas/Documents/GitHub/02-intro_sql/04_this_cohort/live_code/DC/module_4/UNION_UNION_ALL.sql" (not supported by this version) ---- Reference to file "/Users/thomas/Documents/GitHub/02-intro_sql/04_this_cohort/live_code/DC/module_4/FULL_OUTER_JOIN_WITH_UNION.sql" (not supported by this version) ---- Reference to file "/Users/thomas/Documents/GitHub/02-intro_sql/04_this_cohort/live_code/DC/module_4/INTERSECT_EXCEPT.sql" (not supported by this version) -- +>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 diff --git a/04_this_cohort/live_code/DC/module_5/module_5.sqbpro b/04_this_cohort/live_code/DC/module_5/module_5.sqbpro index 0bb5b8cc6..46259ea2a 100644 --- a/04_this_cohort/live_code/DC/module_5/module_5.sqbpro +++ b/04_this_cohort/live_code/DC/module_5/module_5.sqbpro @@ -1 +1,168 @@ +<<<<<<< HEAD +
/* MODULE 5 */ +/* INSERT UPDATE DELETE */ + + +DROP TABLE IF EXISTS temp.product_expanded; +CREATE TEMP TABLE product_expanded AS + SELECT * FROM product; + +--SELECT * FROM product_expanded + +/* 1. add a product to the temp table */ + +INSERT INTO product_expanded +VALUES (24, 'almondes', '1 lb',3, 'lbs'); + +select* +from product_expanded; + +/* 2. change the product_size for THAT product */ +UPDATE product_expanded +SET product_size = '0.5KG', product_qty_type = 'KG' +WHERE product_id = 24; + +select* +from product_expanded; +/* 3. delete the newly added product */ + +SELECT* --- GETING TO KNOW WHAT TWE DELETE BEFORE WE DELETE +FROM product_expanded +WHERE product_id =24; + +DELETE FROM product_expanded +WHERE product_id =24; + +select* +from product_expanded; /* MODULE 5 */ +/* VIEW */ + +/* 1. Create a vendor daily sales view */ + + SELECT + md.market_date + ,market_day + ,market_week + ,market_year + ,vendor_name + ,SUM(quantity*cost_to_customer_per_qty) as sales + + + FROM market_date_info md + INNER JOIN customer_purchases cp + ON md.market_date = cp.market_date + INNER JOIN vendor v + ON cp.vendor_id = v.vendor_id + + GROUP BY cp.market_date, v.vendor_id; + +/* MODULE 5 */ +/* VIEW in another query */ + +/* 1. Transform the daily sales view into a sales by vendor per week result */ + + + +/* MODULE 5 */ +/* UPDATE statements for view */ + + +/* 1. SET market_date equal to today for new_customer_purchases */ + + + + +/* 2. Add today's info to the market_date_info + +we need to add +1. today's date +2. today's day +3. today's week number +4. today's year + +INSERT INTO market_date_info +VALUES('....','....','....','....','8:00 AM','2:00 PM','nothing interesting','Summer','25','28',0,0); + +*/ + +/* MODULE 5 */ +/* DYNAMIC VIEW */ + + + + + + + + + +/* spoilers below */ + + + + + + + + + + + + + + + + + +-- THIS ONLY WORKS IF YOU HAVE DONE THE PROPER STEPS FOR IMPORTING +-- 1) update new_customer_purchases to today +-- 2) add the union +-- 3) add the where statement +-- 4) update the market_date_info to include today + + + + +/* MODULE 5 */ +/* CROSS JOIN */ + + +/* 1. CROSS JOIN sizes with product*/ + +DROP TABLE IF EXISTS TEMP.sizes; +CREATE TEMP TABLE IF NOT EXISTS TEMP.sizes (size TEXT); + +INSERT INTO TEMP.sizes +VALUES('small'), +('medium'), +('large'); + +SELECT * FROM TEMP.sizes; + + + +/* MODULE 5 */ +/* SELF JOIN */ + + +/* 1. Create a self-joining hierarchy */ + +DROP TABLE IF EXISTS TEMP.employees; +CREATE TEMP TABLE TEMP.employees +( +emp_id INT +,emp_name text +,mgr_id INT +); + +INSERT INTO TEMP.employees +VALUES(1,'Thomas',3) +,(2,'Niyaz', 4) +,(3,'Rohan', NULL) +,(4, 'Jennie',3); + +SELECT * FROM TEMP.employees; +-- Reference to file "C:/Users/ofirs/OneDrive/Desktop/DSI PRACTICE/sql/02_activities/assignments/DC_Cohort/assignment2.sql" (not supported by this version) ---- Reference to file "C:/Users/ofirs/OneDrive/Desktop/DSI PRACTICE/sql/02_activities/assignments/DC_Cohort/assignment1.sql" (not supported by this version) --
+=======
-- Reference to file "/Users/thomas/Documents/GitHub/02-intro_sql/04_this_cohort/live_code/DC/module_5/INSERT_UPDATE_DELETE.sql" (not supported by this version) ---- Reference to file "/Users/thomas/Documents/GitHub/02-intro_sql/04_this_cohort/live_code/DC/module_5/FIRST_VIEW.sql" (not supported by this version) ---- Reference to file "/Users/thomas/Documents/GitHub/02-intro_sql/04_this_cohort/live_code/DC/module_5/VIEW_IN_A_QUERY.sql" (not supported by this version) ---- Reference to file "/Users/thomas/Documents/GitHub/02-intro_sql/04_this_cohort/live_code/DC/module_5/UPDATE_DYNAMIC_VIEW.sql" (not supported by this version) ---- Reference to file "/Users/thomas/Documents/GitHub/02-intro_sql/04_this_cohort/live_code/DC/module_5/DYNAMIC_VIEW.sql" (not supported by this version) ---- Reference to file "/Users/thomas/Documents/GitHub/02-intro_sql/04_this_cohort/live_code/DC/module_5/CROSS_JOIN.sql" (not supported by this version) ---- Reference to file "/Users/thomas/Documents/GitHub/02-intro_sql/04_this_cohort/live_code/DC/module_5/SELF_JOIN.sql" (not supported by this version) --
+>>>>>>> eba6c4977c09aa65ee87a00aa06ac551f1891f43 diff --git a/05_src/sql/farmersmarket.db b/05_src/sql/farmersmarket.db index 4720f2483..ff67109aa 100644 Binary files a/05_src/sql/farmersmarket.db and b/05_src/sql/farmersmarket.db differ