Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion 02_activities/assignments/DC_Cohort/Assignment1.md
Original file line number Diff line number Diff line change
Expand Up @@ -205,5 +205,9 @@ Consider, for example, concepts of fariness, inequality, social structures, marg


```
Your thoughts...
The article was published in 'Ideas' in November 2021 and authored by Rida Qadri, then a PhD Candidate in MIT's program of urban information systems. In the article, Rida shares the story of Riz, a Pakistani person who was excluded from the National Database and Registration Authority (NADRA) because their parents' marriages did not meet the (rigid) criteria of the new digitalized identification system. The inability to be identified by NADRA resulted in Riz's inaccessibility to social and welfare services and significant constraints on their freedom of movement. To my understanding, the main claim of the article is that the social hierarchies, social perspectives and political power relationships are embedded in data systems. This claim echoes scholars in Science and Technology Studies (e.g., Bruno Latour, Donna Haraway) and the Philosophy of Technology (e.g., Andrew Feenberg) who argued decades ago that technologies, and specifically information and communication technologies, are sociotechnical objects.

While I agree with the main argument, in this comment, I will problematize some of the assumptions made in the article. Then I will add some insights from my practice and research as a social worker/community organizer, focusing on the interplay between welfare/community practice and information technology. First, I would like to challenge the contrast between NADRA's lineage/genealogical system design and systems for identifying individuals through the organization and processing of biometric data (i.e., the 'unique' physical, behavioural, and biological characteristics). While biometric data is seen as a more objective/reliable process for identification, it is imperative to remember that, as sociotechnical objects, data systems (and any other artifacts) can also be used (and abused) for purposes that deviate from the original one. Thus, while biometric systems automate, streamline, and particularize the identification process, they have also been used to profile and surveil marginalized groups, including poor and racialized individuals and communities. This brings me to my second argument. Similar to Qadri, I support the call for a more reflexive (critical) approach to the politics and ethics embedded in data system design. However, it is crucial to remember that the way we structure/construct the data (through schemes, possible data types, and even the practice of surveys/questionnaires in social science) is only one side of the story. The other side is how we interpret the data produced by, and/or mined/retrieved from, these systems and databases. In her book, Virginia Eubanks (2018) traces how these two practices—structuring data systems and interpreting their outputs—are increasingly used to profile and punish the poor in the U.S. (e.g., via decision algorithms in the child welfare system and in determining eligibility for allowances and social support).

The freedom to interpret, however, can also work in another direction and prepare the ground for other social demands. In this regard, the example of Pakistan's Khawaja Sira community is essential. It shows that databases and information technologies can be a site for political demands and social change (even if not perfect). To motivate this change, it is essential to remember that we (still) have spaces (courts, streets, universities, community centers) that we should foster and take care of together to make these struggles for data justice, ownership, and sovereignty possible and effective.
```
7 changes: 5 additions & 2 deletions 02_activities/assignments/DC_Cohort/Assignment2.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,10 @@ The store wants to keep customer addresses. Propose two architectures for the CU
**HINT:** search type 1 vs type 2 slowly changing dimensions.

```
Your answer...
CUSTOMER_ADDRESS (type 1)→ overwrite changes in costumer's address record, no history is retained
columns: custumer_id; customer_first_name; cutomer_last_name; customer_current_address
CUSTOMER_ADDRESS (type 2)→ does not overwrite changes in costumer's address record, address history is retained by adding another two columns
columns: custumer_id; customer_first_name; cutomer_last_name; customer_address; address_start_date; address_end_date
```

***
Expand Down Expand Up @@ -183,5 +186,5 @@ Consider, for example, concepts of labour, bias, LLM proliferation, moderating c


```
Your thoughts...
Recently, in one of the advanced qualitative methods classes I took, my colleagues and I had a very engaging discussion about the new AI features inserted into qualitative software programs such as NVivo and Dedoose. These features are supposed to ease the tedious line-by-line coding in methods such as Grounded Theory, save at least the initial coding stage, and help research teams "do more!" - analyze a larger volume of qualitative data excerpts in less time. Reading Boykis's article, I was thinking about how easily we give up the skills we have been working to develop for so long. For decades, qualitative researchers have developed strategies to turn qualitative inquiry into a more rigorous process through methods such as collaborative coding and member checking, while also maintaining the unique perspectives of the researcher and research participants and accounting for the specifics of the context in which data were collected and generated. After reading Boykis's move in the article, I was curious to understand how these qualitative software and new AI features actually work. Visiting Nvivo website, I have discovered that Nvivo-15's Lumivero AI Assistant automates the coding process in three stages: (1) learning the researcher coding patterns based on a bulk of previous intial excerpts done by the researcher; (2) sending the excerpts to a third-party services such as Geminei or OpenAI GPT for processing them (with a commitment that interview's data it is not used to train the AI model); and (3) generating predictions to code the following qualitative excerpts (e.g., interviews/policy documents/focus-groups). As someone with extensive experience in qualitative research, I could not help but think about the consequences of this coding streamlining process. Qualitative research was developed to understand local, context-dependent social processes that can't be "captured" by quantitative methods and statistics. Statistical inference requires a sample with a limited amount of variance to draw statistically significant results (this variance should be above zero but not too large, depending on the size of the sample) and then to generalize to an imagined homogeneous group (i.e., population). In contrast, qualitative research values diverse and contradictory perspectives within the same sample, which is then the work of the researcher, using social theory and other materials to interpret the diversity of perspectives and experiences, showing how they can dwell in the same context, while illuminating and representing the experiences of those who are absent. In this regard, I am afraid that the mechanism used by NVivo 15 will quantify qualitative research, through the decontextualization of data and its comparison to an "average" perspective/experience produced by Geminei/ChatGPT, and by doing so, it will undermine the goal and rationale bringing us to develop qualitative research in the first place. It is worth emphasizing that I have nothing against quantitative research and often use it to answer specific questions. However, quantitative methodology cannot answer all questions. It cannot, for example, adequately capture the diverse perspectives and experiences of marginalized individuals and communities. It also fails to represent both the depth and breadth of viewpoints that are often overlooked due to "publication/dissemination bias" (i.e., the tendency for research studies with positive or statistically significant results to be published more frequently than those with negative, null, or inconclusive findings). Thus, the motivation for incorporating these AI tools seems to be not just to improve them. Here, it is time to return to the "doing more" celebrated in one of NVivo-15's AI-feature promotions. In the ultra-productionist academic culture, social sciences are immersed in the imperative to "do more" (produce more articles, based on a larger amount of data), which becomes the only criterion to evaluate both researchers and research. I am afraid that an unreflexive adoption of these recent "innovative" features in qualitative research will undermine our ability to learn from neglected perspectives and will also influence qualitative research to reproduce and echo the "average perspective", which is, unfortunately, biased towards those who have already been extensively represented in the public digital sphere.  
```
119 changes: 113 additions & 6 deletions 02_activities/assignments/DC_Cohort/assignment1.sql
Original file line number Diff line number Diff line change
Expand Up @@ -5,49 +5,126 @@
--SELECT
/* 1. Write a query that returns everything in the customer table. */


SELECT *
FROM customer;

/* 2. Write a query that displays all of the columns and 10 rows from the cus- tomer table,
sorted by customer_last_name, then customer_first_ name. */

SELECT*
FROM customer
ORDER BY customer_first_name, customer_last_name
LIMIT 10;


--WHERE
/* 1. Write a query that returns all customer purchases of product IDs 4 and 9. */

SELECT *
FROM customer_purchases
WHERE product_id = 4
OR product_id = 9;


/*2. Write a query that returns all customer purchases and a new calculated column 'price' (quantity * cost_to_customer_per_qty),
/*2. Write a query that returns all customer purchases and a new calculated column 'price' (quantity * cost_to_customer_per_qty),
filtered by customer IDs between 8 and 10 (inclusive) using either:
1. two conditions using AND
2. one condition using BETWEEN
*/
-- option 1

SELECT
quantity,
cost_to_customer_per_qty,
quantity * cost_to_customer_per_qty as price,
product_id,
market_date,
vendor_id,
customer_id,
transaction_time

FROM customer_purchases

WHERE customer_id > 7
AND customer_id < 11 ;

-- option 2

SELECT *
FROM customer_purchases
WHERE product_id = 4
OR product_id = 9;

/*2. Write a query that returns all customer purchases and a new calculated column 'price' (quantity * cost_to_customer_per_qty),
filtered by customer IDs between 8 and 10 (inclusive) using either:
1. two conditions using AND
2. one condition using BETWEEN
*/
-- option 1

SELECT
quantity,
cost_to_customer_per_qty,
quantity * cost_to_customer_per_qty as price,
product_id,
market_date,
vendor_id,
customer_id,
transaction_time

FROM customer_purchases

WHERE customer_id BETWEEN 8 AND 10;

--CASE
/* 1. Products can be sold by the individual unit or by bulk measures like lbs. or oz.
Using the product table, write a query that outputs the product_id and product_name
columns and add a column called prod_qty_type_condensed that displays the word “unit”
if the product_qty_type is “unit,” and otherwise displays the word “bulk.” */
if the product_qty_type is “unit,” and othewise displays the word “bulk.” */
SELECT
product_id,
product_name,
product_qty_type

,CASE
WHEN product_qty_type = 'unit' THEN 'unit'
ELSE 'bulk'
END as prod_qty_type_condensed

FROM product;

/* 2. We want to flag all of the different types of pepper products that are sold at the market.
add a column to the previous query called pepper_flag that outputs a 1 if the product_name
contains the word “pepper” (regardless of capitalization), and otherwise outputs 0. */

SELECT
product_id,
product_name,
product_qty_type

,CASE
WHEN product_qty_type = 'unit' THEN 'unit'
ELSE 'bulk'
END as prod_qty_type_condensed

,CASE
WHEN product_name LIKE '%eppers%' THEN 1
ELSE 0
END as pepper_flag

FROM product;

--JOIN
/* 1. Write a query that INNER JOINs the vendor table to the vendor_booth_assignments table on the
vendor_id field they both have in common, and sorts the result by vendor_name, then market_date. */

SELECT *

FROM vendor as v
INNER JOIN vendor_booth_assignments as vba
ON v.vendor_id = vba.vendor_id


ORDER BY vendor_name, market_date;


/* SECTION 3 */
Expand All @@ -56,14 +133,30 @@ vendor_id field they both have in common, and sorts the result by vendor_name, t
/* 1. Write a query that determines how many times each vendor has rented a booth
at the farmer’s market by counting the vendor booth assignments per vendor_id. */


SELECT vendor_id ,COUNT (market_date) as number_booth_renting
FROM vendor_booth_assignments
GROUP BY vendor_id;

/* 2. The Farmer’s Market Customer Appreciation Committee wants to give a bumper
sticker to everyone who has ever spent more than $2000 at the market. Write a query that generates a list
of customers for them to give stickers to, sorted by last name, then first name.

HINT: This query requires you to join two tables, use an aggregate function, and use the HAVING keyword. */
SELECT
cp. customer_id,
customer_first_name,
customer_last_name,
SUM (quantity*cost_to_customer_per_qty) as total_spend

FROM customer_purchases as cp
INNER JOIN customer as c
ON c.customer_id = cp.customer_id

GROUP BY c.customer_id

HAVING total_spend > 2000

ORDER BY customer_last_name, customer_first_name;


--Temp Table
Expand All @@ -78,9 +171,23 @@ When inserting the new vendor, you need to appropriately align the columns to be
VALUES(col1,col2,col3,col4,col5)
*/

DROP TABLE IF EXISTS temp.new_vendor;

CREATE TABLE temp.new_vendor AS

SELECT *

FROM vendor;

INSERT INTO temp.new_vendor (vendor_id, vendor_name, vendor_type, vendor_owner_first_name, vendor_owner_last_name )
VALUES(10,'Thomass Superfood Store', 'Fresh Focused store', 'Thomas', 'Rosenthal')






-- Date
-- Date [no need for this assigment]
/*1. Get the customer_id, month, and year (in separate columns) of every purchase in the customer_purchases table.

HINT: you might need to search for strfrtime modifers sqlite on the web to know what the modifers for month
Expand Down
Loading