Skip to content

arthurgaillard/Churn_prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The dataset is composed of 18 columns of different types and covers connection information from Spotify customers, along with their behavior while using the platform.

• status (int64) Records the HTTP response code returned for each user request in the session logs. Main values are:

  • 200: request successfully processed,
  • 307: temporary redirect, often related to authentication or routing,
  • 404: resource not found, indicating failed or incorrect requests.

• gender (object) Gender of the customer.

• firstName (object) Customer’s first name.

• level (object) Subscription type of the customer, taking two values: paid or free.

• lastName (object) Customer’s last name.

• userId (object) Unique identifier of the customer on the app. One of the two columns required in the submission file.

• ts (int64) Timestamp (in milliseconds) corresponding to the moment when the user action occurred. It can be converted to a datetime for clearer interpretation.

• auth (object) Indicates the authentication status for each request. Takes two values:

  • Logged In: request performed by an authenticated user,
  • Cancelled: authentication attempt not completed or interrupted. This does not refer to subscription cancellation. Frequent Cancelled values may reflect login issues and could be linked to higher churn risk.

• page (object) Indicates the type of action performed on the platform. There are 19 possible values. The page column corresponds to the action that the user had on the platform. There are 19 kind of them. The value "Cancellation Confirmation" means that the user cancelled its susbscription NOTE : The "Downgrade" value in the page column indicates that a user has accessed or viewed the page where they can switch from a paid subscription to a free plan. This action often reflects a decrease in engagement or satisfaction with the service, making it an important behavioral signal when predicting churn. Users who navigate to the downgrade page are more likely to discontinue their subscription in the near future. NOTE :The "Thumbs Down" action represents a user giving negative feedback on a song or piece of content by clicking the thumbs-down button. This typically indicates dissatisfaction with what is being played and can reflect a declining user experience. A higher frequency of such events may suggest frustration or disengagement, making it a valuable behavioral signal in user-activity analysis.

• sessionId (int64) Unique identifier of the user’s session.

• location (object) Address from which the user connected, encoded as “City, ST” (US state abbreviation).

• itemInSession (int64) Position of the event within the user’s current session. Acts as a counter increasing with each user action. Useful for understanding interaction sequences and session structure.

• userAgent (object) Full HTTP user agent string describing the client software used to access Spotify. Includes browser, OS, and sometimes device type. Helps analyze usage patterns across platforms.

• method (object) HTTP request method, with only two values in the dataset:

  • GET: retrieve information from the server,
  • PUT: update information on the server. The balance between these may reflect user engagement.

• length (float64) Duration (in seconds) of the song or audio content for the event. Null values correspond to actions not involving music playback. These missing values are meaningful and indicative of non-listening interactions.

• song (object) Name of the song played. Null values indicate no song was played during the event.

• artist (object) Artist associated with the song. Null values also reflect non-listening interactions.

• time (datetime64[us] ) Date and time at which the user connected to the session.

• registration (datetime64[us] ) Account creation date and time. Useful for computing account age, an important predictor of churn, as newer users tend to churn more frequently.

Summary of column types:

  • 12 categorical columns (object)
  • 5 numerical columns (int or float)
  • 2 datetime columns

Note: the column ts should likely be converted to datetime as well.

In this dataset, the target variable is not provided directly and must be constructed from user behavior. A user is considered to have churned if they have visited the page "Cancellation Confirmation" at least once. This event corresponds to the explicit action of cancelling their subscription. Therefore, we define the target variable churn at the user level: it takes the value 1 for any user who has a “Cancellation Confirmation” event in their history, and 0 otherwise. Since each user appears on multiple rows in the log data, the churn label is assigned consistently to all the records associated with that user. This approach ensures that the target reflects user-level churn rather than individual event-level behavior

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors