Finding User Sessions
The concept of a “user session” is central to many product’s analytics. The number of actions someone takes, how long someone is spending in the product, what actions lead to others. It’s all based around the concept of “user sessions”.
But what is a user session?
The answer depends. Some applications will define the end of a user session as the application being closed. Chat apps may have a user opening and closing the application several times but still have it all be in one user session. Lack of actions may define user sessions, even if the application is technically open the whole time.
Some tools will automatically define a user session for you, but there are cases that you’ll want to use raw event data to define your user sessions.
What is a user session?
A period of uninterrupted activity from a single user.
This may seem like a straightforward definition, but it actually has a lot of variables. What is a user? Is it someone logged in, anyone on a page?
What is uninterrupted activity? If someone has 5 minutes between activity, is that a new session? 10 minutes?If someone is inactive for 5 minutes, and then goes to a new page is that different than if they take action on the same page?
Answers differ from product to product, and may differ for the same product depending on what answers you are looking for. That’s why there is benefit in being able to define a user session manually, and have the logic of that easily exposed.
Define a User Session
We are going to define our users based on a unique identifier, in this case their email. Using Sigma parameters, we’ll set up a variable to determine the time between actions that will count as a new sessions.
Create a new parameter. Name it [Minimum Time Between Sessions (m)] and set the value at 10. You'll be able to easily update this number at any time.
Find the start of a user session
To do this analysis, your data needs two things: a unique user identifier, and time stamps for actions.
We’ll be looking at the gaps between actions for each user. When the gap is more than what we set as the minimum time between sessions, the next set of actions will count as a new session.
First, set up the data for analysis.
- Organize your data based on user. Create a new grouping defined by your unique user identifier, like user number or email.
- Organize the Timestamp data. Sort [Timestamp] ascending, which will put the first user events at the top of the list, and the most recent events at the bottom.
Now that our data is organized, we can start analyzing it.
Calculate the Time Between Each of the Events
Create a new Column to calculate the time between events. Call this [Time Between Events (m)]. Using the function DateDiff and the Window Function Lag we can calculate the time between the current event and the even right before.
DateDiff(“minute”, Lag([Timestamp]), [Timestamp])
Minute indicates we want the difference between the two dates to be reported in minutes. Lag([Timestamp]) puts the previous row’s Timestamp as the first date in the DateDiff function. [Timestamp] puts the current row’s timestamp as the ending date in the DateDiff function. This will give us the time in minutes between the current row’s timestamp and the previous row’s timestamp. The value for the first row of every group will be null because the Lag function is constrained to the windows defined in the User grouping.
Identify the Start of Each Session
Create a new column named [Start Session]. We will use the If function to determine which values to populated the new column with.
First we set up the logic that defines what counts as a new session. We’ll use our parameter [Minimum Time Between Sessions (m)] in the logic to determine if a timestamp is the beginning of a session. If the time between events is more than [Minimum Time Between Sessions (m)], we count that timestamp as the first of a new session.
The way we calculated the time between sessions means that the very first value for “time between sessions” will be null for each group. If the value is null, we print the value for [Timestamp].
If([Time Between Events (m)] > [Minimum Time Between Sessions (m)], [Timestamp], IsNull([Time Between Events (m)]) = True, [Timestamp])
This function will create a column where there is a value contained in every row that has a Timestamp that is the beginning of a user sessions, and a null value in all the other rows.
Map Each Event to Its Session
Now that we have a column with the timestamp values that start each session, we need to map the values to every event in the session.
Using FillDown we can easily print the the start of session time to every event in the user session. FillDown replaces a column’s null values with the closest prior non-null value. Create a new column [Start of Session].
You can now group by [Start of Session] to organize your data into user sessions.
To assign each session a session number, use the RowNumber function. Create a new column [User Session Number] on the level that is grouped by [Start of Session].
RowNumber([Start of Session])
Calculate Average Session Length
Now that we have the start time of each session, we can calculate several different metrics.
To calculate the length of a session, we will find the difference between the first and the last timestamp in each session. Create a new column [Length of Sessions (s)] under the grouping defined by [Start of Session]. We’ll calculate the length in seconds.
DateDiff(“second”, [Start of Session], Max([Timestamp]))
Now that we know the length of each session, we can aggregate the [Length of Sessions (s)] column to find the average length for each user. Create a new column under the grouping defined by [User Email].
Avg([Length of Session (s)])
Once you create this Workbook table, you can build all types of analysis on top of it. You can create aggregates to count types of actions in a session. You can create cohorts of users based on the number of user sessions. Plus answers to any other questions you have.