[08:31:36] o/ [08:31:53] ottomata thanks for pointer! [08:35:02] dcausse inflatador flagging this thread on naming s3 vs swift labels in the flink-app Chart template https://phabricator.wikimedia.org/T375176 [08:35:23] gmodena: thanks! [11:22:17] lunch [14:15:36] \o [14:16:44] o/ [14:32:36] dcausse re https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1099727. Usually we rollout ESC changes during backport windows. Are you a mw deployer? Happy to help in case. [14:33:19] gmodena: yes, I'll schedule that for next Monday I think, thanks for the review btw! [14:33:58] dcausse cool! An np at all for the review. Anytime. [14:35:21] gmodena: relatedly I opened T382065 and attached a quick patch, unsure if the approach is correct so feedback is very much welcome :) [14:35:22] T382065: Add support for active/active double compute streams in the EventStreams HTTP service - https://phabricator.wikimedia.org/T382065 [14:37:17] dcausse ack. I'll f/up on phab. [14:42:13] thx! [14:53:06] errand [15:15:23] small CR to add the new wdqs host as a dsh target (so we can do scap deploys) if anyone has a chance to look: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1102874 [15:18:19] inflatador: fwiw I think that's exactly the kind of patch where it's fine to self-+2 :) [15:19:29] not sure i've got this right, session length graph is curious: https://phabricator.wikimedia.org/F57934899 [15:20:43] cdanis ACK, I should probably be more flexible on this kinda thing [15:24:14] ebernhardson: weird, what's the unit, munites? [15:25:43] dcausse: unit is seconds, i'm suspecting it's an artifact though, If i clip the lower bound at 0.1s it then looks like https://phabricator.wikimedia.org/F57935903 which is equally curious but different [15:26:05] well, maybe all those sessions are a single query and then leave, so the session length is basically one event? should check [15:26:59] new sessions start at 10min after inactivity? [15:27:19] this is last event dt - first event dt, but yes the data collection is 10 minutes reset each time they see results [15:28:08] oh, hmm. actually this is slightly wrong though, because i normalized and threw away checkin events, so it's sorta the session length but sorta different :S not sure if thats important [15:28:23] it would be different numbers, i guess its a question of if it's meaningful to include the page dwell time in session length [15:28:53] what i mean is the last event is probably a visitPage to the click, when there could have been a few more minutes of checkin events coming in [15:30:21] dcausse: oh, realized you asked a slightly differnt question. Yes the logic is that when a session ends the user can't start a new session for 10 minutes [15:30:42] single event session are 0 so the second graph does not include those I suppose? [15:30:56] what clip does is set all values less than 0.1 to 0.1 [15:31:08] ok [15:31:10] basically forces a minimum value [15:32:00] did you change the bucket in the second graph as well? [15:32:19] dcausse: oddly no, thats the same set of sessions bucketed the same [15:32:31] weird... [15:32:36] my best guess is something awkward happened in the bucketing algo with the zero values? [15:32:45] but that seems weird, 0 is pretty common to handle specially [15:34:55] hmm, yea they are basically all 1 event sesions. There are a couple sessions that managed to fit 3 events into <1s, but mostly 1 event sessions [15:36:23] school run, back in 20 [15:41:06] maybe we should exclude 1 click sessions here and possibly draw a distribution line instead of bars? but still curious to know if there's something fishy in the data when looking at the spike at https://phabricator.wikimedia.org/F57934899 [15:43:00] the spike is around 60s just guessing visually [16:01:06] annoyingly, when i tell it to add the kde (kernel density estimate) it complains that the input dataset should have multiple elements. Problems of farming out the analysis to a library :P [16:02:06] kde might have problems anyways as it assumes a continuous distribution, i guess if we drop the 1 event sessions it might become continuous though [16:16:18] hmm, digging into this there is actually a big bump of sessions at ~30s. Doing manual processing of the data, find (35, 30] has 201 sessions, (30, 35] has 1435 sessions, (35, 40] has 146 sessions. I suppose needs more investigation [16:16:51] err, (25, 30] has 201 [16:18:22] 1104 of those are (30, 31] [16:24:32] odd... unless this includes checkin events that could be emitted at fixed times IIRC? [16:29:19] no this excludes checkin events, the max checkin value is normalized into the visitPage event, and the checkins are dropped [16:44:17] "the max checkin value is normalized into the visitPage event" <- what does it mean? you augment the visitPage event with this max checkin value? [16:47:07] dcausse: yes, basically .withColumn('checkin', F.max(F.col('checkin')).over(Window.partitionBy('pageViewId').rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing)) [16:47:21] dcausse: and then drop the checkin events [16:48:00] and you use it to extend the session length? [16:48:31] no, that gets ignored for session length. So session length is serp/click/visitPage events only [16:49:19] then I have no clue why so many sessions are exactly 30 sec... seems suspicious? or maybe I misunderstood something [16:49:39] yea it does...i'm thinking maybe some sort of botting? trying to tease something out of data but not sure yet [16:49:55] sure [17:19:01] heading out [17:21:58] .o/ [17:40:07] aww, maybe if i ask nice? MemoryError: Unable to allocate 12.1 TiB for an array with shape (1666653000002,) and data type float64 [17:43:38] LOL [18:14:28] been poking at the data...i dunno :S An interesting bit, time to first click vs session length. So many clicks in <100ms: https://phabricator.wikimedia.org/F57958707 [20:57:52] err, hmm. It turns out, after way too long not noticing this column exists, there is also a `client_dt`. The weird bump completely goes away when using client_dt. Except generally we trust our own dt and not the client :P [20:58:13] It happens across browsers and operating systems...i dunno maybe some intrinsic part of event delivery? [20:59:50] i suppose, particularly for calculating deltas, it might be reasonable to trust the clients clock [21:12:30] I would think that most of the time the client's clock would be consistent, if not necessarily accurate. [21:25:39] yea, seems reasonable [21:26:33] other problem i noticed...i've been using means everywhere, but after plotting the distribution for session lengths noticed mean is 104, median is 12. The outliers are realling throwing things aronud, need to think about them [21:33:27] i guess to be fair, most of my graphs are boolean conditions and a mean is fine there :P