[09:49:38] springle, how do you feel about whisky/beer/delete as applicable? [09:53:57] Ironholds: both work :) [09:55:03] lemme know when you're next going to be in SF; we owe you a drink or twelve. [09:55:42] sounds good! [11:20:15] (CR) Gilles: "Add the scripts where? To the root of this repo?" [analytics/multimedia] - https://gerrit.wikimedia.org/r/130325 (owner: Gilles) [11:27:28] (PS2) Gilles: Add percentile data to the maps [analytics/multimedia/config] - https://gerrit.wikimedia.org/r/130323 [11:34:00] (PS2) Gilles: Generate the maps for each site dashboard [analytics/multimedia/config] - https://gerrit.wikimedia.org/r/130327 [16:11:52] (CR) Gilles: [C: 2] Add build scripts to revision control, huzzah [analytics/multimedia] - https://gerrit.wikimedia.org/r/130392 (owner: MarkTraceur) [16:13:24] (CR) Gilles: [V: 2] Add build scripts to revision control, huzzah [analytics/multimedia] - https://gerrit.wikimedia.org/r/130392 (owner: MarkTraceur) [16:26:42] (PS2) Gilles: Generate geo data per site [analytics/multimedia] - https://gerrit.wikimedia.org/r/130325 [16:27:42] (CR) Gilles: "Don't forget to add build-geoperf-tsvs to your crontab :)" [analytics/multimedia] - https://gerrit.wikimedia.org/r/130325 (owner: Gilles) [17:11:55] ottomata: I updated this patch btw: https://gerrit.wikimedia.org/r/#/c/130499/2 [17:13:56] :) [17:13:58] just merged [18:16:03] (CR) MarkTraceur: [C: 2] Add percentile data to the maps [analytics/multimedia/config] - https://gerrit.wikimedia.org/r/130323 (owner: Gilles) [18:16:09] (CR) MarkTraceur: [V: 2] Add percentile data to the maps [analytics/multimedia/config] - https://gerrit.wikimedia.org/r/130323 (owner: Gilles) [18:32:19] (CR) MarkTraceur: [C: 2 V: 2] Generate geo data per site [analytics/multimedia] - https://gerrit.wikimedia.org/r/130325 (owner: Gilles) [19:27:48] (PS17) Milimetric: Add ability to remove cohorts from database. [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/119343 (owner: Terrrydactyl) [19:28:45] milimetric: did you add something to the change? [19:28:52] just rebased [19:28:57] oh okay [19:29:00] i always do that before reviewing in case you don't [19:29:24] let me know if that bothers you and you'd rather I ping you first so you can rebase [19:29:42] ah, no, i just didn’t understand what you did. rebase away [19:45:55] (CR) MarkTraceur: [C: 2 V: 2] Generate the maps for each site dashboard [analytics/multimedia/config] - https://gerrit.wikimedia.org/r/130327 (owner: Gilles) [19:51:03] (PS1) MarkTraceur: chmod +x the scripts, fix username [analytics/multimedia] - https://gerrit.wikimedia.org/r/130663 [19:51:21] (CR) MarkTraceur: [C: 2 V: 2] "Self merging, this is easy" [analytics/multimedia] - https://gerrit.wikimedia.org/r/130663 (owner: MarkTraceur) [19:56:35] (PS18) Milimetric: Add ability to remove cohorts from database. [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/119343 (owner: Terrrydactyl) [19:56:55] terrrydactyl: I messed up before about how strings concatenate in python [19:57:00] so I fixed it in another patchset^ [19:57:07] okay [19:57:09] let me know if that's ok and otherwise I'll merge [19:57:10] * terrrydactyl looks [19:59:18] milimetric: did you mean to leave the print statement in there? [19:59:28] ugh [19:59:29] :) [19:59:32] :) [20:00:05] (PS19) Milimetric: Add ability to remove cohorts from database. [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/119343 (owner: Terrrydactyl) [20:00:08] updated terrrydactyl [20:00:35] (PS1) MarkTraceur: Fix the geoperf generation script [analytics/multimedia] - https://gerrit.wikimedia.org/r/130666 [20:01:11] ottomata: Hey, you might want to /mode -c in here if you want grrrit-wm to look prettier [20:01:53] (CR) MarkTraceur: [C: 2 V: 2] Fix the geoperf generation script [analytics/multimedia] - https://gerrit.wikimedia.org/r/130666 (owner: MarkTraceur) [20:02:04] (CR) Milimetric: [C: 2] Add ability to remove cohorts from database. [analytics/wikimetrics] - https://gerrit.wikimedia.org/r/119343 (owner: Terrrydactyl) [20:02:21] i dont' know what /mode -c is [20:02:22] but! [20:02:22] 742: c ncslk :MODE cannot be set due to channel having an active MLOCK restriction policy [20:02:38] Well that's dumb [20:39:17] OOF, i'm a little stuck at the moment [20:39:21] anybody avail for a brain bounce? [20:39:24] qchris: maybe? [20:39:29] Sure. [20:39:56] i'm in the trap [20:40:58] it's a trap? [20:41:06] topic [20:41:08] gah [21:19:19] ori: wanna talk about data gaps? [21:32:23] milimetric: sure [21:32:40] now? [21:32:56] qchris: I'm sorry I dropped the ball and haven't replied to your note about the diagram [21:33:06] ori: no worries. [21:33:23] too much context switching :/ [21:33:28] :-) [21:34:15] ori: sure, if you have time [21:34:47] so i'm looking into the data that the mobile team needs to power their reportcard [21:34:56] it looks like things are missing starting after Apr. 27 [21:36:06] i have a script running on vanadium to iterate through the file logs from the 23rd to the 28th, inclusive, and to insert any missing records into the database [21:36:28] there's some context to share [21:37:04] so, when i first worked on the json->table mapping, i was debating whether or not to make the uuid column unique [21:37:18] i think i followed that discussion you had with sean [21:37:34] and your patch to make add a unique constraint [21:37:37] making it unique would have meant slower inserts, but also better guarantees about data integrity [21:37:41] * ori nods [21:38:26] so yeah, the column did not have a unique constraint before, but it does now. sean updated existing tables, and i updated EL code to ensure the constraint is declared on any tables created from now on [21:39:10] this basically means recovering data from the files into the db can be done by naively trying to insert every single event into the database, and just shrugging off the ones that are rejected as violating the uniqueness constraint on uuid [21:40:35] fair enough [21:40:56] but this doesn't explain why we're missing data starting with the 27th, i'm running some querries to figure that out [21:41:13] this is one example: http://mobile-reportcard.wmflabs.org/graphs/successful-edits-main [21:41:26] that graph used to just be broken (which made sense when the querries errored out) [21:41:28] which schema is that? [21:41:44] this is the messy select: [21:41:45] select Date(timestamp) as day, count(*) from (SELECT timestamp, wiki, event_username, event_action, event_namespace, event_userEditCount FROM MobileWebEditing_5644223 UNION SELECT timestamp, wiki, event_username, event_action, event_namespace, event_userEditCount FROM MobileWebEditing_6077315 UNION SELECT timestamp, wiki, event_username, event_action, event_namespace, event_userEditCount from MobileWebEditing_6637866 UNION SELECT [21:41:45] timestamp, wiki, event_username, event_action, event_namespace, event_userEditCount from MobileWebEditing_7675117) as MobileWebEditing where event_namespace = 0 and event_action = 'success' and wiki != 'testwiki' and timestamp > '20140427000000' group by day; [21:41:57] so my guess is MobileWebEditing_7675117 [21:42:58] so ok, easier to look at: [21:42:58] select Date(timestamp) as day, count(*) from MobileWebEditing_7675117 where event_namespace = 0 and event_action = 'success' and wiki != 'testwiki' and timestamp > '20140427000000' group by day; [21:43:08] that returns results only for 4/27 [21:43:58] are you querying db1047 or db1048? [21:44:05] 1047 [21:44:14] i don't have rights to 1048 i don't think [21:44:41] (21:43) eventlog@db1048.eqiad.wmnet:[log]> select Date(timestamp) as day, count(*) from MobileWebEditing_7675117 where event_namespace = 0 and event_action = 'success' and wiki != 'testwiki' and timestamp > '20140427000000' group by day; [21:44:41] +------------+----------+ [21:44:41] | day | count(*) | [21:44:43] +------------+----------+ [21:44:45] | 2014-04-27 | 2928 | [21:44:47] | 2014-04-28 | 2474 | [21:44:49] | 2014-04-29 | 2192 | [21:44:51] | 2014-04-30 | 2217 | [21:44:53] +------------+----------+ [21:44:55] 4 rows in set (0.59 sec) [21:44:59] that's db1048, so it's a replication issue [21:45:08] milimetric: 1047 is lagging also for s1 (in contrast to what icinga says). I would not trust 1047 data until all replication is done. [21:45:09] either db1047 hasn't fully caught up [21:45:48] s/icinga/ganglia/ [21:46:06] cool qchris, that makes sense [21:46:18] where is the script that is generating that query running? [21:46:25] is there a public way for people to know when replication is caught up? [21:46:59] ori: the script is generate.py in /a/limn-mobile-data on stat1003 [21:47:07] it should be near-instantaneous. if it is chronically behind then we need to rethink the setup [21:47:20] but what was the latest from sean -- did he indicate that the initial replication was done? [21:47:27] or are we still in a transition period? [21:47:37] IIRC, we're still in transition. [21:47:40] i got a little confused with sean's updates [21:47:51] but yeah, i think in-progress was the last thing I heard [21:48:22] so how do I see this lag in ganglia and can I point arthur / maryana at it? [21:48:44] (3. db1047 is finishing up reloading log data [...] from http://lists.wikimedia.org/pipermail/analytics/2014-April/001909.html ) [21:48:51] milimetric: You cannot. [21:48:55] ganglia says 0. [21:49:09] But looking at recentchanges table of enwiki says ~8hours. [21:49:28] ok, so I'll tell people to basically watch bug https://bugzilla.wikimedia.org/show_bug.cgi?id=64445 [21:49:29] (but that's lag for enwiki) [21:49:41] when that's closed, that will herald replication as caught-up [21:49:43] milimetric: yeah, and you can specify that the numbers on db1048 look sane [21:49:49] k, cool [21:49:54] thanks ori, I think we're good for now [21:49:59] I'll send this to all the lists actually [21:51:32] qchris: your diagram is accurate, save for a small typo ('statd' should be 'statsd') [21:51:49] ori: Thanks. [21:52:02] I'll fix the typo :-) [21:55:22] message sent [21:55:52] milimetric: looks great, thanks [21:57:04] : milimetric: 1047 is lagging also for s1 (in contrast to what icinga says). I would not trust 1047 data until all replication is done." [21:57:14] That is my experience as well [21:57:33] it is completely missing data for the (briefly deployed) TrackedPageContentSaveComplete schema [21:57:37] which went out Monday [21:57:52] that makes sense. A timestamp from one of the tables I was looking at was 20140427121000 [21:57:56] StevenW: The log table has still to catch up. [21:57:59] I wouldn't expect any data past then [21:58:13] s/table/database/ [21:58:15] (timestamp past which no data is seen) [21:58:56] Meh. I cannot type a straight sentence :-( Time for me to go to bed. [21:59:06] See you :-) [22:02:39] nite for me toooo [22:02:40] laters all [22:03:23] see you guys [22:43:44] Ironholds, tnegrin I am futzing with your slides a bit [22:44:18] Eloquence, so are we! [22:44:19] so are we :) [22:44:21] and the graphics are being replaced [22:44:23] heh [22:44:34] mind if we standardize on Open Sans as a font [22:44:44] fine with me [22:44:46] * Ironholds goes to see if ggplot2 supports Open Sans [22:46:00] (I'll just spam small edits I make here, feel free to revert) [22:46:01] either it does, or it's silently failing [22:46:22] let's be optimistic. [22:46:44] (changing "enhanced PV definition" to "refined pageview definition") [22:46:58] works for me [22:47:06] I didn't spend all that time in minecraft to not know how to refine things. [22:47:12] :) [22:47:25] what does "user agents being masked" mean? [22:55:33] Ironholds, do we really need the "Other" and "Unidentifiable" lines if they both amount to 0% in slide 11? [22:56:29] Eloquence, it will be explained! (it's 'I have an app, and I decide that only the name of the app should appear in the user agent, because I am the enemy of all that is good and true") [22:56:45] and not so much, but there's no elegant way of automatically removing them. I can do so if you want. [22:56:57] I would note that they don't amount to 0, though, they're just ~1 percent. [22:57:36] Ironholds, you mean an app that doesn't disclose the OS? [22:58:26] yup [22:58:39] or a browser that doesn't disclose the device.. [23:00:16] ok [23:00:32] Ironholds, would it be easy to have a chart that shows % of all traffic, phone vs. desktop vs. tablet? [23:01:00] yep, already built it to standardise on using percentages :) [23:01:16] ok, that would be nice to include [23:01:16] trying to insert it now, and running into google's smarmy "I'm sorry, but this is >3500px" error. Workin' round it. [23:01:19] ah [23:01:29] yeah, it's meant to be in the slide after the framing, but I deleted the old version to add the new'n in. [23:02:48] The charts otherwise look good to me. It would be nice to tweak the visuals a bit if you have time - the lines are very thin, the x-axis/y-axis annotations in light gray may be a bit hard to read, and the fonts are tiny [23:04:21] yeah, I spent the afternoon, in order, doubling the font size, removing the internal hatching, and increasing the size of the lines [23:04:35] sat down with the researchers at 2 to trial run the presentation and they spotted the same thing [23:04:52] now I just need to find a way to actually make the flaming things save at 1200x890px, which they are...resisting. [23:05:12] fun [23:06:57] ok, will take a final look later, thanks for all the work so far :) [23:07:48] resizing done