[07:50:45] Analytics-Tech-community-metrics: Mediawiki support to be added to GrimoireLab - https://phabricator.wikimedia.org/T138007#2451648 (Aklapper) >>! In T138007#2409146, @Lcanasdiaz wrote: > The Mediawiki support for Perceval is being finished this week. Does that mean this task is resolved by now? If so, feel... [08:21:41] (PS1) Addshore: Update path to the db ini file [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298421 (https://phabricator.wikimedia.org/T140064) [08:32:45] (CR) Addshore: [C: 2] Update path to the db ini file [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298421 (https://phabricator.wikimedia.org/T140064) (owner: Addshore) [08:36:24] morning joal ! [08:38:57] Hi addshore [08:39:30] addshore: How are you today? [08:40:33] Good, and yourself? [08:40:42] not bad :) [08:40:57] How would you feel about pushing the oozie job out today then? :D [08:40:58] Weather is gently summerizing :) [08:41:08] haha, it's gone the opposite way over here :/ [08:41:13] Arf :( [08:43:11] I guess you're atfer some news about a deploy on our side [08:45:58] Yup! I have no idea about the process etc! ;) [08:46:20] addshore: two separate deploys [08:47:04] addshore: one for refinery-source, then refinery [08:49:14] awesome! [08:49:22] also, joal, can you merge things in operations-puppet? [08:49:33] addshore: And for refinery-source you're inaugurating a new deploy process madhuvishy has impletemented [08:49:39] addshore: I don't [08:49:45] oooooh [08:50:03] addshore: I'm very sorry, you've inqugurating a few things with you patch :) [08:50:12] It's been fun! :D [08:50:24] addshore: the plan is for me and madhuvishy to deploy refinery-source later on today [08:50:50] addshore: Then if everything goes smooth, I'll do refinery tonight, if not probably tomorrow [08:51:44] awesome! [09:20:01] elukey: \o [09:20:30] hello!!! o/ [09:20:43] elukey: How is it for you today? [09:20:57] debugging mod_proxy_fcgi, a nightmare. You? :D [09:21:24] good, wondering about cassandra [09:21:33] good luck with mod_proxy [09:21:42] ahhaah [09:21:58] do you want to chat about cassandra? [09:22:42] elukey: not really, I think if urandom agrees, we'll wipe the cluster tomorrow and start loading the old way [09:22:57] :( [09:23:03] I didn't follow the conversation [09:23:07] as you say :( [09:23:35] elukey: nothing really special, urandom tried to boost compaction yesterday through settings, but I see no difference [09:23:57] elukey: What's really weird is that compaction is really stalled, no progress at all since almost a week [09:24:06] wow [09:24:47] starting the old way will take ages right? [09:25:09] elukey: will take time, but less than bulk so far ! [09:25:40] elukey: If we had started loading the old way instead of bulk last week, we'd currently be compacting the second month [09:26:22] yeah but a full year will take ~2 months? [09:26:33] elukey: probably [09:26:37] :( [09:26:57] elukey: But for the moment a full month has not even been compacted a week so ... [09:29:04] yes yes of course we need to take a decision about what to do otherwise we'll waste too much time [09:29:20] elukey: That's where I stand [09:29:39] it is sad that after all this awesome work you didn't get some good result as reward [09:30:10] elukey: And since no progress seems to be made, either we decide that bulk doesn't fit for us, either there is some parametering we can try, but I don't want to wait more with no progress being made from cassandra [09:32:27] yes +1, we need to take a decison after a deadline otherwise no progress will be made [09:33:12] elukey: For me the deadline is kinda today, in order to have a loading process/compaction running while I'm away this end of week [09:36:04] yep ok [09:36:19] * elukey suspect that joal is suggesting to wipe the cluster again [09:36:28] elukey: If not feasible, we'll do next week, but I'd rather have it moving [09:36:48] * joal is scared when elukey reads his mind [09:37:22] * elukey reassures joal since this is a side effect of working together daily [09:37:31] :D [09:37:46] jokes aside, I am going to wipe everything after lunch [09:37:47] is it ok? [09:37:57] I [09:38:01] elukey: Yes, let's do that [09:38:08] I'll also change the name of the cluster in puppet [09:38:30] from Analytics Query Service Test to Analytics Query Service NG [09:38:33] ok? [09:38:51] works for me [09:38:56] not sure if matters but better than Test [09:39:04] elukey: We can change that back to prod without wiping it after loading? [09:39:07] I suspect that we can't change it at runtime [09:39:21] hm [09:39:30] not sure it is a speculation [09:39:32] didn't check the docs [09:39:42] I just don't want to leave test in there :) [09:40:32] elukey: good for me:) [09:43:55] ok!! [09:44:08] I just understood a little part of mod_proxy_fcgi [09:44:11] my head is exploding [09:49:48] but I tracked down the 304 issue [09:49:54] it is a "break" in the code [09:49:56] :/ [09:50:06] elukey: :S [09:50:34] but the FCGI protocol is really nice [09:50:43] much more powerful than expected [09:51:00] for example it allows "streams" over the same FCGI connection [09:51:30] mod_proxy_fcgi does not support them (and I am not sure if streams are transparent to the applayer) but they resemble HTTP/2 streams [11:01:11] brb lunch! [11:18:17] Analytics, Analytics-Wikistats, Labs-project-wikistats: Design new UI for Wikistats 2.0 - https://phabricator.wikimedia.org/T140000#2452139 (Danny_B) [11:27:48] hey yall just a note that marco did a little experiment (might be useful to look at before wiping the cluster): https://ganglia.wikimedia.org/latest/graph_all_periods.php?h=aqs1004.eqiad.wmnet&m=cpu_report&r=4hr&s=by%20name&hc=4&mc=2&st=1468262986&g=cpu_report&z=large&c=Analytics%20Query%20Service%20eqiad [11:28:25] In his words "15:44:05 milimetric: unthrottling compaction on aqs1004, and double compactor concurrency" [11:29:33] He said it crawled a little faster after that: joal / elukey ^ [11:34:00] milimetric: o/ [11:34:19] yeah I think that joal looked to it but no substantial gain [11:34:41] but I'll wait for joal's confirmation before proceeding [11:34:54] it will take me ~30 mins so I can do it whenever you guys want [11:37:12] I think you can wipe, didn't mean to stop that, indeed I don't think the improvement was substantial [11:38:35] (PS1) Addshore: Big cleanup, reorder and document! [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298447 [11:41:01] (PS2) Addshore: Big cleanup, reorder and document! [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298447 [11:45:55] (CR) Hashar: "recheck" [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298447 (owner: Addshore) [12:25:21] joal: you there? [12:36:05] (CR) Addshore: [C: 2] Big cleanup, reorder and document! [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298447 (owner: Addshore) [12:36:25] (Merged) jenkins-bot: Big cleanup, reorder and document! [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298447 (owner: Addshore) [12:45:11] (PS1) Addshore: Refactor how the simple config file is accessed [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298465 [12:45:53] (CR) Addshore: [C: 2] Refactor how the simple config file is accessed [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298465 (owner: Addshore) [12:46:13] (Merged) jenkins-bot: Refactor how the simple config file is accessed [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298465 (owner: Addshore) [12:47:35] Ah nice yesterday's patch for Hadoop log retention cut in half most of the node managers heaps! https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=17&fullscreen [13:01:48] Analytics, Revision-Slider, TCB-Team, WMDE-Analytics-Engineering, and 3 others: Data need: Explore range of article revision comparisons - https://phabricator.wikimedia.org/T134861#2452497 (Addshore) [13:02:55] morrning [13:03:00] NICE! [13:03:19] elukey: you know i think there may have been 2 nodemanagers that didn't get restarted [13:03:27] i ran a salt command with puppet --test && ...restart [13:03:37] and i think a coule pof the puppet --test didn't run because puppet was already running [13:03:47] ah yes that explain why we have different heap sizes! [13:03:49] hmmm, elukey but. i don't see any reduction in lots though [13:03:55] still 31T [13:04:21] ok elukey cool, now I can see which two didn't restart! [13:04:24] gonna restart those two now [13:04:45] super! [13:04:46] looks like 3 actually [13:04:50] I was about to say the same [13:05:30] !log restarting nodemanagers on analytics 1039 1046 and 1054 [13:05:38] morning ottomata ! [13:05:43] mornin [13:07:22] addshore: is it working? [13:07:26] yup! [13:07:35] thought I have 1 more thing for puppet for you to merge [13:07:40] oh i see a patch [13:07:41] looking [13:07:51] I rearranged some of the scripts and added docs, and generally made things a bit nicer [13:07:58] addshore: if you like, you can do ensure => 'latest' [13:08:06] then you don't have to make puppet patches whne you change your codebase [13:08:11] OR [13:08:14] we could use scap [13:08:16] maybe [13:08:21] to deploy your codebase [13:08:24] instead of having puppet clone it [13:08:34] yeh, but in this case it would have broken, as the cron scripts would have had to get updated at the same time [13:08:44] * addshore doesn't know anything about scap ;) [13:08:46] oh? [13:08:51] the cron scripts are updated? [13:08:53] oh [13:08:57] daily, etc. [13:09:00] yeh! as I moved things :) [13:09:05] addshore: you could put daily and minutely in your repo [13:09:33] I could do, that is one of the things I thought of today [13:09:41] just as erb files and then get puppet to grab them [13:09:44] hmm [13:09:46] not as erb files [13:09:48] as a bash script [13:09:51] that takes an arguemnt [13:09:53] $base_path [13:09:53] or something [13:09:56] hmm, yeh! [13:10:00] or $scripts_dir [13:10:07] then puppet just passes it in when it makes the cron job [13:10:26] yeh, that would make sense :) I'll see if I can some up with something in the coming days! I'm just super glad it's all puppetized now :D [13:10:27] addshore: i'll go ahead and merge this, if you want to make those changes am happy to help then too :) [13:10:29] k np [13:12:10] ottomata: heap sizes dropping after restart :) [13:13:13] nice [13:13:25] addshore: merged and puppet ran [13:13:30] awesome! :) [13:13:41] ottomata: I'm going to go ahead and make a patch meaning I can sudo as the user too! [13:14:12] yeah, ok, that's going to take some ops discussion, but i think it will be fine eventually [13:14:14] addshore: do you know how? [13:14:23] you gotta make a new group in admin module data.yaml [13:14:39] maybe [13:14:50] I think I should be able to spot all of the individual bits! [13:14:52] 'wmde-users' or 'wmde-admins' or something [13:14:53] k [13:14:54] cool [13:15:02] you will have to file a phab ticket for that [13:15:04] explain everything [13:15:09] okay! [13:15:43] add is this hte path you want? [13:15:44] /a/analytics-wmde/src/scripts/src/wikidata/site_stats/page_size.php [13:15:48] doesn't exist afaict [13:16:39] hmm, it exists in the latest commit in the scripts repo! [13:17:03] hmm yeah it did not check ou t the commit you specificed [13:17:03] hm [13:17:15] in the repo it is src/wikidata/site_stats/page_size.php [13:17:21] oooooh [13:17:41] i see it at 73c88575345d63115230a6f4ca7c75852fb735f0 [13:17:46] running puppet agian just to double check [13:19:05] Analytics: Create ops dashboard with info like ipv6 traffic split - https://phabricator.wikimedia.org/T138396#2452597 (faidon) Per day, like Google's, would probably be more interesting, but per month would do too. [13:35:45] Analytics-Features, Project-Admins: Create new project "Analytics-Features" - https://phabricator.wikimedia.org/T863#2452762 (Danny_B) [13:45:18] ottomata: any ideas about checking out the right version? [13:45:40] ah sorry, got distracted... [13:45:57] dunno why puppet isn't doing it [13:45:59] Hey elukey [13:46:04] so did I :D Until I noticed icinga showing one of the checks as unknown ;) [13:46:15] sorry, got caught at home [13:46:29] oh addshore [13:46:42] git::clone doesn't support ensure -=> sha [13:47:13] Oh, thats lame, my brain must have assumed that it did from using ansible! [13:47:35] it supports a branch though [13:47:39] elukey: you can wipe when you want, I have not seen any drastic change [13:47:42] if you want to do that [13:47:48] your choices for ensure are [13:47:52] absent, present, or latest [13:47:58] hmm, I could have a production / deploy branch I guess? [13:48:02] ja you could [13:48:24] branch => 'deploy', ensure => 'latest' [13:48:24] Right, I'll do that! Is there a convention as to if it should be called deploy or production? [13:48:29] aweosme! [13:48:30] naw, up to you [13:48:46] joal: all right! going to finish packaging varnishkafka and then I'll start [13:48:53] awesome :) [13:49:35] ottomata: patch is up, let me just make the branches quickly! [13:49:50] milimetric, mforns : let me know when you have a minute for a last pass over schemas [13:49:58] ottomata: Hello :) [13:50:09] Analytics-Kanban, EventBus, Patch-For-Review: Propose evolution of Mediawiki EventBus schemas to match needed data for Analytics need - https://phabricator.wikimedia.org/T134502#2452813 (Ottomata) In EventBus meeting yesterday we decided to remove the `sha1` visibility boolean. It doesn't provide an... [13:50:15] ottomata: done! [13:50:19] joal: hi! [13:50:26] ottomata: I think when stat1002 got restarted the pivot and caravel servers didn't ;) [13:50:31] ottomata: Would you mind? [13:51:04] oh! [13:51:05] hi milimetric I'm here [13:51:07] they certinaly wouldn't have [13:51:09] hm [13:51:12] oh no, it was joal [13:51:18] huhu :) [13:51:28] Hi mforns [13:51:33] hi joal :] [13:51:57] I wanted to have last pass over the schemas if you have a minute (with milimetric would be better) [13:52:19] I have time, will be here until standup, just ping me [13:52:52] joal: try pivot now [13:52:55] stat1002 9090 [13:53:34] ottomata: works, but no datasource configured :( [13:53:56] ? [13:53:58] mforns: ok great, no news from milimetric yet :) [13:54:08] ok [13:54:26] hm [13:54:26] ottomata: server respond but it says 'no datasource configured yet' [13:54:38] k [13:54:39] ... [13:54:39] i see [13:57:03] Hey mforns, just saw Dan's message ... I think we can go the two of us [13:57:14] mforns: batcave? [13:57:20] joal, oh yea, he is not feeling well [13:57:21] ok [13:57:23] omw [13:57:57] Analytics-Volunteering, Developer-Relations, Project-Admins, Blocked-on-Analytics, Need-volunteer: Analytics-Volunteering and Wikidata's Need-Volunteer tags; "New contributors" vs "volunteers" terms - https://phabricator.wikimedia.org/T88266#2452902 (Danny_B) [13:58:14] wops got it joal better now [13:58:30] addshore: that worked too [13:58:34] now at 60aa970b0ae8ccb82034b8972775cd3bf6f15b1e [13:58:39] :) [13:59:54] aww joal, the caravel sqlite db was in /tmp [13:59:54] so its gone [13:59:59] all the stuff you did :( [14:01:29] ottomata: Arf [14:02:10] arf [14:06:05] sorry ottomata one more! https://gerrit.wikimedia.org/r/#/c/298477/ [14:06:13] joal: caravel back up on 9091 [14:06:17] db now in my homedir [14:06:22] ottomata: thanks ottomata :) [14:06:23] inited with our cluster info [14:06:34] didn't load the examples, lemme know if you want them [14:06:40] figured you'd just want to play with our pageview stuff [14:06:51] And now I'll go and work on the thing so these changes don't have to be in the puppet repo! [14:07:43] That great ottomata thanks [14:08:24] ottomata, hey :] do you have rights to delete wiki pages? [14:09:12] uhhhh [14:09:16] on wikipedia? doubtful [14:09:22] ottomata, on testwiki [14:09:26] dunno! [14:09:34] mforns: what about beta.wmflabs.org [14:09:34] ? [14:09:55] there for sure [14:10:28] ottomata, the database for beta is in analytics-storage? I can not find it, or the research user does not have rights to access it... [14:10:35] no, its in labs [14:10:38] I see [14:10:39] in deployment-prep [14:11:02] i've never used testwiki, so i doubt i have rights there [14:11:07] ok [14:12:09] joal: did we say user_id_target etc. was better? [14:12:20] ottomata: I don't recall so [14:12:21] ottomata, how can I access the db in deployment-prep? [14:13:05] ottomata: I explained why we chose to go for differenciating names, but I can't recall having agreed on user_target [14:13:15] ottomata: On the other hand, so many things were discussed ! [14:13:23] i'm not sure why we chose, just that dan and mforns maybe liked it better [14:13:34] buuuut, not sure why [14:13:46] mforns: not totally sure, looking [14:13:48] in the mean time [14:13:53] ottomata: I removed sha1, now we're triple checking with mforns on the page_creation_timestamp/page_first_rev_id [14:13:54] what's your opinion on field names like [14:14:02] user_id_blocks_changed [14:14:04] and [14:14:26] vs [14:14:31] user_id_target [14:14:33] ottomata: I don't have a strong one [14:14:47] ottomata, I pointed to use user_id_blocks_changed, to be consistent with the user_id_groups_changed [14:15:00] mforns: why not user_id_target for both? [14:15:01] but I think both are too long names [14:15:08] ottomata: there are advantages for both ... Maybe using target, since it's the same in multiple places, makes it easier [14:15:10] yes, I would agree [14:15:24] it seems to make sense, will be more consistent for future schemas too [14:15:24] it's simpler [14:15:29] sure [14:15:36] there is a user_id that is performing and action, and a possible user that the action is being performed on [14:15:40] okm let's go for user_[id|text]_target :) [14:15:40] ok cool [14:15:51] ok, joal i'm modifying a few things now, will add that and amend your patch [14:16:05] ottomata: as you wish, I have aptch on the fly as well [14:16:40] ottomata: we also agreed on database_name instead of wiki_database [14:16:55] oh ok [14:16:59] ok cool [14:17:00] ottomata: And I think with that you have everything :) [14:17:11] mforns: i don't know how to access the db. i see two instance in depl-prep [14:17:13] db1 and db2 [14:17:20] but sudo mysql needs a pw [14:17:21] don't know it [14:17:24] except 1 field we want to add in the page_delete event on which we haven't find a name yet [14:17:25] you will have to ask rel-eng folks [14:17:32] ottomata, don't worry we'll find another way [14:17:37] ok, yes [14:17:42] mforns: not sure what you are doing, but you can test in mw vagrant, no? [14:17:48] mmmm, good [14:18:32] thanks ottomata :] [14:21:44] joal: can we expect that ALL mw generated eventbus events will have the 'database_name' field [14:21:45] ? [14:21:45] joal, ottomata: wiping the AQS cluster [14:21:58] aqs100[456] [14:22:01] :P [14:22:02] elukey: ok [14:22:03] :) [14:22:19] ottomata: Probably only the mediawiki oners [14:22:22] there is a createEvent function in the EventBus extension that populates the meta info [14:22:33] i could DRY up the database_name there [14:22:37] even though its not part of meta [14:22:46] ottomata: hm, can't say [14:22:53] but then we'd need to enforce that all EventBus extension schemas have that field [14:22:55] ottomata: https://gerrit.wikimedia.org/r/#/c/298456/ - I am changing the name to Analytics Query Service NG [14:23:11] ottomata: sorry to bug you but could you quickly +2 https://gerrit.wikimedia.org/r/#/c/298477/? :) [14:23:19] ottomata: for instance, error, change-prop, resource-change etc, they probably don't want database_name [14:23:23] this can be done before creating the cluster so if you guys don't like it let me know now :P [14:23:28] elukey: did you mean to change cdh submodule in that patch? [14:23:33] also, will 'NG' be permanent? [14:25:13] thanks! [14:25:48] ottomata: nono that one is for AQS [14:25:53] it is the cassandra name [14:25:56] right [14:25:57] cluster name [14:25:57] but, i mean [14:26:05] will we be stuck with it forever, or will you change it back later? [14:26:16] I think that we will not able to change it [14:26:19] hm [14:26:24] then I don't like it :p [14:26:30] what about next time we do this? NNG? [14:26:55] Well I am happy with every name [14:27:00] haha [14:27:08] :P [14:27:18] so, you need to call it something different than just Analytics Query Service, because that currently conflicst with 100[123]? [14:27:20] correct? [14:28:02] joal: indeed, about database_name [14:28:07] but those events are not created by the eventbus mwextension [14:28:23] mobrovac: what do you think? [14:28:24] qq [14:28:25] yeah [14:28:41] yt? [14:28:44] so I am reading that THEORETICALLY it is possible to change the name of the cluster on the fly [14:28:57] elukey: how about just something that makes sense but doesn't conflict [14:28:57] like [14:28:58] via cqlsh first then editing the cassandra yaml [14:29:05] 'Analytics Query Service Storage' [14:29:05] ? [14:29:07] heh [14:29:22] looks nice [14:29:39] all right updating the code review [14:32:10] ottomata: https://gerrit.wikimedia.org/r/#/c/298456/3 [14:32:37] +1 [14:32:38] :) [14:32:43] gooood [14:41:10] hey ottomata [14:41:29] ottomata: We have the new field for the page_delete event [14:41:38] ottomata: You probably won't like it ;) [14:41:58] ottomata: move_over_redirect_page_id: integer - Not mandatory [14:42:31] ottomata: If present, This field contains the page_id of the redirect overwrittwen by the move [14:46:25] haha [14:46:34] wait, before I ask about what the heck. [14:46:46] is that instead of page_creation_dt [14:46:47] ? [14:46:50] or do you still need that too? [14:47:03] we need both ottomata !!! [14:47:10] ok, lemme ask a q about that first too [14:47:10] ottomata: batcave for the heck? [14:47:13] yeah! [14:59:37] (PS1) Addshore: Move cron files from puppet to this repo [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298486 [15:00:10] (PS2) Addshore: Move cron files from puppet to this repo [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298486 (https://phabricator.wikimedia.org/T140095) [15:00:35] nuria_, having problems joining... [15:03:07] mforns: we're in batcave with ottomata doing some tests on mediawiki if you're interested [15:03:16] joal: new cluster up and running [15:03:22] elukey: Yay ! [15:03:27] Thanks a lot elukey [15:05:44] elukey@aqs1006:/var/log/cassandra$ nodetool-a describecluster [15:05:45] Cluster Information: Name: Analytics Query Service Storag [15:06:01] sorry should be Name: Analytics Query Service Storage [15:06:03] goooood [15:06:07] everything is up [15:06:27] ottomata: I'm thinking https://gerrit.wikimedia.org/r/#/c/298486/2 and https://gerrit.wikimedia.org/r/#/c/298487/4 should do it if your still free for a review! [15:16:16] (CR) Addshore: [C: 1] Move cron files from puppet to this repo [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298486 (https://phabricator.wikimedia.org/T140095) (owner: Addshore) [15:17:08] (PS3) Addshore: Move cron files from puppet to this repo [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298486 (https://phabricator.wikimedia.org/T140095) [15:17:18] (CR) Addshore: [C: 2] Move cron files from puppet to this repo [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298486 (https://phabricator.wikimedia.org/T140095) (owner: Addshore) [15:21:06] (Merged) jenkins-bot: Move cron files from puppet to this repo [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298486 (https://phabricator.wikimedia.org/T140095) (owner: Addshore) [15:30:14] ottomata: looks like we are at 30T of logs so nothing is been deleted yet, I think we should make the setting small to start (12 hrs) to make sure it is actually being used [15:35:05] (PS1) Addshore: Fix execute permissions [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298496 [15:35:42] (PS2) Addshore: Fix execute permissions [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298496 [15:35:50] (PS1) Addshore: Fix execute permissions [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298497 [15:36:04] Analytics-Cluster, Analytics-Kanban: Procure hardware for future druid cluster - https://phabricator.wikimedia.org/T116293#2453438 (Nuria) Open>Resolved [15:36:06] (CR) Addshore: [C: 2] Fix execute permissions [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298497 (owner: Addshore) [15:36:09] (CR) Addshore: [C: 2] Fix execute permissions [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298496 (owner: Addshore) [15:36:20] Analytics-Kanban: Page History: design algorithm to reconstruct page history - https://phabricator.wikimedia.org/T138851#2453441 (Nuria) Open>Resolved [15:36:22] Analytics-Kanban, Patch-For-Review: Extract edit oriented data from MySQL for small wiki - https://phabricator.wikimedia.org/T134790#2453442 (Nuria) [15:36:36] Analytics-Kanban: User History: design algorithm to reconstruct page history - https://phabricator.wikimedia.org/T138859#2453445 (Nuria) Open>Resolved [15:36:38] Analytics-Kanban, Patch-For-Review: Extract edit oriented data from MySQL for small wiki - https://phabricator.wikimedia.org/T134790#2277147 (Nuria) [15:37:10] (Merged) jenkins-bot: Fix execute permissions [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298496 (owner: Addshore) [15:37:13] (CR) jenkins-bot: [V: -1] Move cron files from puppet to this repo [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298498 (https://phabricator.wikimedia.org/T140095) (owner: Addshore) [15:37:16] (PS2) Addshore: Move cron files from puppet to this repo [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298498 (https://phabricator.wikimedia.org/T140095) [15:38:02] Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Fix hive-metastore vs libmysql-jar race condition when provisioning new hive metastore server - https://phabricator.wikimedia.org/T133198#2453463 (Nuria) Open>Resolved [15:38:15] Analytics-Kanban: Retention metric research - https://phabricator.wikimedia.org/T138611#2453464 (Nuria) Open>Resolved [15:38:50] Analytics-Kanban, Patch-For-Review: Extract edit oriented data from MySQL for small wiki - https://phabricator.wikimedia.org/T134790#2453466 (Nuria) [15:38:52] Analytics-Kanban, Patch-For-Review: User History: write scala for user history reconstruction algorithm - https://phabricator.wikimedia.org/T138861#2453465 (Nuria) Open>Resolved [15:39:28] Analytics-Cluster, Analytics-Kanban, Patch-For-Review: Java OOM errors kill Hadoop HDFS daemons on analytics* - https://phabricator.wikimedia.org/T139071#2453470 (Nuria) Open>Resolved [15:47:23] milimetric: Hey buuuddy, https://edit-analysis.wmflabs.org/multimedia-health/#projects=commonswiki/metrics=Uploaders hasn't uploaded for a while, James_F says. What up with that? [15:47:51] a while = about five months [15:48:00] And I mean "updated" [15:48:14] Hi marktraceur, milimetric is off today [15:48:26] Oh, no problem. Do you have any insight into the above? [15:48:55] I remember doing a lot of gymnastics with config files so I would never have to touch the data again, and now I'm questioning the utility of it [15:48:56] marktraceur: I don't unfortunately [15:49:00] Shoot. [15:49:05] :( [15:51:14] Analytics-Kanban: Respawn the schema/field white-list for EL auto-purging {tick} - https://phabricator.wikimedia.org/T135190#2453551 (mforns) The new white-list including the new data for the modified schemas since Aug 2015 is done. Here's the file: {F4266082} [15:56:50] (PS1) Joal: Update changelog for version 0.0.32 [analytics/refinery/source] - https://gerrit.wikimedia.org/r/298501 [15:57:08] I'll file a ticket. [15:57:41] hey nuria_ : can you here me in batcave? [15:58:54] Analytics, Multimedia: Multimedia analytics cron job not been running for a few months? - https://phabricator.wikimedia.org/T140121#2453599 (Jdforrester-WMF) [15:59:02] marktraceur: ^^ [15:59:14] Thanks James_F [15:59:21] I don't think it's literally a cronjob, though [15:59:29] I can't remember how it's supposed to work [16:00:09] ottomata: standdduppp [16:00:29] a-team: standddupppp [16:05:47] ottomata: you are frozen man [16:06:04] marktraceur: let's look at that after our standup, in 30 mins [16:06:07] Analytics, Research-and-Data-Archive, Research-consulting, Research-management: Draft announcement for wikistats transition plan - https://phabricator.wikimedia.org/T128870#2453702 (DarTar) [16:06:12] Done. [16:17:19] Analytics, Research-and-Data-Archive, Research-consulting, Research-management: Draft announcement for wikistats transition plan - https://phabricator.wikimedia.org/T128870#2453816 (ggellerman) Open>Resolved [16:21:57] Analytics-Cluster, Analytics-Kanban, Operations, ops-eqiad: analytics1049.eqiad.wmnet disk failure - https://phabricator.wikimedia.org/T137273#2453855 (Cmjohnson) It appears that a disk was in a foregin cfg mode. Cleared the foreign cfg and cache. Added the disk group back. [16:22:50] Analytics-Cluster, Analytics-Kanban, Operations, ops-eqiad: analytics1049.eqiad.wmnet disk failure - https://phabricator.wikimedia.org/T137273#2453876 (Cmjohnson) Return shipment of the first disk FEDEX 9611918 2393026 70283562 UPS 9202 3946 5301 2421 0335 48 [16:23:15] Analytics, Fundraising-Backlog, Blocked-on-Analytics, Fundraising Sprint Licking Cookies, and 2 others: Clicktracking data not matching up with donation totals - https://phabricator.wikimedia.org/T132500#2453878 (awight) @CCogdill_WMF Sorry to keep you waiting on this! I guess the workaround is... [16:24:17] madhuvishy: we are having staff earlier... [16:27:48] Analytics, Fundraising-Backlog, Blocked-on-Analytics, Fundraising Sprint Licking Cookies, and 2 others: Clicktracking data not matching up with donation totals - https://phabricator.wikimedia.org/T132500#2453905 (Jgreen) >>! In T132500#2453878, @awight wrote: > @CCogdill_WMF > Sorry to keep you... [16:38:57] Analytics, Editing-Analysis, Notifications, Collab-Team-Q1-July-Sep-2016: Numerous Notification Tracking Graphs Stopped Working at End of 2015 - https://phabricator.wikimedia.org/T132116#2453972 (jmatazzoni) [16:42:54] Ping nuria_ [16:43:12] marktraceur: yes, did you look at whether your queries might be timing out? [16:43:18] marktraceur: let's look at that 1st [16:44:14] I haven't looked at anything yet, but I can try to pull up what I remember... [16:44:42] Like I said, I specifically tried to build something I would never have to look at again, and promptly left it alone for 5 months [16:45:26] (PS1) Addshore: Stop hardcoding graphite and statsd hosts [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298520 (https://phabricator.wikimedia.org/T140081) [16:46:17] (PS1) Addshore: Stop hardcoding graphite and statsd hosts [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298521 (https://phabricator.wikimedia.org/T140081) [16:48:45] (PS2) Addshore: Stop hardcoding graphite and statsd hosts [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298521 (https://phabricator.wikimedia.org/T140081) [16:49:08] (PS2) Addshore: Stop hardcoding graphite and statsd hosts [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298520 (https://phabricator.wikimedia.org/T140081) [16:49:45] (CR) Addshore: [C: 2] Stop hardcoding graphite and statsd hosts [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298520 (https://phabricator.wikimedia.org/T140081) (owner: Addshore) [16:51:08] joal: aloha! AQS is complaining about no data :P [16:51:21] should I silence it or will it clear in a bit after the first load? [16:51:40] (PS3) Addshore: Move cron files from puppet to this repo [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298498 (https://phabricator.wikimedia.org/T140095) [16:51:47] (CR) Addshore: [C: 2] Move cron files from puppet to this repo [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298498 (https://phabricator.wikimedia.org/T140095) (owner: Addshore) [16:51:58] (PS3) Addshore: Stop hardcoding graphite and statsd hosts [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298521 (https://phabricator.wikimedia.org/T140081) [16:52:04] (CR) Addshore: [C: 2] Stop hardcoding graphite and statsd hosts [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298521 (https://phabricator.wikimedia.org/T140081) (owner: Addshore) [16:54:16] ottomata: I have piled 2 patches up in the scripts repo now depending on https://gerrit.wikimedia.org/r/#/c/298487/ (waiting for you) which should move most of the future changes out of ops-puppet and into the scripts repo! [16:55:50] cool, will get to it addshore, sorry had meetings and eatin lunch [16:56:22] awesome :) In theroy I think the 2 patches will just merge themselves in the scripts repo after the puppet one gets a +2 (i think) [16:57:14] nuria_: I don't think I ever knew how to check the output of my queries except asking Dan [16:57:35] marktraceur: do you have permits to access 1002 [16:57:42] nuria_: I do! [16:58:11] marktraceur: then queries can be tested on that machine, let's look what query is updating the files that are stale [16:58:45] Hm, maybe not [16:58:53] nuria_: I think I have access to 1003 but not 1002, shoot. [17:00:28] I know the data goes to https://datasets.wikimedia.org/limn-public-data/metrics/multimedia-health/ at least. [17:01:12] marktraceur: let me see if you can access db from 1003, i do not use that one that much but the idea is that looking at queries in 1002 should not be hard [17:01:35] nuria_: is staff still on? sorry I was up super late last night - found out we have to move houses again - lots of drama [17:01:51] joal: Hi! Do you want to deploy refinery-source? :) [17:02:49] madhuvishy: we are done, announcements are about us having a joined goal with traffic team next quarter and CTO stattus [17:02:59] madhuvishy: https://etherpad.wikimedia.org/p/analytics-staff-meeting [17:03:02] nuria_: okay [17:03:13] madhuvishy: it was short, 30 minutes after standup [17:03:27] marktraceur: please file a request for access to 1002 [17:03:31] nuria_: yeah I guessed [17:03:41] Uh, k [17:04:14] nuria_: if it's mysql both 1003 and 1002 should work fine [17:04:42] nuria_: Do you know the tag for that off the top of your head? [17:05:39] marktraceur: what exactly do you need to access? [17:05:50] (sorry if you've already said this) [17:05:52] madhuvishy: I'm not sure, I'm trying to figure out why our queries aren't running [17:05:58] ah [17:06:06] madhuvishy: I think nuria wants me to look at the query logs on stat1002 which I don't have access to? [17:06:10] mysql? [17:06:16] interesting [17:06:18] okay [17:06:35] It's supposed to be automatically generated by something, but I always forget the names of things in the analytics world [17:06:56] hmmm [17:07:15] the equivalent of statistics-users for 1002 is statistics-privatedata-users [17:07:28] https://wikitech.wikimedia.org/wiki/Analytics/Data_access#Access_Groups [17:08:29] joal: Let me know about deployment - I'm around [17:09:15] Thanks madhuvishy [17:09:23] madhuvishy: the tag in phabis "access-request" right? [17:09:35] madhuvishy: ah sorry, did not see response [17:10:07] > A three business day waiting period must be observed after the request is filed [17:10:08] marktraceur: ok, access will take 3-5 days once you requested it but that way you are set for when this happens gain [17:10:10] *again [17:10:13] OK then [17:10:26] I mean, assuming I remember what we do this time [17:10:34] marktraceur: let's try to see if we can see what is wrong now [17:11:31] yes tag on phab is Ops-Access-Requests [17:12:23] I filed the request, so we're good to go on that front [17:13:21] joal: added one week of downtime for AQS and acked the alarms [17:17:40] all right team logging off! [17:17:46] byyeee [17:18:46] marktraceur: this is your config: https://github.com/wikimedia/analytics-limn-multimedia-data/blob/master/multimedia/config.yaml#L32 [17:18:55] Sounds right [17:19:07] marktraceur: is there anything else? cause thus far i see just this one config file [17:19:26] nuria_: I don't think there's much else, no. It gets run through some system that Dan has [17:19:35] I really wish I could remember the name of it. [17:19:48] marktraceur: well, this is teh sytem, correct? https://wikitech.wikimedia.org/wiki/Analytics/Reportupdater#Queries_and_scripts [17:19:59] nuria_: I guess there's the SQL queries there. Ah, yeah, ReportUpdater, that was it [17:20:57] And there are two commits from mforns that I don't recall seeing, maybe they have something to do with this? [17:23:46] Analytics-Kanban: Respawn the schema/field white-list for EL auto-purging {tick} - https://phabricator.wikimedia.org/T135190#2454245 (mforns) Note, I forgot to add the editCountBucket fields that were the result of the bucketization. Here is the white list that includes those fields, the previous one is inco... [17:26:08] marktraceur: i do not see anything on the logs , but i think i am not looking at the right logs let me ping mforns [17:26:08] Analytics, DBA: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#2454271 (mforns) @jcrespo I finished updating the white-list. Sorry for the delay, I needed to make sure that changes to schemas between 2015's audit and today were included and that the respective schem... [17:27:18] elukey: I will show you how to insert that data :) [17:27:23] Analytics, DBA: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#2454286 (mforns) a:jcrespo @jcrespo Please, let me know if the white-list is what you expected and what can I do to help you in the next steps. Thanks! [17:27:23] Hi madhuvishy ! [17:27:29] joal: Hi :) [17:27:29] I'm ready for a deploy indeed :) [17:27:33] awesome [17:28:07] joal: so, https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Refinery-source [17:28:17] I was actually reading this :) [17:28:18] as usual, changelog step is first [17:28:40] madhuvishy: https://gerrit.wikimedia.org/r/#/c/298501/ [17:28:50] I had my homework done ;) [17:29:08] (CR) Madhuvishy: [C: 2 V: 2] Update changelog for version 0.0.32 [analytics/refinery/source] - https://gerrit.wikimedia.org/r/298501 (owner: Joal) [17:29:14] joal: :D [17:29:16] okay done [17:29:30] do you want to do step 3? [17:30:05] joal: the SCM tag is the only gotcha on that step [17:30:21] madhuvishy: first, I logged in jenkins :) [17:30:29] madhuvishy: I'll update the doc as we move in [17:30:35] okay [17:31:59] madhuvishy: Currently doing step 3, everything looks good :) [17:32:13] madhuvishy: I'll change the SCM tag to: v0.0.32 [17:32:18] madhuvishy: correct? [17:32:21] joal: yup [17:32:28] Great, doing ! [17:33:22] I see job launched [17:33:42] madhuvishy: waiting for the email ;) [17:33:55] joal: will take a while to build, and then we should get email when done. although i'm going to stare at console output nervously for a bit [17:34:12] huhuhu madhuvishy :) [17:34:30] madhuvishy: I'll do something else, trying not to build on your nerves :) [17:34:37] joal: ha ha [17:36:56] marktraceur / nuria_: the log file is /srv/reportupdater/log/limn-multimedia-data-multimedia.log on stat1003, not 1002 [17:36:59] and the error is: [17:37:19] 2016-07-12 17:00:30,191 - ERROR - Report "image-uploads" could not be executed because of error: pymysql can not execute query ((1064, u"You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'all\n\tfrom all.image\n\twhere img_major_mime = 'image' and\n\t\timg_timestamp >= [17:37:20] '2016' at line 2")). [17:37:37] joal: it failed :( [17:38:09] madhuvishy: :( Did I do something wrong? [17:38:15] i don't think so [17:38:16] checking [17:38:33] marktraceur: ok documented location of logs now at: https://wikitech.wikimedia.org/wiki/Analytics/Reportupdater#Where_are_logs_.3F [17:38:38] cc milimetric [17:38:50] that way next time we remember where they are at [17:40:02] Cool. [17:40:11] milimetric: Thanks! [17:40:20] joal: nothing in the error logs - my assumption is archiva login failed [17:40:21] and hopefully we can make reportupdater 100% self service cc milimetric [17:40:44] ottomata: the new archiva login we made is supposed to work right? [17:40:45] madhuvishy: ok [17:41:39] madhuvishy: i think so, but i don't remember much [17:41:39] !log Insert test data in aqs100[456] to prevent false alarms [17:41:49] nuria_: Could this still be because "all" isn't a valid wikidb name? [17:41:56] ottomata: Failed to execute goal org.apache.maven.plugins:maven-deploy-plugin:2.7:deploy (default-deploy) on project refinery: Failed to deploy artifacts: Could not transfer artifact org.wikimedia.analytics.refinery:refinery:pom:0.0.32 from/to archiva.releases (https://archiva.wikimedia.org/repository/releases/): Failed to transfer file: [17:41:56] https://archiva.wikimedia.org/repository/releases/org/wikimedia/analytics/refinery/refinery/0.0.32/refinery-0.0.32.pom. Return code is: 401, ReasonPhrase:Unauthorized. -> [Help 1] [17:42:02] it looks like it isn't [17:42:08] i'll switch back to the old ones [17:42:50] marktraceur: right select from all.image [17:42:53] joal: 5 minutes - let me fix this. I got a new set of archiva creds made for jenkins only - but looks like they aren't working [17:43:04] marktraceur: it is not going to work [17:43:07] nuria_: I thought we had dealt with this, shoot [17:43:09] madhuvishy: no prbo :) [17:43:15] addshore: coments on puppet inline [17:43:16] madhuvishy: Thanks for doing this on the fly :) [17:43:31] madhuvishy: looking [17:43:34] what is the username we made? [17:43:53] I don't remember what happened last time I spoke with you all about this, but apparently it didn't get fixed like I thought it did [17:43:56] oh archiva-ci [17:44:11] marktraceur: but seems a problem with the system not with your select per se [17:44:31] Analytics, DBA: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#2454323 (jcrespo) Please create a change review on puppet/operations (anywhere, I will move it somewhere else) instead of attaching it here, I will take it from here. [17:44:45] nuria_: No, I remember this vaguely, "all" is included in the list of wikis for some particular queries, and I can't use it because my query doesn't support it [17:44:47] marktraceur: and while access to 1002 will help for other things, my bad, in this case access to 1003 was all you needed [17:44:55] Well that's fine [17:45:00] More access isn't a bad thing [17:45:53] ottomata: yeah [17:46:31] nuria_: Now, I personally think that supporting an edge case at the expense of all other cases is silly, but I think Dan had a better reason than I can remember [17:46:50] marktraceur: wait, what is the edge case? [17:47:35] nuria_: Using "all" as a wiki db name [17:48:05] madhuvishy: ah [17:48:08] i think it didn't have proper perms [17:48:09] try now [17:48:28] ottomata: aah [17:48:29] okay [17:48:39] joal: we have to roll this back https://github.com/wikimedia/analytics-refinery-source/commit/26207eadce4ec41a5ca6ebbd7de8515f5bc51c66 [17:48:44] or we can skip this version [17:48:51] nuria_: At least that's what I remember. Again, it's been a few months [17:48:52] what do you think? [17:49:01] madhuvishy: what does a rolback incur? [17:49:16] marktraceur: i am trying to see where is the configuration for all wikis in reportupdater [17:49:28] madhuvishy: I don't have the proper understanding of it I think [17:49:53] marktraceur: here: https://github.com/wikimedia/analytics-reportupdater/blob/master/wikis.txt [17:50:16] joal: so i don't think we can do maven rollback given that our local machine has no history of the process. But all that has changed is this commit. we could do git reset --hard to your last commit and force push [17:50:36] that would be https://github.com/wikimedia/analytics-refinery-source/commit/52301c12d709b2fb8b9179f65ecc62da7a659dd0 [17:50:45] i did it while testing and it was fine [17:50:48] marktraceur: but that file does not list "all" so i am not sure why select ends up being "from all" [17:50:59] in general i dont think that's the safest idea [17:51:09] ottomata: ^ [17:51:19] Hm. [17:51:34] madhuvishy, ottomata : I'll do whatever I'm told :) [17:51:43] madhuvishy: git push --force : http://i.imgur.com/R7tEQPA.gif [17:51:58] joal: ha ha yes yes i know [17:52:06] nuria_: I think it's added specifically somewhere, but yeah I don't know why [17:52:25] joal: we can always just skip this version - add a new changelog and proceed [17:52:27] ha, sounds fine :) [17:52:37] ottomata: which one? [17:52:38] do as madhuvishy recommends :) [17:52:43] nuria_: https://github.com/wikimedia/analytics-reportupdater/blob/master/reportupdater/reader.py#L149 [17:52:44] i am also unopinionated [17:52:46] ottomata: lol [17:52:49] okay [17:52:57] i think git reset is okay in this case [17:53:04] okaaay :) [17:53:04] it's fairly safe [17:53:13] nuria_: Again, as I recall this is a special case but I don't know why [17:53:24] joal: do you want to or I can [17:53:34] madhuvishy: I'll do it, make me learn ;) [17:53:53] git reset --hard 52301c1 [17:53:59] madhuvishy: so, after having pulled the last change, in master, I go for a git reset --hard HEAD~1 [17:54:01] marktraceur: ya, i do not understand when will it work, there must be a case i do not know about [17:54:02] git push -f origin master [17:54:09] joal: HEAD~2 [17:54:13] marktraceur: but let me file a ticket with info thus far [17:54:16] there are 2 commits maven makes [17:54:19] madhuvishy: YES madam [17:54:24] hence i gave the commit hash [17:54:34] makes complit sense, safer with commit hash [17:54:43] madhuvishy: Will go with the hash :) [17:55:19] madhuvishy: done, clean [17:55:57] joal: cool - I'm gonna re launch the release job from jenkins [17:56:04] madhuvishy: ok :) [17:56:17] it's running [17:56:29] if it doesn't work, we can blame ottomata :D [17:56:30] nuria_: I want to say it has something to do with either language or mobile, but I still don't remember which [17:56:39] Analytics-Kanban: Reportiupdater queries not working for uploader metric on multimedia - https://phabricator.wikimedia.org/T140137#2454360 (Nuria) [17:57:23] Analytics, Fundraising-Backlog, Blocked-on-Analytics, Fundraising Sprint Licking Cookies, and 2 others: Clicktracking data not matching up with donation totals - https://phabricator.wikimedia.org/T132500#2454391 (CCogdill_WMF) Yes, thank you all. We sent our first emails since the fix this mornin... [17:57:28] marktraceur: task created, it is on our kanban, I am not going to look into it further today cause i *think* it requires code changes to reportupdater (unless i am missing config somewhere) [17:57:38] marktraceur: do add yourself to ticket please [17:59:29] nuria_: I think James_F Already filed a ticket, sorry [17:59:45] nuria_: I also cannot find any reportupdater repositories that use "by_wiki" except ours. [17:59:49] joal: gah i forgot that we also had to delete the tag [17:59:51] doing [18:00:14] marktraceur: if you could resolve the other ticket as duplicate of this one it will be great [18:00:14] Analytics-Kanban, Datasets-Webstatscollector, RESTBase-Cassandra, Patch-For-Review: Better response times on AQS (Pageview API mostly) {melc} - https://phabricator.wikimedia.org/T124314#2454407 (JAllemandou) After a week of almost no progress on a month of bulk loaded data to compact, we wiped th... [18:00:24] marktraceur: on meeting for a bit [18:00:30] Working on it. Thanks! [18:01:41] Analytics-Kanban: Reportiupdater queries not working for uploader metric on multimedia - https://phabricator.wikimedia.org/T140137#2454413 (MarkTraceur) Open>Resolved a:MarkTraceur This is a duplicate of T140137 [18:02:00] nuria_, marktraceur: Ticket is https://phabricator.wikimedia.org/T140121#2453599 [18:02:13] Analytics-Kanban: Reportiupdater queries not working for uploader metric on multimedia - https://phabricator.wikimedia.org/T140137#2454417 (MarkTraceur) Resolved>Open Sorry, misfired [18:02:29] Analytics, Multimedia: Multimedia analytics cron job not been running for a few months? - https://phabricator.wikimedia.org/T140121#2454422 (MarkTraceur) Open>Resolved a:MarkTraceur Duplicate of T140137 [18:02:37] joal: hm, am looking at the number_archived_revisions field [18:02:40] not sure about that one... [18:02:42] or how to get it [18:02:44] still looking, but... [18:02:56] James_F: Closed as duplicate (well, actually I couldn't see any mechanism for literally marking as a duplicate so I just mentioned it in a comment) [18:03:10] ottomata: it's not that bad if not feasible [18:03:12] Analytics-Kanban: Reportiupdater queries not working for uploader metric on multimedia - https://phabricator.wikimedia.org/T140137#2454360 (MarkTraceur) a:MarkTraceur>None [18:03:38] ottomata: it would have been very comfortable to be able to double check on how revisions are impacted at page deletion, but it';s ok if we don't have it [18:03:49] marktraceur: Click "Edit Related Tasks…" > "Merge Duplicates In". [18:04:12] Analytics, Fundraising-Backlog, Blocked-on-Analytics, Fundraising Sprint Licking Cookies, and 2 others: Clicktracking data not matching up with donation totals - https://phabricator.wikimedia.org/T132500#2454429 (CCogdill_WMF) Open>Resolved [18:04:50] Analytics, Multimedia: Multimedia analytics cron job not been running for a few months? - https://phabricator.wikimedia.org/T140121#2454432 (MarkTraceur) [18:04:54] Analytics-Kanban: Reportiupdater queries not working for uploader metric on multimedia - https://phabricator.wikimedia.org/T140137#2454434 (MarkTraceur) [18:06:22] Hmm see also https://phabricator.wikimedia.org/T132404 [18:07:18] I agreed to this. Shoot. [18:07:29] But also, something changed to cause the reports to all fail when "all" fails. [18:07:57] HM [18:08:02] So if mforns_away can weigh in on that, especially since my brief survey of the various reportupdater scripts revealed that nobody else was using by_wiki [18:08:03] joal: where do the revisions of a page go when they are deleted?! [18:08:34] ottomata: when the page is deleted (not from a move_redir though, remember earlier on today), it's revisions are archived [18:08:39] OMG [18:08:45] BOTH page AND revision are stored in archive?! [18:08:47] in the same table [18:08:49] crazyyyy [18:08:49] joal: things are going well [18:08:56] ottomata: no page in archive, only revision [18:09:03] oh. [18:09:04] hm. [18:09:05] right [18:09:06] ottomata: the permissions look good now :) [18:09:12] Yay Madhu ! [18:09:16] great, sorry about that madhuvishy glad it works [18:09:18] madhuvishy: currently updatin [18:09:27] joal: its not done yet [18:09:35] madhuvishy : (doing again) currently updating docs :) [18:09:40] Analytics-Kanban: Reportiupdater queries not working for uploader metric on multimedia - https://phabricator.wikimedia.org/T140137#2454458 (MarkTraceur) See, for history, T132404 and T132481 which may have caused and/or not solved this issue. To my discredit, I should have followed up on the ticket, but in... [18:09:45] joal: okay :) [18:11:14] joal: re: T124314, I'm interested to see if the load method solves this [18:11:14] T124314: Better response times on AQS (Pageview API mostly) {melc} - https://phabricator.wikimedia.org/T124314 [18:11:42] urandom: Will keep you posted for sure ! [18:11:56] joal: i'm a little worried that LCS might not work for you guys (precisely because you do your loading in batches) [18:12:06] urandom: Thanks for yesterday test, it changed a bit the CPU usage (from 60% nice to 100% nice on 1004) [18:12:20] heh "nice" [18:12:29] urandom: I think on a daily basis, it'll work just fine [18:12:37] yeah, user barely nudged [18:12:40] that's the problem [18:12:40] urandom: Our problem now is bulk loading a year of data :) [18:12:56] the concurrency model on LCS isn't good for this [18:13:26] generally, it doesn't have any (concurrency) within levels [18:13:42] the exception there should be level 0 where it does size-tiered [18:14:11] but you can't merge from 0 down to 1 until you're below the threshold, which sets up quite a bottleneck [18:14:42] newer Cassandra versions fix this somewhat [18:14:51] urandom: hmmm [18:15:06] urandom: We'll try bulk loading again next time we upgrade ;) [18:15:56] i hope that the method of loading actually makes this tractable for you, it seems... strange [18:15:56] madhuvishy: To delete the tag, you did: git tag -d v0.0.32 [18:16:01] strange that it would change things so much [18:16:14] madhuvishy: then git push origin :refs/tags/v0.0.32 [18:16:17] madhuvishy: right? [18:16:20] joal: yes [18:16:49] urandom: I must say I don't explain to myself how cassandra doesn't deal with bulk loading better :( [18:16:55] madhuvishy: ok sool [18:16:57] joal: oh, also, before i forget... pending compactions on LCS is a SWAG [18:17:18] it's more or less gospel on the others, but not on LCS [18:17:44] urandom: riiiight, I can guess that [18:18:17] joal: that's not to say compaction wasn't way behind for you, the level histogram made that clear [18:19:13] urandom: yup, the nodetool results were just kinda freaky after a week of work ! [18:19:28] yeah, not enough concurrency :( [18:22:16] urandom: can we bump that easily? [18:22:26] even for not-bulk, it could help I guess [18:22:38] joal: i should say, not enough concurrency available where it is needed [18:23:08] yesterdays bump in cpu usage was the result of removing all compaction trottling, and doubling the compactor threads [18:23:10] madhuvishy: Job finished successfully :) [18:23:25] and it *barely* moved the user value :/ [18:23:41] urandom: right [18:23:59] because there was only one thread working on the level compactions [18:24:03] joal: where are you seeing this? [18:24:13] and nothing from level 0 could merge up until it was done [18:24:17] madhuvishy: I received an email ;) [18:24:22] wha [18:24:32] madhuvishy: oh no ! [18:24:34] joal: it's still running [18:24:42] madhuvishy: my mistake, it's the previous one [18:24:42] TL;DR even with 20 threads, it's bottle-necking on what one can do [18:24:45] the job number is 29 [18:24:55] sorry madhuvishy [18:25:14] joal: np :) it's almost done hough [18:25:19] I think I understand urandom [18:25:21] its uploading refinery-hive [18:25:31] sorry refinery-job [18:26:08] urandom: We'll see how things go with CQL loading [18:27:13] joal: sure, let me know! [18:29:33] joal: done now :) [18:29:58] madhuvishy: awesome :) [18:30:02] joal: i think we can do it. but it does require another mw core change [18:30:05] on it... [18:30:07] you should have email etc [18:30:20] madhuvishy: indeed ! [18:30:23] joal: we can move on to next step :) [18:30:28] :) [18:31:07] that should be smoother, fingers crossed [18:31:28] Analytics-Kanban, EventBus, Patch-For-Review: Propose evolution of Mediawiki EventBus schemas to match needed data for Analytics need - https://phabricator.wikimedia.org/T134502#2454572 (Ottomata) Ah! We need another MW Core change. We'd like to know the number of revisions archived during a page_d... [18:31:30] madhuvishy: Building ! [18:31:35] joal: out of curiosity, why did you decide against bootstrapping / copying over data to the new boxes? [18:31:39] ottomata: Thanks a lot for that ! [18:32:04] joal: i see it failed [18:32:22] gwicke: we actually decided to go that way instead of include the new boxes in the old cluster [18:32:38] gwicke: But boostrapping involves loading almost 1 yera of data [18:33:21] ah, we have different meanings of bootstrapping - I meant bootstrapping cassandra instances, as in joining the cluster [18:33:30] ah right gwicke :) [18:33:42] & then eventually decommissioning the spinning disk nodes [18:34:06] the other technique I mentioned is to essentially rsync over the data [18:34:13] gwicke: one main reason was that we want to change compaction [18:34:22] advantage is that it's very quick to switch, disadvantage is that it's very quick ;) [18:34:41] gwicke: DTCS isn't really helping us read side [18:35:26] gwicke: And SDDs have shown that they can handle LTC correctly (with initial lsad time painful) [18:35:28] everything should be a lot less problematic once you are on SSDs [18:36:05] gwicke: more or less :) [18:36:12] madhuvishy: any idea for the failure ? [18:36:22] madhuvishy: I'm very sorry, you know me: la chkoumoune ! [18:36:31] joal: such a weird message - awww [18:36:32] no [18:36:39] in any case, I would expect a regular bootstrap to be quicker than re-writing the entire dataset [18:36:43] madhu: I used 0.0.32 as the version [18:36:50] madhuvishy: Should it have been v0,0,32? [18:36:56] I wondered [18:37:12] I am fairly certain it should be 0.0.32 [18:37:21] ok madhuvishy [18:38:17] gwicke: I completely agree, but compaction change isn't that easy from what I've understood [18:39:09] you can change it in cql, and it'll apply to newly compacted sstables [18:39:54] newly bootstrapped nodes will see a lot of compaction, so should end up with pretty much pure leveled compaction [18:40:02] gwicke: right, and we'd rather try to optimize for read since begining [18:41:48] gwicke: I think I follow you, but the example we have with bulk loading is kinda weird [18:41:53] I would expect the outcome (in terms of read latency) to be pretty much the same either way [18:42:37] anyway, was just curious about why you went this way [18:42:46] gwicke: with one month of data bulk loaeded (SSTAbles streamed to nodes the same way a bootstrap would do), we ended up with stalled compaction [18:43:29] gwicke: like 20k SSTables at level 0, and compactors fighting to absorb [18:43:47] gwicke: and not actually managing [18:43:53] joal: can we call that field 'archived_revision_count' [18:43:53] ? [18:44:04] yeah, we saw similar issues with bootstrapping in early 2.1 versions [18:44:14] ottomata: That works for me (I was thinking that last time you mentionned it ;) [18:44:20] failure to bootstrap with LCS above a certain instance size [18:44:37] the limit was around 700G [18:44:53] gwicke: so, regular CQL loading is a really longer at load time, but makes compaction manageable [18:45:03] no issue with DTCS or STCS [18:45:12] k cool [18:45:13] but I think there were fixes on that front since [18:45:23] gwicke: we streamed 500G raw sstables, 1 month :( [18:47:35] if the streamed sstables aren't very sorted, then a compaction might need to access a lot of them at once [18:48:02] this is because leveled compaction has hard constraints on not having overlaps [18:48:16] gwicke: Correct, that's why I asked urandom if there were ways to have hadoop write bigger (more sorted) SSTables ... But I didn't look further [18:48:41] STCS and DTCS don't have that constraint, so have an easier time picking bite-sized compaction work [18:48:57] gwicke: makes sense, but less read perf [18:49:23] gwicke: perf tests with only 4 month loaded showed quite a degreadation, so we better try to make the most of what's available [18:49:35] gwicke: --^ read test meaning [18:49:50] gwicke: We didn't test with DTCS compaction though [18:50:16] madhuvishy: I need to go [18:50:23] madhuvishy: I'm sorry I broke the thing :( [18:50:28] joal: yeah i'm looking at what's going on [18:50:31] you didn't [18:50:32] madhuvishy: I updated the doc some 20:43:47 < joal> gwicke: and not actually managing [18:50:35] oops [18:50:38] again [18:50:39] i don't know why yet though [18:50:45] hopefully will fix it soon [18:50:49] https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Refinery-source <-- madhuvishy [18:50:55] thanks joal [18:51:02] joal: fingers crossed that leveled compaction will be the trick! [18:51:05] madhuvishy: I let you finish the release, we'll discuss that tomorrow :) [18:51:13] joal: okay :) [18:51:26] gwicke: we hopw, knowing that there'll always be limitations :) [18:51:34] Gone for tonight folks [18:51:43] thanks for the talk gwicke [18:51:50] Thaks for the help madhuvishy :) [18:51:55] Have a good day/night [18:52:12] joal: enjoy your evening! [18:56:58] joal: it was just a typo in the config [18:57:10] refs/head/master instead of refs/heads/master [18:57:13] i've fixed it [18:57:35] https://integration.wikimedia.org/ci/job/analytics-refinery-update-jars/37/ succeeded [18:57:51] (no email because I disabled it) [18:58:08] https://github.com/wikimedia/analytics-refinery/commit/c914469e38773acab88c2e0d899208d8a80869ae [19:00:26] sorry about the confusions - but hopefully here on it should be good :) [19:20:43] Analytics-Kanban, EventBus, Patch-For-Review: Propose evolution of Mediawiki EventBus schemas to match needed data for Analytics need - https://phabricator.wikimedia.org/T134502#2454840 (Ottomata) Ah, we will need a change for page_is_redirect too. [19:21:12] > ottomata: done with interviews, working on puppet now [19:21:35] k cool [19:52:23] nuria_, milimetric, I'm back, I saw a lot of chat activity with marktraceur about a reportupdater issue [19:52:32] sorry I didn't catch the ping then [19:53:22] mforns: Yeah, talking about by_wiki and the 'all' related errors, as per Phabricator [19:53:52] mforns: here, we can look at it tomorrow, i do not think it is fast to resolve: (let me know otherwise) [19:53:55] mforns: https://phabricator.wikimedia.org/T140137 [19:54:15] marktraceur, nuria_, looking [20:04:50] Analytics-Kanban, EventBus, Patch-For-Review: Propose evolution of Mediawiki EventBus schemas to match needed data for Analytics need - https://phabricator.wikimedia.org/T134502#2455049 (Ottomata) @JAllemandou, I left a pretty big comment here: https://gerrit.wikimedia.org/r/#/c/288210/11/jsonschema/... [20:08:09] addshore: hiera('statsd') works now [20:08:13] but does intclude the port [20:08:17] it will resolve to [20:08:34] statsd.eqiad.wmnet:8125 [20:08:43] Yeh! I'll have to look at that again tommorrow to adjust it in the scripts! [20:09:38] k [20:09:50] nuria_: merged cdh change [20:10:01] let's do ops puppet and nodemanager restart tomorrow [20:10:10] ottomata: yessir [20:18:17] mforns: Any thoughts? [20:28:46] marktraceur, still looking, I'm executing from my machine without problems [20:29:22] marktraceur, the error logs are, as you saw, due to the 'all' keyword when accessing the db [20:29:40] but this should not interfere in the execution of the other reports that use a real wiki db name [20:30:56] mforns: Weird. Are your patches related, maybe? [20:31:05] marktraceur, which patches? [20:31:35] mforns: There are two you wrote for multimedia-limn-data [20:31:44] aha [20:31:47] looking [20:32:17] The timing doesn't seem right though [20:34:06] marktraceur, there's one that dates from Mar 1st, and the reports seem to stop at feb 2016, so it would match [20:34:41] Hm, okay [20:35:40] mforns: are reports run sequentially? [20:35:48] nuria_, yes [20:36:01] but one failing does not stop the others from running [20:36:08] in theory... [20:36:19] mforns: It's not doing something silly like putting the data somewhere new, right? [20:36:39] marktraceur: that would be so awesome... and sad, sad too [20:36:49] marktraceur, actually the puppet job is pointing to 'metrics/beta-feature-enables' [20:37:04] Uhhh [20:37:10] instead of 'metrics/multimedia-health' [20:37:36] FFS how did that happen [20:37:55] marktraceur, yes, that folder is updated [20:38:10] Yeah, I see the job ran on 07-01, which is super [20:38:14] let me see [20:38:14] At least we have the data [20:38:20] yes, it seems so [20:38:43] marktraceur, https://github.com/wikimedia/operations-puppet/blob/production/manifests/role/statistics.pp#L121 [20:39:37] https://github.com/wikimedia/operations-puppet/commit/bf5fd0a1fd32ef7011d4ac89f730e9476e153d60 [20:40:53] marktraceur, maaaaaaaaaan [20:40:59] my bad, copy-paste [20:42:01] No problem, we caught it and we still have the data! [20:43:16] marktraceur, I think the data would be recoverable from the db anyway no? [20:43:27] mforns: Yes, I think except for new uploaders [20:43:31] or do I need to move the files? [20:43:32] But I could be wrong [20:43:44] ok, I'll check new uploaders [20:43:58] Moving the files shouldn't be a problem anyway I think? The reports won't run again until August and there's no changes except for new data. [20:44:18] The only thing that would change is we wouldn't need to re-run the reports for March-June [20:44:25] marktraceur, mforns : i wish i had though about this earlier, added troubleshooting section t docs: https://wikitech.wikimedia.org/wiki/Analytics/Reportupdater#Troubleshooting [20:44:36] Cool, thanks nuria_ [20:44:49] I probably should have said something earlier, I got sidetracked by the 'all' messages [20:44:52] marktraceur, I'm fixing the puppet code right now [20:45:05] Thanks mforns, I'll update the Phab ticket with some info [20:45:11] ok [20:45:42] mforns: upon merging puppet it will re-run though, right? [20:45:53] nuria_, yes [20:46:04] mforns: or wait.. where does it keep runs executed [20:46:06] nuria_, but they are very fast queries [20:46:11] mforns: ah ok, that is what i though [20:46:19] mforns: so no need to move files [20:46:25] it looks at the missing data and executes the corresponding reports [20:46:51] mforns: you can reference the created ticket on puppet code, marktraceur : only ops merges puppet so we will not be able to fix it right away [20:46:55] New uploaders is slow [20:46:57] nuria_, I don't think so. We can let RU execute everything [20:46:59] Well, slow-ish [20:47:32] mmm, the thing is it runs for all wikis [20:47:50] marktraceur, nuria_, ok I'll move the files [20:47:50] marktraceur: and hopefully for next time with teh docs you can find your way arround logs [20:47:56] Analytics-Kanban: Reportiupdater queries not working for uploader metric on multimedia - https://phabricator.wikimedia.org/T140137#2455248 (MarkTraceur) What Happened to Our Numbers, a story by #wikimedia-analytics: 1. We had our data at https://datasets.wikimedia.org/limn-public-data/metrics/multimedia-hea... [20:47:59] nuria_: Yeah, should be good [20:48:32] marktraceur: ok, cause we want this to be self-service with our latest refactor that cleaned up a abunch of code [20:48:35] * a bunch [20:48:36] and we also can prioritize the task to handle the 'all' keyword maybe [20:50:08] I don't mind that as long as it doesn't crash the other reports (as we have decided it doesn't) [20:50:47] marktraceur, yes, it doesn't, it only pollutes the log file [20:50:54] Analytics-Kanban: Reportupdater queries not working for uploader metric on multimedia - https://phabricator.wikimedia.org/T140137#2455252 (Nuria) [20:50:57] which is also ugly [20:51:48] marktraceur, nuria_, https://gerrit.wikimedia.org/r/#/c/298605/ [20:53:04] Cheers mate [21:02:56] marktraceur, I moved the files, in 15 minutes the files should be replicated to https://datasets.wikimedia.org/limn-public-data/metrics/multimedia-health/ [21:03:08] with the new data [21:04:00] and the dashboard should see it [21:11:35] Thanks mforns! [21:12:01] np :] thanks for notifying! [21:15:24] Analytics-Kanban: Multimedia health metrics stalled since Feb 2016 - https://phabricator.wikimedia.org/T140137#2455334 (mforns) a:mforns [21:16:19] Analytics-Kanban: Multimedia health metrics stalled since Feb 2016 - https://phabricator.wikimedia.org/T140137#2454360 (mforns) I fixed the puppet code, but I used the other task's ID, so the gerrit patch went to the other one, sorry. https://gerrit.wikimedia.org/r/#/c/298605/ [22:11:23] Analytics, Analytics-Cluster, Analytics-Kanban, Deployment-Systems, and 2 others: Deploy analytics-refinery with scap3 - https://phabricator.wikimedia.org/T129151#2455647 (thcipriani) >>! In T129151#2441318, @elukey wrote: > The analytics refinery (https://phabricator.wikimedia.org/diffusion/ANRE...