[07:50:45] Analytics-Tech-community-metrics: Mediawiki support to be added to GrimoireLab - https://phabricator.wikimedia.org/T138007#2451648 (Aklapper) >>! In T138007#2409146, @Lcanasdiaz wrote: > The Mediawiki support for Perceval is being finished this week. Does that mean this task is resolved by now? If so, feel... [08:21:41] (PS1) Addshore: Update path to the db ini file [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298421 (https://phabricator.wikimedia.org/T140064) [08:32:45] (CR) Addshore: [C: 2] Update path to the db ini file [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298421 (https://phabricator.wikimedia.org/T140064) (owner: Addshore) [08:36:24] morning joal ! [08:38:57] Hi addshore [08:39:30] addshore: How are you today? [08:40:33] Good, and yourself? [08:40:42] not bad :) [08:40:57] How would you feel about pushing the oozie job out today then? :D [08:40:58] Weather is gently summerizing :) [08:41:08] haha, it's gone the opposite way over here :/ [08:41:13] Arf :( [08:43:11] I guess you're atfer some news about a deploy on our side [08:45:58] Yup! I have no idea about the process etc! ;) [08:46:20] addshore: two separate deploys [08:47:04] addshore: one for refinery-source, then refinery [08:49:14] awesome! [08:49:22] also, joal, can you merge things in operations-puppet? [08:49:33] addshore: And for refinery-source you're inaugurating a new deploy process madhuvishy has impletemented [08:49:39] addshore: I don't [08:49:45] oooooh [08:50:03] addshore: I'm very sorry, you've inqugurating a few things with you patch :) [08:50:12] It's been fun! :D [08:50:24] addshore: the plan is for me and madhuvishy to deploy refinery-source later on today [08:50:50] addshore: Then if everything goes smooth, I'll do refinery tonight, if not probably tomorrow [08:51:44] awesome! [09:20:01] elukey: \o [09:20:30] hello!!! o/ [09:20:43] elukey: How is it for you today? [09:20:57] debugging mod_proxy_fcgi, a nightmare. You? :D [09:21:24] good, wondering about cassandra [09:21:33] good luck with mod_proxy [09:21:42] ahhaah [09:21:58] do you want to chat about cassandra? [09:22:42] elukey: not really, I think if urandom agrees, we'll wipe the cluster tomorrow and start loading the old way [09:22:57] :( [09:23:03] I didn't follow the conversation [09:23:07] as you say :( [09:23:35] elukey: nothing really special, urandom tried to boost compaction yesterday through settings, but I see no difference [09:23:57] elukey: What's really weird is that compaction is really stalled, no progress at all since almost a week [09:24:06] wow [09:24:47] starting the old way will take ages right? [09:25:09] elukey: will take time, but less than bulk so far ! [09:25:40] elukey: If we had started loading the old way instead of bulk last week, we'd currently be compacting the second month [09:26:22] yeah but a full year will take ~2 months? [09:26:33] elukey: probably [09:26:37] :( [09:26:57] elukey: But for the moment a full month has not even been compacted a week so ... [09:29:04] yes yes of course we need to take a decision about what to do otherwise we'll waste too much time [09:29:20] elukey: That's where I stand [09:29:39] it is sad that after all this awesome work you didn't get some good result as reward [09:30:10] elukey: And since no progress seems to be made, either we decide that bulk doesn't fit for us, either there is some parametering we can try, but I don't want to wait more with no progress being made from cassandra [09:32:27] yes +1, we need to take a decison after a deadline otherwise no progress will be made [09:33:12] elukey: For me the deadline is kinda today, in order to have a loading process/compaction running while I'm away this end of week [09:36:04] yep ok [09:36:19] * elukey suspect that joal is suggesting to wipe the cluster again [09:36:28] elukey: If not feasible, we'll do next week, but I'd rather have it moving [09:36:48] * joal is scared when elukey reads his mind [09:37:22] * elukey reassures joal since this is a side effect of working together daily [09:37:31] :D [09:37:46] jokes aside, I am going to wipe everything after lunch [09:37:47] is it ok? [09:37:57] I [09:38:01] elukey: Yes, let's do that [09:38:08] I'll also change the name of the cluster in puppet [09:38:30] from Analytics Query Service Test to Analytics Query Service NG [09:38:33] ok? [09:38:51] works for me [09:38:56] not sure if matters but better than Test [09:39:04] elukey: We can change that back to prod without wiping it after loading? [09:39:07] I suspect that we can't change it at runtime [09:39:21] hm [09:39:30] not sure it is a speculation [09:39:32] didn't check the docs [09:39:42] I just don't want to leave test in there :) [09:40:32] elukey: good for me:) [09:43:55] ok!! [09:44:08] I just understood a little part of mod_proxy_fcgi [09:44:11] my head is exploding [09:49:48] but I tracked down the 304 issue [09:49:54] it is a "break" in the code [09:49:56] :/ [09:50:06] elukey: :S [09:50:34] but the FCGI protocol is really nice [09:50:43] much more powerful than expected [09:51:00] for example it allows "streams" over the same FCGI connection [09:51:30] mod_proxy_fcgi does not support them (and I am not sure if streams are transparent to the applayer) but they resemble HTTP/2 streams [11:01:11] brb lunch! [11:18:17] Analytics, Analytics-Wikistats, Labs-project-wikistats: Design new UI for Wikistats 2.0 - https://phabricator.wikimedia.org/T140000#2452139 (Danny_B) [11:27:48] hey yall just a note that marco did a little experiment (might be useful to look at before wiping the cluster): https://ganglia.wikimedia.org/latest/graph_all_periods.php?h=aqs1004.eqiad.wmnet&m=cpu_report&r=4hr&s=by%20name&hc=4&mc=2&st=1468262986&g=cpu_report&z=large&c=Analytics%20Query%20Service%20eqiad [11:28:25] In his words "15:44:05 milimetric: unthrottling compaction on aqs1004, and double compactor concurrency" [11:29:33] He said it crawled a little faster after that: joal / elukey ^ [11:34:00] milimetric: o/ [11:34:19] yeah I think that joal looked to it but no substantial gain [11:34:41] but I'll wait for joal's confirmation before proceeding [11:34:54] it will take me ~30 mins so I can do it whenever you guys want [11:37:12] I think you can wipe, didn't mean to stop that, indeed I don't think the improvement was substantial [11:38:35] (PS1) Addshore: Big cleanup, reorder and document! [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298447 [11:41:01] (PS2) Addshore: Big cleanup, reorder and document! [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298447 [11:45:55] (CR) Hashar: "recheck" [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298447 (owner: Addshore) [12:25:21] joal: you there? [12:36:05] (CR) Addshore: [C: 2] Big cleanup, reorder and document! [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298447 (owner: Addshore) [12:36:25] (Merged) jenkins-bot: Big cleanup, reorder and document! [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298447 (owner: Addshore) [12:45:11] (PS1) Addshore: Refactor how the simple config file is accessed [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298465 [12:45:53] (CR) Addshore: [C: 2] Refactor how the simple config file is accessed [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298465 (owner: Addshore) [12:46:13] (Merged) jenkins-bot: Refactor how the simple config file is accessed [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298465 (owner: Addshore) [12:47:35] Ah nice yesterday's patch for Hadoop log retention cut in half most of the node managers heaps! https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?panelId=17&fullscreen [13:01:48] Analytics, Revision-Slider, TCB-Team, WMDE-Analytics-Engineering, and 3 others: Data need: Explore range of article revision comparisons - https://phabricator.wikimedia.org/T134861#2452497 (Addshore) [13:02:55] morrning [13:03:00] NICE! [13:03:19] elukey: you know i think there may have been 2 nodemanagers that didn't get restarted [13:03:27] i ran a salt command with puppet --test && ...restart [13:03:37] and i think a coule pof the puppet --test didn't run because puppet was already running [13:03:47] ah yes that explain why we have different heap sizes! [13:03:49] hmmm, elukey but. i don't see any reduction in lots though [13:03:55] still 31T [13:04:21] ok elukey cool, now I can see which two didn't restart! [13:04:24] gonna restart those two now [13:04:45] super! [13:04:46] looks like 3 actually [13:04:50] I was about to say the same [13:05:30] !log restarting nodemanagers on analytics 1039 1046 and 1054 [13:05:38] morning ottomata ! [13:05:43] mornin [13:07:22] addshore: is it working? [13:07:26] yup! [13:07:35] thought I have 1 more thing for puppet for you to merge [13:07:40] oh i see a patch [13:07:41] looking [13:07:51] I rearranged some of the scripts and added docs, and generally made things a bit nicer [13:07:58] addshore: if you like, you can do ensure => 'latest' [13:08:06] then you don't have to make puppet patches whne you change your codebase [13:08:11] OR [13:08:14] we could use scap [13:08:16] maybe [13:08:21] to deploy your codebase [13:08:24] instead of having puppet clone it [13:08:34] yeh, but in this case it would have broken, as the cron scripts would have had to get updated at the same time [13:08:44] * addshore doesn't know anything about scap ;) [13:08:46] oh? [13:08:51] the cron scripts are updated? [13:08:53] oh [13:08:57] daily, etc. [13:09:00] yeh! as I moved things :) [13:09:05] addshore: you could put daily and minutely in your repo [13:09:33] I could do, that is one of the things I thought of today [13:09:41] just as erb files and then get puppet to grab them [13:09:44] hmm [13:09:46] not as erb files [13:09:48] as a bash script [13:09:51] that takes an arguemnt [13:09:53] $base_path [13:09:53] or something [13:09:56] hmm, yeh! [13:10:00] or $scripts_dir [13:10:07] then puppet just passes it in when it makes the cron job [13:10:26] yeh, that would make sense :) I'll see if I can some up with something in the coming days! I'm just super glad it's all puppetized now :D [13:10:27] addshore: i'll go ahead and merge this, if you want to make those changes am happy to help then too :) [13:10:29] k np [13:12:10] ottomata: heap sizes dropping after restart :) [13:13:13] nice [13:13:25] addshore: merged and puppet ran [13:13:30] awesome! :) [13:13:41] ottomata: I'm going to go ahead and make a patch meaning I can sudo as the user too! [13:14:12] yeah, ok, that's going to take some ops discussion, but i think it will be fine eventually [13:14:14] addshore: do you know how? [13:14:23] you gotta make a new group in admin module data.yaml [13:14:39] maybe [13:14:50] I think I should be able to spot all of the individual bits! [13:14:52] 'wmde-users' or 'wmde-admins' or something [13:14:53] k [13:14:54] cool [13:15:02] you will have to file a phab ticket for that [13:15:04] explain everything [13:15:09] okay! [13:15:43] add is this hte path you want? [13:15:44] /a/analytics-wmde/src/scripts/src/wikidata/site_stats/page_size.php [13:15:48] doesn't exist afaict [13:16:39] hmm, it exists in the latest commit in the scripts repo! [13:17:03] hmm yeah it did not check ou t the commit you specificed [13:17:03] hm [13:17:15] in the repo it is src/wikidata/site_stats/page_size.php [13:17:21] oooooh [13:17:41] i see it at 73c88575345d63115230a6f4ca7c75852fb735f0 [13:17:46] running puppet agian just to double check [13:19:05] Analytics: Create ops dashboard with info like ipv6 traffic split - https://phabricator.wikimedia.org/T138396#2452597 (faidon) Per day, like Google's, would probably be more interesting, but per month would do too. [13:35:45] Analytics-Features, Project-Admins: Create new project "Analytics-Features" - https://phabricator.wikimedia.org/T863#2452762 (Danny_B) [13:45:18] ottomata: any ideas about checking out the right version? [13:45:40] ah sorry, got distracted... [13:45:57] dunno why puppet isn't doing it [13:45:59] Hey elukey [13:46:04] so did I :D Until I noticed icinga showing one of the checks as unknown ;) [13:46:15] sorry, got caught at home [13:46:29] oh addshore [13:46:42] git::clone doesn't support ensure -=> sha [13:47:13] Oh, thats lame, my brain must have assumed that it did from using ansible! [13:47:35] it supports a branch though [13:47:39] elukey: you can wipe when you want, I have not seen any drastic change [13:47:42] if you want to do that [13:47:48] your choices for ensure are [13:47:52] absent, present, or latest [13:47:58] hmm, I could have a production / deploy branch I guess? [13:48:02] ja you could [13:48:24] branch => 'deploy', ensure => 'latest' [13:48:24] Right, I'll do that! Is there a convention as to if it should be called deploy or production? [13:48:29] aweosme! [13:48:30] naw, up to you [13:48:46] joal: all right! going to finish packaging varnishkafka and then I'll start [13:48:53] awesome :) [13:49:35] ottomata: patch is up, let me just make the branches quickly! [13:49:50] milimetric, mforns : let me know when you have a minute for a last pass over schemas [13:49:58] ottomata: Hello :) [13:50:09] Analytics-Kanban, EventBus, Patch-For-Review: Propose evolution of Mediawiki EventBus schemas to match needed data for Analytics need - https://phabricator.wikimedia.org/T134502#2452813 (Ottomata) In EventBus meeting yesterday we decided to remove the `sha1` visibility boolean. It doesn't provide an... [13:50:15] ottomata: done! [13:50:19] joal: hi! [13:50:26] ottomata: I think when stat1002 got restarted the pivot and caravel servers didn't ;) [13:50:31] ottomata: Would you mind? [13:51:04] oh! [13:51:05] hi milimetric I'm here [13:51:07] they certinaly wouldn't have [13:51:09] hm [13:51:12] oh no, it was joal [13:51:18] huhu :) [13:51:28] Hi mforns [13:51:33] hi joal :] [13:51:57] I wanted to have last pass over the schemas if you have a minute (with milimetric would be better) [13:52:19] I have time, will be here until standup, just ping me [13:52:52] joal: try pivot now [13:52:55] stat1002 9090 [13:53:34] ottomata: works, but no datasource configured :( [13:53:56] ? [13:53:58] mforns: ok great, no news from milimetric yet :) [13:54:08] ok [13:54:26] hm [13:54:26] ottomata: server respond but it says 'no datasource configured yet' [13:54:38] k [13:54:39] ... [13:54:39] i see [13:57:03] Hey mforns, just saw Dan's message ... I think we can go the two of us [13:57:14] mforns: batcave? [13:57:20] joal, oh yea, he is not feeling well [13:57:21] ok [13:57:23] omw [13:57:57] Analytics-Volunteering, Developer-Relations, Project-Admins, Blocked-on-Analytics, Need-volunteer: Analytics-Volunteering and Wikidata's Need-Volunteer tags; "New contributors" vs "volunteers" terms - https://phabricator.wikimedia.org/T88266#2452902 (Danny_B) [13:58:14] wops got it joal better now [13:58:30] addshore: that worked too [13:58:34] now at 60aa970b0ae8ccb82034b8972775cd3bf6f15b1e [13:58:39] :) [13:59:54] aww joal, the caravel sqlite db was in /tmp [13:59:54] so its gone [13:59:59] all the stuff you did :( [14:01:29] ottomata: Arf [14:02:10] arf [14:06:05] sorry ottomata one more! https://gerrit.wikimedia.org/r/#/c/298477/ [14:06:13] joal: caravel back up on 9091 [14:06:17] db now in my homedir [14:06:22] ottomata: thanks ottomata :) [14:06:23] inited with our cluster info [14:06:34] didn't load the examples, lemme know if you want them [14:06:40] figured you'd just want to play with our pageview stuff [14:06:51] And now I'll go and work on the thing so these changes don't have to be in the puppet repo! [14:07:43] That great ottomata thanks [14:08:24] ottomata, hey :] do you have rights to delete wiki pages? [14:09:12] uhhhh [14:09:16] on wikipedia? doubtful [14:09:22] ottomata, on testwiki [14:09:26] dunno! [14:09:34] mforns: what about beta.wmflabs.org [14:09:34] ? [14:09:55] there for sure [14:10:28] ottomata, the database for beta is in analytics-storage? I can not find it, or the research user does not have rights to access it... [14:10:35] no, its in labs [14:10:38] I see [14:10:39] in deployment-prep [14:11:02] i've never used testwiki, so i doubt i have rights there [14:11:07] ok [14:12:09] joal: did we say user_id_target etc. was better? [14:12:20] ottomata: I don't recall so [14:12:21] ottomata, how can I access the db in deployment-prep? [14:13:05] ottomata: I explained why we chose to go for differenciating names, but I can't recall having agreed on user_target [14:13:15] ottomata: On the other hand, so many things were discussed ! [14:13:23] i'm not sure why we chose, just that dan and mforns maybe liked it better [14:13:34] buuuut, not sure why [14:13:46] mforns: not totally sure, looking [14:13:48] in the mean time [14:13:53] ottomata: I removed sha1, now we're triple checking with mforns on the page_creation_timestamp/page_first_rev_id [14:13:54] what's your opinion on field names like [14:14:02] user_id_blocks_changed [14:14:04] and [14:14:26] vs [14:14:31] user_id_target [14:14:33] ottomata: I don't have a strong one [14:14:47] ottomata, I pointed to use user_id_blocks_changed, to be consistent with the user_id_groups_changed [14:15:00] mforns: why not user_id_target for both? [14:15:01] but I think both are too long names [14:15:08] ottomata: there are advantages for both ... Maybe using target, since it's the same in multiple places, makes it easier [14:15:10] yes, I would agree [14:15:24] it seems to make sense, will be more consistent for future schemas too [14:15:24] it's simpler [14:15:29] sure [14:15:36] there is a user_id that is performing and action, and a possible user that the action is being performed on [14:15:40] okm let's go for user_[id|text]_target :) [14:15:40] ok cool [14:15:51] ok, joal i'm modifying a few things now, will add that and amend your patch [14:16:05] ottomata: as you wish, I have aptch on the fly as well [14:16:40] ottomata: we also agreed on database_name instead of wiki_database [14:16:55] oh ok [14:16:59] ok cool [14:17:00] ottomata: And I think with that you have everything :) [14:17:11] mforns: i don't know how to access the db. i see two instance in depl-prep [14:17:13] db1 and db2 [14:17:20] but sudo mysql needs a pw [14:17:21] don't know it [14:17:24] except 1 field we want to add in the page_delete event on which we haven't find a name yet [14:17:25] you will have to ask rel-eng folks [14:17:32] ottomata, don't worry we'll find another way [14:17:37] ok, yes [14:17:42] mforns: not sure what you are doing, but you can test in mw vagrant, no? [14:17:48] mmmm, good [14:18:32] thanks ottomata :] [14:21:44] joal: can we expect that ALL mw generated eventbus events will have the 'database_name' field [14:21:45] ? [14:21:45] joal, ottomata: wiping the AQS cluster [14:21:58] aqs100[456] [14:22:01] :P [14:22:02] elukey: ok [14:22:03] :) [14:22:19] ottomata: Probably only the mediawiki oners [14:22:22] there is a createEvent function in the EventBus extension that populates the meta info [14:22:33] i could DRY up the database_name there [14:22:37] even though its not part of meta [14:22:46] ottomata: hm, can't say [14:22:53] but then we'd need to enforce that all EventBus extension schemas have that field [14:22:55] ottomata: https://gerrit.wikimedia.org/r/#/c/298456/ - I am changing the name to Analytics Query Service NG [14:23:11] ottomata: sorry to bug you but could you quickly +2 https://gerrit.wikimedia.org/r/#/c/298477/? :) [14:23:19] ottomata: for instance, error, change-prop, resource-change etc, they probably don't want database_name [14:23:23] this can be done before creating the cluster so if you guys don't like it let me know now :P [14:23:28] elukey: did you mean to change cdh submodule in that patch? [14:23:33] also, will 'NG' be permanent? [14:25:13] thanks! [14:25:48] ottomata: nono that one is for AQS [14:25:53] it is the cassandra name [14:25:56] right [14:25:57] cluster name [14:25:57] but, i mean [14:26:05] will we be stuck with it forever, or will you change it back later? [14:26:16] I think that we will not able to change it [14:26:19] hm [14:26:24] then I don't like it :p [14:26:30] what about next time we do this? NNG? [14:26:55] Well I am happy with every name [14:27:00] haha [14:27:08] :P [14:27:18] so, you need to call it something different than just Analytics Query Service, because that currently conflicst with 100[123]? [14:27:20] correct? [14:28:02] joal: indeed, about database_name [14:28:07] but those events are not created by the eventbus mwextension [14:28:23] mobrovac: what do you think? [14:28:24] qq [14:28:25] yeah [14:28:41] yt? [14:28:44] so I am reading that THEORETICALLY it is possible to change the name of the cluster on the fly [14:28:57] elukey: how about just something that makes sense but doesn't conflict [14:28:57] like [14:28:58] via cqlsh first then editing the cassandra yaml [14:29:05] 'Analytics Query Service Storage' [14:29:05] ? [14:29:07] heh [14:29:22] looks nice [14:29:39] all right updating the code review [14:32:10] ottomata: https://gerrit.wikimedia.org/r/#/c/298456/3 [14:32:37] +1 [14:32:38] :) [14:32:43] gooood [14:41:10] hey ottomata [14:41:29] ottomata: We have the new field for the page_delete event [14:41:38] ottomata: You probably won't like it ;) [14:41:58] ottomata: move_over_redirect_page_id: integer - Not mandatory [14:42:31] ottomata: If present, This field contains the page_id of the redirect overwrittwen by the move [14:46:25] haha [14:46:34] wait, before I ask about what the heck. [14:46:46] is that instead of page_creation_dt [14:46:47] ? [14:46:50] or do you still need that too? [14:47:03] we need both ottomata !!! [14:47:10] ok, lemme ask a q about that first too [14:47:10] ottomata: batcave for the heck? [14:47:13] yeah! [14:59:37] (PS1) Addshore: Move cron files from puppet to this repo [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298486 [15:00:10] (PS2) Addshore: Move cron files from puppet to this repo [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298486 (https://phabricator.wikimedia.org/T140095) [15:00:35] nuria_, having problems joining... [15:03:07] mforns: we're in batcave with ottomata doing some tests on mediawiki if you're interested [15:03:16] joal: new cluster up and running [15:03:22] elukey: Yay ! [15:03:27] Thanks a lot elukey [15:05:44] elukey@aqs1006:/var/log/cassandra$ nodetool-a describecluster [15:05:45] Cluster Information: Name: Analytics Query Service Storag [15:06:01] sorry should be Name: Analytics Query Service Storage [15:06:03] goooood [15:06:07] everything is up [15:06:27] ottomata: I'm thinking https://gerrit.wikimedia.org/r/#/c/298486/2 and https://gerrit.wikimedia.org/r/#/c/298487/4 should do it if your still free for a review! [15:16:16] (CR) Addshore: [C: 1] Move cron files from puppet to this repo [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298486 (https://phabricator.wikimedia.org/T140095) (owner: Addshore) [15:17:08] (PS3) Addshore: Move cron files from puppet to this repo [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/298486 (https://phabricator.wikimedia.org/T140095) [15:17:18]