[06:06:31] 10Analytics, 10ChangeProp, 10Citoid, 10ContentTranslation, and 12 others: Node 6 upgrade planning - https://phabricator.wikimedia.org/T149331#2948472 (10GWicke) Over the last ~10 hours we have not seen any issues with node 6 & RESTBase. As expected, the most noticeable impact is significantly reduced memor... [08:52:07] 10Analytics, 10ChangeProp, 10Citoid, 10ContentTranslation, and 12 others: Node 6 upgrade planning - https://phabricator.wikimedia.org/T149331#2758922 (10hashar) That is great! I had T121850 about RESTBase emitting `Heap memory limit exceed`. The last one is at 2017-01-17T18:10:12 (logstash for 7 days http... [09:49:31] good morning A-Team! [09:49:38] Hi fdans :) [10:06:45] o/ [10:07:00] just added https://wikitech.wikimedia.org/wiki/Kafka/Administration#Handling_a_broker_down and https://wikitech.wikimedia.org/wiki/Kafka/Administration#Verify_the_state_of_a_topic [11:05:03] chasemp: Hello :) [11:05:19] chasemp: is now a good time to bug you? [11:14:44] chasemp: looks like not :) [11:19:05] joal: isn't he in PST ? [11:19:31] hm, I don't know :) [11:25:15] yeah, he lives somewhere in the US midwest [12:10:38] joal: do you have 10 minutes for my usual cassandra ramblings? [12:12:17] not now, discussingwith Ariel - when I'm done with him ? [12:12:21] elukey: --^ [12:14:29] whenever you have time, not in a hurry :) [12:14:40] I am try to prep for next week's cluster upgrade [12:35:51] elukey: Here ! [12:35:57] batcave elukey ? [12:42:55] joal: what about in ~20 mins? [12:43:03] whenever elukey :) [12:43:13] super thanks! Commuting to the office :) [13:25:48] hi, I need to rollout new firejail updates on aqs (it's used by the aqs node service). I don't expect any problems, but is anyone around during the Euro afternoon to make some sanity checks after the upgrade? [13:28:17] moritzm: I'll do it [13:28:45] joal: sorry for the delay, but in the co-working there are meetings and the noise level is a bit high :( [13:28:52] would you mind if I write in here instead? [13:28:53] would in 15 mins work for you? [13:29:02] moritzm: yep! [13:29:13] k, I'll ping you soon [13:29:30] ok for me elukey [13:29:55] thanks! [13:30:33] so I tried to understand (again) what happens when a new node is added to a cassandra cluster, especially with rack awareness [13:31:03] this is also to understand what described in https://wikitech.wikimedia.org/wiki/Cassandra#On_Bootstrapping [13:31:43] I know something is wrong in my understanding but I don't know what :D [13:31:49] so here it goes [13:32:39] My understanding of "num_tokens 256" is that each node gets randomly 256 tokens of the ring, getting their correspondent ranges. [13:33:15] each of the token / ranges is replicated 3 times, possibly using different racks [13:33:28] (as specified by the cassandra::rack option) [13:34:09] elukey: BTW: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=851574 [13:34:20] now when a new node joins the cluster with num_tokens 256, I would expect that it gets 256 tokens and the ring gets rebalanced to adjust the new ranges [13:35:11] so theoretically the new node should get streams of data from the primary replicas of his new tokens [13:35:21] belonging to different hosts [13:35:24] BUT [13:35:38] https://wikitech.wikimedia.org/wiki/Cassandra#On_Bootstrapping seems to disagree with my interpretation :D [13:36:40] moritzm: is he a debian devel? I am following is github fork of clickhouse! [13:39:46] elukey: indeed there is something I don't understand either [13:40:04] elukey: I think (at least for me) it is related to rack-awareness [13:41:06] joal: I feel better now, I don't feel that stupid anymore :D [13:41:15] ;) [13:41:25] will try to ping urandom [13:41:37] * elukey blames Eric for Cassandra's complexity [13:41:40] :P [13:42:59] elukey: he's not a DD, but maintains a few packages: https://qa.debian.org/developer.php?login=debian@jbfavre.org [13:44:29] nice [13:44:45] sooner or later it would be great to be able to do it myself :D [14:09:10] halfak: Hi ! [14:09:20] halfak: excuse me I'm late [14:52:54] 10Analytics, 10Analytics-Wikistats: stat1002: R library Cairo is missing, probably since server update - https://phabricator.wikimedia.org/T155254#2949176 (10Ottomata) Hm, not sure! It totally just worked for me, and all the libcairo2 stuff is installed properly via puppet. ``` export http_proxy=http://webpr... [15:29:48] ottomata, joal - ops sync or skip? [15:29:57] sync ! [15:29:59] Joining [15:32:07] ja! [15:32:08] coming [15:32:26] elukey: we in batcave [15:33:19] joining [16:00:54] mforns: standdduppp [16:04:47] o/ ottomata [16:05:43] When you're done syncing, I'm wondering if you can help me roughly ballpark the cost of a server that's like stat1003, but has room for a big, beefy GPU ala ellery's modeling needs. [16:05:55] I just need a rough estimate of the capex cost. [16:06:02] Figured you could help me ballpark that. [16:08:08] halfak: sure will ping you when we done standup [16:08:32] kk thanks man [16:20:27] 10Analytics, 10Analytics-Wikistats: stat1002: R library Cairo is missing, probably since server update - https://phabricator.wikimedia.org/T155254#2949351 (10ezachte) 05Open>03Resolved Ah, there is a commented line in my R input file, which I overlooked (and apparently is needed just once) #install.packag... [16:33:40] chasemp: maybe now? [16:33:51] hey joal :) [16:33:55] Hi !! [16:34:01] it was 5 am when you pinged before ;) [16:34:11] Wow, I'm very sorry [16:34:19] I'll NEVER do it again ;) [16:34:39] no worries just letting you know [16:34:52] I have questions on labsdb etc, may I ping you again in some later time? [16:35:25] now is cool, later is ok too but we are going to be doing a migration for NFS so I may be tied up [16:35:38] starts in 25m [16:36:02] ok - might be tomorrow too then ;) [16:36:50] sure thing, if you drop the questions on me in a task or something I can do better async [16:37:04] do we have a task for the new labsdb cluster data ingestion by analytics? [16:42:03] sorry [16:42:52] chasemp: not sure, checking and subscribing you, and adding questions in there for async [16:43:03] nice [16:49:17] 10Analytics: Showcase for new recent changes feed - https://phabricator.wikimedia.org/T155637#2949386 (10Nuria) [17:01:05] Ook halfak heya [17:01:06] so hm [17:01:20] i re-read the ticket [17:01:31] so, we are planning to replace both stat1002 and stat1003 next quarter [17:01:34] Q4 [17:01:34] ottomata: I tried to interpolate the strings but it seems not working :( [17:01:41] with hiera? [17:01:41] rats [17:01:45] maybe you need :: in front? [17:01:50] or maybe it just doesn't work :/ [17:02:03] it'd be nice if we didn't have to duplicate that value [17:02:06] 06Analytics-Kanban, 10DBA: Sqoop doesn't run anymore - Seem related to a DB change (analytics store) - https://phabricator.wikimedia.org/T154685#2949425 (10JAllemandou) Removing limitations has solved the issue. [17:02:11] halfak: lemme know when you are there [17:02:17] (we can also move over to -research if you like) [17:02:32] yes but it doesn't seem a big dealt atm.. anyhow, the rest looks ok?? [17:02:32] o/ Just started my own meeting :( [17:02:41] doh [17:03:16] fdans: I have my 1/1 with Nuria, wanna chat in 30 minutes or will you be gone? [17:03:32] in 30 min is perfect actually [17:03:42] haven't eaten in the whole day [17:08:29] ah ottomata, anything against https://gerrit.wikimedia.org/r/#/c/331451/ ? [17:08:42] I didn't see any problem but I wanted to ask you [17:08:56] hashar is kinda blocked because of it [17:09:33] +1 elukey [19:32:20] ottomata: any issues with me deploying cluster so we can close 1 task we have in progress? [19:33:06] joal: looks good [19:33:54] nuria: go ahead! [19:36:22] ottomata: k [19:48:25] (03PS1) 10Nuria: Changes for v0.0.39 [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/332809 [19:49:25] (03CR) 10Nuria: [V: 032 C: 032] "Self merging to deploy V0.0.39" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/332809 (owner: 10Nuria) [20:08:35] (03PS1) 10Nuria: Update jar version for webrequest load job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/332811 [20:09:16] ottomata, joal: please be so kind as to take a look [20:09:18] https://gerrit.wikimedia.org/r/#/c/332811/ [20:16:38] (03CR) 10Ottomata: [C: 031] Update jar version for webrequest load job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/332811 (owner: 10Nuria) [20:16:41] +1 nuria [20:18:31] (03CR) 10Nuria: [V: 032 C: 032] Update jar version for webrequest load job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/332811 (owner: 10Nuria) [20:18:48] (03Abandoned) 10Nuria: Update jar version for webrequest load job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/332811 (owner: 10Nuria) [20:19:14] ottomata: sorry, just realized i had one changeset there that ws not mean to be merged, redoing [20:19:22] k [20:22:27] (03PS1) 10Nuria: Update jar version for webrequest load job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/332814 [20:23:25] ottomata: redid it: https://gerrit.wikimedia.org/r/#/c/332814/ [20:24:28] milimetri, joalc: works on deploying cluster now are golden, thanks for the work you guys put there! [20:24:47] (03CR) 10Ottomata: [C: 031] Update jar version for webrequest load job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/332814 (owner: 10Nuria) [20:27:36] (03CR) 10Nuria: [V: 032 C: 032] Update jar version for webrequest load job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/332814 (owner: 10Nuria) [20:43:23] ottomata: about to restart jobs, i neeed to restart from hour 20 correct? [20:43:24] https://hue.wikimedia.org/oozie/list_oozie_bundle/0040263-161121120201437-oozie-oozi-B [20:47:06] looks correct to me nuria [20:47:13] ottomata: k sir [20:48:57] ottomata: [20:48:59] see [20:49:02] https://www.irccloud.com/pastebin/jR5U4Dgy/ [20:51:39] ottomata: looks good? [20:52:15] +1 nuria looks good to me [20:52:26] do you need to restart refinement jobs too? [20:53:02] oh sorry, they are not separate(?) man its been a while since i looked at this [20:53:43] ottomata: this is the webrequest load so nothing else right? [20:54:22] the changes you deployed are just for wr load? [20:54:25] then ja [20:54:38] ottomata: no, wait [20:55:00] ottomata: there is no such a thing as refine [20:55:01] https://github.com/wikimedia/analytics-refinery/tree/master/oozie/webrequest [20:55:12] rightttt? [20:55:20] yeah we merged them a loong time ago [20:55:21] sorry [20:55:35] ottomata: ahahahah [20:55:53] ottomata: ok, then i think we are ready? [20:55:54] ja [20:56:10] ottomata: ok, crossing fingers so nothing blows up [20:56:31] :) [20:56:45] ebernhardson: deployed your stemmer UDF with latest changes [20:57:14] ebernhardson: https://gerrit.wikimedia.org/r/#/c/332809/1/changelog.md [20:58:27] nuria: woo! thanks. [20:59:12] ebernhardson: at this time i am hoping that my deployment to cluster is one that doesn't break anything ... ahem... [21:00:26] :) [21:07:16] 06Analytics-Kanban, 13Patch-For-Review: Productionize loading of edit data into Druid (contingent on success of research spike) - https://phabricator.wikimedia.org/T141473#2950455 (10Nuria) Given the issues with volume of edit data in druid seems like this one should go back to "paused", correct? [21:11:52] 10Analytics: Measure Community Backlog. - https://phabricator.wikimedia.org/T155497#2950464 (10Milimetric) [21:14:47] 06Analytics-Kanban, 13Patch-For-Review: Productionize loading of edit data into Druid (contingent on success of research spike) - https://phabricator.wikimedia.org/T141473#2950471 (10Milimetric) I would like to down-scope it to load just 1 year of the data so we can show it next week at metrics. I take it upo... [21:22:02] 06Analytics-Kanban, 13Patch-For-Review: Productionize loading of edit data into Druid (contingent on success of research spike) - https://phabricator.wikimedia.org/T141473#2950513 (10Nuria) @Milimetric +1. Agreed. I think we talked about that today in standup. Sounds fine. [21:47:46] 10Analytics, 10EventBus, 13Patch-For-Review, 06Services (doing): EventBus produces non-canonical page urls - https://phabricator.wikimedia.org/T155066#2950587 (10Pchelolo) I've created a set of interdependent patches to solve this: - https://github.com/wikimedia/swagger-router/pull/53 - adds ability to dec... [22:06:15] 10Analytics, 10ChangeProp, 10EventBus, 06Reading-Web-Backlog, and 4 others: Subscription: Trending service should be able to subscribe to edits in real time - https://phabricator.wikimedia.org/T145553#2950677 (10Jdlrobson) [23:21:37] 10Analytics: Measure Community Backlog. - https://phabricator.wikimedia.org/T155497#2945157 (10Halfak) I think this is a great idea. One component that I'd like to address directly is thinking about a long-term solution for populating a backlog. Right now, there's a hodgepodge of strategies employed for flaggi...