[07:49:03] hi, I'll enable firewall rules on stat1002 in a few minutes, should not have any impact on your ongoing work. but should you notice something, speak up :-) [08:04:03] this has been enabled and I don't see any dropped traffic in iptables, seems all fine [08:11:13] * elukey commutes to the office, brb in 15mins [08:31:28] (PS3) Joal: Include webrequest refine oozie job into load one [analytics/refinery] - https://gerrit.wikimedia.org/r/285998 (https://phabricator.wikimedia.org/T130731) [08:32:39] (CR) Joal: "Thanks @ottomata for spotting those :)" (2 comments) [analytics/refinery] - https://gerrit.wikimedia.org/r/285998 (https://phabricator.wikimedia.org/T130731) (owner: Joal) [08:37:18] elukey: you're working out of a coworking space or something? [08:37:37] mobrovac: I think you are right :) [08:38:05] mobrovac: yep correct! [08:38:07] coworking [08:38:12] nice [08:38:41] i'm part of a local NGO here that tries to open up a coworking space in my city [08:38:48] and we're close to getting it done [08:39:28] Analytics-Kanban: Upgrade scripts to facilitate wiki data loading / treatment on hadoop - https://phabricator.wikimedia.org/T132590#2250713 (JAllemandou) [08:39:29] Analytics-Kanban: Examine wikistats reports, make a summary of the most granular data needed that would serve all reports - https://phabricator.wikimedia.org/T131783#2250714 (JAllemandou) [08:39:32] Analytics-Kanban, Analytics-Wikistats, Reading-Admin: {lama} Wikistats traffic reports 2.0 - https://phabricator.wikimedia.org/T107175#2250715 (JAllemandou) [08:40:22] Analytics-Kanban, Operations, ops-eqiad: rack/setup/deploy aqs100[456] - https://phabricator.wikimedia.org/T133785#2243191 (fgiunchedi) >>! In T133785#2249043, @Ottomata wrote: > Ok! We discussed partitioning today. We'd like the following: > > - / a small (30G?) RAID 1 partition on the first 2 dr... [08:40:37] Analytics-Kanban: Examine wikistats reports, make a summary of the most granular data needed that would serve all reports - https://phabricator.wikimedia.org/T131783#2178448 (JAllemandou) We should consider working with Research (@Halfak in particular), is the realm of that project: https://meta.wikimedia.o... [08:41:03] mobrovac: it is really nice for me since I can preserve a bit of mental sanity [08:41:38] :) [08:41:50] Analytics-Kanban, Operations, ops-eqiad: rack/setup/deploy aqs100[456] - https://phabricator.wikimedia.org/T133785#2250727 (JAllemandou) Interesting @fgiunchedi. But what in case of failure, two instances down? [08:42:18] elukey: is there any left though? [08:42:20] :-P [08:42:46] ahhahah joal only a bit! I am protecting it with all my streghts! :P [08:43:22] Apr 29 07:48:00 kafka2001 systemd[102431]: Failed at step CHDIR spawning /usr/bin/python: No such file or directory [08:43:33] seems legit :P [08:43:44] really ? [08:43:52] No python in the kafka jungle? [08:43:57] but /usr/bin/python is present on kafka2001 [08:44:24] Ahhhh, here you go :) Somethimes well hidden, but always python [08:44:25] nono it seems a problem only on new new codfw-main cluster [08:44:29] oki [08:44:38] probably it is something to do with me [08:44:51] currenltly looking where is the issue :P [08:49:00] mobrovac: whenever you have 10 minutes I'd need some for event bus in codfw [08:49:25] elukey: i have now literally 10 min [08:49:35] euh, 9 actually [08:49:41] what's up? [08:54:03] mobrovac: do I need to do something special before the first deployment? Puppet complains about "failed: Execution of '/usr/bin/deploy-local --repo eventlogging/eventbus -D log_json:False' returned 70:" [08:54:25] *sigh* [08:55:00] that's a scap3 "feature" [08:55:04] :) [08:55:14] I am totally ignorant about scap and EB, just wanted to know if I was missing something before starting the journey [08:58:16] Analytics-Kanban, Operations, ops-eqiad: rack/setup/deploy aqs100[456] - https://phabricator.wikimedia.org/T133785#2250788 (fgiunchedi) @JAllemandou failure of which component? the other different thing for cassandra/restbase in production is that it maximizes available disk space, so ssds there for... [08:59:18] elukey: you need to add the servers in the https://gerrit.wikimedia.org/r/eventlogging/ repo to the scap/eventbus file [08:59:25] elukey: then pull that file on tin [09:00:16] elukey: you also need to be in the eventlogging gid on tin (look at ops/puppet/modules/admin/data) [09:00:38] elukey: then you need to probably manually connect from tin to the servers to accept the keys [09:00:47] *i think* [09:00:48] * elukey blames ottomata [09:00:53] then run the deploy from tin [09:01:05] thanks! Will double check, and maybe leave Andrew finish the work :P [09:01:28] probably better to ack the puppet failure in icinga and wait for him :D [09:01:50] i need to go out now, but will be back in an hour or so, and can probably help out [09:05:11] thanks! [10:27:17] elukey: made any progress? [10:31:12] mobrovac: fixed icinga config error, silenced/acked everything and filed https://gerrit.wikimedia.org/r/#/c/286136/ that should be what you suggested [10:31:22] yup, just merged it [10:31:24] :) [10:31:27] niceeee [10:31:58] ok, so now we'll do an experiment called "let's try to find out what 70 means to scap" [10:32:10] hahahaha [10:32:16] (btw it seems to report all errors with the code 70) [10:32:23] at least the ones i've seen thus far [10:32:53] i don't have the perms to deploy eventbus (which should probably be changed), so we'll need to do it together [10:33:37] sure [10:34:16] I guess that I should fetch/checkout /srv/deployment/eventlogging/eventbus/scap on tin first right? [10:34:25] ok, i've updated the repo on tin [10:34:28] elukey: did that [10:34:36] you are always one step ahead :P [10:34:47] haha [10:35:01] ok, elukey, now on tin go to /srv/deployment/eventlogging/eventbus [10:35:09] and issue "deploy -vf" [10:35:26] i'm there and using deploy-log which tails the output from the nodes [10:35:29] elukey: actually [10:35:37] let's try to deploy only to one node [10:35:50] it'll be easier than to have the output from 4 nodes [10:35:57] let me fix the file manually [10:36:25] elukey: ok, left only kafka2002.codfw.wmnet [10:36:32] you can start the deploy [10:36:45] should I sudo -u something or just with my cred? [10:36:58] no, no, do that as you [10:37:05] all right [10:37:26] 10:37:19 Finished Deploy: eventlogging/eventbus (duration: 00m 08s) [10:37:26] ok it worked! [10:37:31] \o/ [10:37:58] ok, i'll restore the dsh file back and we'll attempt a full deploy [10:38:04] ah you changed scap/eventbus [10:38:25] yup [10:38:30] elukey: ok, done, go ahead [10:38:32] deploy -fv [10:38:34] again :) [10:38:56] done! [10:38:59] and check! [10:39:15] now let's see if puppet complains [10:39:20] that was easy :) [10:39:26] famous last words ... [10:39:49] brace yourselves, winter is coming [10:39:53] haha [10:40:54] 12:39 RECOVERY - Check that eventlogging-service-eventbus is running on kafka2001 is OK: PROCS OK: 1 process with command name python, args [10:41:15] yuhuu [10:41:46] now it only needs LVS and it will be ready to go [10:41:59] will check after lunch what needs to be done! [10:42:59] cool [10:43:19] mobrovac: thanks a lot for the help! [10:43:32] don't mention it elukey [10:45:43] * elukey lunch! [10:46:21] elukey: before you go, can i bother you for a favour? [10:46:32] i'd need one command run on kafka1001 [10:46:41] (i don't have access there) [10:47:41] mobrovac: sure! [10:48:03] * elukey is not going to run rm -rf on behalf of mobrovac [10:48:36] hahaha [10:48:53] elukey: /srv/deployment/eventlogging/eventbus/bin/ensure-kafka-topics-exist [10:49:07] on 1001? [10:49:44] yup [10:50:20] IOError: [Errno 2] No such file or directory: './config/topics.yaml' [10:50:40] sigh [10:50:41] should I run it from event bus I guess [10:50:42] lemme check [10:50:46] ah yes [10:50:54] otto hardcoded that [10:50:56] yeah I saw the ./ afterwards [10:51:02] * elukey re-blames ottomata [10:51:13] haha [10:52:53] mmmm nope it seems not working [10:53:14] --topic-config TOPIC_CONFIG [10:53:40] uf [10:53:53] and I can see [10:53:54] ./docker/service/topics.yaml [10:54:03] ./config/schemas/config/eventbus-topics.yaml [10:54:03] lemme check the path [10:54:29] yes, it's that one [10:54:35] elukey: ./config/schemas/config/eventbus-topics.yaml [10:54:58] all right [10:55:57] mobrovac: https://dpaste.de/5VfN [10:56:08] better https://dpaste.de/5VfN/raw [10:56:31] topics created! yuhuu [10:56:35] cheers elukey [10:56:43] sorry to have kept you from your lunch [10:56:50] * mobrovac thought this would go much faster [10:56:59] no worriesssss [10:57:05] ttl! [10:57:11] buon appetito [10:57:19] grazie! [11:05:00] Analytics-Kanban, DC-Ops, EventBus, MediaWiki-Cache, and 5 others: setup kafka2001 & kafka2002 - https://phabricator.wikimedia.org/T121558#2251078 (mobrovac) [11:51:17] Analytics-Kanban, DC-Ops, EventBus, MediaWiki-Cache, and 5 others: setup kafka2001 & kafka2002 - https://phabricator.wikimedia.org/T121558#2251101 (elukey) - Icinga configuration updated - Added kafka200[12] to eventbus scap config (thanks to Marko) - Run puppet on both nodes, no errors Next ste... [13:02:48] Analytics-Kanban, Operations, Patch-For-Review: Upgrade stat1001 to Debian Jessie - https://phabricator.wikimedia.org/T76348#2251236 (elukey) [13:02:51] Analytics-Kanban, Operations, Patch-For-Review: Upgrade stat1001 to Debian Jessie - https://phabricator.wikimedia.org/T76348#798408 (elukey) [13:03:02] Analytics-Kanban, Operations, ops-codfw, Patch-For-Review: rack/setup/deploy conf200[123] - https://phabricator.wikimedia.org/T131959#2251238 (elukey) [13:03:39] joal: you there? [13:04:52] Analytics-Kanban: Fix Dashiki's metrics-by-project breakdown - https://phabricator.wikimedia.org/T133944#2251239 (mforns) p:Triage>Unbreak! a:mforns [13:05:13] Analytics-Kanban: Fix Dashiki's metrics-by-project breakdown - https://phabricator.wikimedia.org/T133944#2251243 (mforns) [13:19:07] elukey, yt? [13:19:28] elukey: Yes ! [13:21:53] o/ [13:22:04] joal: if you have time we can check aqs beta [13:22:12] mforns: o/ [13:22:29] hi elukey : [13:22:30] :] [13:22:38] elukey: Yay :) [13:22:45] Hi mforns :) [13:23:20] just to know you were there, I'm planning to deploy my EL changes, but I've seen that there are 7 other changes in the master to be deployed to EL, so I was waiting for you to be here before trying :] [13:23:28] elukey, ^ [13:23:31] hi joal ! [13:23:52] mforns: ack, you can proceed :) [13:23:56] hehe, cool [13:23:58] mforns: Good call ! on fridays, allways wait for an ops before breaking stuff ;) [13:24:05] hehe sure joal [13:24:13] THEORETICALLY no deployments should be done on a friday :P [13:24:18] mmmmm [13:24:25] elukey: Are we Friday? [13:24:27] I can wait till monday :P [13:24:58] elukey, do you want me to wait till monday? [13:25:03] np for me [13:25:18] mforns: jokes aside, if you need to deploy today no problem, otherwise monday looks better [13:25:46] joal: yesterday I didn't get the command that you were issuing on tin (for beta) [13:25:49] elukey, there's no rush, I just wanted to push the task to done. So monday it is! [13:25:56] super :) [13:25:59] :] [13:26:17] * joal loves inception :) [13:27:02] * mforns doesn't get the inception thing... [13:27:46] pushing ideas without actually telling them mforns :) [13:29:44] mforns: often doesn't work though, and explicit is needed :) [13:30:37] hehehehe [13:32:56] elukey: tell me when it's time for aqs-deploy :) [13:35:50] also elukey, would be interesting to get Filippo into a talk about partitions (see https://phabricator.wikimedia.org/T133785) [13:36:34] yep yep [13:36:35] :) [13:53:26] elukey: Oh, didn't saw your question ! So the command I ran on beta was deploy :) [13:54:05] I was about to write :) [13:55:22] I am a bit ignorant but.. what was the host? Still tin or another one? https://www.reddit.com/r/explainlikeimfive/ [13:56:03] deployment-tin.deployment-puppetmaster.deployment-prep.eqiad.wmflabs [13:56:15] deployment-tin.deployment-prep.eqiad.wmflabs sorry [13:56:52] aaaaaaaahhhhhhhhhhhhh [13:57:01] now it makez sensez [13:57:15] (explain like I am 5 is amazing btw, there are also other ones on reddit) [14:11:03] Analytics-Kanban, Patch-For-Review: Make webrequest load and refine jobs a single bundle - https://phabricator.wikimedia.org/T130731#2251334 (JAllemandou) [14:11:13] Analytics-Kanban: Standardise naming in oozie jobs (particularly for top level ones) - https://phabricator.wikimedia.org/T130732#2251336 (JAllemandou) [14:25:11] joal: just had a chat with hashar, basically if you run keyholder status on tin (deployment-prep) you'll see that we don't have keys for AQS [14:25:20] so probably ottomata knows [14:25:37] elukey: we'll ask him :) [14:28:10] currently checking the scap repo for AQS but it should be ok.. [14:28:16] feel free to raise that in #wikimedia-releng [14:28:31] the scap3 / keyholder folks are about to join (us west coast) [14:28:32] elukey: I don't think it was actually: missing src [14:28:44] hashar: Thanks :) [14:31:09] joal: I meant /srv/deployment/analytics/aqs/deploy/scap [14:31:26] elukey: my bad ! [14:32:43] mmm reading https://phabricator.wikimedia.org/T132267 [14:33:25] ahh and https://phabricator.wikimedia.org/T116206 [14:36:10] Analytics, Beta-Cluster-Infrastructure, Services, scap, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#1743135 (elukey) Quick note: we are getting the following failure while trying to deploy: ``` elukey@deployment-tin:/srv/deployment/analytics/aqs/deploy$ deploy 14... [14:36:53] joal: just commented ---^ [14:37:15] awesome elukey ! [14:37:18] Thanks :) [14:58:31] a-team: just a reminder that today you need to complete the peer review list and submit it :) [15:12:52] Analytics, Research-and-Data, Research-management: Draft announcement for wikistats transition plan - https://phabricator.wikimedia.org/T128870#2251439 (DarTar) wiki page on [[ https://www.mediawiki.org/wiki/Analytics/Wikistats/DumpReports/Future_per_report | dump reports ]] ready to go live, @ezacht... [15:14:01] (CR) Amire80: Add sorted errors (1 comment) [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/282228 (owner: Amire80) [15:17:12] Analytics, Beta-Cluster-Infrastructure, Services, scap, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#2251441 (thcipriani) Blerg. We really need to automate `known_hosts` for scap targets in beta. When connecting to a server for the first time, a fingerprint of the... [15:21:49] Analytics, Beta-Cluster-Infrastructure, Services, scap, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#1743135 (mmodell) {T72792} [15:47:34] joal: we should be able to deploy on aqs beta now, I tried but failed :P [15:47:45] so one step ahead but still not working completely [15:56:44] a-team, I won't make it to stand-up... sent an email with update, see you in a while [16:03:03] elukey: Managed to deploy succesfully on beta ! [16:03:20] BUT, looks like aqs is actually not running :) [16:03:39] goooood :P [16:14:33] Analytics-Kanban, Operations, ops-eqiad: rack/setup/deploy aqs100[456] - https://phabricator.wikimedia.org/T133785#2243191 (GWicke) The reason RESTBase (and many other Cassandra users) are using RAID-0 or JBOD is that it tends to provide more resilience and throughput at a given data duplication rati... [16:30:39] Analytics-Kanban, Operations, ops-eqiad: rack/setup/deploy aqs100[456] - https://phabricator.wikimedia.org/T133785#2251618 (Eevans) > The reason RESTBase (and many other Cassandra users) are using RAID-0 or JBOD is that it tends to provide more resilience and throughput at a given data duplication ra... [16:32:54] Analytics-Kanban, Operations, ops-eqiad: rack/setup/deploy aqs100[456] - https://phabricator.wikimedia.org/T133785#2251619 (GWicke) On the other hand, losing one of only three machines is a larger blast radius than losing one of five or so, which when using RAID-0 cost about the same. [16:53:09] Analytics, Beta-Cluster-Infrastructure, Services, scap, and 3 others: Set up AQS in Beta - https://phabricator.wikimedia.org/T116206#2251658 (elukey) Adding some info after the chat with @thcipriani - to avoid "Agent admitted failure to sign key" we had to add myself and @joal to the deploy-servi... [16:57:52] all right a-team, logging off! [16:57:59] bye elukey ! [16:58:01] have a great weekend, talk with you on Monday :) [16:58:07] nice weekend [16:58:08] :] [16:58:11] o/ [17:10:11] joal, webrequest and pageview_hourly do not have namespace information, right? [17:23:52] mforns: no [17:24:02] i think only way is to extract from url [17:24:07] madhuvishy, thanks! that's what I thought [17:24:10] aha [17:24:43] User: Special: etc I guess will show up in the article title [17:25:11] and I guess absence of it is main namespace? [17:25:23] https://www.mediawiki.org/wiki/Namespaces [17:25:37] uhhh https://www.mediawiki.org/wiki/Help:Namespaces [17:26:34] yeah so I guess, if you had article title and split them at :, provided the article already doesn't have a : in it's title (for main namespace articles), you could extract namespace [17:26:36] mforns: [17:26:55] madhuvishy, makes sense, it remains to see if there are no edge cases that are difficult to parse [17:27:07] mforns: I'm sure there are [17:27:13] there are also calls to the api that probably have the namespace in other places [17:27:29] hmm api calls aren't pageviews though right? [17:27:39] i'm not sure [17:27:49] madhuvishy, some are, from mobile app [17:27:53] right [17:27:53] I think [17:27:55] yeah [17:28:07] but doesn't pageview hourly extract article title? [17:28:20] madhuvishy, yes! [17:28:26] that makes sense [17:28:32] mforns: so I guess we could go from there [17:28:45] madhuvishy, so, yes maybe it is not so difficult, but... hehehe [17:28:48] ok thanks! [17:29:04] eedge case is indeed for mobile pageviews if namespace is not provided as part of the page title [17:29:07] mforns: he he - i think if you looked for a specific namespace like User: it's fairly easy [17:29:12] Which I don't know if it exists [17:29:13] for main namespace [17:29:15] its hard [17:29:16] aha [17:29:16] yeah [17:29:34] joal: aah [17:29:46] i didn't know mobile ones don't pass namespaces [17:30:02] madhuvishy: I'm not sure [17:30:10] hmmm [17:30:13] madhuvishy: it could be different, but I'm not sure [17:30:30] namespaces, and isUserLoggedIn are things that would be cool to have in our logs [17:30:30] however atricky part for any case is cross-language [17:30:56] namespaces have differnt "names" by languageb [17:31:10] true [17:31:16] https://es.wikipedia.org/wiki/Usuario:J.delanoy [17:32:21] So, to differenciate main namespace from others would be easy enough (restricting to desktop and mobile web), but identifying namespace would cost more [17:32:24] mforns: --^ [17:33:25] joal, of course! the language [17:33:28] makes sense [17:34:16] mforns: We could imagine getting namespaces names for some core languages for a very special case, but that's really complicated [17:35:12] joal, madhuvishy, I see, I am responding amir's email, thank you a lot for the enlightment :] feel free to correct me in the thread [17:35:21] ;) [17:35:36] Thank you mforns for taking the time to answer [17:36:19] Analytics-Kanban, Operations, ops-eqiad: rack/setup/deploy aqs100[456] - https://phabricator.wikimedia.org/T133785#2251772 (fgiunchedi) >>! In T133785#2250716, @fgiunchedi wrote: >>>! In T133785#2249043, @Ottomata wrote: >> Ok! We discussed partitioning today. We'd like the following: >> >> - / a... [17:37:23] np, I should do this more [17:43:32] Logging off a-team ! [17:43:43] Have a good weekend :) [17:43:51] joal, bye, nice weekend! [20:20:05] Analytics-Cluster, RESTBase-Cassandra, cassandra: Evaluate TimeWindowCompactionStrategy - https://phabricator.wikimedia.org/T133395#2252089 (Eevans) [20:26:29] Analytics, Discovery, Maps, RESTBase-Cassandra, and 2 others: Investigate and implement possible simplification of Cassandra Logstash filtering - https://phabricator.wikimedia.org/T130861#2252101 (Eevans) [20:26:54] Analytics, Discovery, Maps, RESTBase-Cassandra, and 2 others: Investigate and implement possible simplification of Cassandra Logstash filtering - https://phabricator.wikimedia.org/T130861#2148876 (Eevans) [20:37:20] Analytics-Cluster, RESTBase-Cassandra, cassandra: Standardized Cassandra dashboards - https://phabricator.wikimedia.org/T133403#2252136 (Eevans) [22:10:00] Analytics, Research-and-Data-Archive: Historical analysis of edit productivity for English Wikipedia - https://phabricator.wikimedia.org/T99172#2252520 (ggellerman) [22:11:16] Analytics, Research-and-Data-Archive: Historical analysis of edit productivity for English Wikipedia - https://phabricator.wikimedia.org/T99172#2252543 (ggellerman) Open>Resolved