[01:38:32] Analytics-Cluster, Analytics-Kanban: HDFS Blancing, productionize - https://phabricator.wikimedia.org/T94933#1176720 (kevinator) p:Triage>High [02:31:20] aagh [02:31:24] what the hell is up with hadoop?! [02:47:09] (PS1) Yurik: Insert leading zeroes for all stacked graphs [analytics/zero-sms] - https://gerrit.wikimedia.org/r/201645 [02:48:04] (CR) Yurik: [C: 2 V: 2] Insert leading zeroes for all stacked graphs [analytics/zero-sms] - https://gerrit.wikimedia.org/r/201645 (owner: Yurik) [13:57:58] Analytics-EventLogging, Analytics-Kanban, Wikimedia-Search: Estimate maximum throughput of Schema:Search (capacity) {oryx} - https://phabricator.wikimedia.org/T89019#1177856 (ggellerman) a:Nuria [14:14:44] Analytics-EventLogging, Analytics-Kanban, hardware-requests, operations: deploy eventlog1001 - https://phabricator.wikimedia.org/T90904#1177882 (Nuria) [14:22:17] nuria: as far as I can tell, everything on vanadium starts with a listener, so, i should be able to puppetize eventlog1001 exactly the same as step 1 [14:22:21] and nothing will happen [14:22:33] step 2 will be to change sources to point at eventlogging1001, and i think everything will just wokr [14:22:34] Great [14:22:44] also vanadium -hafnium [14:26:10] halfnium looks like it is just a consuemr, right? [14:26:16] so, when we are ready to switch, we switch it all at once [14:27:53] ottomata: yes, hafnium is a consumer [14:28:11] ottomata: it just has to be able to connect (firewall wise) [14:28:14] ottomata: also db [14:28:28] ottomata: as the new box needs to be able to connect to the new master [14:28:50] ottomata: well, not "new" anymore, but EL master [14:32:13] Analytics-EventLogging, Analytics-Kanban, operations: Upgrade box for EventLogging (vanadium) - https://phabricator.wikimedia.org/T90363#1177903 (Ottomata) Found the machine, finally! https://phabricator.wikimedia.org/T90904 On it... [15:24:57] Analytics-Cluster, Ops-Access-Requests, operations: Requesting access to analytics-users (stat1002) for Jkatz - https://phabricator.wikimedia.org/T94939#1178002 (Ottomata) Clarification: Jon is asking for 'analytics-privatedata-users' membership, not 'analytics-users'. [15:29:59] Analytics-Cluster, Analytics-Kanban: HDFS Blancing, productionize - https://phabricator.wikimedia.org/T94933#1178007 (Ottomata) a:Ottomata [15:36:24] ottomata:let me know if you need me to help you test the vanadium swap [15:37:03] joal: milimetric, nuria, mforns, if you have not read this article, you really should: https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying [15:37:13] very relevent [15:37:23] thanks ottomata [15:38:19] you can skip the first bit [15:38:22] start at Changelog 101: Tables and Events are Dual [15:38:22] if you want [15:39:17] Analytics-Cluster, Analytics-Kanban: HDFS Blancing, productionize - https://phabricator.wikimedia.org/T94933#1178013 (Ottomata) I am totally blancing the hdfs right now [15:39:26] haha [15:41:56] nuria, i'm pretty sure I can puppetize eventlog1001 with no produciton effects unless we change where data is going [15:41:58] so i'm going to do that now [15:42:06] ottomata: ok [15:53:12] ha, nuria, this eventlogging deploy process is dumb! [15:53:17] • Find the “update-eventlogging” (or a file with similar name) script in root's home directory on https://wikitech.wikimedia.org/wiki/Vanadium. [15:53:19] this is not puppetized! [15:53:32] ottomata: no it is [15:53:43] ottomata: i never use that [15:53:58] no? [15:54:00] you just do [15:54:01] evenloggingctl stop; python build; eventlogingctl start [15:54:05] that's it [15:54:05] python setup.py install [15:54:06] ? [15:54:11] python build?!?! [15:54:16] ya sorry [15:54:29] you mean python http://setup.py/ install [15:54:29] ? [15:54:34] evenloggingctl stop; python setup.py install; eventlogingctl start [15:54:34] ha [15:54:35] ok [15:55:07] ya, that script is not needed as EL is really managed by upstart, if you shut it down, it will re-start itself [15:56:24] I think is there to remember to do the python step (now, that to me is the odd part but i have never deployed a python client app before) [15:58:03] nuria: do you remember what the package name is that gets you zsub? [15:58:18] ottomata: arghhhh [15:58:34] Analytics-Cluster, Ops-Access-Requests, operations: Requesting access to analytics-users (stat1002) for Jkatz - https://phabricator.wikimedia.org/T94939#1178045 (Dzahn) p:Triage>Normal [15:58:56] found it [15:58:58] zpubsub [15:59:07] i think it is not available for trusty [15:59:12] good, was just looking [15:59:41] ok, so the new host has a newwer linux? [15:59:56] yes [16:01:43] welp, no zsub, i don't even know where that came from [16:01:44] haha [16:01:46] too bad [16:01:53] use eventloggging-consumer to subscribe :p [16:01:53] ottomata: https://github.com/atdt/zpubsub [16:01:58] ottomata: from ori [16:01:59] yeah, but there is a .deb pacakge [16:02:05] it is in our apt [16:02:14] but it depends on some other package that is not available in trusty [16:02:40] you can do this instead: eventlogging-consumer tcp://0.0.0.0:8600 stdout:// [16:02:41] :p [16:03:08] ok [16:03:18] so, eventlog1001 is up and eventlogging is running there [16:03:26] i tested the kafka part, since that doesn't need any new configs [16:03:36] eventlog1001 gets kafka messages from /beacon/event.gif [16:03:46] ottomata: can it connect to the db? [16:04:15] ottomata: i think this is the master: db1046.eqiad.wmnet [16:05:19] yes [16:05:22] just checked [16:07:08] nuria, i think this transition will be pretty smooth [16:07:11] ottomata: ok, how do we know that we can listen to varnishncsa? [16:07:33] ummm, well, vanrishncsa would have to be configured to send the stuff to eventlog1001 [16:07:35] hm [16:08:12] ottomata: right, that is the firewall stuff i was concerned about [16:08:14] oh actually, i can start up a secondary vanrishncsa process on one of the bits [16:08:16] and see what happens [16:08:19] i think it will be ok [16:08:52] ottomata: ok, sounds good, if you stop the db consumer we can test using logs, welll.. not only stop but stopping it from inserting [16:09:00] good idea [16:09:00] ok [16:09:10] ottomata: also permits, we need all permits from vanadium be transfered to this new machine [16:09:45] shoudl already be [16:09:46] try logging in [16:09:49] eventlog1001.eqiad.wmnet [16:10:02] how do I stop just one consumer? [16:10:57] kill -15 but wait we have to modify it so upstart doesn't start it [16:11:03] let me do that [16:12:17] ok... [16:12:25] can't I just tell upstart to stop it? [16:12:27] i just forget how [16:12:31] service eventlogging .... [16:13:38] start eventlogging/consumer NAME=graphite CONFIG=/etc/eventlogging.d/consumers/graphite [16:14:01] I just modified db consumer locally: [16:14:06] https://www.irccloud.com/pastebin/1AcBjRXy [16:14:14] haha [16:14:14] ok [16:14:23] so it would not "try" to accces db [16:14:34] so let's start /stop ALL and get a stream from bits [16:15:04] now [16:15:05] i got it [16:15:07] not stop all [16:15:08] just mysql [16:15:09] got it [16:15:38] ok, mysql not running, everything else iws [16:15:39] is [16:15:59] cool check it out [16:16:04] in client-side-events.log [16:16:07] i started a varnsihncsa [16:16:09] its working [16:16:12] going to stop it [16:16:33] ottomata: lemme see [16:16:52] i just stopped it, but you can see that events rolled in [16:17:26] ottomata: can i ssh into vanadium still? [16:17:39] yes [16:18:02] ottomata: argh, is taking foreverrrr [16:18:51] ok, let me look arround and get some stuff from vanadium that i can use that i have on my homedir [16:20:00] hah, ok [16:23:19] ottomata: ok, looking at logs in new machine [16:25:39] ottomata: looks good..... I think .... [16:26:22] ottomata: so what's next? [16:26:25] ottomata: hafnium? [16:26:50] next is a big puppet patch I have coming in a few mins [16:26:59] ottomata: k [16:29:20] ottomata: GOOD, alarms are working, so box has gotten puppet [16:29:59] ha, yup :) [16:30:08] nuria: https://gerrit.wikimedia.org/r/#/c/201724/ [16:32:02] ottomata: k, see that but cannot +1 as there is a lot of code there i do not understand [16:32:10] ottomata: rather "systems" [16:35:26] oh nuria, can you fix the new mysql consumer [16:35:28] we can start it back up now [16:35:37] ottomata: yes [16:37:23] ottomata: ok, starting/stopping [16:38:07] ottomata: ready [16:38:27] ottomata: also i ackoledge alarms on box [16:39:30] danke [16:41:12] nuria: I am trying to understand this mediaiwki udp2log error logging thing [16:41:21] it seems to mention vanadium, but i can't see how it actually uses it [16:41:41] ottomata: just saw that on your puppet change....no idea [16:41:44] there is an mwerrors process running [16:41:47] listening on a udp port [16:42:01] it then counts errors and sends to statsd [16:42:10] but i can't find out what sends to that port [16:43:31] ah ok, udp2log on fluroine does [16:43:32] ok [16:43:33] let me see [16:43:36] ok got it, this is fine. [16:43:40] the port in the commetns is just wrong [16:44:17] ah it is just the default mwerrors port [16:44:20] which is overrided to 8423 [16:44:21] ok [16:44:23] yes, this si fine [16:44:57] ottomata: ok, no idea [16:45:19] ok nuria, i think this should just work™ [16:45:26] ottomata: haha [16:45:43] if i merge this, puppet will run everywhere slowly [16:45:44] ottomata: or better, "we are almost done" [16:45:53] and logs will transfer over to eventlog1001 instead of vanadium [16:46:00] since vanadium and eventlog1001 can run at the same tie [16:46:03] that is kinda cool [16:46:09] eventually vanadium will not get anything anymore [16:46:12] and we can stop stuff there [16:46:14] ottomata: but there is the DB [16:46:18] won't matter [16:46:23] only one host will get a unique log [16:46:29] so, yes, both will be inserting [16:46:33] but each will insert unique logs [16:46:35] right? [16:46:38] ottomata: no [16:46:43] ottomata: because of batching [16:46:47] ? [16:46:49] why [16:46:58] ottomata: and a duplicate error in a insert(), (), () [16:47:00] log A -> vandium, log B -> eventlog1001 [16:47:11] both log A and log B go to mysql [16:47:11] will prevent the whose insert from succeding [16:47:14] ? [16:47:22] why would there be duplicates? [16:47:35] ottomata: because the insert is not sequencial [16:47:36] there will be no time in which both hosts will get the same logs [16:47:43] ottomata: ah then, yes [16:47:57] ottomata: sorry, i missundertood [16:48:05] ja, so [16:48:20] bitsA -> vanadium [16:48:20] bitsB -> eventlog1001 [16:48:25] consumers on both of those will still be inserting [16:48:30] but the logs will be unique on each of them [16:49:10] ok? [16:49:12] ok [16:49:17] now i get it, sorry [16:49:25] k cool, mergin! [16:49:39] good ol' friday afternoon migration, i love it. [16:50:19] ottomata: ya.. me too, i still ha PTSD from my pager at amazon [16:50:26] *i still have [16:50:49] ottomata: ok, let's baby sit this for a bit [16:51:58] ottomata: let me know when the bits split is done [16:53:39] hm [16:55:40] whoa, this is using systemd [16:55:43] varnishncsa [16:55:45] on caches [16:55:47] new things! [16:55:48] hmmm [16:57:07] systemd instead of upstart? [16:57:57] oof, yes, and i just caused some problems [16:58:06] somethign is not working, the new varnishncsa is not starting, but the other one stopped [16:58:11] i just fixed, but we def lots logs there. [16:58:17] i mean, ihaven't fixed yet [16:58:22] but, varnishncsa-vanadium is running [16:58:32] ahammmmm [16:58:33] can you double check that logs are coming into vanadium right now? [16:58:44] yes [17:00:09] ottomata: yes [17:01:45] ottomata: things are coming just fine, client side [17:01:51] ok [17:01:54] ottomata: how do we connect with serverside [17:01:56] we lost a few minutes there [17:02:00] oh. [17:02:03] i don't know. [17:02:03] ottomata: it's fine, it's tier-2 [17:02:14] i have no idea, do they have to change code? [17:02:24] ottomata: they come straight from mediawiki [17:02:30] aye [17:02:34] they probalby do have to change [17:02:48] (PS4) Jdlrobson: Update limn graphs (untested) [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/199162 (https://phabricator.wikimedia.org/T93690) [17:03:53] ottomata: man... let's look [17:04:22] i'm trying to fix varnishncsa right now... [17:05:23] ottomata: ok, no server side events on new box [17:05:33] ottomata: we need to figure out where those come from [17:05:36] aye [17:05:42] can you do that? i'm fighting this atm :) [17:06:39] (CR) Jdlrobson: [C: -1] "yeh this doesn't work" [analytics/limn-mobile-data] - https://gerrit.wikimedia.org/r/199162 (https://phabricator.wikimedia.org/T93690) (owner: Jdlrobson) [17:08:10] ottomata: yes, looking [17:21:59] nuria: a couple of the bits hosts are logging to eventlog1001 now [17:22:06] bblack is prepping a patch to fix the problem we just have [17:22:12] i will wait to apply his patch to the rest of the nodes [17:22:53] ottomata: ok, asked matt and php is logging directly to a udp socket for server side events: https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FEventLogging.git/37effccb844672536ac10ec132996565d9e36b6a/includes%2FEventLogging.php#L67 [17:23:34] Internal error [17:23:34] https://git.wikimedia.org/ [17:23:37] git.wm.og never works! [17:23:40] gimme github link [17:24:17] ottomata: yesssir [17:25:08] https://github.com/wikimedia/mediawiki-extensions-EventLogging/blob/master/EventLogging.php#L71 [17:25:34] hopefully this means something top someone that (unlike me) knows how mediawiki works [17:25:49] will dig some more [17:30:09] ottomata: will try to see where that udp multicast is configured [17:41:30] ottomata: found it! [17:41:32] ottomata: [17:41:39] https://www.irccloud.com/pastebin/bAOQQncK [17:42:07] we also need to change these settings in CommonSettings.php [17:42:27] nuria: no more client side events going to vanadium :) [17:42:47] ottomata: ok, good!! [17:42:57] ottomata: let me check db is updated [17:43:24] ottomata: server side comes from the udp socket i pasted above [17:43:45] ottomata: and that lives on mediawiki config [17:43:59] hehe nuria...maybe we should make mw use kafka ! :p [17:44:01] ottomata: operations/mediawiki-config.git [17:44:13] the php logger, RIGHTTTTTT [17:44:13] nawww... [17:44:13] jk [17:44:44] this is php "looking" like it logs to a file but nono, it's really udp [17:45:08] probably someone in ops knows the rules of deploying that depo [17:46:10] ottomata: do you? [17:46:40] not really. [17:46:43] but i will submit patch and ask [17:47:09] ottomata: ok [17:47:18] ottomata: checking db counts for client side events [17:49:52] nuria: server side events now going to eventlog1001! [17:49:54] that wsa fast! [17:50:31] ottomata: ninja-deployment! [17:51:05] ottomata: nice !!!!! [17:51:09] i see thme now [17:51:35] Where is the documentation on how to get access to the EventLogging DB? [17:54:48] kevinator: https://wikitech.wikimedia.org/wiki/EventLogging#Access_to_EventLogging_data [17:54:59] ottomata: lemme look at db [18:03:54] nuria: thanks, but I need something with steps like: ssh stat1003, mysql xyz; use log [18:04:24] should I log into stat 1002 or 1003? [18:04:43] kevinator: you can edit that wiki and make it more explicit, ssh to 1003 [18:04:53] sample query: mysql --defaults-extra-file=/etc/mysql/conf.d/research-client.cnf --host dbstore1002.eqiad.wmnet -e "select left(timestamp,8) ts , COUNT(*) from log.NavigationTiming_10785754 where timestamp >= '20150402062613' group by ts order by ts"; [18:04:58] will do [18:07:01] nuria: it worked... editing the wiki now [18:07:39] nuria: looking good? [18:18:26] Analytics: As an end user, i'd like an up to date set of documentation on how to access our event logging data, so that I can make data informed decisions - https://phabricator.wikimedia.org/T95027#1178583 (Tfinc) NEW [18:19:24] Analytics: As an end user, i'd like an up to date set of documentation on how to access our event logging data, so that I can make data informed decisions - https://phabricator.wikimedia.org/T95027#1178597 (Tfinc) Kevin pointed me to https://wikitech.wikimedia.org/wiki/EventLogging#Access_to_EventLogging_dat... [18:24:28] ottomata: i think so.... [18:24:36] ottomata: (famous last words) [18:25:12] i think so ™ [18:26:14] Analytics-EventLogging, Analytics-Kanban, operations, Patch-For-Review: Upgrade box for EventLogging (vanadium) - https://phabricator.wikimedia.org/T90363#1178620 (Ottomata) Things are looking good! eventlog1001 has fully taken over all vanadium duties. I will leave vanadium up for now, and decomm... [18:29:14] Analytics: As an end user, i'd like an up to date set of documentation on how to access our event logging data, so that I can make data informed decisions - https://phabricator.wikimedia.org/T95027#1178638 (Nuria) I updated the doc a bit... there is not much more to add I think: https://wikitech.wikimedia.o... [18:30:09] ottomata: haha, ok, let's make sure to check graphite [18:30:30] ottomata: are client side streams still coming into vanadium [18:31:09] no [18:32:19] nuria: qq, how are eventlogging schema versions resolved? [18:32:23] i see the schema name in the data [18:32:30] but not a schema revision [18:32:36] ottomata: version is part of incoming event [18:32:37] oh [18:32:37] sorry [18:32:38] is that [18:32:38] http://confluent.io/docs/current/installation.html#installation-apt [18:32:40] oops [18:32:43] "revision": 11331974, [18:32:47] sorry, didn't see it. [18:32:47] :) [18:32:50] yes [18:35:00] ottomata: did you stopped bits events from coming into vanadium? [18:36:20] yes [18:42:39] ottomata: ok, want to talk about stratta or you are busy with something else (either way is fine) [18:43:07] today would be good, gimme a few mins [18:50:04] Analytics: As an end user, i'd like an up to date set of documentation on how to access our event logging data, so that I can make data informed decisions - https://phabricator.wikimedia.org/T95027#1178746 (Tfinc) Yup, that's what i needed. Is there a CNAME for the used analytics server? That way if anything... [18:51:04] ok nuria [18:51:06] batcave? [18:51:09] actually wait [18:51:13] lemme get a coffee refill... [18:51:15] ottomata: sure give me 5 mins [19:00:29] ottomata: ready when you are [19:03:29] k here [19:04:24] nuria: [19:05:08] ottomata: going [19:07:16] Analytics-Cluster, Analytics-Kanban: Update Pageview UDF with dialect-specific directories {hawk} - https://phabricator.wikimedia.org/T92020#1178814 (JAllemandou) Discussed in standup this morning with the team : we are gonna live it as it is now. zh-(hans|hant) folder were not present in pageviews on the... [19:08:05] Analytics-Cluster, Analytics-Kanban: Story: AnalyticsEng has hourly PageView counts {hawk} - https://phabricator.wikimedia.org/T69804#1178818 (JAllemandou) [19:08:07] Analytics-Cluster, Analytics-Kanban: Update Pageview UDF with dialect-specific directories {hawk} - https://phabricator.wikimedia.org/T92020#1178817 (JAllemandou) Open>declined [19:25:02] milimetric, yt? Does https://edit-analysis.wmflabs.org/compare/ work for you? [19:50:47] Analytics, MediaWiki-API-Team, MediaWiki-Authentication-and-authorization: Create dashboard to track key authentication metrics before, during and after AuthManager rollout - https://phabricator.wikimedia.org/T91701#1178891 (Tgr) a:Tgr [19:54:25] gahhhh [19:54:31] i enabled two factor auth on wikitech [19:54:32] what's wrong [19:54:33] now I can't log in :( [19:54:47] need me to do something to an instance? [19:55:10] milimetric: do you use two factor auth? [19:55:17] no [19:56:16] i did it for some reason, to have some extra priviledge [19:56:18] i was able to log in last time [19:56:20] now i have no idea [19:56:30] heading to #labs [20:06:16] Bye all, good weekend :) [20:13:42] byaaa [21:20:07] Analytics-EventLogging, Analytics-Kanban, operations, Patch-For-Review: Upgrade box for EventLogging (vanadium) - https://phabricator.wikimedia.org/T90363#1179257 (yuvipanda) Open>Resolved Yay :) [21:45:17] (PS13) Milimetric: Implement A/B comparison [analytics/dashiki] - https://gerrit.wikimedia.org/r/198169 [21:46:00] (CR) Milimetric: Implement A/B comparison (2 comments) [analytics/dashiki] - https://gerrit.wikimedia.org/r/198169 (owner: Milimetric) [21:46:41] have a nice weekend everyone [22:43:35] (CR) Nemo bis: "> Oh! And getting the apps team to spit out pageid/ns in the same way MediaWiki does." [analytics/refinery/source] - https://gerrit.wikimedia.org/r/193985 (owner: OliverKeyes) [22:48:22] (CR) OliverKeyes: "No idea; Otto was reaching out to Dan, I think." [analytics/refinery/source] - https://gerrit.wikimedia.org/r/193985 (owner: OliverKeyes) [22:49:25] anyone around to ask about cohortList.js on wikimetrics? [23:41:41] Analytics-Engineering, Analytics-Wikimetrics, Community-Wikimetrics: User reads result of validation after creating a cohort - https://phabricator.wikimedia.org/T76914#1179578 (Fhocutt) I've been trying to make sense of what cohortList.js is doing, and I don't think it's correctly getting data into th... [23:48:10] Analytics-EventLogging, Ops-Access-Requests, operations: Grant user 'tomasz' access to dbstore1002 for Event Logging data - https://phabricator.wikimedia.org/T95036#1179580 (Dzahn) [23:59:44] (PS1) Yurik: Resorted 'via' based on 'a'=zero, 'b'=non-zero [analytics/zero-sms] - https://gerrit.wikimedia.org/r/201873