[00:50:31] jdlrobson: password changed, you should always use "mysql --defaults-file=/etc/mysql/conf.d/research-client.cnf" instead of copying the password [00:50:41] jdlrobson: that file is updated anytime the password changes [00:50:49] (it was shared publicly, that's why it was changed) [09:32:36] 10Analytics-Kanban, 10User-Elukey: dbstore1002 /srv filling up - https://phabricator.wikimedia.org/T168303#3455713 (10elukey) If I made correct calculations the top 20 tables of the log database on store/slave weight ~2.6TB, so even cutting their size in a half would be a massive win for us. Just sent an email... [12:49:58] so something interesting discovered today is that the eventbus hosts are making a ton of DNS requests constantly for statsd.eqiad.wmnet [12:50:01] https://github.com/wikimedia/eventlogging/blob/be89e0fd08e0d175364c3fe6e0f288f231724576/eventlogging/handlers.py#L569 [12:50:32] we added the IP to /etc/hosts on kafka2002, depooled acamar again and it didn't fail the Pybal checks [12:59:16] huh. [12:59:25] ottomata: o/ [12:59:54] we just tried to apply the same fix to all kafka2*, no outage when acamar was depooled [12:59:57] you sure that's code where the request comes from though? [13:00:18] it is the only one that resolves statsd afaics [13:00:20] no? [13:00:48] statsd_writer should only be used it if is configured as an eventlogging output [13:01:03] and maybe that hostname is not statsd, mmmmm [13:01:03] also [13:01:05] weird [13:01:09] no, that should be [13:01:10] but [13:01:16] that line will only be executed once [13:01:21] that is a coroutine [13:01:33] only the stuff in the while 1: part would be looped [13:01:50] but, even so, eventbus doesn't use statsd_writier, i think we don't use that anywhere anymore [13:02:16] but [13:02:29] service.py uses a statsd tornado plugin to send request stats [13:02:44] https://github.com/sprockets/sprockets.mixins.statsd [13:04:34] Note that the socket for [13:04:35] communicating with statsd is created once upon module import and will not [13:04:37] change until the application is restarted or the module is reloaded [13:05:17] maybe there is something that prevents that IP to be cached [13:05:20] as it is stated [13:05:29] anything else interested in statsd ottomata ? [13:07:55] on eventbus hosts? I don't think so [13:08:21] so if the statsd writer is not used (and appartently it is efficient) it might be sprockets.mixins.statsd? [13:08:52] iirc that's the only thing that would talk to statsd, unless node-rdkafka doe? [13:08:55] s? [13:08:57] looking [13:08:59] sorry [13:08:59] haa [13:09:02] not node-rdkafka [13:09:05] kafka-pythong [13:10:16] yeah, pretty sure its just the sprockets mixin [13:10:25] oh [13:10:31] jmxtrans [13:10:31] ? [13:10:36] but that's not eventlogging eventbus [13:10:49] since the kafka brokers run there too [13:11:18] jmxtrans runs and there and sends jmx metrics [13:11:19] to statsd [13:11:29] but that's not hte eventlogging process [13:12:11] let me check on kafka1001 the process that does the queries, didn't check before [13:13:12] not easy to get though [13:16:58] so I used netstat -aup [13:17:15] and I can see a ton of things for jmxtrans [13:17:31] let me stop jmxtrans on kafka1001 to double check [13:18:01] but it wouldn't make any sense [13:19:20] ok in fact it is not that one [13:19:29] :D [13:20:12] nono it must be eventbus [13:21:23] plus in graphite I can see "service" populated for eventbus [13:21:40] kafka goes through jmxtrans that is fine [13:22:54] ya then it must be the sprockets mixin [13:31:07] ottomata: can it be possible that the code related to sprockets.mixins is executed multiple times rather than only once [13:31:10] ? [13:32:03] I am thinking out loud, this thing doesn't make much sense :D [13:35:28] The default statsd server that is used is ``localhost:8125``. The [13:35:28] ``STATSD_HOST`` and ``STATSD_PORT`` environment variables can be used to [13:35:31] set the statsd server connection parameters. [13:35:40] but I can't find any trace of them in the plugin code [13:37:37] looking [13:37:42] i don't know much about how the mixin thing works [13:37:43] so it is possible [13:40:57] might be in sprockets.clients.statsd that is a dependency [13:42:07] https://github.com/sprockets/sprockets.clients.statsd/blob/master/sprockets/clients/statsd/__init__.py [13:42:22] yaeh [13:42:25] https://github.com/sprockets/sprockets.clients.statsd/blob/34daf6972ebdc5ed1e8fde2ff85b3443b9c04d2c/sprockets/clients/statsd/__init__.py [13:42:25] oh haha [13:42:27] we foudn the same :) [13:43:10] hmmmm yeah elukey, it looks like it makes the socket cone, but without the addr [13:43:21] the STATSD_ADDR is provided in the sendto call [13:43:25] maybe that is doing a lookup [13:43:41] yeah I wanted to say the sme [13:43:42] same [13:43:51] maybe it is so stupid that needs a raw IP [13:44:56] yeah [13:45:03] connection_string = os.getenv('STATSD') [13:45:03] if connection_string: [13:45:03] url = urlparse.urlparse(connection_string) [13:45:04] STATSD_ADDR = (url.hostname, url.port) [13:45:11] elukey: i betcha if we did the lookup ourself [13:46:15] looking for where eventlogging ets STATSD or STATSD_HOST [13:46:48] ah, its in systemd [13:46:49] env var [13:46:54] Environment="STATSD_HOST=statsd.eqiad.wmnet" [13:47:07] ya elukey, i betcha if we made eventlogging override that env var with a ip [13:47:11] doing the host lookup ourself [13:47:18] socket.sendto will stop doing that [13:47:19] but this is crazy! [13:47:20] for every metric [13:47:34] sigh [13:47:35] elukey: i think its a udp socket python thing [13:47:54] if the socket.sendto is given a hostname [13:47:55] I'd expect the plugin to make the dns resolution [13:47:59] its gotta do a dns lookup, ya? [13:48:02] if it finds that it is not an IP [13:48:06] oh yes [13:48:11] or, make a socket with an endpoint once [13:48:47] else: [13:48:47] STATSD_ADDR = (os.getenv('STATSD_HOST', 'localhost'), [13:48:48] int(os.getenv('STATSD_PORT', 8125))) [13:48:48] oh elukey [13:48:54] this one is flawed [13:49:02] that's exactly what the eventlogging statsd_writer does [13:49:05] addr = socket.gethostbyname(hostname), port [13:49:06] then [13:49:08] in the loop [13:49:11] exactly [13:49:12] sock.sendto(stat.encode('utf-8'), addr) [13:49:17] but with the IP [13:49:20] ya [13:49:24] ok we found it [13:49:38] elukey: i can make a quick patch to el for ya [13:50:15] please do it mr ottomata [13:50:42] (03PS2) 10Milimetric: Implement Wikistats metrics as Druid queries [analytics/refinery] - 10https://gerrit.wikimedia.org/r/365806 (https://phabricator.wikimedia.org/T170882) [13:51:15] ottomata: task is https://phabricator.wikimedia.org/T171048 [13:51:17] going to update it [13:57:49] elukey: you have a way to test/repro this in prod? [13:57:58] cause I can apply manually on a codfw host and we can double check if my fix works [13:58:28] 10Analytics, 10EventBus, 10Operations, 10User-Elukey: Eventbus does not handle gracefully changes in DNS recursors - https://phabricator.wikimedia.org/T171048#3456304 (10elukey) Me and @ema set up an experiment, namely adding `10.64.32.155 statsd.eqiad.wmnet` in /etc/hosts on kafka2002 (and not on the othe... [13:59:05] ottomata: sure, just tcpdump and see no statsd.e.w related traffic :D [13:59:19] ema applied a hack to /etc/hosts [13:59:29] so make sure that there is nothing in there [13:59:38] elukey: https://gerrit.wikimedia.org/r/#/c/366561/ [14:00:24] yeah it looks good [14:00:43] but then does it mean restarting the service if we change statsd.eqiad.wmnet's IP in DNS? [14:00:53] ya it would [14:00:55] yes.. [14:01:00] hello ema :) [14:01:02] o/ [14:01:04] :) [14:01:50] ottomata: mmmm one thing - the "localhost" default is necessary? [14:02:04] no, but it is what the lib defaults to [14:02:20] hmm, the service supports some sighup reloading [14:02:30] yeah but I'd let it confined in the lib.. we might just say [14:02:31] ahhh but it doesn't matter [14:02:35] if os[]: then .. [14:02:37] because the lib caches the addr [14:02:54] i suppose, sendto.localhost isn't going to do remote dns [14:03:00] ok [14:03:04] but it will fail brutally :D [14:03:36] also say we put an IP for STATSD_HOST [14:03:55] will gethostbyname be happy? [14:04:05] checking [14:04:22] i checked that [14:04:25] it just returns the ip [14:04:45] super [14:05:08] ok, elukey i can apply this to a codfw box in prod [14:05:09] buuuut [14:05:21] can we verify that it actually solves the problem? [14:05:51] ottomata: sure, it is sufficient to depool acamar [14:06:05] ok cool, applying on kafka2001 [14:07:14] removed ema's statsd override from /etc/hosts [14:07:35] so I was thinking of something similar for T151643, but perhaps caching the result and resolving every now and then would be better than doing it one-off? [14:07:39] T151643: python-varnishapi daemons seeing "Log overrun" constantly - https://phabricator.wikimedia.org/T151643 [14:08:15] also see T104442 [14:08:15] T104442: Investigate better DNS cache/lookup solutions - https://phabricator.wikimedia.org/T104442 [14:08:17] ema: yeah i thought about putting it in a sighup handler that is already there [14:08:17] but [14:08:23] the lib caches the value of the env var itself [14:08:32] yeah [14:08:42] https://github.com/sprockets/sprockets.clients.statsd/blob/34daf6972ebdc5ed1e8fde2ff85b3443b9c04d2c/sprockets/clients/statsd/__init__.py#L44-L52 [14:08:49] altough, ahha, it is global [14:08:51] so we could keep hacking [14:08:51] :p [14:08:52] :/ [14:08:55] nope :D [14:09:13] I think that this is a good short/medium term solution [14:09:15] agree [14:09:33] maybe we can make a pull request on the lib and it will get be merged [14:09:35] who knows [14:09:54] I am doing it, this is a bug in their code [14:10:27] let's see first if we fix the issue [14:10:29] cool thanks [14:10:33] ok restarting eventubs on kafka2001 [14:11:13] tcpdump with host statsd.eqiad.wmnet still shows stuff, buuuut, i betcha tcpdump is smart [14:11:22] of course it is :) [14:11:28] elukey: depool acamar? [14:12:36] elukey: when we are done here, lets do ops sync :) [14:14:16] so I don't see anything more with sudo tcpdump -vvv -l -n dst port 5 [14:14:19] that is super good :) [14:14:53] ottomata: since it might cause a little outage, can we apply the fix to kafka200[23] before depooling acamar? [14:15:06] sure, few mins [14:15:07] maybe to 2/3 hots [14:15:11] *hosts [14:15:12] super [14:18:38] elukey: 2/3 hosts so we can see the 3rd fail? :) [14:20:27] yep! [14:21:29] ok done elukey [14:21:33] applie on 2001 and 2002 [14:21:39] 2003 left running as is [14:22:13] ema: do you have a minute to depool acamar? [14:23:23] (removed entries in /etc/hosts on kafka200[23]) [14:28:27] elukey: sure [14:28:39] <3 [14:31:11] elukey: you keep an eye on lvs2003? [14:31:35] yep [14:32:29] elukey: I'm seeing the timeouts on kafka2003 (resolving cp1008) [14:32:32] Jul 20 14:32:23 lvs2003 pybal[45967]: [eventbus_8085] ERROR: Monitoring instance ProxyFetch reports server kafka2003.codfw.wmnet (enabled/up/pooled) down: Getting http://localhost/v1/topics took longer than 5 seconds. [14:32:42] that's the only one without the fix \o/ [14:32:49] ottomata: --^ [14:32:49] great [14:33:04] ema: feel free to repool [14:33:24] ok repooling [14:34:14] ok cool [14:34:26] i'm going to do some EL deploys today anyway [14:34:28] will do ones for eventbus too [14:34:30] maybe that first [14:34:42] elukey: merge my change? [14:35:06] ottomata: +2 [14:35:42] ottomata: the tests failing are expected right? Still need to fix them? [14:35:46] cool [14:35:52] elukey: let's do ops sync while we stil lhave time :) [14:36:00] (thanks ema! ) [14:36:12] am in bc [14:36:17] pleasure! [14:36:35] ottomata: let me go outside the co-working, a lot of noise, 1 min [14:37:20] k [14:37:29] mforns: can you log into deployment-eventlogging03 ? [14:37:42] ottomata, trying [14:38:00] I used to [14:38:08] i wonder if its disk is full again or something [14:38:11] ottomata, no, I can not [14:38:20] aye, i can log into other deployment-prep instances fine [14:38:25] Permission denied (publickey). [14:38:27] maaan, i want to wipe that one and set up a new beta el [14:38:45] mforns: objections to a shorter name? [14:38:49] deployment-el01 maybe? [14:38:49] no [14:38:56] or [14:38:59] maybe we keep it descriptive [14:39:02] deployment-eventlog01 [14:39:05] like in prod sorta [14:39:10] the no was to: no objection [14:39:15] k cool [14:39:48] ottomata, yes, deployment-eventlog01 is good imo [14:39:59] k [14:51:56] 10Analytics, 10Scoring-platform-team-Backlog, 10revscoring, 10artificial-intelligence: [Investigate] Hadoop integration for ORES training - https://phabricator.wikimedia.org/T170650#3438153 (10Halfak) [15:00:38] ping ottomata [15:01:09] ping elukey [15:21:32] 10Analytics, 10EventBus, 10Operations, 10Patch-For-Review, 10User-Elukey: Eventbus does not handle gracefully changes in DNS recursors - https://phabricator.wikimedia.org/T171048#3452124 (10fgiunchedi) FYI caching statsd name forever also has its problems when failing over statsd as many services need re... [15:31:58] hello [15:32:26] on beta cluster I have removed a bunch of role::eventlogging::analytics::* classes which were applied project wide ( via https://horizon.wikimedia.org/project/puppet/ ) [15:32:33] that broke puppet on most instances [15:32:44] ottomata: for after your meeting :D [15:32:59] the class should be applied per prefix or per instance :} [15:45:03] oh sorry [15:45:05] just saw these [15:45:08] responded in mwsec too [15:45:13] they were appplied project wide?!! [15:45:22] OH MY GOODNESS [15:45:23] sorry hashar [15:45:28] that's my fault [15:45:28] i just applied those [15:45:32] but i meant to apply it to JUST MY NEW NODE [15:45:35] i was on the wrong tab [15:46:12] thank you hashar that was the right thing to do [15:48:58] ping elukey [15:59:38] 10Analytics, 10EventBus, 10Wikimedia-Stream: Add tags to recentchange stream - https://phabricator.wikimedia.org/T171182#3456794 (10Nirmos) [16:00:10] ottomata: some classes where applied project wide [16:00:13] ottomata: :-} [16:00:17] no worries! [16:15:38] 10Analytics, 10Discovery, 10Discovery-Analysis: Public folder on stat1005 for Discovery's A/B test reports - https://phabricator.wikimedia.org/T171187#3456874 (10mpopov) [16:35:05] 10Analytics, 10Discovery, 10Discovery-Analysis: Public folder on stat1005 for Discovery's A/B test reports - https://phabricator.wikimedia.org/T171187#3456965 (10Ottomata) Hm, would you mind if the report url was just analytics.wikimedia.org/datasets/discovery/reports/ ? If not, you could put stuff in publi... [16:48:52] * elukey off! [16:55:52] nuria_: hash ar had me make a new instance, and now i can log in [16:56:06] going to see if a reboot of the old one works [16:56:20] ottomata: k, so we scrape the old one, we need to update docs (i will do that) how is the new one called? [16:56:20] then i don't have to bother with making a new one (today) [16:56:30] lemme try fixing the old one, it sounds like a reboot might fix it [16:56:34] ottomata: k [16:56:57] elukey:, mforns : i think we can entirely delete this data too: https://meta.wikimedia.org/wiki/Schema_talk:PageCreation [16:57:04] as it is on data lake [16:57:12] have sent aaron a note about it [16:57:15] aha [16:58:00] nuria_, the only difference is the availability no? data lake is updated monthly [16:58:16] mforns: no, we also have page-creation from mw now [16:58:25] mforns: which is amuch better vs of that schema [16:59:46] mforns: see: https://github.com/wikimedia/mediawiki-event-schemas/tree/master/jsonschema/mediawiki/revision/create [17:00:43] 10Analytics, 10Discovery, 10Discovery-Analysis: Public folder on stat1005 for Discovery's A/B test reports - https://phabricator.wikimedia.org/T171187#3457066 (10mpopov) 05Open>03Resolved a:03mpopov We're okay with that :) [17:00:47] nuria_, of course! [17:06:30] nuria_: oh, what did we decide? i should just stop tranquility, right? [17:11:43] ottomata: yes please [17:11:49] ottomata: webrequest is updated hourly [17:11:53] ottomata: and daily [17:13:27] ok [17:14:52] !log killed tranquility instances tranq-banners and tranq-netflow running on druid1003 in joal's screen sessions [17:14:53] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:24:50] hey elukey fyi [17:24:56] i see in my.cnf for db1046 [17:25:03] character_set_server = binary [17:25:04] character_set_filesystem = binary [17:25:04] collation_server = binary [17:32:38] nuria_: do you know how to enable tokudb? [17:32:40] i get [17:32:48] Can't open shared library '/usr/lib/mysql/plugin/ha_tokudb.so' (errno: 17, cannot open shared object file: No such file or directory) [17:32:51] whenn i configure it in my.cnf [17:35:44] ottomata: yessir [17:35:54] ottomata: it is on wikitech letmme see [17:36:21] ottomata: /etc/init.d/mysql start--default-storage-engine=tokudb [17:36:32] https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/TestingOnBetaCluster [17:36:55] ottomata: let me try to restart mysql on beta [17:37:17] naw nuria_, something is wrong [17:37:19] Can't open shared library '/usr/lib/mysql/plugin/ha_tokudb.so' [17:37:26] i've got it configured as default storage [17:37:32] ottomata: new machine or old machine? [17:37:35] new machine [17:37:39] old machine is locked [17:37:40] can't get in [17:37:44] ottomata: yes [17:37:57] ottomata: what is new machien? [17:38:02] *machine's name [17:38:25] deployment-eventlog02 [17:39:31] i'm going to proceed for now without toku [17:40:08] ottomata: i think insertions will not work [17:40:15] ? they'll work [17:40:16] sure [17:40:18] they work in vagrant [17:40:21] no toku there [17:40:46] ottomata: ok, the el code specifies that engine but maybe it doesn't fail, ok [17:41:16] it falls back to innodb [17:41:46] k [17:41:47] pretty sure [17:42:39] ottomata: does this machine have a different os? [17:42:53] no [17:42:54] trusty [17:43:11] aham [17:43:13] what innodb not supported?! [17:44:00] wait no [17:44:03] its ok, i had it misconfigured [17:44:04] cool. [17:44:42] ottomata: k, you did installed mysql w/o puppet right? [17:45:41] ya [17:45:49] maridb-server-5.5 [17:46:48] ha, sigh scap never works [17:46:52] when i want it to :) [17:48:58] ottomata: it never works on beta labs [17:50:02] ottomata: k, let me know when you are done testing and i will try to install tokudb , i think i did this on teh older one [17:52:46] wow nuria_ no wonder the old one kept filling up [17:52:54] ottomata: yes? [17:53:16] lots of events [17:53:20] all thos mobilewikiapp things [17:53:28] ottomata: ya, specially page create, they are driven by tests [17:54:31] ottomata: what could we do here? disable mysql consumer so they only go to files? [17:54:54] it goes to files too [17:55:10] ottomata: turn off el and only turn it on at will? [17:56:50] sample? [17:56:56] naw that would be annoying [17:57:03] have them not send so many events? [17:59:34] ottomata: code is the same [18:00:24] ottomata: so they are sensing at an equal rate that prod events , more so, as tests execute some code over 7 over [18:00:50] aye yikes [18:00:59] ottomata: we just need to run the event logging clearner script with less tha 1 week [18:01:14] ottomata: and deleted anything that is older [18:01:18] ottomata: right? [18:01:47] ottomata: files will be moved to archive and maybe we can also delete those in a a more frequent schedule [18:02:21] ottomata: does running el cleaner script seem like a sane solution? [18:02:48] ottomata: could we setit up on puppet so it runs on beta with a smaller frequency? even on a cron [18:04:07] yae [18:05:19] ottomata: ok, let's do that then [18:05:26] ottomata: shoudl i file a ticket? [18:05:45] ya sure [18:07:21] 10Analytics, 10Analytics-EventLogging: Run eventlogging purging script on beta labs to avoid disk getting full - https://phabricator.wikimedia.org/T171203#3457322 (10Nuria) [18:07:25] ottomata:k, done [18:10:35] yagh so i'm trying to test the whole pipeline, and the mirror maker from main to analytics kafka clusters doesn't seem to be owrking inbeta right now [18:10:36] yarrr [18:13:20] ottomata: ARGH [18:13:32] ottomata: that was working as of me testing page-create fixes last week [18:13:43] oh really? [18:13:45] tha'ts cool [18:13:49] hm [18:13:49] then whyyyy [18:13:50] ottomata: as in [18:13:53] eventbus is main kafka is working [18:13:56] ottomata: i saw events flowing in np [18:14:01] their just not going to the kafka cluster the eventloging subscribes to [18:14:53] AHhttps://deployment.wikimedia.beta.wmflabs.org/wiki/Test03 [18:14:55] ah [18:14:56] wait i think its going now [18:15:04] ok, mirror maker needed a bump on both servers [18:17:19] AHHH [18:17:23] ottomata: k, good, cause otherwise .. man [18:17:24] i need to delete the other el box [18:17:29] nuria_: i'm doing that, s'ok? [18:17:32] ottomata: yes please, i will update docs [18:17:34] its actually running,a nd consumers there are balancing [18:17:40] taking over my new ones :) [18:18:07] !log deleted instance deployment-eventlogging03 in favor of new instance deployment-eventlog02 [18:18:08] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:18:18] ottomata: k, i will also try to install tokudb when you are done with testting [18:20:25] nuria_: ungh, i don't know what is happening, now its not working [18:20:29] eventbus says its working fine [18:20:38] but i can't consuem te data from kafka, even from main kafka clsuter [18:20:41] ottomata: but events are not consumed on EL box? [18:20:43] i can see eventbus getting my events fine [18:20:53] no i can't even consume in kafka [18:20:56] it was working [18:20:57] it just stopped [18:21:06] ottomata: and kafkacat on new box can connect either? [18:21:18] ottomata: argh [18:21:21] it can connect [18:21:24] just doesn't see any messages [18:21:28] they've got to be there though [18:21:59] ottomata: have they been consumed by the 'other' box? [18:22:12] don't work like that :) [18:22:14] ottomata: no, that is not [18:22:17] ottomata: ya [18:22:29] i was like -> retardation [18:23:57] man [18:23:57] ERROR [Replica Manager on Broker 4]: Error processing fetch operation on partition [eqiad.mediawiki.page-create,0] offset 6140 (kafka.server.ReplicaManager) [18:23:57] java.lang.IllegalArgumentException: Attempt to read with a maximum offset (6139) less than the start offset (6140). [18:24:02] labs boxes just can't sit [18:24:04] they get so rusty so fast [18:24:29] ottomata: this is SO painful [18:24:42] gonna try to wipe the main kafka cluster [18:27:07] oh shoot change prop uses this cluster too [18:28:01] ottomata: but in beta [18:28:10] ottomata: it should be wipeable [18:28:54] yaa [18:28:59] just not sure if they will need to bump them [18:33:14] 10Analytics, 10Android-app-feature-Compilations-v1, 10Reading-Infrastructure-Team-Backlog: Track zim file downloads - https://phabricator.wikimedia.org/T171117#3454614 (10Mholloway) Edited slightly for clarity. The compilations endpoint we're building is really technically something more like a compilation... [18:39:51] 10Analytics, 10Android-app-feature-Compilations-v1, 10Reading-Infrastructure-Team-Backlog: Track zim file downloads - https://phabricator.wikimedia.org/T171117#3454614 (10Nuria) >Analytics will probably have some ideas on how to count files served from wherever that ends up happening. For reader data such as... [18:40:43] FINALLY [18:40:44] did it nuria [18:40:45] haha [18:40:58] i was going crazy because i was producing revisoin create events but not seeing them in mysql [18:41:03] but forgot that we don't produce them! [18:41:04] :p [18:41:12] ok! [18:41:16] index creation tested! [18:41:17] it works! [18:43:28] whatata ay ottomata [18:43:41] sorry ,don't write them to mysql [18:43:58] ottomata: SUPER thanks, on my end i will update docs to point to mysql, sorry about my senior moments with this [18:44:13] haha, i had lots too [18:44:18] oh no, there were a lot of things wrong [18:44:23] wiping kafka cluster helped :) [18:44:27] ottomata:unrelated question: [18:44:42] the dump cluster is not frontend by varnish right? [18:45:17] dump.wm.org? pretty sure its not a 'cluster' :) [18:45:20] dumps* [18:45:22] but yes, i think it is [18:45:26] not certain [18:45:27] but i think so [18:45:33] why? [18:46:13] cause someone was asking [18:46:23] 10Analytics, 10Android-app-feature-Compilations-v1, 10Reading-Infrastructure-Team-Backlog: Track zim file downloads - https://phabricator.wikimedia.org/T171117#3457425 (10Mholloway) I'm not sure what the caching situation will be. These files can be up to 10+ GB in size (I think we're saying up to as large... [18:46:25] ottomata: i have to say i have not seen those requests in webrequest [18:47:33] ottomata: ya, not there [18:47:53] hmm ya maybe [18:47:59] nuria_: we do copy the apache access logs to stat1005 [18:48:06] ottomata: so it is probably not front-ed by varnish [18:48:52] HMMMM [18:48:55] or we should be... [18:48:56] hmmm [18:49:02] its on stat1002 [18:49:04] will look into that [18:49:34] ottomata: merging changeset for EL [18:49:50] ottomata: ah you just did it [18:49:56] :D [18:49:57] ottomata: want me to deploy it to prod? [18:50:10] ottomata: or are you doing that too? [18:50:13] done it [18:50:15] nuria_: hm. [18:50:18] so here's a thing [18:50:25] the eventlogging mysql-eventbus consumer [18:50:28] is going to need a restart [18:50:30] if a new schema is added [18:50:31] hmmm [18:50:34] whatatata [18:50:36] i'll have to add a sighup handler [18:50:42] like eventlogging service uses [18:50:42] ottomata: why? [18:50:48] it loads the schemas from disk during init [18:50:58] ottomata: but wait EL doesn't need a restart [18:51:04] ottomata: if schema chnages [18:51:12] tha't because it looks it up in meta.wm.org [18:51:30] ottomata: ah sorry, due to schema retrieval, not insertion [18:51:33] ya [18:51:40] although, i don't know why consumer needs to look at schema [18:51:42] oh duh [18:51:49] for mysql table creation [18:51:49] right. [18:51:53] ottomata: right [18:51:59] we can add sighup handler just like service [18:52:06] then puppet will reload the schemas without restarting the process [18:52:21] ottomata: but wait, how would we coordinate those two? [18:52:28] ? [18:52:40] nuria_: puppet pulls down new schema commit to repo [18:52:41] and if it does [18:52:45] it notifies the consumer process [18:52:49] whcih does a sighup [18:52:49] ottomata: ah ok [18:52:59] then the el code just reloads all schemas from disk [18:53:06] ottomata: upon restarting [18:53:12] yes restarting [18:53:12] but [18:53:16] sighup does not need restart [18:53:22] reload instead of restart [18:53:28] it just tells the process to reload teh schemas into memory [18:53:40] ottomata: reload sorry, yes [18:56:30] ottomata: do we have an example of this code elsewhere? does eventbus do the same to verify events? [18:57:24] ja [18:58:28] nuria_: https://github.com/wikimedia/eventlogging/blob/master/bin/eventlogging-service#L85 [18:58:29] also [18:58:37] https://gerrit.wikimedia.org/r/#/c/366614/ [18:58:38] :) [18:59:58] ottomata: you know i am remembering this now, we have talked about it before [19:00:22] ottomata: and the puppet that goes with https://gerrit.wikimedia.org/r/#/c/366614/? [19:00:59] for el service its different, since it is configure to run with systemd [19:01:04] gotta figure out how to do with upstart... [19:01:05] buuuut [19:01:24] https://github.com/wikimedia/puppet/blob/production/modules/eventlogging/manifests/service/service.pp#L156-L166 [19:01:51] and [19:01:51] https://github.com/wikimedia/puppet/blob/production/modules/role/manifests/eventbus/eventbus.pp#L98 [19:03:12] 10Analytics-Kanban, 10WMF-Legal, 10Services (watching): License for pageview data - https://phabricator.wikimedia.org/T170602#3457522 (10mforns) @GWicke > Licenses differ between end points, so changing the global disclaimer to CC0 doesn't seem accurate. This is why I suggested to add the CC0 license speci... [19:05:56] ottomata: ok, got it [20:10:10] nuria_: here's puppet to go with that sighup thing [20:10:12] https://gerrit.wikimedia.org/r/#/c/366623/ [20:10:19] tested in a hacky way in beta [20:10:42] no need to review now [20:10:43] just fyi [20:15:23] ottomata: i cannot review the puppet side but i just CR-ed eventlogging, i am happy to merge and deploy today if you want [20:17:09] nuria_: sure, that one can be deployed really safely i think [20:17:19] the puppet thing i can test more easily once that change is deployed [20:17:21] in beta too [20:17:26] althouhg, ungh, deploy doens't work in beta [20:17:29] have to pull from host :/ [20:17:37] target [20:18:57] ottomata: we can still test it with maximun craftiness [20:19:25] i did test it with some pretty crafty craftyness [20:19:27] :) [20:22:24] ottomata: ok, should we merge code on eventlogging-consumer and deploy then? [20:24:35] (03PS2) 10Nuria: [WIP] Modifying ingestion spec after additions to edit history [analytics/refinery] - 10https://gerrit.wikimedia.org/r/366327 (https://phabricator.wikimedia.org/T170493) [20:25:15] yeah [20:25:17] nuria_: thanks [20:25:18] that would be great [20:25:23] ottomata: k [20:29:57] !Log deploying eventlogging c1c2c39411ccd002ff8cea197bc535155213f5fb and restarting [20:30:14] !log deploying eventlogging c1c2c39411ccd002ff8cea197bc535155213f5fb and restarting [20:30:15] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:31:49] !log restaring eventlogging on eventlog1001 [20:31:49] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [20:33:08] ottomata: deployed and restarted, tailing logs [20:33:38] tahnk you [20:36:21] nuria_: you are watching /var/log/upstart/eventlogging_consumer-mysql-eventbus.log log? [20:36:22] ya? [20:36:42] ottomata: i was looking at the other one, [20:36:48] watch this one :) [20:36:50] i'm going to run [20:37:35] sudo reload eventlogging/consumer NAME=mysql-eventbus CONFIG=/etc/eventlogging.d/consumers/mysql-eventbus [20:38:02] ottomata: i restarted them all [20:38:08] yaya [20:38:09] ottomata: why would this one need a restart? [20:38:14] its not arestart [20:38:18] i'm going to reload [20:38:20] AKA sighup [20:38:31] if you watch logs, you'll see it reload the local schemas [20:38:36] you watching? [20:38:46] ottomata: ah sorry, one sec [20:39:24] ottomata: that pagecontentsavecomplete is a pain [20:39:50] * nuria_ watching [20:40:10] works! :) [20:40:11] ajajam [20:41:20] ottomata: ok, looks good [20:47:36] nice nuria_ puppet reload on schema change works too :) [20:47:49] thank you! [20:47:51] that was quick! [20:47:52] :) [20:49:54] milimetric, yt? leaving out the topNav links highlighting, were there other concerns about the vue-router that need to be solved for https://phabricator.wikimedia.org/T170459 ? [21:00:41] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Audit users and account expiry dates for stat boxes - https://phabricator.wikimedia.org/T170878#3457923 (10bearND) In the past life as an Android app dev I used EL quite regularly. Now in Reading Infrastructure I haven't had the need yet but it cou... [21:05:11] hi mforns [21:06:17] so in my opinion, we could remove vue-router as a dependency and use just a simple component that mirrors some property from the $store to the url and vice versa [21:07:43] 10Analytics-Kanban, 10Analytics-Wikistats: Cleanup Routing code - https://phabricator.wikimedia.org/T170459#3457962 (10Milimetric) In my opinion, we could remove vue-router as a dependency and use just a simple component that mirrors some property from the $store to the url and vice versa. That's what I had i... [21:12:46] (03PS29) 10Ottomata: Camus JSON datasets -> Hive [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346291 (https://phabricator.wikimedia.org/T161924) (owner: 10Joal) [21:14:40] milimetric: ^ i added a bit to write the last mod time of the input dir into the done flag file [21:14:52] the last mod time of a dir changes if any of its content changes [21:15:28] so, we can hopefully write something fancy to run jobs based on comparing the last mod time when the refinement happened, vs the current last mod time of the input [21:17:45] oh cool, that's easier and more direct [21:36:15] milimetric, the reason is reducing the size of the bundle? [21:37:38] mforns: the main thing to clean up is to centralize the logic that interacts with the URL [21:37:49] so that components don't need to know how to router.push [21:38:05] and we figured that if you're centralizing the state that drives that in the $store [21:38:38] and then therefore there's only one place where the router would be used (to do router.push($store.getters.url) or whatever) [21:38:59] so that means the router becomes kind of useless and it's fairly easy to replace it [21:39:11] (03PS30) 10Ottomata: Camus JSON datasets -> Hive [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346291 (https://phabricator.wikimedia.org/T161924) (owner: 10Joal) [21:39:20] and there are smaller libraries that help with interacting with the URL [21:39:25] but we don't have to do all that at once [21:39:50] we can keep the router and just have App.vue watch $store.getters.url and do router.push or something, that's totally fine [21:39:59] or main.js or whatever [21:54:23] milimetric, I understand, thanks! [22:16:42] bye team, cya! [22:16:54] ottomata: for my life that i cannot find the application id of the indexing job that just failed [22:16:56] ottomata: [22:17:00] https://www.irccloud.com/pastebin/GdpjIh7c/ [22:45:56] 10Analytics-Tech-community-metrics: Empty or incorrect data on Kibana's "Git-Demographics" dashboard - https://phabricator.wikimedia.org/T171240#3458588 (10Aklapper) [22:53:23] ottomata: well, found error in indexing logs in druid but i just do not know what could be wrong with new format