[03:49:27] Quarry: Show desktop notification when a query is done - https://phabricator.wikimedia.org/T124625#1960941 (APerson) NEW [07:41:59] (CR) Madhuvishy: [C: 2 V: 2] "LGTM! (Although I wish we din't have to look through all of refined webrequest for this data and computed stuff on streams ;))" [analytics/refinery] - https://gerrit.wikimedia.org/r/259662 (https://phabricator.wikimedia.org/T118938) (owner: Joal) [07:46:43] (CR) Madhuvishy: Divide app session metrics job into global and split (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/264292 (https://phabricator.wikimedia.org/T117615) (owner: Mforns) [07:55:36] (CR) Madhuvishy: "Minor style thing - LGTM otherwise." (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/264297 (https://phabricator.wikimedia.org/T117615) (owner: Mforns) [10:13:18] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [30.0] [10:18:28] elukey, yt? [10:18:36] joal, yt? [10:18:36] I was about to ping you :) [10:18:38] https://grafana.wikimedia.org/dashboard/db/eventlogging [10:18:44] yes I know... hehehe [10:18:46] Hi lads :) [10:18:46] the ops team has restarted two kafka nodes [10:18:53] it seems kafka has some problems no? [10:18:55] the GC logs were 20GBs [10:19:06] wow [10:19:09] Unable to connect to broker kafka1012.eqiad.wmnet:9092 [10:19:17] sounds bad [10:20:03] I would restart eventlogging, but I'm afraid of starting the consumers with it [10:20:06] kafka14 seems the only one still alive (from grafan charts [10:20:14] mforns: wait a bit [10:20:17] ok [10:21:00] let's jump to -ops [10:21:03] sure [10:37:26] guys I am trying kafka topic --describe on 1020 [10:37:39] but it tells meMissing required argument "[zookeeper]" [10:37:42] any idea [10:37:43] ? [10:39:22] elukey, no [10:47:53] joal, do you know if I restart eventlogging, will consumers start with it? [10:48:04] I ignore how andrew did cut them off.. [10:48:17] mforns: I think so, but can't be sure ... [10:51:20] https://grafana.wikimedia.org/dashboard/db/eventlogging?panelId=8&fullscreen - this is the only thing that I don't like.. [10:51:35] why do we need to restart EL in this case? [10:52:46] mforns: have you found your answer? [10:53:05] elukey, joal, no, I don't get why processors do not spit logs.. [10:53:26] mforns: yeah they seems to be stuck a while ago on the connection refused [10:53:38] aha [10:53:40] * elukey sees that Burrow is not happy [10:53:50] ?? [10:53:52] that's why I thought in restarting [10:54:14] mforns: I'm looking at puppet code see if consumers have stopped in there [10:54:25] joal, good idea [10:56:28] mforns: seems ok to restart [10:56:33] ok will do [10:57:25] !log restarted EventLogging to overcome connection problems with Kafka [10:58:14] joal: https://github.com/wikimedia/operations-puppet/commit/2d950c97ce26b8e29017af647b19c80cf015b309 [10:58:31] indeed elukey [10:58:33] ah sorry I missed your comment :) [10:58:43] :) [10:59:16] joal, elukey, it seems EL is back: https://grafana.wikimedia.org/dashboard/db/eventlogging [11:00:01] right mforns [11:00:09] Analytics-Tech-community-metrics, pywikibot-core, DevRel-January-2016: Statistics for SCM project 'core' mix pywikibot/core, mediawiki/core and oojs/core - https://phabricator.wikimedia.org/T123808#1961419 (Dicortazar) Thanks for the issue @jayvdb !. I'm reassigning this to @Lcanasdiaz as he'll be in... [11:00:29] Analytics-Tech-community-metrics, pywikibot-core, DevRel-January-2016: Statistics for SCM project 'core' mix pywikibot/core, mediawiki/core and oojs/core - https://phabricator.wikimedia.org/T123808#1961421 (Dicortazar) a:Dicortazar>Lcanasdiaz [11:00:43] mmm what graph are you looking? I am still waiting for https://grafana.wikimedia.org/dashboard/db/eventlogging?panelId=8&fullscreen to drop (but I believe it'll take a bit for graphana to show) [11:00:55] if logs are ok on the host I am good :) [11:01:35] elukey, I removed the -5m from the "now-5m" in the time range selector [11:02:00] mforns knows the tricks :) [11:02:19] * elukey hates graphana a bit now [11:02:27] hehehehe [11:02:28] https://grafana.wikimedia.org/dashboard/db/eventlogging?panelId=8&fullscreen - And we have lift off! [11:04:51] Analytics-Kanban: Rotate kafka GC logs - https://phabricator.wikimedia.org/T124644#1961429 (JAllemandou) [11:06:39] Thanks elukey and mforns for the fast reaction and actions :) [11:07:28] thank *you* guys, I was feeling I could do nothing about it :o [11:08:33] first outage together :D [11:08:48] hehe cool! :] [11:09:36] Now the only thing I'm worried about is varnish-kafka [11:10:14] We had some trouble with ottomata last time we had outages like that (leader change, and varnish-kafka didn't manage cope IIRC) [11:11:13] mmm [11:11:22] joal: do you remember what was the issue? Otherwise we can ask to bblack [11:13:03] also: ema needs to merge https://gerrit.wikimedia.org/r/#/c/266209/, that is basically the change to /etc/default/kafka done manuall on 2/3 of the kafka nodes. The last one standing has a giant GC log too to be removed :) [11:13:21] oh [11:13:30] I don't see a major problem but we'd need to restart the last node, so before it we'd need to make sure that varnish-kafka is ok [11:13:35] elukey: volume of bytes in in charts seems to match: let's not worry, I'll check on webrequest partitions stats see if any trouble [11:17:01] guys, ema is gonna merge the patch in pupet, then restart nodes [11:18:09] joal: one thing - https://grafana.wikimedia.org/dashboard/db/kafka?panelId=12&fullscreen [11:18:20] it probably means we'll need to restart eventlogging again mforns [11:18:22] any idea why 1020 dropped [11:18:28] ? [11:18:49] joal, ok, can you ping me? [11:19:08] will do mforns [11:20:00] elukey: I think it's a leader issue: having been restarted, 1020 and 1012 are not leaders anymore, therefore don't send bytes [11:20:50] okok because the metric wasn't super clear [11:21:11] all right I am going to ask to ema to proceed [11:21:24] elukey: You also can see on that chart 14 and 18 have bumpeed in bytes out --> the replace 12 and 20 as leaders [11:21:36] yep yep [11:21:39] all good :) [11:21:45] just wanted to triple check [11:21:49] elukey: told hiim I was ok, but you're the ops in charge :) [11:22:09] ahahahah no no sorry I didn't want to overstep [11:22:25] I had him in chat waiting this is why [11:22:47] elukey: in absence of ottomata, you're the ops in charge now :) I give my opinion, you make the calls :) [11:27:48] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 20.00% above the threshold [20.0] [11:29:05] \o/ [11:30:24] elukey: how are we doing on merging / restarting ? [11:32:18] we need to review https://gerrit.wikimedia.org/r/#/c/266209/1 (git submodule update) - it should be straightforward but it is the first time [11:33:23] elukey: you do it or do you want me to do it? [11:33:59] if you have done it before please go ahead [11:34:26] elukey: I have not, but should be ok though :) [11:37:11] elukey: In fact, you'll do it :) [11:37:16] muhahahaha :) [11:37:30] elukey: I don't have +2 in ops [11:37:36] me too [11:38:11] Arf [11:38:22] You can give a +1 then, and ask them to merge :) [11:38:50] Talked with ema, and then submodule patch has already been merged, so we can happily go for the main merge [11:38:54] elukey: --^ [11:59:27] we are waiting for puppet on 1018 [11:59:34] then restart [11:59:45] then ema will re-enable puppet on 1020 an 1012 [12:04:37] mforns: kafka1018 will restart soon, EL restart probably needed (I can do it if you want, not to bother you more :) [12:05:45] joal, I'll do it [12:05:47] :] [12:06:42] mforns: kafka restarted on kafka 1018restart done [12:07:29] oh, ok [12:08:20] !log restarted EventLogging to sync with kafka1018 restart [12:18:07] Analytics-Kanban: Rotate kafka GC logs - https://phabricator.wikimedia.org/T124644#1961543 (ema) The puppet changes I have been merged. I have re-enabled puppet on kafka1020 and kafka1012. On kafka1018, where the size of kafkaServer-gc.log was also starting to become a problem, I have restarted kafka after p... [12:18:37] RECOVERY - Throughput of event logging NavigationTiming events on graphite1001 is OK: OK: Less than 15.00% under the threshold [1.0] [12:19:05] Analytics-Tech-community-metrics, Developer-Relations, DevRel-March-2016: Who are the top 50 independent contributors and what do they need from the WMF? - https://phabricator.wikimedia.org/T85600#1961551 (Aklapper) [12:21:22] all right 1020 and 1012 have puppet running [12:21:30] all good now [12:21:59] I believe that we should receive alarms in here about kafka [12:22:48] brb lunch! [12:25:51] joal, mforns: I'll write an email to -internal if you want, with all the details [12:26:04] and the phab references [12:26:39] cool elukey [12:26:50] I agree, kafka alarms would make sense in here [12:27:01] yep I'll try to add them [12:27:04] as part of the follow up [12:27:07] also elukey, I think there still are other machines to reapply, no ? [12:27:38] so 1012 and 1020 got the update by ema with puppet disabled [12:27:48] We've done 12, 18, 20 -> there is 13, 14 and 22 to go :) [12:28:01] yup, and ema just did 18 [12:28:04] ah snap we have 6 nodes??? [12:28:14] :( :( [12:28:51] allright let me check their status [12:29:53] 1013: /dev/md0 28G 19G 7.1G 73% / [12:30:22] 14: /dev/md0 28G 24G 2.1G 93% / [12:30:47] 22: /dev/md3 28G 24G 2.7G 90% / [12:31:50] they already have KAFKA_OPTS="-XX:GCLogFileSize=50M -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5" [12:32:01] so we should just restart them [12:33:03] right elukey [12:34:13] elukey: one after the other, and with possible rest time in between to facilitate partition sync :) [12:37:01] yep sure [12:38:48] joal, mforns: restarted 1014 [12:39:12] elukey, do we need to restart EL? [12:39:32] mforns: let's wait for the 3 restart instead? [12:39:52] I see [12:40:34] mforns: no dataloss while waiting, right ? [12:40:54] joal, not now [12:41:11] That's what I thought [12:41:12] but before, when events weren't making it into Kafka yes, I guess [12:42:13] oh! this week it's me in ops-duty! [12:42:34] hehehe, cool, I'll write the incident page [12:42:44] mforns: before, you mean when using 0mq? [12:42:47] :) [12:42:59] no, before, when we had the kafka outage [12:43:20] only 1020 and 1012 were broken, so I think we have not lost any data :) [12:43:27] The power replication :) [12:43:33] joal, oooh, makes sense [12:44:00] so when we restarted the other nodes, they all resync'd? [12:44:12] Will still be good to monitor, but I think we are ok :) [12:44:19] they did :) [12:44:27] ok, as part of ops-duty I'll also check the db [12:45:06] mforns: looking at 6 hours behind in https://grafana.wikimedia.org/dashboard/db/kafka is interesting to notice what happened [12:46:38] joal, I don't understand these graphs, they doesn't seem to match the problem we had.. [12:48:00] batcavce ? [12:48:11] sure [12:49:40] 1018: /dev/md0 28G 2.8G 24G 11% / :) [12:50:01] joal: I don' think that we need to relaunch a leader election, I am seeing three leaders in --describe [12:50:09] maybe in the end of the restarts? [12:52:32] 1014 and 1018 seems not leader at all [12:52:39] https://grafana.wikimedia.org/dashboard/db/kafka?panelId=20&fullscreen [12:53:21] I would launch kafka preferred-replica-election [12:54:01] helllooooo [12:54:08] :D [12:54:22] elukey: wait for having restarted the other nodes :) [12:54:29] and elect after [12:55:13] all right :) [12:56:34] elukey: you can continue to restart [12:56:36] :) [12:58:09] yep already doing 13 [12:59:35] /dev/md0 28G 2.9G 24G 11% / [13:01:56] I'll wait 5 min then I'll restart 22 [13:08:37] 1022 restarted /dev/md3 28G 3.1G 23G 12% / [13:09:25] joal, mforns: restarts completed, I am going to run kafka preferred-replica-election on 1022 [13:09:36] elukey, will restart EL then [13:09:39] elukey: ensure parittions in sync firsst ) [13:09:43] ok [13:10:41] joal: isr have three brokers for each topic [13:10:43] as the other nodes : [13:10:44] :) [13:10:50] cool [13:11:22] elukey: just double checked that you had that checked before reelecting leaders [13:11:53] yep, even before restarting [13:12:12] one question: should I ran sudo kafka preferred-replica-election on one of the nodes? [13:12:27] https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Kafka/Administration#Safe_Broker_Restarts is not super clear [13:12:46] reelction is needed, yes - The command can happen in any node [13:13:09] reelection happens fior the full cluster, so just nees to launch it once [13:14:18] mmmmm it doesn't work [13:14:40] it prints the help [13:14:53] error in command :) [13:15:28] ah not with sudo [13:15:29] :) [13:15:47] all right all done [13:16:12] mforns: restaaaaaaaaaaart :) [13:16:15] ok! [13:16:46] !log restarted EventLogging after Kafka leader reelection [13:20:45] joal, mforns: would you mind if I grab something to eat? [13:20:50] brb in 40mins [13:20:55] np at all! [13:21:01] elukey: please :) [13:21:17] all right, everything seems good from the dashboards :) [13:21:20] talk with you in a bit! [13:21:26] elukey: first outage solved, please have some food ;) [13:21:33] \o/ [13:48:53] a-team, I'm away for some time, see you in a bit [13:49:01] ok joal, cya [14:08:26] (PS1) MarkTraceur: Fix cross-wiki upload script [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/266224 [14:09:44] (Abandoned) MarkTraceur: Flesh out SQL, tweak configuration [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/237134 (owner: MarkTraceur) [14:17:40] Analytics-Kanban, Research-and-Data: Research Spike: Article Title normalization contains weird chars [8 pts] {hawk} - https://phabricator.wikimedia.org/T108867#1961722 (mforns) I wrote a page in Wikitech commenting the conclusions I drew: https://wikitech.wikimedia.org/wiki/Analytics/Data/Redirects [14:18:03] back! [14:34:37] https://grafana.wikimedia.org/dashboard/db/kafka looks very nice [14:35:16] mforns: I've read above that you are planning to add a wikipage with a post-mortem [14:35:19] danke [14:35:51] helllooooooooooo ottomata :) [14:36:37] hiyyaa [14:36:53] I am going to send an email with all the details so you guys we'll be up to speed [14:37:01] 10/15 mins and it should be ready [14:40:37] woah kafka funkyness happened? [14:40:42] yesss [14:40:54] ENOSPACE left [14:41:22] oh is that what this is about? [14:41:23] https://gerrit.wikimedia.org/r/#/c/266203/ [14:41:43] yikes [14:41:50] no alarms at all? [14:41:50] wow [14:43:47] yes! [14:46:45] Analytics-Kanban, DBA, Patch-For-Review: Pending maintenance on the eventlogging databases (db1046, db1047, dbstore1002, other dbstores) - https://phabricator.wikimedia.org/T120187#1961748 (Ottomata) Cool, totally missed this on Saturday! Starting them now... [14:51:51] Analytics-Kanban: Rotate kafka GC logs - https://phabricator.wikimedia.org/T124644#1961749 (elukey) p:Unbreak!>High [14:54:40] !log restarting eventlogging mysql consumers after TokuDB conversion has finished [14:55:17] ottomata: email sent [14:55:26] ah k [14:55:33] oh thats what you were sending email about [14:55:34] k [15:05:16] hmmmm [15:05:21] i just restarted el mysql consumers [15:05:33] and i am worried that they did not start from where they left off, but from the latest offset [15:07:25] Analytics-Kanban: Rotate kafka GC logs - https://phabricator.wikimedia.org/T124644#1961774 (elukey) 1014, 1018 and 1022 Kafka brokers were restarted and cleaned from the huge logs. Last action was to run kafka preferred-replica-election to rebalance the partition leaders. EL was restarted again. [15:08:20] :( [15:08:49] is there an SQL query that we can run later on to quickly identify if there is a hole or not? [15:11:27] yeah i'm seeing data inserted for now [15:11:35] but none is in the db from over the weekend [15:11:56] i'm currently just going to find an offset from shortly before when we stopped the mysql consumers [15:12:02] and set up a separate process to backfill from three [15:12:03] there [15:12:44] I would be curious to know how you are going to start the process, but only when you have time [15:17:26] Analytics-Tech-community-metrics, DevRel-January-2016: Make GrimoireLib display *one* consistent name for one user, plus the *current* affiliation of a user - https://phabricator.wikimedia.org/T118169#1961804 (Lcanasdiaz) I thought having more than one affiliation at top-contributors was a requirement and... [15:28:20] elukey: i'm not totally sure how yet, i thought i could pass an offset from whith to start to the pykafka consumer [15:28:56] ah so you want to kick off manually a python process with ad hoc parameters [15:29:04] yes [15:33:44] hm, ok, i can't seem to do it, but i can consume from kafka and pipe to eventlogging consumer stdin... [15:33:52] might use kafkatee to do this, so i can save offsets [15:42:43] (PS1) MarkTraceur: Expand to run on all wikis [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/266240 [15:51:52] Analytics, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown, Patch-For-Review: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#1961889 (Sadads) @Legoktm I don't think its an issue, persay, we are really mostly concerned a... [15:54:12] Analytics-Tech-community-metrics, DevRel-January-2016: Make GrimoireLib display *one* consistent name for one user, plus the *current* affiliation of a user - https://phabricator.wikimedia.org/T118169#1961892 (Aklapper) >>! In T118169#1961804, @Lcanasdiaz wrote: > I thought having more than one affiliatio... [16:00:41] ottomata: do we have the event bus meeting? [16:01:19] yes [16:06:14] nuria: coming? [16:42:46] ottomata: can you undo changes on puppet and restart EL? [16:42:58] undo changes? [16:43:24] ottomata: I thought you diosabled puppet? [16:43:31] nooo [16:43:34] eh? [16:43:45] nuria: mysql consumers are running now [16:43:54] but, they did not backfill automatically like they should ahve [16:43:56] not yet sure why [16:43:59] right now i'm trying to backfill [16:44:22] mmm.. did they started from a different offset? [16:44:27] than you expect them to do? [16:44:37] they seemed to have started from the end...today [16:45:01] before the eventbus meeting, i was trying to backfill directly from kafka, but it seemed to only partially backfill? not yet sure [16:45:06] am investigating [16:45:12] i may try to backfill from all-events.log files [16:52:17] ottomata: I see, but consumers were stopped correct? did you looked at whether they might have started and thus recommit offsets? [16:52:25] ottomata: logs should be able to tell us that [16:53:14] ottomata: if puppet was not stopped could a run of puppet have restarted consumers? [16:54:41] nuria consumers were stopped [16:54:43] i puppetized their removal [16:54:46] over the weekend [16:54:51] only today did I repuppetize them [16:54:54] puppet was not stopped [16:54:56] ottomata: ah [16:54:58] mforns: Hey, mind looking at some of my limn-multimedia-data patches so I can start recalculating some data overnight? [16:55:13] ottomata: i see [16:55:18] ottomata: then i do not get it [16:55:24] nuria: ja ,haven't yet looked into why consumrers did not start from where they left off [16:55:32] MarkTraceur, sure! please, add me as a reviewer :] [16:55:33] perhaps offsets get stale and are removed if no co mmits come in? [16:55:35] KK [16:55:43] i'm just trying to get backfilling started asap right now [16:55:51] ottomata: i have stopped consumers before without trouble i think , not for this long though [16:55:55] rigiht [16:55:55] {{done}} [16:55:56] same [16:55:59] thx MarkTraceur [16:56:10] No, thank you! :) [16:56:40] MarkTraceur, I have a couple meetings for the next 2 hours, but I'll try to do it in between [16:56:49] Sure [16:57:02] If milimetric shows up at all I might bother him about it [16:57:25] But I guess it's not a huge rush, I just don't have any other reviewers on the repository yet...I guess I should probably add some [17:01:48] Analytics-Kanban, operations, HTTPS, Patch-For-Review: EventLogging sees too few distinct client IPs {oryx} [8 pts] - https://phabricator.wikimedia.org/T119144#1962106 (Nuria) Open>Resolved [17:07:54] Analytics-Kanban, DBA, Patch-For-Review: Pending maintenance on the eventlogging databases (db1046, db1047, dbstore1002, other dbstores) - https://phabricator.wikimedia.org/T120187#1847396 (Nuria) We are backfilling evens that were not inserted during scheduled maintenance window. [17:18:27] * MarkTraceur waves at milimetric [17:18:37] I added you on some patches, mforns too, y'all can fight over them [17:18:47] I have to delete some tainted data after at least one of them gets merged [17:20:22] (CR) Madhuvishy: Add split-by-os argument to AppSessionMetrics job (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/264297 (https://phabricator.wikimedia.org/T117615) (owner: Mforns) [17:21:49] Analytics-Kanban, Patch-For-Review: Remove queue of batches from EventLogging [8 pts] {oryx} - https://phabricator.wikimedia.org/T121151#1962161 (Nuria) Open>Resolved [17:22:33] joal: ottomata as an update: 1. I wrote an LDAP authenticator for JupyterHub, and it also allows us to restrict access by group, 2. Ops didn't want us using npm / node stuff directly yet, so I just rewrote the nodejs parts of jupyterhub yesterday :D 3. Only thing left for fully securing it is a docker api whitelisting proxy, but that can probably be done this week. [17:22:50] awesooome [17:23:02] YuviPanda: i havent' worked on the spark thing since the all staff [17:23:07] YuviPanda: you ROKC ! [17:23:17] :D [17:23:27] ROCK, ROKC, ORCK, whatever :) [17:23:32] so right not [17:23:34] *now [17:23:38] you still need an ssh tunnel to access it [17:23:47] because I don't know if misc-varnish allows websockets through [17:23:54] I didn't know I could do that ! [17:24:20] do what? [17:24:36] ssh tunnel and use [17:24:38] ah [17:25:21] Meetings tonight, but I'll bug you to know how :) [17:25:24] YuviPanda: --^ [17:25:44] ssh -L 8000:analytics1021.eqiad.wmnet:8000 stat1003.eqiad.wmnet [17:25:51] if you do that on your console right now [17:25:55] and then hit [17:25:59] localhost:8000 [17:26:06] you will actually see a jupyterhub [17:26:11] that you can authenticate via LDAP [17:26:22] it doesnt' spawn yet, working on that next [17:26:36] Thanks amillion YuviPanda :) [17:28:48] mforns: am going to rest of ops meeting... [17:29:01] ottomata, ok [17:29:12] ottomata, when do you want to try and backfill [17:29:15] ? [17:31:23] mforns: i'm prepping now [17:31:43] copied and trimming some files on eventlog1001 in /srv/backfill_files/otto-2016-01-25 [17:31:52] (CR) Mforns: Divide app session metrics job into global and split (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/264292 (https://phabricator.wikimedia.org/T117615) (owner: Mforns) [17:32:13] ottomata, do you want me to skip tasking? [17:32:35] Analytics, Editing-Analysis: Queries for the edit analysis dashboard failing since December 2015 - https://phabricator.wikimedia.org/T124383#1962190 (Milimetric) Hm... all of a sudden around that time we've had lots of db issues. We were busy putting out fires last week, so this seems to me like another... [17:32:42] oh tasking is happening [17:32:47] man, ops meeting conflicts with everything?! [17:32:48] naw its ok [17:32:55] Analytics-Kanban, Editing-Analysis: Queries for the edit analysis dashboard failing since December 2015 - https://phabricator.wikimedia.org/T124383#1962192 (Milimetric) p:Triage>High a:Milimetric [17:33:04] hmm, mforns maybe some of it [17:33:13] YuviPanda: Tried to connect, either I have forgotten my pwd, either there seem to to have an issue :)( [17:33:16] i'll ping you when i'm ready to start and see [17:33:28] ottomata, ok let's do that [17:33:33] joal: > [W 2016-01-25 17:32:21.092 JupyterHub ldapauthenticator:74] Invalid username [17:33:44] joal: you should use your shell name [17:34:38] ofc, you'll get a 500 error after logging in [17:34:43] I think that's what I do :( [17:34:45] because I don't have a spawner yet [17:34:51] AH, ok :) [17:34:55] joal: what are you using? [17:34:59] as your username? [17:35:08] joal [17:35:11] :) [17:35:29] hmm [17:35:39] [W 2016-01-25 17:32:41.096 JupyterHub ldapauthenticator:104] Invalid password [17:35:47] joal: so that's what it said the last time :) [17:35:51] Ok ... I need to recall that pass then [17:36:00] sorry :S [17:36:03] :) np! [17:36:20] I just verified it works for me [17:36:25] with my wikitech password [17:36:25] Analytics-Kanban: Prepare presentation quaterly review [3 pts] - https://phabricator.wikimedia.org/T123528#1962214 (Milimetric) [17:38:13] Analytics-Kanban, Patch-For-Review: Remove LegacyPageviews from vital-signs [3 pts] - https://phabricator.wikimedia.org/T124244#1962226 (Milimetric) [17:38:56] Analytics-Kanban, Patch-For-Review: Remove LegacyPageviews from vital-signs [3 pts] - https://phabricator.wikimedia.org/T124244#1950171 (Milimetric) a:Nuria [17:43:37] Analytics-Kanban, Editing-Analysis: Queries for the edit analysis dashboard failing since December 2015 [5 pts] - https://phabricator.wikimedia.org/T124383#1962236 (Milimetric) [17:44:18] Analytics-Kanban: Rotate kafka GC logs [3 pts] - https://phabricator.wikimedia.org/T124644#1962239 (Milimetric) [17:44:59] mforns: i'm about to start backfilling from /srv/backfill_files/otto-2016-01-25/splits/20160122.1453393669 [17:45:30] going to run [17:46:23] on each of the splits pipe into eventlogging-consumer stdin:// mysql://... [17:47:57] Analytics-Kanban: Edit schema needs purging, table is too big for queries to run (500G before conversion) - https://phabricator.wikimedia.org/T124676#1962252 (Nuria) NEW [17:48:02] Analytics-Kanban: Edit schema needs purging, table is too big for queries to run (500G before conversion) - https://phabricator.wikimedia.org/T124676#1962260 (Nuria) [17:48:54] Analytics-Kanban, RESTBase: Update AQS config to new config format {melc} [5 pts] - https://phabricator.wikimedia.org/T122249#1962266 (Milimetric) [17:51:49] mforns: looking good i think, if this split goes well, i'll look at parallelizing for the others [17:51:54] i'm seeing old events come into the db [17:52:02] ottomata, awesome! [17:52:49] ottomata, the removal of the queue will help in backfilling also I think [17:53:16] just one thread, blocking consumption while inserting... [17:53:26] actualy, mforns i had a q about that [17:53:35] aha [17:53:35] i see things like this [17:53:37] 2016-01-25 17:53:21,909 (MainThread) Inserted 168 Edit_13457736 events in 0.704253 seconds [17:53:37] 2016-01-25 17:53:21,927 (MainThread) Inserted 4 Edit_13457736 events in 0.017552 seconds [17:53:37] 2016-01-25 17:53:22,293 (MainThread) Inserted 82 Edit_13457736 events in 0.363897 seconds [17:53:37] 2016-01-25 17:53:22,327 (MainThread) Inserted 7 Edit_13457736 events in 0.033619 seconds [17:53:37] 2016-01-25 17:53:22,572 (MainThread) Inserted 51 Edit_13457736 events in 0.243748 seconds [17:53:38] 2016-01-25 17:53:22,656 (MainThread) Inserted 21 Edit_13457736 events in 0.082975 seconds [17:53:55] they happen all at once, you can see the log timestamps there [17:54:00] that's the same schema [17:54:06]