[03:49:27] Quarry: Show desktop notification when a query is done - https://phabricator.wikimedia.org/T124625#1960941 (APerson) NEW [07:41:59] (CR) Madhuvishy: [C: 2 V: 2] "LGTM! (Although I wish we din't have to look through all of refined webrequest for this data and computed stuff on streams ;))" [analytics/refinery] - https://gerrit.wikimedia.org/r/259662 (https://phabricator.wikimedia.org/T118938) (owner: Joal) [07:46:43] (CR) Madhuvishy: Divide app session metrics job into global and split (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/264292 (https://phabricator.wikimedia.org/T117615) (owner: Mforns) [07:55:36] (CR) Madhuvishy: "Minor style thing - LGTM otherwise." (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/264297 (https://phabricator.wikimedia.org/T117615) (owner: Mforns) [10:13:18] PROBLEM - Difference between raw and validated EventLogging overall message rates on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [30.0] [10:18:28] elukey, yt? [10:18:36] joal, yt? [10:18:36] I was about to ping you :) [10:18:38] https://grafana.wikimedia.org/dashboard/db/eventlogging [10:18:44] yes I know... hehehe [10:18:46] Hi lads :) [10:18:46] the ops team has restarted two kafka nodes [10:18:53] it seems kafka has some problems no? [10:18:55] the GC logs were 20GBs [10:19:06] wow [10:19:09] Unable to connect to broker kafka1012.eqiad.wmnet:9092 [10:19:17] sounds bad [10:20:03] I would restart eventlogging, but I'm afraid of starting the consumers with it [10:20:06] kafka14 seems the only one still alive (from grafan charts [10:20:14] mforns: wait a bit [10:20:17] ok [10:21:00] let's jump to -ops [10:21:03] sure [10:37:26] guys I am trying kafka topic --describe on 1020 [10:37:39] but it tells meMissing required argument "[zookeeper]" [10:37:42] any idea [10:37:43] ? [10:39:22] elukey, no [10:47:53] joal, do you know if I restart eventlogging, will consumers start with it? [10:48:04] I ignore how andrew did cut them off.. [10:48:17] mforns: I think so, but can't be sure ... [10:51:20] https://grafana.wikimedia.org/dashboard/db/eventlogging?panelId=8&fullscreen - this is the only thing that I don't like.. [10:51:35] why do we need to restart EL in this case? [10:52:46] mforns: have you found your answer? [10:53:05] elukey, joal, no, I don't get why processors do not spit logs.. [10:53:26] mforns: yeah they seems to be stuck a while ago on the connection refused [10:53:38] aha [10:53:40] * elukey sees that Burrow is not happy [10:53:50] ?? [10:53:52] that's why I thought in restarting [10:54:14] mforns: I'm looking at puppet code see if consumers have stopped in there [10:54:25] joal, good idea [10:56:28] mforns: seems ok to restart [10:56:33] ok will do [10:57:25] !log restarted EventLogging to overcome connection problems with Kafka [10:58:14] joal: https://github.com/wikimedia/operations-puppet/commit/2d950c97ce26b8e29017af647b19c80cf015b309 [10:58:31] indeed elukey [10:58:33] ah sorry I missed your comment :) [10:58:43] :) [10:59:16] joal, elukey, it seems EL is back: https://grafana.wikimedia.org/dashboard/db/eventlogging [11:00:01] right mforns [11:00:09] Analytics-Tech-community-metrics, pywikibot-core, DevRel-January-2016: Statistics for SCM project 'core' mix pywikibot/core, mediawiki/core and oojs/core - https://phabricator.wikimedia.org/T123808#1961419 (Dicortazar) Thanks for the issue @jayvdb !. I'm reassigning this to @Lcanasdiaz as he'll be in... [11:00:29] Analytics-Tech-community-metrics, pywikibot-core, DevRel-January-2016: Statistics for SCM project 'core' mix pywikibot/core, mediawiki/core and oojs/core - https://phabricator.wikimedia.org/T123808#1961421 (Dicortazar) a:Dicortazar>Lcanasdiaz [11:00:43] mmm what graph are you looking? I am still waiting for https://grafana.wikimedia.org/dashboard/db/eventlogging?panelId=8&fullscreen to drop (but I believe it'll take a bit for graphana to show) [11:00:55] if logs are ok on the host I am good :) [11:01:35] elukey, I removed the -5m from the "now-5m" in the time range selector [11:02:00] mforns knows the tricks :) [11:02:19] * elukey hates graphana a bit now [11:02:27] hehehehe [11:02:28] https://grafana.wikimedia.org/dashboard/db/eventlogging?panelId=8&fullscreen - And we have lift off! [11:04:51] Analytics-Kanban: Rotate kafka GC logs - https://phabricator.wikimedia.org/T124644#1961429 (JAllemandou) [11:06:39] Thanks elukey and mforns for the fast reaction and actions :) [11:07:28] thank *you* guys, I was feeling I could do nothing about it :o [11:08:33] first outage together :D [11:08:48] hehe cool! :] [11:09:36] Now the only thing I'm worried about is varnish-kafka [11:10:14] We had some trouble with ottomata last time we had outages like that (leader change, and varnish-kafka didn't manage cope IIRC) [11:11:13] mmm [11:11:22] joal: do you remember what was the issue? Otherwise we can ask to bblack [11:13:03] also: ema needs to merge https://gerrit.wikimedia.org/r/#/c/266209/, that is basically the change to /etc/default/kafka done manuall on 2/3 of the kafka nodes. The last one standing has a giant GC log too to be removed :) [11:13:21] oh [11:13:30] I don't see a major problem but we'd need to restart the last node, so before it we'd need to make sure that varnish-kafka is ok [11:13:35] elukey: volume of bytes in in charts seems to match: let's not worry, I'll check on webrequest partitions stats see if any trouble [11:17:01] guys, ema is gonna merge the patch in pupet, then restart nodes [11:18:09] joal: one thing - https://grafana.wikimedia.org/dashboard/db/kafka?panelId=12&fullscreen [11:18:20] it probably means we'll need to restart eventlogging again mforns [11:18:22] any idea why 1020 dropped [11:18:28] ? [11:18:49] joal, ok, can you ping me? [11:19:08] will do mforns [11:20:00] elukey: I think it's a leader issue: having been restarted, 1020 and 1012 are not leaders anymore, therefore don't send bytes [11:20:50] okok because the metric wasn't super clear [11:21:11] all right I am going to ask to ema to proceed [11:21:24] elukey: You also can see on that chart 14 and 18 have bumpeed in bytes out --> the replace 12 and 20 as leaders [11:21:36] yep yep [11:21:39] all good :) [11:21:45] just wanted to triple check [11:21:49] elukey: told hiim I was ok, but you're the ops in charge :) [11:22:09] ahahahah no no sorry I didn't want to overstep [11:22:25] I had him in chat waiting this is why [11:22:47] elukey: in absence of ottomata, you're the ops in charge now :) I give my opinion, you make the calls :) [11:27:48] RECOVERY - Difference between raw and validated EventLogging overall message rates on graphite1001 is OK: OK: Less than 20.00% above the threshold [20.0] [11:29:05] \o/ [11:30:24] elukey: how are we doing on merging / restarting ? [11:32:18] we need to review https://gerrit.wikimedia.org/r/#/c/266209/1 (git submodule update) - it should be straightforward but it is the first time [11:33:23] elukey: you do it or do you want me to do it? [11:33:59] if you have done it before please go ahead [11:34:26] elukey: I have not, but should be ok though :) [11:37:11] elukey: In fact, you'll do it :) [11:37:16] muhahahaha :) [11:37:30] elukey: I don't have +2 in ops [11:37:36] me too [11:38:11] Arf [11:38:22] You can give a +1 then, and ask them to merge :) [11:38:50] Talked with ema, and then submodule patch has already been merged, so we can happily go for the main merge [11:38:54] elukey: --^ [11:59:27] we are waiting for puppet on 1018 [11:59:34] then restart [11:59:45] then ema will re-enable puppet on 1020 an 1012 [12:04:37] mforns: kafka1018 will restart soon, EL restart probably needed (I can do it if you want, not to bother you more :) [12:05:45] joal, I'll do it [12:05:47] :] [12:06:42] mforns: kafka restarted on kafka 1018restart done [12:07:29] oh, ok [12:08:20] !log restarted EventLogging to sync with kafka1018 restart [12:18:07] Analytics-Kanban: Rotate kafka GC logs - https://phabricator.wikimedia.org/T124644#1961543 (ema) The puppet changes I have been merged. I have re-enabled puppet on kafka1020 and kafka1012. On kafka1018, where the size of kafkaServer-gc.log was also starting to become a problem, I have restarted kafka after p... [12:18:37] RECOVERY - Throughput of event logging NavigationTiming events on graphite1001 is OK: OK: Less than 15.00% under the threshold [1.0] [12:19:05] Analytics-Tech-community-metrics, Developer-Relations, DevRel-March-2016: Who are the top 50 independent contributors and what do they need from the WMF? - https://phabricator.wikimedia.org/T85600#1961551 (Aklapper) [12:21:22] all right 1020 and 1012 have puppet running [12:21:30] all good now [12:21:59] I believe that we should receive alarms in here about kafka [12:22:48] brb lunch! [12:25:51] joal, mforns: I'll write an email to -internal if you want, with all the details [12:26:04] and the phab references [12:26:39] cool elukey [12:26:50] I agree, kafka alarms would make sense in here [12:27:01] yep I'll try to add them [12:27:04] as part of the follow up [12:27:07] also elukey, I think there still are other machines to reapply, no ? [12:27:38] so 1012 and 1020 got the update by ema with puppet disabled [12:27:48] We've done 12, 18, 20 -> there is 13, 14 and 22 to go :) [12:28:01] yup, and ema just did 18 [12:28:04] ah snap we have 6 nodes??? [12:28:14] :( :( [12:28:51] allright let me check their status [12:29:53] 1013: /dev/md0 28G 19G 7.1G 73% / [12:30:22] 14: /dev/md0 28G 24G 2.1G 93% / [12:30:47] 22: /dev/md3 28G 24G 2.7G 90% / [12:31:50] they already have KAFKA_OPTS="-XX:GCLogFileSize=50M -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5" [12:32:01] so we should just restart them [12:33:03] right elukey [12:34:13] elukey: one after the other, and with possible rest time in between to facilitate partition sync :) [12:37:01] yep sure [12:38:48] joal, mforns: restarted 1014 [12:39:12] elukey, do we need to restart EL? [12:39:32] mforns: let's wait for the 3 restart instead? [12:39:52] I see [12:40:34] mforns: no dataloss while waiting, right ? [12:40:54] joal, not now [12:41:11] That's what I thought [12:41:12] but before, when events weren't making it into Kafka yes, I guess [12:42:13] oh! this week it's me in ops-duty! [12:42:34] hehehe, cool, I'll write the incident page [12:42:44] mforns: before, you mean when using 0mq? [12:42:47] :) [12:42:59] no, before, when we had the kafka outage [12:43:20] only 1020 and 1012 were broken, so I think we have not lost any data :) [12:43:27] The power replication :) [12:43:33] joal, oooh, makes sense [12:44:00] so when we restarted the other nodes, they all resync'd? [12:44:12] Will still be good to monitor, but I think we are ok :) [12:44:19] they did :) [12:44:27] ok, as part of ops-duty I'll also check the db [12:45:06] mforns: looking at 6 hours behind in https://grafana.wikimedia.org/dashboard/db/kafka is interesting to notice what happened [12:46:38] joal, I don't understand these graphs, they doesn't seem to match the problem we had.. [12:48:00] batcavce ? [12:48:11] sure [12:49:40] 1018: /dev/md0 28G 2.8G 24G 11% / :) [12:50:01] joal: I don' think that we need to relaunch a leader election, I am seeing three leaders in --describe [12:50:09] maybe in the end of the restarts? [12:52:32] 1014 and 1018 seems not leader at all [12:52:39] https://grafana.wikimedia.org/dashboard/db/kafka?panelId=20&fullscreen [12:53:21] I would launch kafka preferred-replica-election [12:54:01] helllooooo [12:54:08] :D [12:54:22] elukey: wait for having restarted the other nodes :) [12:54:29] and elect after [12:55:13] all right :) [12:56:34] elukey: you can continue to restart [12:56:36] :) [12:58:09] yep already doing 13 [12:59:35] /dev/md0 28G 2.9G 24G 11% / [13:01:56] I'll wait 5 min then I'll restart 22 [13:08:37] 1022 restarted /dev/md3 28G 3.1G 23G 12% / [13:09:25] joal, mforns: restarts completed, I am going to run kafka preferred-replica-election on 1022 [13:09:36] elukey, will restart EL then [13:09:39] elukey: ensure parittions in sync firsst ) [13:09:43] ok [13:10:41] joal: isr have three brokers for each topic [13:10:43] as the other nodes : [13:10:44] :) [13:10:50] cool [13:11:22] elukey: just double checked that you had that checked before reelecting leaders [13:11:53] yep, even before restarting [13:12:12] one question: should I ran sudo kafka preferred-replica-election on one of the nodes? [13:12:27] https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Kafka/Administration#Safe_Broker_Restarts is not super clear [13:12:46] reelction is needed, yes - The command can happen in any node [13:13:09] reelection happens fior the full cluster, so just nees to launch it once [13:14:18] mmmmm it doesn't work [13:14:40] it prints the help [13:14:53] error in command :) [13:15:28] ah not with sudo [13:15:29] :) [13:15:47] all right all done [13:16:12] mforns: restaaaaaaaaaaart :) [13:16:15] ok! [13:16:46] !log restarted EventLogging after Kafka leader reelection [13:20:45] joal, mforns: would you mind if I grab something to eat? [13:20:50] brb in 40mins [13:20:55] np at all! [13:21:01] elukey: please :) [13:21:17] all right, everything seems good from the dashboards :) [13:21:20] talk with you in a bit! [13:21:26] elukey: first outage solved, please have some food ;) [13:21:33] \o/ [13:48:53] a-team, I'm away for some time, see you in a bit [13:49:01] ok joal, cya [14:08:26] (PS1) MarkTraceur: Fix cross-wiki upload script [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/266224 [14:09:44] (Abandoned) MarkTraceur: Flesh out SQL, tweak configuration [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/237134 (owner: MarkTraceur) [14:17:40] Analytics-Kanban, Research-and-Data: Research Spike: Article Title normalization contains weird chars [8 pts] {hawk} - https://phabricator.wikimedia.org/T108867#1961722 (mforns) I wrote a page in Wikitech commenting the conclusions I drew: https://wikitech.wikimedia.org/wiki/Analytics/Data/Redirects [14:18:03] back! [14:34:37] https://grafana.wikimedia.org/dashboard/db/kafka looks very nice [14:35:16] mforns: I've read above that you are planning to add a wikipage with a post-mortem [14:35:19] danke [14:35:51] helllooooooooooo ottomata :) [14:36:37] hiyyaa [14:36:53] I am going to send an email with all the details so you guys we'll be up to speed [14:37:01] 10/15 mins and it should be ready [14:40:37] woah kafka funkyness happened? [14:40:42] yesss [14:40:54] ENOSPACE left [14:41:22] oh is that what this is about? [14:41:23] https://gerrit.wikimedia.org/r/#/c/266203/ [14:41:43] yikes [14:41:50] no alarms at all? [14:41:50] wow [14:43:47] yes! [14:46:45] Analytics-Kanban, DBA, Patch-For-Review: Pending maintenance on the eventlogging databases (db1046, db1047, dbstore1002, other dbstores) - https://phabricator.wikimedia.org/T120187#1961748 (Ottomata) Cool, totally missed this on Saturday! Starting them now... [14:51:51] Analytics-Kanban: Rotate kafka GC logs - https://phabricator.wikimedia.org/T124644#1961749 (elukey) p:Unbreak!>High [14:54:40] !log restarting eventlogging mysql consumers after TokuDB conversion has finished [14:55:17] ottomata: email sent [14:55:26] ah k [14:55:33] oh thats what you were sending email about [14:55:34] k [15:05:16] hmmmm [15:05:21] i just restarted el mysql consumers [15:05:33] and i am worried that they did not start from where they left off, but from the latest offset [15:07:25] Analytics-Kanban: Rotate kafka GC logs - https://phabricator.wikimedia.org/T124644#1961774 (elukey) 1014, 1018 and 1022 Kafka brokers were restarted and cleaned from the huge logs. Last action was to run kafka preferred-replica-election to rebalance the partition leaders. EL was restarted again. [15:08:20] :( [15:08:49] is there an SQL query that we can run later on to quickly identify if there is a hole or not? [15:11:27] yeah i'm seeing data inserted for now [15:11:35] but none is in the db from over the weekend [15:11:56] i'm currently just going to find an offset from shortly before when we stopped the mysql consumers [15:12:02] and set up a separate process to backfill from three [15:12:03] there [15:12:44] I would be curious to know how you are going to start the process, but only when you have time [15:17:26] Analytics-Tech-community-metrics, DevRel-January-2016: Make GrimoireLib display *one* consistent name for one user, plus the *current* affiliation of a user - https://phabricator.wikimedia.org/T118169#1961804 (Lcanasdiaz) I thought having more than one affiliation at top-contributors was a requirement and... [15:28:20] elukey: i'm not totally sure how yet, i thought i could pass an offset from whith to start to the pykafka consumer [15:28:56] ah so you want to kick off manually a python process with ad hoc parameters [15:29:04] yes [15:33:44] hm, ok, i can't seem to do it, but i can consume from kafka and pipe to eventlogging consumer stdin... [15:33:52] might use kafkatee to do this, so i can save offsets [15:42:43] (PS1) MarkTraceur: Expand to run on all wikis [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/266240 [15:51:52] Analytics, MediaWiki-extensions-WikimediaEvents, The-Wikipedia-Library, Wikimedia-General-or-Unknown, Patch-For-Review: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119#1961889 (Sadads) @Legoktm I don't think its an issue, persay, we are really mostly concerned a... [15:54:12] Analytics-Tech-community-metrics, DevRel-January-2016: Make GrimoireLib display *one* consistent name for one user, plus the *current* affiliation of a user - https://phabricator.wikimedia.org/T118169#1961892 (Aklapper) >>! In T118169#1961804, @Lcanasdiaz wrote: > I thought having more than one affiliatio... [16:00:41] ottomata: do we have the event bus meeting? [16:01:19] yes [16:06:14] nuria: coming? [16:42:46] ottomata: can you undo changes on puppet and restart EL? [16:42:58] undo changes? [16:43:24] ottomata: I thought you diosabled puppet? [16:43:31] nooo [16:43:34] eh? [16:43:45] nuria: mysql consumers are running now [16:43:54] but, they did not backfill automatically like they should ahve [16:43:56] not yet sure why [16:43:59] right now i'm trying to backfill [16:44:22] mmm.. did they started from a different offset? [16:44:27] than you expect them to do? [16:44:37] they seemed to have started from the end...today [16:45:01] before the eventbus meeting, i was trying to backfill directly from kafka, but it seemed to only partially backfill? not yet sure [16:45:06] am investigating [16:45:12] i may try to backfill from all-events.log files [16:52:17] ottomata: I see, but consumers were stopped correct? did you looked at whether they might have started and thus recommit offsets? [16:52:25] ottomata: logs should be able to tell us that [16:53:14] ottomata: if puppet was not stopped could a run of puppet have restarted consumers? [16:54:41] nuria consumers were stopped [16:54:43] i puppetized their removal [16:54:46] over the weekend [16:54:51] only today did I repuppetize them [16:54:54] puppet was not stopped [16:54:56] ottomata: ah [16:54:58] mforns: Hey, mind looking at some of my limn-multimedia-data patches so I can start recalculating some data overnight? [16:55:13] ottomata: i see [16:55:18] ottomata: then i do not get it [16:55:24] nuria: ja ,haven't yet looked into why consumrers did not start from where they left off [16:55:32] MarkTraceur, sure! please, add me as a reviewer :] [16:55:33] perhaps offsets get stale and are removed if no co mmits come in? [16:55:35] KK [16:55:43] i'm just trying to get backfilling started asap right now [16:55:51] ottomata: i have stopped consumers before without trouble i think , not for this long though [16:55:55] rigiht [16:55:55] {{done}} [16:55:56] same [16:55:59] thx MarkTraceur [16:56:10] No, thank you! :) [16:56:40] MarkTraceur, I have a couple meetings for the next 2 hours, but I'll try to do it in between [16:56:49] Sure [16:57:02] If milimetric shows up at all I might bother him about it [16:57:25] But I guess it's not a huge rush, I just don't have any other reviewers on the repository yet...I guess I should probably add some [17:01:48] Analytics-Kanban, operations, HTTPS, Patch-For-Review: EventLogging sees too few distinct client IPs {oryx} [8 pts] - https://phabricator.wikimedia.org/T119144#1962106 (Nuria) Open>Resolved [17:07:54] Analytics-Kanban, DBA, Patch-For-Review: Pending maintenance on the eventlogging databases (db1046, db1047, dbstore1002, other dbstores) - https://phabricator.wikimedia.org/T120187#1847396 (Nuria) We are backfilling evens that were not inserted during scheduled maintenance window. [17:18:27] * MarkTraceur waves at milimetric [17:18:37] I added you on some patches, mforns too, y'all can fight over them [17:18:47] I have to delete some tainted data after at least one of them gets merged [17:20:22] (CR) Madhuvishy: Add split-by-os argument to AppSessionMetrics job (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/264297 (https://phabricator.wikimedia.org/T117615) (owner: Mforns) [17:21:49] Analytics-Kanban, Patch-For-Review: Remove queue of batches from EventLogging [8 pts] {oryx} - https://phabricator.wikimedia.org/T121151#1962161 (Nuria) Open>Resolved [17:22:33] joal: ottomata as an update: 1. I wrote an LDAP authenticator for JupyterHub, and it also allows us to restrict access by group, 2. Ops didn't want us using npm / node stuff directly yet, so I just rewrote the nodejs parts of jupyterhub yesterday :D 3. Only thing left for fully securing it is a docker api whitelisting proxy, but that can probably be done this week. [17:22:50] awesooome [17:23:02] YuviPanda: i havent' worked on the spark thing since the all staff [17:23:07] YuviPanda: you ROKC ! [17:23:17] :D [17:23:27] ROCK, ROKC, ORCK, whatever :) [17:23:32] so right not [17:23:34] *now [17:23:38] you still need an ssh tunnel to access it [17:23:47] because I don't know if misc-varnish allows websockets through [17:23:54] I didn't know I could do that ! [17:24:20] do what? [17:24:36] ssh tunnel and use [17:24:38] ah [17:25:21] Meetings tonight, but I'll bug you to know how :) [17:25:24] YuviPanda: --^ [17:25:44] ssh -L 8000:analytics1021.eqiad.wmnet:8000 stat1003.eqiad.wmnet [17:25:51] if you do that on your console right now [17:25:55] and then hit [17:25:59] localhost:8000 [17:26:06] you will actually see a jupyterhub [17:26:11] that you can authenticate via LDAP [17:26:22] it doesnt' spawn yet, working on that next [17:26:36] Thanks amillion YuviPanda :) [17:28:48] mforns: am going to rest of ops meeting... [17:29:01] ottomata, ok [17:29:12] ottomata, when do you want to try and backfill [17:29:15] ? [17:31:23] mforns: i'm prepping now [17:31:43] copied and trimming some files on eventlog1001 in /srv/backfill_files/otto-2016-01-25 [17:31:52] (CR) Mforns: Divide app session metrics job into global and split (1 comment) [analytics/refinery] - https://gerrit.wikimedia.org/r/264292 (https://phabricator.wikimedia.org/T117615) (owner: Mforns) [17:32:13] ottomata, do you want me to skip tasking? [17:32:35] Analytics, Editing-Analysis: Queries for the edit analysis dashboard failing since December 2015 - https://phabricator.wikimedia.org/T124383#1962190 (Milimetric) Hm... all of a sudden around that time we've had lots of db issues. We were busy putting out fires last week, so this seems to me like another... [17:32:42] oh tasking is happening [17:32:47] man, ops meeting conflicts with everything?! [17:32:48] naw its ok [17:32:55] Analytics-Kanban, Editing-Analysis: Queries for the edit analysis dashboard failing since December 2015 - https://phabricator.wikimedia.org/T124383#1962192 (Milimetric) p:Triage>High a:Milimetric [17:33:04] hmm, mforns maybe some of it [17:33:13] YuviPanda: Tried to connect, either I have forgotten my pwd, either there seem to to have an issue :)( [17:33:16] i'll ping you when i'm ready to start and see [17:33:28] ottomata, ok let's do that [17:33:33] joal: > [W 2016-01-25 17:32:21.092 JupyterHub ldapauthenticator:74] Invalid username [17:33:44] joal: you should use your shell name [17:34:38] ofc, you'll get a 500 error after logging in [17:34:43] I think that's what I do :( [17:34:45] because I don't have a spawner yet [17:34:51] AH, ok :) [17:34:55] joal: what are you using? [17:34:59] as your username? [17:35:08] joal [17:35:11] :) [17:35:29] hmm [17:35:39] [W 2016-01-25 17:32:41.096 JupyterHub ldapauthenticator:104] Invalid password [17:35:47] joal: so that's what it said the last time :) [17:35:51] Ok ... I need to recall that pass then [17:36:00] sorry :S [17:36:03] :) np! [17:36:20] I just verified it works for me [17:36:25] with my wikitech password [17:36:25] Analytics-Kanban: Prepare presentation quaterly review [3 pts] - https://phabricator.wikimedia.org/T123528#1962214 (Milimetric) [17:38:13] Analytics-Kanban, Patch-For-Review: Remove LegacyPageviews from vital-signs [3 pts] - https://phabricator.wikimedia.org/T124244#1962226 (Milimetric) [17:38:56] Analytics-Kanban, Patch-For-Review: Remove LegacyPageviews from vital-signs [3 pts] - https://phabricator.wikimedia.org/T124244#1950171 (Milimetric) a:Nuria [17:43:37] Analytics-Kanban, Editing-Analysis: Queries for the edit analysis dashboard failing since December 2015 [5 pts] - https://phabricator.wikimedia.org/T124383#1962236 (Milimetric) [17:44:18] Analytics-Kanban: Rotate kafka GC logs [3 pts] - https://phabricator.wikimedia.org/T124644#1962239 (Milimetric) [17:44:59] mforns: i'm about to start backfilling from /srv/backfill_files/otto-2016-01-25/splits/20160122.1453393669 [17:45:30] going to run [17:46:23] on each of the splits pipe into eventlogging-consumer stdin:// mysql://... [17:47:57] Analytics-Kanban: Edit schema needs purging, table is too big for queries to run (500G before conversion) - https://phabricator.wikimedia.org/T124676#1962252 (Nuria) NEW [17:48:02] Analytics-Kanban: Edit schema needs purging, table is too big for queries to run (500G before conversion) - https://phabricator.wikimedia.org/T124676#1962260 (Nuria) [17:48:54] Analytics-Kanban, RESTBase: Update AQS config to new config format {melc} [5 pts] - https://phabricator.wikimedia.org/T122249#1962266 (Milimetric) [17:51:49] mforns: looking good i think, if this split goes well, i'll look at parallelizing for the others [17:51:54] i'm seeing old events come into the db [17:52:02] ottomata, awesome! [17:52:49] ottomata, the removal of the queue will help in backfilling also I think [17:53:16] just one thread, blocking consumption while inserting... [17:53:26] actualy, mforns i had a q about that [17:53:35] aha [17:53:35] i see things like this [17:53:37] 2016-01-25 17:53:21,909 (MainThread) Inserted 168 Edit_13457736 events in 0.704253 seconds [17:53:37] 2016-01-25 17:53:21,927 (MainThread) Inserted 4 Edit_13457736 events in 0.017552 seconds [17:53:37] 2016-01-25 17:53:22,293 (MainThread) Inserted 82 Edit_13457736 events in 0.363897 seconds [17:53:37] 2016-01-25 17:53:22,327 (MainThread) Inserted 7 Edit_13457736 events in 0.033619 seconds [17:53:37] 2016-01-25 17:53:22,572 (MainThread) Inserted 51 Edit_13457736 events in 0.243748 seconds [17:53:38] 2016-01-25 17:53:22,656 (MainThread) Inserted 21 Edit_13457736 events in 0.082975 seconds [17:53:55] they happen all at once, you can see the log timestamps there [17:54:00] that's the same schema [17:54:06] why wouldn't those be inserted in the same statement? [17:54:17] yes, that's because they have different sets of columns [17:54:22] OHHH [17:54:22] right [17:54:29] some events have optional fields [17:54:33] ok [17:55:30] quick question: do you guys have an idea why https://gerrit.wikimedia.org/r/#/c/266264 is pending of Jaime's commit? It was before mine when I ran git review but it doesn't make sense. I can't add reviewers [17:57:13] Analytics-Kanban: Edit schema needs purging, table is too big for queries to run (500G before conversion) {oryx} [8 pts] - https://phabricator.wikimedia.org/T124676#1962293 (Milimetric) [17:58:57] Analytics-Kanban: Productionize last access jobs for daily and monthly calculations {bear} - https://phabricator.wikimedia.org/T122514#1962311 (Nuria) [18:00:40] mforns: why do we batch them by optional fields? [18:00:45] Analytics-Kanban: Productionize last access jobs for daily calculations {bear} [3 pts] - https://phabricator.wikimedia.org/T122514#1962317 (Milimetric) [18:00:47] those fields dont' get inserted to the db, do they>? [18:01:09] because the insertion statement can not be formulated with different sets of columns [18:01:17] and there's no way to force an empty value [18:01:22] empty != null [18:01:27] I think [18:01:32] IIRC [18:01:41] oh, i see, those fields are inserted [18:01:47] if they are present [18:01:53] aha [18:02:07] hmm, you can do that with mysql, if the fields have a default value, no? [18:02:37] and if the column names for the values are specifeid in the insert statement [18:02:47] Analytics-Kanban: Productionize last access jobs for monthly calculations {bear} [? pts] - https://phabricator.wikimedia.org/T124678#1962321 (Milimetric) NEW a:JAllemandou [18:02:52] ok, gonna make some lunch [18:02:58] ottomata, you're right, that is a TODO in the code [18:03:00] Analytics-Kanban: Productionize last access jobs for daily calculations {bear} [8 pts] - https://phabricator.wikimedia.org/T122514#1962333 (Milimetric) [18:03:18] but we should talk to all schema owners to add default values to their fields [18:03:19] ... [18:03:49] ottomata, or else, we could batch by schema AND set of fields [18:05:32] Analytics-Kanban: Productionize last access jobs for daily calculations {bear} [8 pts] - https://phabricator.wikimedia.org/T122514#1962355 (Nuria) Please take a look at past numbers: https://docs.google.com/spreadsheets/d/1xfDOjEX9WTf0HJQGR-Zt7h1kCQpMm5D91zkcZIK-Edc/edit#gid=1104950651 Also we should remove... [18:06:09] Analytics-Kanban: Research and validate assumptions for pageview sanitization with Research Team [8 pts] {mole} - https://phabricator.wikimedia.org/T120640#1962358 (JAllemandou) [18:06:13] ottomata: we decided to not do that [18:06:50] Analytics-Kanban: {mole} Data Security and Privacy - https://phabricator.wikimedia.org/T124679#1962361 (Milimetric) NEW [18:06:53] ottomata: as defaults on db did not translated well to json schema intentions, if [18:06:58] i rememeber it right [18:07:10] Analytics-Kanban: {mole} Data Security - https://phabricator.wikimedia.org/T124679#1962370 (Milimetric) p:Triage>Normal [18:12:08] that's fine w me, just make all fields DEFAULT NULL in mysql [18:12:29] their going to be that anyway in current case [18:16:34] Analytics-Kanban: Modify wikimetrics local install to account for recent changes to puppet to get rid of the self-hosted puppet master {dove} [13 pts] - https://phabricator.wikimedia.org/T123749#1962419 (Milimetric) [18:17:45] mforns: whatcha think? https://gerrit.wikimedia.org/r/#/c/266264/ [18:17:47] looks good to me [18:18:00] Analytics-Kanban: Modify wikimetrics local install to account for recent changes to puppet to get rid of the self-hosted puppet master {dove} [13 pts] - https://phabricator.wikimedia.org/T123749#1962428 (madhuvishy) a:madhuvishy [18:19:49] ottomata, will look in 10 mins [18:20:05] Analytics: Add wm:BE-Wikimedia-Belgium to Wikimetrics tags {dove} - https://phabricator.wikimedia.org/T124492#1962439 (Milimetric) p:Triage>Normal [18:20:58] Analytics, EventBus: Continuous Integration for mediawiki/event-schemas - https://phabricator.wikimedia.org/T124438#1962443 (Milimetric) p:Triage>Normal [18:22:16] Analytics, Continuous-Integration-Infrastructure, Patch-For-Review: Add json linting test for schemas in mediawiki/event-schemas - https://phabricator.wikimedia.org/T124319#1962462 (Milimetric) p:Triage>Normal [18:22:59] Analytics: Make Dashiki get pageview data from pageview API {melc} - https://phabricator.wikimedia.org/T124063#1962467 (Milimetric) p:Triage>Normal [18:28:29] Analytics, Editing-Analysis, Editing-Department: Consider scrapping Schema:PageContentSaveComplete and Schema:NewEditorEdit, given we have Schema:Edit - https://phabricator.wikimedia.org/T123958#1962490 (Nuria) Actually we do not want to have large -catch it all schemas bur rather distinct schemas per... [18:29:57] Analytics, Wikipedia-Android-App: Beta Event Logging no longer functional - https://phabricator.wikimedia.org/T123781#1962502 (Milimetric) Open>Resolved a:Milimetric deployment-eventlogging03 is working now. [18:30:20] Analytics-Kanban, Wikipedia-Android-App: Beta Event Logging no longer functional {oryx} - https://phabricator.wikimedia.org/T123781#1962506 (Milimetric) [18:31:46] Analytics-Kanban, Wikipedia-Android-App: Beta Event Logging no longer functional {oryx} - https://phabricator.wikimedia.org/T123781#1962510 (Niedzielski) @Milimetric, it looks like there's no `/var/log/eventlogging/all-events.log`. [18:34:34] Analytics-Kanban: Polish script that checks eventlogging lag to use it for alarming - https://phabricator.wikimedia.org/T124306#1962517 (elukey) https://gerrit.wikimedia.org/r/#/c/266264 [18:35:03] Analytics-Kanban: Tune eventlogging.overall.inserted.rate alert to use a movingAverage transformation - https://phabricator.wikimedia.org/T124204#1962520 (elukey) https://gerrit.wikimedia.org/r/#/c/266264 [18:41:34] Analytics-Kanban: Edit schema needs purging, table is too big for queries to run (500G before conversion) {oryx} [8 pts] - https://phabricator.wikimedia.org/T124676#1962542 (Neil_P._Quinn_WMF) I imagine purging refers to the deletion of old rows? And I imagine you have the tools to do automatic purging alrea... [18:42:32] Analytics-Kanban: Edit schema needs purging, table is too big for queries to run (500G before conversion) {oryx} [8 pts] - https://phabricator.wikimedia.org/T124676#1962552 (jcrespo) > it's just a matter of us deciding how long we want to retain the logs Exactly that. [18:45:35] Analytics-Kanban: Edit schema needs purging, table is too big for queries to run (500G before conversion) {oryx} [8 pts] - https://phabricator.wikimedia.org/T124676#1962570 (Nuria) @Neil_P._Quinn_WMF : Correct, for most schemas we purge after 90 days [18:48:04] mforns: I am now parallelzing inserts for all of the events > than 2016-01-21 16:28 in file all-events.log-20160122 [18:48:12] will let that finish before i make this more seamless [18:48:35] there are currently 6 mysql backfilling processes happening [18:48:50] hopefully mysql will be ok :/ [18:48:51] :) [18:48:55] so far it seems fine [18:48:59] a-team, I'm off for diner, back for some time after [18:49:16] laters! [18:50:01] I am getting worried, but for somethings unrelated :-/ [18:50:31] but if you are happy now, I am happy too [18:51:29] bye joal ! [18:51:33] jynus: ja let me know of m4 master is unhappy [18:53:15] a-team: going offline, talk with you tomorrow! [18:53:30] latesr! [18:55:42] bye elukey ! [18:58:41] jynus: inserts seem pretty fast, is that new or am I just remembering wrong [18:58:42] ? [18:59:43] this could be new! [18:59:56] remember that we did the maintenance for something [18:59:59] heheh [19:00:08] tokudb is optimized for faster bulk inserts [19:00:10] nice [19:00:11] yeah [19:00:26] Inserted 3000 PageContentSaveComplete_5588433 events in 2.267616 seconds [19:00:26] also, 600GB saved means less IO [19:00:57] also, i have purging disabled for now, that may also be a thing [19:01:01] oh ja [19:01:07] might be good to keep that off wil we are done backfilling [19:01:08] but specially that table was converted [19:01:11] yep [19:01:28] and the difference in very large tables could be huge [19:01:38] please tell me if you see something strange [19:01:48] my worry is that I have see some stalls [19:02:09] as far as I can tell everything looks great [19:02:10] ping me if it freezes at some point [19:02:18] ok [19:02:20] I think it was temporary [19:02:27] but I want to be sure [19:02:55] toku does a lot of lazy actions, and sometime it says it has finised but it is doing lots of things in the background [19:02:58] jynus: insert rate is around 2K+ / sec right now [19:03:08] not surprised [19:03:16] i could make it more :D [19:03:17] can I? [19:03:24] or is that a good rate to keep it at [19:03:39] is that INSERTs or rows? [19:03:42] rows [19:04:00] we could get 100K/s easily [19:04:13] awesome, k, they are batched within that differently [19:04:19] that is overall events per sec inserted [19:04:20] now that I think [19:04:24] I have made a mistake [19:04:28] oh? [19:04:44] I left it with the old configuration, which means a large innodb buffer [19:05:07] we are wasting memory there, maybe [19:05:09] oh, ok. if you want to restart, let me finish with this set of backfilling [19:05:13] and we can stop consumers so you can restart [19:05:15] not yet [19:05:17] k [19:05:35] maybe giving it a week, do a very quick restart [19:05:39] ok [19:05:50] I want to evaluate how it is going [19:06:03] but please share the performance improvement already :-) [19:08:10] halfak: So I'm thinking about the TSV your script generated of images on pages, and I'm wondering if it would be better to have that data in a SQL table on stat1003 so I could query it instead of trying to write a secondary TSV-handling script to make it into data that would work for what I want [19:08:33] I'm curious about your thoughts, I've never created a table on stat1003 and I'm not sure what that process entails or if it would be too much overhead [19:09:08] "Table 'log.MobileWabClickTracking_5929948' doesn't exist" interesting [19:09:11] MarkTraceur, +1. I made the TSV format natively usably my MySQL [19:09:48] * halfak should really write the "so you want to load a TSV into a table on analytics-store" page of Wikitech [19:09:58] Yes, yes you should [19:10:05] I figure it will require permissions I don't yet have [19:10:20] Which is fine, but it means a request for said permissions probably (or maybe this is that?) [19:11:53] MarkTraceur, halfak http://dbahire.com/testing-the-fastest-way-to-import-a-table-into-mysql-and-some-interesting-5-7-performance-results/ [19:12:07] heh [19:12:10] that's strnage [19:12:12] strange [19:12:37] so, I'd use mysqlimport rather than a custom python script. [19:12:48] Yeah, that's fine with me [19:13:07] Probably negligible difference in performance given it's stat1003, but I'll be nice™ about it [19:13:07] So, you don't want to use the commands that they have here [19:13:09] which is the right way, as I mentioned there [19:13:14] halfak: http://blogs.microsoft.com/next/2016/01/25/microsoft-releases-cntk-its-open-source-deep-learning-toolkit-on-github/ btw (apropos of nothing) [19:13:21] You can't do "load data infile" [19:13:43] You'll need to use the `mysqlimport` utility with `--local` set [19:13:53] which is the same thing [19:13:57] "same" [19:14:31] halfak: Sidenote, obviously stat1003 should alias minnesota-nice to nice -n 19 [19:15:52] LOAD DATA [LOCAL] INFILE [19:15:52] halfak: Anyway, if you have any documentation about what permissions I need and/or what I need to do to create a table that's specific to stat1003 then I can follow it whenever [19:17:02] * halfak lets jynus take this one [19:17:41] halfak, https://github.com/mysql/mysql-server/blob/5.7/client/mysqlimport.c#L224 [19:17:42] (CR) Mforns: [C: -1] Expand to run on all wikis (1 comment) [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/266240 (owner: MarkTraceur) [19:18:12] do not try to teach mysql to the dba --Abraham Lincoln [19:18:18] hahaha [19:18:54] * ori adds to tools.wmflabs.org/bash [19:19:14] heheh [19:20:00] jynus, would you like me to continue to help getting this TSV loaded or would you like to? [19:21:08] I can help [19:21:42] (CR) MarkTraceur: Expand to run on all wikis (1 comment) [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/266240 (owner: MarkTraceur) [19:22:06] jynus: Are you going to be around for at least another hour? I'm going to drive to a coffee shop real quick [19:22:18] yes [19:22:44] I do it all the time for labs users, I do not see why not for analytics/research [19:33:15] nuria: joining meeting? [19:33:21] madhuvishy: yesss [19:35:05] heading to post office, back shortly [19:35:24] hmm, actually, no i'll wait to start the next backfilling process [19:38:10] Analytics, DBA: Set up bucketization of editCount fields {tick} - https://phabricator.wikimedia.org/T108856#1962995 (jcrespo) [19:38:12] Analytics-EventLogging, DBA: db1046 innodb signal 6 abort and restart - https://phabricator.wikimedia.org/T104748#1962991 (jcrespo) Open>Resolved Most of the issues if not all have been fixed with T120187 or are now irrelevant. [19:43:58] Analytics-Kanban, DBA, Patch-For-Review: Pending maintenance on the eventlogging databases (db1046, db1047, dbstore1002, other dbstores) - https://phabricator.wikimedia.org/T120187#1963036 (jcrespo) So, some of the pending tasks to discuss/solve: * With the new application configuration (almost all ta... [19:50:01] OK jynus, let's do this thing [19:51:46] jynus: Currently I have a TSV at /home/marktraceur/illustrations/historical-illustration-enwiki.2015-09-09.tsv that is Big™ [19:52:28] jynus: I can import it into wherever, but I'd like to do this for other wikis too, so either a single database with tables for each wiki (meh) or adding tables to each wiki's database? I'm not sure which is better or more in line with what we already do [19:54:42] Hm, having it in the wiki's database might be best, because I'm going to need to join on the wiki's revisions table probably...the data isn't complete, I'll need context for it [19:57:10] hey jynus [19:57:14] just though of something [19:57:30] i think the eventlogging_sync script will not do replication bakfill properly [19:57:31] will it? [19:57:45] it gets the max timestmap from a table [19:57:47] in the slave [19:57:55] and then gets data greater than that timestamp in the master [20:02:13] yep [20:03:04] MarkTraceur, import it where? [20:03:30] jynus: I don't know, either a table on the enwiki database on stat1003 or a new database there [20:03:44] no, wiki databases do not get touched [20:03:51] it has to be on a separate database [20:03:53] jynus: Really I just need it to be in mysql so I can join it with enwiki.revisions and figure out some context [20:04:03] you can still do joins between databases, no worries [20:04:07] OK cool [20:04:19] this is mysql, not poor postgres [20:04:20] I'm not the DBA, and I didn't presume to teach the DBA how to use mysql ;) [20:04:22] jynus: but there are already timestamps from today [20:04:45] jynus: So a new database, fine, but will I have access to stick more data in it? [20:05:06] well, I would be the one providing that access, so you better talk to me [20:05:13] :-) [20:05:19] I assume the relevant access group is "statistics-users" but I'm not sure if that's correct [20:05:55] is is stat1003 that you use? [20:06:15] ottomata, what about filling them using primary keys? [20:06:25] will that work? [20:06:48] Yeah, I'm using stat1003 currently for testing and writing temporary scripts [20:06:58] And I belieeeeeve that reportupdater is running there too [20:07:10] MarkTraceur: that is the proper group if you want stat1003 access [20:07:12] I'd probably be using reportupdater to generate results from my saved data [20:07:20] jynus: using primary keys? [20:07:22] I already have stat1003 access, but I think it's researcher access [20:07:28] but? [20:07:37] you mean you do not have mysql access? [20:07:57] Oh, I have both. [20:08:04] MarkTraceur: jynus, yeah, researchers lets you read a password file [20:08:07] on stat1003 [20:08:47] so you only need a database to be created and access for the research account to that db, right? [20:09:45] Right! [20:09:46] We generally use the "staging" database for everything because of some old policy against creating per-user databases. [20:10:06] that is actually bad [20:10:09] jynus, I wonder what you think of allowing per-user databases again? [20:10:13] Yeah. Totally bad. :) [20:10:20] look [20:10:30] for now, I can stick to that [20:10:36] * halfak misses the old days when he didn't need to prefix his table names with "halfak" because they were in the "halfak" database. [20:10:48] but soon, there should be individual accounts [20:10:51] \o/ [20:11:09] and at least db per account [20:11:09] \o/ [20:11:18] I said there should [20:11:25] I do not know when or how [20:11:55] the question is that sharing accounts is more dangerous than having just one [20:12:37] I'm not thinking of a user-specific database, I was thinking it should be separate because there will be a table for each wiki (eventually) (probably) [20:12:39] we need to integrate LDAP for that, so do not get too excited [20:12:56] It's not my own, it's useful data about the pages that doesn't exist currently [20:13:07] let me see the current databasees and grants [20:13:14] * halfak gets back to his model evaluation statistics. [20:13:45] also, I need to blame someone when queris go crazy [20:15:22] I try to blame Ironholds or RoanKattouw, whoever's not in the room at the time [20:15:22] And neither of them are currently, so we can blame both [20:15:22] so, technically you have permissions already for staging, and datasets [20:15:28] should this be under a different name? [20:17:02] jynus: I would suggest putting it somewhere else, it'll be substantial, and there will be a lot of tables [20:17:11] I don't want to clutter either of those unnecessarily [20:17:14] But it's up to you [20:17:19] I'm ok, databases are free [20:19:25] what it is a good name for this? [20:19:51] jynus: I've been calling it "illustrations" but I'm open to other ideas [20:19:55] halfak: Thoughts? [20:20:32] Considering a database name for the project? [20:20:36] Yeah [20:21:11] Oh. Uhhh. Not sure what a good name would be. We should probably have a prefix on it that will show up in sorting [20:21:22] E.g. "project_..." or maybe "_..." [20:21:39] So "_illustrations." [20:22:02] We've not done per-project databases AFAIK, so I'm making things up that seem good. [20:22:33] I like project_ better, I think [20:22:54] project_illustrations [20:23:22] well, whatever that doesn't contain wik in the name [20:24:10] Heh, yeah [20:24:23] utf8 or will it contain wiki data? (wikis are stored in binary) [20:27:46] Hm [20:27:54] jynus: utf8, I'm only storing revision data really [20:28:19] Well, maybe page names [20:36:52] well, you can later decide the collation of you tables yourself [20:37:32] OK then [20:37:48] you should have access already, I am only puppetizing it [20:39:20] please note that if later we setup more databases, we may have to rename it to a proper patter, etc. [20:39:58] (CR) Mforns: [C: -1] Fix cross-wiki upload script (1 comment) [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/266224 (owner: MarkTraceur) [20:40:14] jynus: Well, that's fine [20:42:51] (CR) Mforns: Expand to run on all wikis (1 comment) [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/266240 (owner: MarkTraceur) [20:43:14] a-team AGH! [20:43:16] i found it [20:43:20] why offsets were not respected [20:43:26] offsets.retention.minutes [20:43:30] o.O [20:43:32] Log retention window in minutes for offsets topic [20:43:36] default [20:43:36] 1440 [20:43:39] 24 hours [20:43:39] :( [20:43:44] to the puppet for a fix! [20:43:46] aha [20:43:52] why the crap would anyone want that?! :p [20:45:10] (CR) MarkTraceur: Fix cross-wiki upload script (1 comment) [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/266224 (owner: MarkTraceur) [20:48:30] (PS2) MarkTraceur: Expand to run on all wikis [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/266240 [20:48:38] (CR) MarkTraceur: Expand to run on all wikis (1 comment) [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/266240 (owner: MarkTraceur) [20:49:02] (CR) MarkTraceur: "Also fixed cross-wiki-uploads because no wiki that is not Commons will have any numbers for it." [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/266240 (owner: MarkTraceur) [20:51:02] (CR) Mforns: [C: -1] Expand to run on all wikis (1 comment) [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/266240 (owner: MarkTraceur) [20:51:13] can you check that it works, MarkTraceur ? [20:51:39] (PS3) MarkTraceur: Expand to run on all wikis [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/266240 [20:51:45] mforns: OK it's been a long day. [20:51:49] jynus: Yeah, sec [20:51:51] hehe [20:52:17] hmm, actually not sure, because that is not an 0.8.2 setting, so i'm still confused but this seems related [20:52:28] MarkTraceur, did you want to add something to https://gerrit.wikimedia.org/r/#/c/266224/ ? Otherwise I'll merge it [20:52:30] jynus: Apparently research user doesn't have access, which is maybe expected [20:52:56] mforns: Maybe merge it after the patch that takes enwiki and dewiki out of the cwu config [20:53:09] Not that it matters a lot [20:53:35] hm, no it is in 0.8.2, just not documented? [20:53:38] ok [20:53:43] which server are you connecting to, MarkTraceur ? [20:54:01] or what are you writing on console? [20:54:09] jynus: Uhhh, analytics-store.eqiad.wmnet [20:54:17] mysql --defaults-file=/home/marktraceur/.my.cnf.relevant project_illustrations --host=analytics-store.eqiad.wmnet -A [20:54:29] I probably got that command, like, two years ago honestly [20:54:45] (CR) Mforns: [C: 2 V: 2] "LGTM!" [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/266240 (owner: MarkTraceur) [20:54:48] project_illustration without s [20:54:53] Ohhh christ [20:55:00] OK, better [20:55:08] works? [20:55:10] Yup! [20:55:21] jynus: So now create table enwiki and import them thar TSV, right? [20:55:30] then resolve https://phabricator.wikimedia.org/T124705 [20:55:41] I get payed $5000 per ticket resolved [20:55:48] Sounds right [20:55:54] (PS2) Mforns: Fix cross-wiki upload script [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/266224 (owner: MarkTraceur) [20:55:57] mforns: btw, i'm inserting around 3000 events / second :o [20:56:10] ottomata, O.o [20:56:13] wow [20:56:13] I should clarify that is a joke, just in case someone else is in the channel [20:56:14] Fixed a li'l typo too [20:56:35] {{done}} [20:56:53] The cash should be in your offshore account in a few minutes [20:56:54] a yes [20:57:25] so, you should be now be able to create table and LOAD data... I mean, mysqlimport [20:57:26] (CR) Mforns: [C: 2 V: 2] "LGTM!" [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/266224 (owner: MarkTraceur) [20:57:33] Right [20:57:38] right, or you need help with that? [20:57:44] :-P [20:57:45] No, I should be able to do it [20:57:54] See: "should" [20:58:00] But I'll figure it out one way or the other [20:58:10] haha, ok, for the time being I am going to talkt to other "clients" [20:58:20] Fair enough, cheers [20:58:21] ask for help if you cannot [21:00:56] (CR) Mforns: "LGTM! Please, rebase and I'll merge that." [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/265757 (owner: MarkTraceur) [21:01:02] milimetric: will you join? :-) [21:01:13] leila: trying - getting an error [21:02:31] MarkTraceur, the last patch is OK. But I can't rebase it from gerrit. Please, if you rebase it, I'll merge [21:02:39] Sure sure [21:03:36] Analytics, Reading-Web, Wikipedia-iOS-App-Product-Backlog: As an end-user I shouldn't see non-articles in the list of trending articles - https://phabricator.wikimedia.org/T124082#1963460 (BGerstle-WMF) another thing that's not obvious is how clients show pageview data across time zones. will need to... [21:04:00] jynus: lemme know when you want to talk about how we gonna get this data backfilled on slave [21:04:18] talk to me [21:04:20] (PS2) MarkTraceur: Add deletions script [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/265757 [21:04:30] mforns: Also fixed that so it's all wikis instead of just the three as before [21:04:36] jynus: so ja uh, i assume, if the eventlogging_sync.sh script is only looking for max timestamp on slave [21:04:41] that it is only selecting new data now [21:04:46] So that one will need to generate the commons numbers, and enwiki and dewiki, but otherwise it's fine [21:04:47] ottomata, there is a script for backfilling [21:04:51] because the max timestamp is greater than the data that is being inserted now [21:04:52] oh? [21:05:28] (CR) Mforns: [C: 2 V: 2] "LGTM!" [analytics/limn-multimedia-data] - https://gerrit.wikimedia.org/r/265757 (owner: MarkTraceur) [21:06:26] https://github.com/wikimedia/operations-software/blob/master/dbtools/eventlogging_resync.sh [21:06:37] probably it is as bad as the other one [21:07:31] a-team, I'm leaving for today! good night everyone! [21:08:27] looking [21:08:29] laters mforns! [21:15:02] ahh ok jynus, i see [21:15:15] it just tries a reinsert of everything in a given time range [21:15:22] and passes over duplicates with --insert-ignore [21:15:23] ja? [21:18:18] no, only those ranges with less rowns than on the master [21:18:43] if [ $mcount -ne $scount ]; then [21:20:03] since="$2" [21:20:24] that is probably a yes :-) [21:21:10] so the idea would be running it since thursday or so [21:21:41] although m2-master has to change to m4-master [21:21:56] and other small teaks [21:26:36] ah yeah [21:26:45] ok [21:26:54] well, i guess we can start that somewhere when this backfill is done, maybe some more hours [21:27:16] I can run that on iron [21:27:20] tommorrow [21:27:23] ok [21:27:46] jynus: Does research have access to import a file into a table? I created the table fine but for some reason it tells me permission denied when running mysqlimport [21:27:58] --local [21:28:00] MarkTraceur, ^ [21:28:10] Otherwise it's trying to load the file on the DB server itself. [21:28:10] Ugh [21:28:28] Got it [21:28:35] :) [21:28:42] Running!!! [21:28:55] Now I need to make the python script stick the data into the SQL table instead of a TSV [21:29:07] yes, you have no FILE permission to import from the server filesystem [21:29:10] MarkTraceur, wait... what? [21:29:15] Why write directly to MySQL? [21:29:30] halfak: Well, I don't need the TSVs, all of my analysis will be derived from the SQL tables [21:29:47] Oh! I'd still recommend writing to a file rather than directly to the DB. [21:29:54] Hm, care to elaborate? [21:29:57] And then importing when ready. [21:30:08] It seems odd to have so much data sitting on the filesystem if I can avoid it [21:30:19] Batch inserts will be faster. There's less likelihood that you'll accidentally write bad data or overwrite good data in the DB. [21:30:23] Ah, hm. [21:30:33] We have some big discs. [21:30:39] Can compress if it is an issue. [21:31:05] Sure [21:31:13] And I mean, I can also remove the old files after they're imported [21:31:47] well, I agree with 2 things there [21:31:54] batch importing will be faster [21:32:10] and analytics slave user tables are not backed up [21:32:42] Oh, shoot, I didn't nice the mysqlimport, foolish [21:32:56] MarkTraceur, won't matter. That doesn't use much CPU [21:33:09] All right! [21:33:14] Yay not messing up too badly [21:33:19] Really only need nice when you are running a giant CPU-heavy process. [21:33:37] Hence why it's in my history for image_diff.py [21:35:31] also try using db compression for large tables (TokuDB) [21:37:21] jynus: I hope I can do that after the fact... :) [21:38:24] ottomata: just reading about offsets.retention.minutes [21:38:29] aye [21:39:26] ottomata: I love it when things make sense [21:40:52] (PS2) Yuvipanda: Don't die when downloading a file named with unicode characters [analytics/quarry/web] - https://gerrit.wikimedia.org/r/266023 (https://phabricator.wikimedia.org/T123031) (owner: Alex Monk) [21:41:07] (CR) Yuvipanda: [C: 2] "<3" [analytics/quarry/web] - https://gerrit.wikimedia.org/r/266023 (https://phabricator.wikimedia.org/T123031) (owner: Alex Monk) [21:43:39] Analytics-Kanban: Edit schema needs purging, table is too big for queries to run (500G before conversion) {oryx} [8 pts] - https://phabricator.wikimedia.org/T124676#1963600 (Nuria) Please see: https://phabricator.wikimedia.org/T108850 [21:44:38] (Merged) jenkins-bot: Don't die when downloading a file named with unicode characters [analytics/quarry/web] - https://gerrit.wikimedia.org/r/266023 (https://phabricator.wikimedia.org/T123031) (owner: Alex Monk) [21:45:17] Analytics-Kanban, DBA, Patch-For-Review: Pending maintenance on the eventlogging databases (db1046, db1047, dbstore1002, other dbstores) - https://phabricator.wikimedia.org/T120187#1963604 (Nuria) @jcrespo: >Let's keep purging conversation to purging ticket: https://phabricator.wikimedia.org/T108850... [21:46:32] Analytics-Kanban, DBA, Patch-For-Review: Pending maintenance on the eventlogging databases (db1046, db1047, dbstore1002, other dbstores) - https://phabricator.wikimedia.org/T120187#1963609 (jcrespo) Not for the second. Maybe for the first, but it is not yet decided- I am coordinating with Otto. [21:49:31] (CR) Yuvipanda: [C: 2] When listing queries for user profiles, fall back to 'Untitled query #' if no title is set [analytics/quarry/web] - https://gerrit.wikimedia.org/r/266031 (https://phabricator.wikimedia.org/T101394) (owner: Alex Monk) [21:49:55] Krenair: I've merged your changes. let me know when you are around so I can help you dpeloy :) [21:49:57] *deploy [21:50:32] I'm around now [21:51:11] Krenair: ok, wanna deploy? [21:51:36] yep, what's the process? [21:51:50] Krenair: on your local checkout [21:51:53] just do [21:51:55] 'fab update_git' [21:51:55] Analytics-EventLogging, MediaWiki-API, Easy, Google-Code-In-2015, Patch-For-Review: ApiJsonSchema implements ApiBase::getCustomPrinter for no good reason - https://phabricator.wikimedia.org/T91454#1963616 (greg) Update on wmf.11 rollout: We had to rollback to wmf.10 last week (ie: right now all... [21:52:00] if that doesn't work you need to have fabric installed [21:52:04] Analytics-Kanban, DBA, Patch-For-Review: Pending maintenance on the eventlogging databases (db1046, db1047, dbstore1002, other dbstores) - https://phabricator.wikimedia.org/T120187#1963617 (Ottomata) If we just have to do an easy peasy restart of mysql to apply a config change, I think we do not need t... [21:52:56] Krenair: this assumes you've fabric installed [21:53:05] pip install fabric or equivalente with your package manager [21:53:06] yeah, installing now [21:53:20] The program 'fab' is currently not installed. You can install it by typing: [21:53:20] sudo apt-get install fabric [21:53:21] :) [21:53:31] ah [21:53:34] the ubuntu niceness [21:54:18] that program is still banned from labs I assume? [21:54:29] so I ran update_git? [21:54:43] well, 'banned' [21:54:45] s/?// [21:54:53] Krenair: yeah [21:55:05] Krenair: yeah, so that just did a git pull equivalent on all the 3 nodes [21:55:12] 'main' which has mysql, redis and webservers [21:55:15] and two runners that just run celery [21:55:22] now since this is just the webserver changes [21:55:27] you don't need to restart celery [21:55:30] so you can now just do [21:55:33] fab restart_uwsgi [21:55:37] and that'll restart the webserver [21:55:43] and should show you your changes [21:56:08] (PS3) Alex Monk: When listing queries for user profiles, fall back to 'Untitled query #' if no title is set [analytics/quarry/web] - https://gerrit.wikimedia.org/r/266031 (https://phabricator.wikimedia.org/T101394) [21:56:30] Analytics-Kanban, Editing-Analysis: Edit schema needs purging, table is too big for queries to run (500G before conversion) {oryx} [8 pts] - https://phabricator.wikimedia.org/T124676#1963632 (Neil_P._Quinn_WMF) [21:56:31] realised that patch didn't go in [21:56:46] (CR) Alex Monk: [C: 2] "re-applying yuvi's +2" [analytics/quarry/web] - https://gerrit.wikimedia.org/r/266031 (https://phabricator.wikimedia.org/T101394) (owner: Alex Monk) [21:57:00] oh, needed +2 [21:57:02] err [21:57:04] needed rebase [21:57:07] cool didn't notice [21:57:36] (Merged) jenkins-bot: When listing queries for user profiles, fall back to 'Untitled query #' if no title is set [analytics/quarry/web] - https://gerrit.wikimedia.org/r/266031 (https://phabricator.wikimedia.org/T101394) (owner: Alex Monk) [21:58:54] okay [22:02:21] Quarry, Patch-For-Review: Cannot download data from a query with Unicode characters in its title - https://phabricator.wikimedia.org/T123031#1963675 (Krenair) Open>Resolved [22:04:56] Quarry, Patch-For-Review: Query counter increases but draft query is not accessible when window is closed and query doesn't have a title - https://phabricator.wikimedia.org/T101394#1963690 (Krenair) Open>Resolved [22:06:35] Quarry: Number of queries shown in profile is wrong - https://phabricator.wikimedia.org/T86512#1963694 (Krenair) Sounds like T101394 to me - for some reason sometimes 'None' was shown, sometimes an empty string (which meant you couldn't use it). It no longer shows empty strings. [22:06:44] YuviPanda, right, thanks [22:07:16] Krenair: np! thanks for all the patches :D [22:07:43] there should be more to come :P [22:07:45] but when I have time [22:07:48] like maybe at the weekend [22:10:54] YuviPanda: in office? [22:11:03] madhuvishy: thinking of coming [22:11:06] Krenair: thanks! [22:11:11] madhuvishy: but probably not. [22:11:15] okay :) [22:11:26] madhuvishy: anything specific? [22:12:13] YuviPanda: we were talking about setting up a small version of labs db in a docker container for wikimetrics [22:12:28] and Dan was saying we can just use sqlalchemy to create the schemas [22:12:53] madhuvishy: schemas of what? [22:13:21] YuviPanda: the mediawiki table schemas are already in models - so we can just create the tables from that [22:13:39] oh [22:13:43] for wikimetrics atleast - and may be use the test setup fixtures to populate test data [22:13:44] in your wikimetrics models you mean? [22:13:46] yeah [22:13:49] yeah [22:13:49] you can probably do that sure [22:14:16] that would probably be the shortest path to getting this done without investing a lot of time [22:14:32] yeah [22:15:05] in general if we want a small updatable local copy of labsdb i am happy to work on it - but may be outside of this [22:16:39] madhuvishy: yeah [22:17:41] YuviPanda: okay cool. also - I need to turn on AllowOverride for the htaccess mechanism for the static sites role to work - wasn't sure if it should be AllowOverride all [22:18:37] madhuvishy: do you already have the .htaccess file you need [22:18:43] if so just allowoverride the things it usess? [22:18:48] YuviPanda: no [22:18:51] hmmm [22:19:39] well that would imply anyone adding a htaccess file to a dashboard will have to first get a puppet patch for AllowOverride done [22:20:01] may be we should figure out a few things to turn on by default [22:23:11] madhuvishy: yeah, that's what I meant. [22:23:20] madhuvishy: figure out what things you will use in .htaccess and turn them on [22:23:26] and if someone else needs to use different things [22:23:27] YuviPanda: yeah alright [22:23:28] they can too [22:23:35] i don't want to do AllowOverride all either [22:23:46] cool [22:35:26] milimetric: around? [22:36:20] I'm having trouble starting vagrant to figure this out - but in the local setup for wikimetrics - is "wiki" the only available project? [22:38:03] hey madhuvishy [22:38:20] hey :) [22:38:24] wiki and wiki2 are used for "dev" [22:38:36] wiki_testing and wiki2_testing are used for running tests [22:38:38] where is this controlled? [22:40:43] milimetric: ^ [22:41:55] I will never know all the things this codebase has/does [23:12:33] madhuvishy: sorry! was hangin out with nathaniel working on some code [23:12:40] milimetric: np :) [23:12:42] what do you mean by "controlled"? [23:12:53] milimetric: as in configured [23:13:26] oh! um... I think it's just convention, there's nothing stopping people from making more dbs [23:13:40] for testing it's set up in fixtures, one sec [23:13:57] https://github.com/wikimedia/analytics-wikimetrics/blob/master/tests/fixtures.py#L34 [23:14:23] milimetric: yeah i saw that - was wondering where it came from for dev - may be puppet? [23:14:28] for "dev" I just created wiki and wiki2 on my local mysql and I use both of those when creating cohorts (I just paste them in there [23:14:48] uh... oh! there's that list that auto-completes, right! [23:14:50] I forgot about that [23:14:58] let's see where that comes from [23:15:34] milimetric: hmmm what happened in the vagrant setup? that's where i'm lost [23:16:06] https://www.irccloud.com/pastebin/5JTC9Imb/ [23:17:02] right, that's for testing... but for "local" mode it has wiki, wiki2, enwiki, dewiki or something, right? [23:17:06] and it goes to the client here: https://github.com/wikimedia/analytics-wikimetrics/blob/master/wikimetrics/templates/forms/csv_upload.html#L38 [23:17:29] so that comes from here: https://github.com/wikimedia/analytics-wikimetrics/blob/master/wikimetrics/controllers/cohorts.py#L231 [23:18:14] which of course is this: https://github.com/wikimedia/analytics-wikimetrics/blob/master/wikimetrics/database.py#L267 [23:18:49] which is of course just hard coded: https://github.com/wikimedia/analytics-wikimetrics/blob/31baafc6ab2b37a72e4009a2cba95a9c6b55bb3a/wikimetrics/configurables.py#L45 [23:19:26] milimetric: whaa [23:19:29] okay... [23:19:41] whaa like confusing or whaa like wtf did you do this for :) [23:19:45] so setting DEBUG=true in our staging [23:19:50] is wrong [23:19:53] because [23:19:57] you can connect to labsdb [23:20:07] it's usually not set to true [23:20:27] there are multiple debugs though, the "DB" one is usually set to false in staging [23:20:31] milimetric: i thought it just meant it will setup test dbs [23:20:36] i have no idea [23:20:53] no, test is a separate thing, it usually refers to the unit / integration test suite [23:21:06] which sets up its own project host map [23:21:09] ooh, brb, gotta go [23:21:18] milimetric: np - can chat more tomorrow [23:21:27] cya :) [23:31:04] madhuvishy: need more help with wikimetrics? [23:31:48] nuria: i'm trying to figure out what parts of fixtures to run to populate the dbs with test data [23:44:28] also nuria, https://github.com/wikimedia/analytics-wikimetrics/blob/03ecb0ee0a32b191a660288e4e770646f9ae125e/tests/fixtures.py#L654 why does setUp need to call tearDown? [23:45:07] does it delete things from before? [23:49:05] Analytics-Kanban, DBA, Patch-For-Review: Pending maintenance on the eventlogging databases (db1046, db1047, dbstore1002, other dbstores) - https://phabricator.wikimedia.org/T120187#1964165 (Ottomata) Backfilling on m4-master done. @jcrespo, feel free to start the slave resync script as soon as you get... [23:59:11] madhuvishy: back [23:59:29] madhuvishy: yes, setup assumes a 100% clean db [23:59:36] madhuvishy: thus every piece of data is scrapped [23:59:40] nuria: hmmm [23:59:55] madhuvishy: but let me understand, why are we trying to "freeze" some data?