[00:00:10] 10Analytics, 10Analytics-Kanban: Issues with page deleted dates on data lake - https://phabricator.wikimedia.org/T190434 (10Milimetric) a:03Milimetric [00:54:04] nuria: are you around? got a question about where the aggregated net-new-pages metric is stored [05:54:00] !log re-run failed mediacounts and browser-general coordinators with hive-site -> hdfs://analytics-hadoop/user/hive/hive-site.xml [05:54:02] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [05:54:53] joal: Bonjour! I guess that for a permanent fix for --^ I'll need to kill/restart the coordinators right? (I just manually fixed the parameter with Re-run) [05:55:09] (going to physioteraphy, if yes I'll try to fix them later on to learn :) [05:58:08] I have also removed log files from eventlog1002 up to today, partition full this time :( [05:58:12] now we have ~33G [05:58:31] should be fine for the next couple of hours, but not great, we need to fix it today [05:58:34] (lovely) [05:58:37] * elukey afk [07:13:29] back! [07:14:31] HaeB: going to fix the documentation today, yesterday we have been busy :) [07:14:40] Hi elukey - Kids day today - I'm restarting mediacounts-archive and browser general for them to pick the correct setting (change in UI when rerunning doesn't work) [07:15:44] thanks elukey [07:15:47] joal: can I try to do it? Should be easy enough right? [07:15:59] re-run didn't work with the correct parameters, I expected a different result [07:16:19] !log Full restart of mediacount-archive oozie job [07:16:20] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:16:29] all right will not do it :) [07:16:57] elukey: Yes, the rerun in hue is not super nice: it let you change the params but doesn't actually change the job [07:17:12] ahahha [07:17:13] nice [07:17:21] WAT?? [07:18:11] ? [07:18:29] mediacount-archvie restart failed with XML-error :( [07:19:34] (03PS4) 10Joal: Correct oozie jobs after move to an-coord1001 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/465422 [07:19:47] !log patch mediacount-archive job in prod [07:19:49] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:23:19] !log Full restart of browser-general oozie job [07:23:20] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:23:27] !log add ipv6 mapped addresses (and DNS PTRs) to analytics-tools* [07:23:28] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:24:36] elukey: I think we're good on jobs (until next failure ...) [07:24:46] back to kids :) [07:25:00] joal: yeah, leave the next ones to me so I'll practice a bit :) [07:25:15] ttl! [07:55:04] 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: rack/setup/install stat1007.eqiad.wmnet (stat1005 user replacement) - https://phabricator.wikimedia.org/T203852 (10elukey) [08:02:01] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: eventlogging logs taking a huge amount of space on eventlog1002 and stat1005 - https://phabricator.wikimedia.org/T206542 (10elukey) a:03elukey [08:14:30] 10Analytics, 10Analytics-Kanban, 10Operations, 10Traffic, and 2 others: Add Accept header to webrequest logs - https://phabricator.wikimedia.org/T170606 (10JAllemandou) @Ottomata - After standup please, as today is kids-day for me :) [08:38:27] Wikimedia Analytics Team public chat - Type 'a-team' to ping the team members - Channel logs available at https://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-analytics - Kanban https://phabricator.wikimedia.org/tag/analytics-kanban/ [08:38:31] nope [08:48:16] 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: rack/setup/install stat1007.eqiad.wmnet (stat1005 user replacement) - https://phabricator.wikimedia.org/T203852 (10elukey) @Cmjohnson quick question (might be wrong since I am a n00b with Juniper): is stat1007 in the analytics VLAN? ``` elukey@asw2-b-... [09:03:11] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Q1 2018/19 Analytics procurement - https://phabricator.wikimedia.org/T198694 (10elukey) [09:03:56] - Error message: FS010: chmod, path [hdfs://analytics-hadoop/wmf/data/archive/mediacounts/daily/2018] invalid permissions mask [0775] [09:03:59] what? [09:04:01] :D [09:09:39] so our dear oozie seems to like 3 digits [09:17:29] (03PS1) 10Elukey: oozie mediacounts: fix chmod command [analytics/refinery] - 10https://gerrit.wikimedia.org/r/465579 [09:18:49] joal: left --^ for you :) [09:30:13] I have to go in a bit to do an extra round of physiotheraphy, will take an early lunch :) [11:28:52] back! [11:51:42] mforns: are yall doing alright over there? just learned about the current floods in mallorca :/ [11:52:54] same question :( [12:06:03] I am restarting a job and making some spam, sorry people [12:06:07] luca == n00b [12:11:42] > Heya elukey [12:12:19] How have you restarted the job for it to be changed? [12:13:59] joal: I am trying to de-n00bify me, I did a little hack that I didn't want to tell you :D [12:14:08] :D [12:14:13] I uploaded the new file to hdfs and restarted the coordinator [12:14:47] elukey: should work - However there was a previous patch to that file (see https://gerrit.wikimedia.org/r/465422 [12:14:51] elukey: should work - However there was a previous patch to that file (see https://gerrit.wikimedia.org/r/465422) [12:14:54] oops sorry [12:15:09] then, it depends where you uploaded the file :) [12:15:14] I grabbed the content from hdfs and re-uploaded [12:15:19] I just did the exact same [12:15:22] hm [12:15:27] has it worked? [12:15:47] it is running now afaict [12:15:52] Ah great :) [12:15:53] but it has not reached the chmod part [12:16:00] ok we'll see - Nice :) [12:16:06] \o/ [12:16:25] I am trying to do what one Joseph process does during the day [12:16:32] elukey: do you mind if I move our patch to the biggest one with al the corrections? [12:16:35] not pretending to do what all the other cores do [12:16:42] Single patch without rebase [12:16:46] sure [12:16:51] :D [12:17:13] So far most of the cores are resting from half a day with two kids ;) [12:17:52] well this is p99.99 [12:18:22] with a very high 15min load indeed - Not even talking about I/O overload [12:18:33] :) [12:18:59] (03PS5) 10Joal: Correct oozie jobs after move to an-coord1001 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/465422 [12:19:17] ahhahaha [12:19:56] (03Abandoned) 10Joal: oozie mediacounts: fix chmod command [analytics/refinery] - 10https://gerrit.wikimedia.org/r/465579 (owner: 10Elukey) [12:37:51] (03Abandoned) 10Fdans: Allow whole metric areas to be collapsed [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/464800 (https://phabricator.wikimedia.org/T206311) (owner: 10Fdans) [12:38:48] 10Analytics, 10Analytics-Kanban, 10Analytics-Wikistats, 10Patch-For-Review: Make area metrics collapsible - https://phabricator.wikimedia.org/T206311 (10fdans) 05Open>03declined [13:36:20] (03CR) 10Ottomata: [C: 031] Refactor EventLoggingToDruid to use whitelists and ConfigHelper [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/465532 (https://phabricator.wikimedia.org/T206342) (owner: 10Mforns) [13:55:05] FYI mediawiki switchover back to eqiad happening in a bit [14:23:33] fdans, yea, we're fine, the flood was on the other side of the island... [14:23:41] has been tough though [14:23:49] thx for asking :] [14:24:13] mforns: me alegro mucho de que estéis bien :) [14:24:20] :) [14:52:32] hey ottomata :], where should the config properties file go for the EL2Druid job? Where in puppet? Couldn't find any other example... [14:53:23] (03PS2) 10Mforns: Refactor EventLoggingToDruid to use whitelists and ConfigHelper [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/465532 (https://phabricator.wikimedia.org/T206342) [14:54:49] hm [14:54:57] oh mforns [14:55:05] example would be in refine_job.pp [14:55:10] that is just an abstraction around spark_job.pp [14:55:19] aha [14:55:32] probably in /etc/refinery/ ... something? [14:55:33] hm [14:55:57] what did you call the puppet class? I forget/ druid_load? [14:56:02] /etc/refinery/druid_load ? [14:56:07] ottomata, but is the config file going to be managed by puppet? [14:56:10] yes [14:56:35] and where in puppet should I put it? [14:56:41] mforns: it uses rofile::analytics::refinery::job::config define [14:56:44] to render the properties file [14:56:53] ah! there's the template [14:56:54] mforns: in druid_load? [14:57:00] aaah [14:57:15] so, the parameters still go in the spark_job? [14:57:27] spark_job takes two types of params [14:57:34] I seeeeee [14:57:41] spark_opts (which are for spark itself) and then your job's argv [14:57:42] job_opts [14:57:45] you do [14:58:00] job_opts => "--config_file /path/to/properties.properties" [14:58:06] and it will load what was rendered by rofile::analytics::refinery::job::config [14:58:29] ok [14:58:32] will try [14:58:57] thanks! [15:06:56] ottomata: o/ - do you think that I can proceed with the rsync changes for stat1005 and eventlog1002? [15:07:11] err sorry, data retention policies [15:07:13] rsync changes? oh the log retention? [15:07:14] for sure! [15:07:15] not really rsync [15:07:16] proceed! [15:07:19] super :) [15:14:20] /msg fdans friendly remiander that it is your turn to go to SoS [15:14:30] sorry [15:19:26] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: eventlogging logs taking a huge amount of space on eventlog1002 and stat1005 - https://phabricator.wikimedia.org/T206542 (10elukey) Both changes merged, the space consumption should go down on both eventlog1002 and stat1005 after the next logrotate run. Ke... [15:28:31] ottomata, I was thinking of adding a druid_load_job.pp, and then use it from druid_load.pp, does it make sense? [15:28:50] otherwise there will be some repetition in druid_load.pp... [15:32:16] mforns: i think it makes sense, but only reason i'd hesitate is that you only have one EventLogging2Druid (right? or are there many?) and druid_load_job would eventually be for something more generic than just EventLogging? [15:32:24] but, it probably makes sense for now [15:37:13] ottomata, yes, for now just EventLoggingToDruid [15:37:55] ottomata, so should it be el2druid_job.pp then? [15:41:50] eventlogging_to_druid_job.pp [15:41:58] ? [15:44:28] (sorry in meeting) [15:48:08] np :] [15:50:24] oh a-team i will miss standup! better use of data meeting it today [15:50:34] k! [15:51:50] mforns: do you have more than one EventLoggingToDruid to run? [15:51:58] ottomata, yes 3 [15:52:05] oh [15:52:06] hm [15:52:20] ok mforns el2druid_job.pp, with usages in druid_load.pp ? [15:52:40] yes, if it's ok to you [15:52:56] or: eventlogging_to_druid_job.pp [15:56:34] ok [15:56:52] 10Analytics, 10Analytics-Cluster: Update to CDH 6 or other up-to-date Hadoop distribution - https://phabricator.wikimedia.org/T203693 (10elukey) [15:57:29] 10Analytics, 10Analytics-Cluster: Update to CDH 6 or other up-to-date Hadoop distribution - https://phabricator.wikimedia.org/T203693 (10elukey) ``` As of 6.0 we (Cloudera) no longer support/build on debian: https://www.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_deprecated_items.html#conce... [15:57:48] 10Analytics, 10Analytics-Cluster, 10User-Elukey: Update to CDH 6 or other up-to-date Hadoop distribution - https://phabricator.wikimedia.org/T203693 (10elukey) [16:00:59] ping joal , ottomata , elukey [16:02:30] 10Analytics, 10User-Elukey: Return to real time banner impressions in Druid - https://phabricator.wikimedia.org/T203669 (10AndyRussG) @elukey so just to confirm, it's fine to go ahead and use the `eventlogging_CentralNoticeImpression` stream for this. Thanks so much again and apologies again for the delays in... [16:23:20] (03PS6) 10Joal: Correct oozie jobs after move to an-coord1001 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/465422 [16:23:32] fdans: This is the patch to deploy --^ [16:24:13] fdans: All of that has already been fixed in prod, but we need the available code to reflect (so that next restart doesn't fail for wrong reasons) [16:25:43] thank youuu joal [16:25:51] np fdans :) [16:27:32] fdans: AFAIK, no need to deploy refinery-source - let's confirm with the team [16:27:52] yeah I was looking at that and I think not :) [16:56:07] (03PS1) 10Framawiki: Store and show query execution time [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/465659 (https://phabricator.wikimedia.org/T126888) [16:58:39] 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: rack/setup/install stat1007.eqiad.wmnet (stat1005 user replacement) - https://phabricator.wikimedia.org/T203852 (10elukey) [16:58:57] 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: rack/setup/install stat1007.eqiad.wmnet (stat1005 user replacement) - https://phabricator.wikimedia.org/T203852 (10elukey) [17:02:40] (03CR) 10Zhuyifei1999: "Patch LGTM. The template needs to be compiled." [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/465659 (https://phabricator.wikimedia.org/T126888) (owner: 10Framawiki) [17:04:41] (03CR) 10Zhuyifei1999: Store and show query execution time (032 comments) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/465659 (https://phabricator.wikimedia.org/T126888) (owner: 10Framawiki) [17:19:15] * elukey off! [17:24:09] milimetric: wait, my change to EL is merged and deployed, i was wrong about thta [17:24:11] *that [17:24:13] sorry [17:24:30] 10Analytics-EventLogging, 10Analytics-Kanban, 10MW-1.32-notes (WMF-deploy-2018-10-02 (1.32.0-wmf.24)), 10Patch-For-Review: Make sampling by session more obvious in eventlogging module - https://phabricator.wikimedia.org/T203612 (10Nuria) 05Open>03Resolved [17:24:47] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Replace the Analytics Hadoop coordinator - Hive/Oozie/etc... (hardware refresh) - https://phabricator.wikimedia.org/T205509 (10Nuria) 05Open>03Resolved [17:25:03] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Deprecation Information for EventLogging ResourceLoader modules - https://phabricator.wikimedia.org/T205744 (10Nuria) 05Open>03Resolved [17:25:09] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10MW-1.32-notes (WMF-deploy-2018-10-16 (1.32.0-wmf.26)), and 2 others: Spin out a tiny EventLogging RL module for lightweight logging - https://phabricator.wikimedia.org/T187207 (10Nuria) [17:27:15] 10Analytics, 10Analytics-Kanban: Top metrics error with blank page for closed wikis - https://phabricator.wikimedia.org/T206682 (10Nuria) [17:28:21] 10Analytics, 10Analytics-Kanban: Top metrics error with blank page for closed wikis - https://phabricator.wikimedia.org/T206682 (10Nuria) 05Open>03declined [17:31:58] nuria: meeting? [17:32:10] joal: coming, had to change rooms [17:32:12] oo joal when you are ready, let's do accept field ya? [17:32:45] ottomata: sure in 1/2 hour [17:34:25] nuria: that’s good about the EL change. The first phase of the tiny EL module is merged too, will test that in beta [17:50:11] ottomata, is there a reason why spark_job crons do not specify environment MAILTO ? Can I add that? EventLoggingToDruid code does not use refine, thus does not send emails.. [17:51:51] 10Analytics: AQS: category information per project (ex: wiki-project-medicine) available in API - https://phabricator.wikimedia.org/T206686 (10Nuria) [17:54:41] 10Analytics: AQS: category information per project (ex: wiki-project-medicine) available in API - https://phabricator.wikimedia.org/T206686 (10Nuria) [17:57:00] ottomata: ready :) [17:59:59] joal: ok! [18:00:19] ottomata: IRC or cave? [18:00:27] let's irc for now ya? [18:00:36] sure [18:00:43] ok so step one: alter tables [18:00:44] yes? [18:00:46] ottomata: let;s move fast on killing webrequest- hour just finished :) [18:00:51] ok! [18:01:01] we can stop it now ya? [18:01:49] ottomata: let's wait a minute - hour 16 is almost finished [18:01:54] oh ok [18:02:13] let's wait until it's done and then kill before too much of next hour is started [18:02:44] joal the alters will be: [18:02:44] ALTER TABLE wmf_raw.webrequest ADD COLUMNS (`accept` string COMMENT 'Accept header of request'); [18:02:44] ALTER TABLE wmf.webrequest ADD COLUMNS (`accept` string COMMENT 'Accept header of request'); [18:02:45] ya? [18:03:00] looks correct :) [18:03:00] also, ok if i merge? [18:03:01] https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/464574/ [18:03:28] joal: ^ [18:03:35] (03PS3) 10Ottomata: Add accept header to webrequest raw and refined tables [analytics/refinery] - 10https://gerrit.wikimedia.org/r/464574 (https://phabricator.wikimedia.org/T170606) [18:03:39] was reading one last time - Merge please ! [18:03:43] k! [18:03:47] (03CR) 10Ottomata: [V: 032 C: 032] Add accept header to webrequest raw and refined tables [analytics/refinery] - 10https://gerrit.wikimedia.org/r/464574 (https://phabricator.wikimedia.org/T170606) (owner: 10Ottomata) [18:03:52] fdans: How are we on deploy? tomorrow? [18:04:27] !log Kill webrequest-load-coord-upload [18:04:28] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:04:41] why kill the coord? [18:04:47] instead of just stop the bundle? [18:04:57] gonna deploy refinery (not to hdfs yet) [18:04:57] ottomata: Good timing :) We need to be patient for webrequest_text, still some work to do [18:05:01] hm ok [18:05:25] ottomata: if you want to deploy the cluster, let;s merg https://gerrit.wikimedia.org/r/c/analytics/refinery/+/465422 first [18:05:30] oh [18:05:38] OGH YES [18:05:39] let's do it [18:06:00] (03CR) 10Ottomata: [C: 031] Correct oozie jobs after move to an-coord1001 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/465422 (owner: 10Joal) [18:06:03] ottomata: nothing to do, prod already updated, but good to have code deployed [18:06:08] ahhh ok [18:06:10] joal ottomata sorry, had to run for a couple errands, was planning to do it a bit later [18:06:32] np fdans :) ottomata's on it [18:06:48] (03CR) 10Ottomata: [V: 032 C: 032] Correct oozie jobs after move to an-coord1001 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/465422 (owner: 10Joal) [18:06:50] (03CR) 10Joal: [V: 032 C: 032] "Merge before deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/465422 (owner: 10Joal) [18:06:53] Ahaha [18:06:56] hehe [18:06:57] :) [18:07:31] This patch has a double [V: 2 C: 2] - This is my first time :D [18:07:53] omg joal how dare you!!! :D [18:17:31] ottomata: how are you on dpeloyment? [18:17:35] scap doing it.. [18:17:39] sloowwlyyy [18:17:50] ottomata: hour 16 almost done for text (manually killed hour 17) [18:17:55] coo [18:18:08] ottomata: we NEED to clean the artifact folder. [18:18:20] ottomata: This is where scap spends its time [18:18:38] Every time it downloads all history of all our jars [18:18:59] ya :/ [18:19:57] ottomata: no task for that it seems - creating one [18:20:51] ok done [18:21:14] joal: ok for me to deploy to hdfs? [18:21:24] +1 ottomata [18:21:28] no impoact [18:23:46] !log kill Webrequest-load bundle [18:23:47] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:23:50] ready ottomata [18:23:58] gr8 [18:24:00] we can alter ya? [18:24:08] joal: ? [18:24:16] ottomata: give me 1sec [18:24:18] k [18:24:53] ottomata: jobs are following after wr-text hour 16 just finished - Let's wait please, so as not to break them [18:24:59] ottomata: sorry for that :( [18:25:17] ok [18:25:20] sure [18:30:00] ottomata: good to process [18:30:02] Thanks :) [18:30:06] ALTER ! [18:31:29] ok! [18:31:36] doing raw [18:33:18] 10Analytics-Kanban, 10Product-Analytics, 10Reading-analysis: Final Vetting of Family Wide unique devices data - https://phabricator.wikimedia.org/T169550 (10Tbayer) >>! In T169550#4623614, @Nuria wrote: > ping @tbayer, do you think you could get to this task in the next month? (Also discussed this in person... [18:34:41] getting a weird error when i try [18:34:44] not sure why yet [18:34:49] at least one column must be specified for the table [18:34:57] when running ALTER TABLE wmf_raw.webrequest ADD COLUMNS (`accept` string COMMENT 'Accept header of request'); [18:36:11] oh [18:36:15] wmf_raw.webrequet is....? [18:36:16] empty? [18:36:39] mforns: let me know if you want to go to bc [18:36:48] joal bc? [18:37:17] mforns: batcave occupied, i guess it is going to be bc 2 [18:37:26] nuria, sure, gimme 3 mins [18:38:48] mforns: no rush, you ping me when you are available [18:38:59] ottomata: joining [18:39:05] nuria, omw [18:39:15] a-batcave-2 [18:43:42] joal: when you're done with that, I think I need some help with managing memory for Hive... [18:46:30] Hi! Anyone ever get incorrect labels on the x axis in tornillo? https://bit.ly/2NzT6OW [18:47:31] Try hovering over any of the lines, you'll see the correct time in the pop-up next to the mouse ^ [18:47:37] hi AndyRussG it looks like the axis is time zone specific [18:48:07] (just a guess) [18:48:11] milimetric: hi! ah hmmmmm interesting [18:48:39] There's a little config button to set the time zone, mine is currently set to URC [18:48:41] UTC [18:50:00] milimetric, hey wanna discuss wikistats router-links? [18:50:09] yeah, omw mforns [18:50:13] a-batcave-2 [18:50:16] mforns: bc 2 [18:50:30] Though indeed the labels do correspond to what my local time was at that time in UTC [18:50:45] I guess I should file a ticket? I don't see one already in Fab [19:21:50] 10Analytics: turnilo x axis improperly labeled - https://phabricator.wikimedia.org/T197276 (10Nuria) 05declined>03Open [19:22:05] AndyRussG: we alredy have a ticket to update turnilo when that bug is fixed, it had been wrongly closed: https://phabricator.wikimedia.org/T197276 [19:22:24] 10Analytics: turnilo x axis improperly labeled - https://phabricator.wikimedia.org/T197276 (10Nuria) a:03Ottomata [19:22:39] 10Analytics, 10Analytics-Kanban: turnilo x axis improperly labeled - https://phabricator.wikimedia.org/T197276 (10Nuria) [19:23:07] nuria: ah cool, thx!!! :) [19:25:48] !lof Restart webrequest-load oozie bundle [19:26:09] (03PS1) 10Ottomata: Fixes to webrequest create table LOCATIONS and get-add-webrequest-partition-statements [analytics/refinery] - 10https://gerrit.wikimedia.org/r/465687 [19:27:01] (03CR) 10Ottomata: [V: 032 C: 032] Fixes to webrequest create table LOCATIONS and get-add-webrequest-partition-statements [analytics/refinery] - 10https://gerrit.wikimedia.org/r/465687 (owner: 10Ottomata) [19:27:17] (03CR) 10Joal: [C: 031] "Nice :) Thanks for that" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/465687 (owner: 10Ottomata) [19:27:22] !log Restart webrequest-load oozie bundle [19:27:23] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:28:15] Hey milimetric - What about your hive errors? [19:28:33] joal: I did SET mapreduce.map.memory.mb=8096; [19:28:43] and I'm scared again... because I did other similar stuff and it failed [19:28:45] but it's running now [19:28:53] maybe I should also do SET mapreduce.reduce.memory.mb=8096; ? [19:29:16] any other advice/better ways of doing it? [19:29:28] milimetric: depends on the error you get I assume :) If it happens map-side or reduce-side [19:29:51] milimetric: do you want to try spark-sql? [19:29:54] it happened in both places, seemingly *after* I increased the memory, so I was trying to figure out what the default was [19:29:57] maybe I decreased it.. [19:30:14] spark-sql has no memory problems? [19:30:26] spark-sql has plenty problems :) [19:30:37] but I find it easier to configure [19:30:42] then no, I got 99 problems and I don't want spark to be one [19:30:42] (and should be faster than hive [19:30:53] Makes sense :) [19:30:53] oooh... faster... [19:31:12] hm... when can we get presto? :) [19:31:35] : [19:31:37] ) [19:31:49] oh, what about the superset SQL console thing, does that have more memory? [19:31:56] how would we handle it if I wanted to run a bigger query there? [19:32:13] nope milimetric [19:32:25] ok [19:32:34] milimetric: spark2-sql --master yarn --executor-memory 8G --executor-cores 2 --driver-memory 4G --conf spark.dynamicAllocation.maxExecutors=128 [19:32:40] milimetric: can you paste your query? [19:32:48] ok, cool, I'll try that if it keeps failing [19:32:49] here: [19:32:57] https://www.irccloud.com/pastebin/25EnggM8/ [19:35:42] milimetric: does the 'order by id' has any importance here? [19:36:08] Actually yes, it has ! [19:36:12] right? [19:36:35] I actually don't know for sure [19:36:45] because collect_list is less awesome than group_concat [19:37:04] where group_concat allows me to specify the order inline, collect_list I think hopefully uses the order of the rows it goes over [19:37:36] I'm shocked that hive is worse at something than mysql [19:37:50] hm - I think it would be better to group by id, start_timestamp --> This will order by id and start_timestamp , doing it in parallel - Simple order is not parallelized [19:38:57] I'd have to group by all three, caused_... as well [19:39:03] but I can try that if this fails [19:40:27] https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy [19:40:31] milimetric: --^ [19:42:18] oh cool: For Hive 0.11.0 through 2.1.x, set hive.groupby.orderby.position.alias to true (the default is false). [19:47:06] And actually milimetric - the sort doesn't work in the query - Since you group by in the next query, the sorting is messed up when grouping [19:47:32] so is there no way to do this then?... [19:47:49] * joal thinks hard [19:48:39] milimetric: without sorting, request done in less than a minute by spark - Trying to find a way to sort [19:49:56] oh wow, that's a lot faster [19:50:34] it finished this time in Hive too, results appear sorted mostly properly [19:50:51] milimetric: solution is actually to use windowing [19:51:13] oh, right, duh [19:51:15] thanks [19:51:21] but if spark is faster, I'll definitely try that [19:53:12] also milimetric - the lifecycles are unreadable :) [19:53:35] joal: only a few, most are very interesting [19:53:57] I just have to window it to make sure they make sense and are sorted properly and I'll post results [19:54:17] but yeah, some of them have like 100000000000 steps :) [19:55:56] I have the query - https://gist.github.com/jobar/e5f90b61e708e755d986bfbba6772f17 [19:56:24] milimetric: -^ [19:57:20] thx, yeah, was just doing the same, that should work right? [19:59:04] works in spark-sql (but takes some time) [20:23:28] ottomata: webrequest-upload successfull - I'm gonna call it a success :) [20:26:47] mforns: loook at the cool awesome thing I just learned: https://www.mediawiki.org/wiki/API:Query#Getting_a_list_of_page_IDs [20:26:59] lookin! [20:27:22] joal cool accept works on refined table? [20:28:00] didn't try - refine succeeded, assumed it work :) Trying now [20:28:10] ottomata: --0^ [20:28:49] milimetric, what would you use it with? [20:28:58] like the mediawiki-storage [20:29:10] ah! for config pages>? [20:29:21] https://github.com/wikimedia/analytics-mediawiki-storage/blob/master/src/mediawiki-storage.js#L126 [20:29:52] aha [20:29:55] cool [20:29:58] ottomata: confirmed :) [20:29:58] yeah, especially because you can get multiple pages per request [20:31:57] ok gone for tonight team - see ou! [20:33:23] great! [20:46:32] hm, spark-sql just spinning in place it looks like... /me gonna run back to hive [20:51:32] 10Analytics, 10EventBus, 10Services (next), 10Wikimedia-production-error: Timeouts on event delivery to EventBus - https://phabricator.wikimedia.org/T180017 (10Krinkle) [20:53:06] 10Analytics, 10Analytics-Kanban, 10Page-Issue-Warnings, 10Product-Analytics, and 3 others: Ingest data from PageIssues EventLogging schema into Druid - https://phabricator.wikimedia.org/T202751 (10Nuria) >BTW, I understand we are focusing on use in Turnilo for now, but out of curiosity (and considering the... [21:09:56] milimetric: your suggestion with the frozen stuff was SO Good.... [21:10:55] yay, I did something right :) [21:11:29] milimetric: OF COURSE [21:45:07] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10Readers-Web-Backlog (Tracking): Ingest data into druid for readingDepth schema - https://phabricator.wikimedia.org/T205562 (10ovasileva) [22:05:46] [Fun] Not sure if anyone saw this, but I thought that it might be of interest (the links are also interesting reading on the potential future of academic data analysis/publishing): https://qz.com/1417145/economics-nobel-laureate-paul-romer-is-a-python-programming-convert/ [22:16:19] (03PS4) 10Nuria: Time dimension should be reseted to "1-Month" for top metrics [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/465296 (https://phabricator.wikimedia.org/T206479) [22:17:30] marlier: nice [22:18:13] milimetric: please take a look https://gerrit.wikimedia.org/r/#/c/analytics/wikistats2/+/465296/ [22:19:17] milimetric: I found yet another bug on top metric urls, this is a new one due to splits (cc fdans ) but i think the main url issue due to time might be fixed doing it the way you suggested [22:19:47] (03CR) 10jerkins-bot: [V: 04-1] Time dimension should be reseted to "1-Month" for top metrics [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/465296 (https://phabricator.wikimedia.org/T206479) (owner: 10Nuria) [22:19:55] cool, I’ll take a look after dinner [22:27:46] 10Quarry, 10Documentation: admin docs: quarry - https://phabricator.wikimedia.org/T206710 (10Aklapper) [23:10:01] (03PS5) 10Nuria: Time dimension should be reseted to "1-Month" for top metrics [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/465296 (https://phabricator.wikimedia.org/T206479)