[00:26:43] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-extensions-NavigationTiming, 10Operations, and 2 others: Increase maxUrlSize from 1000 to 1500 - https://phabricator.wikimedia.org/T112002 (10Krinkle) [00:57:02] 10Analytics, 10Analytics-EventLogging, 10Operations, 10Performance-Team, 10Traffic: Increase EventLogging limit from 2K to 4K - https://phabricator.wikimedia.org/T208282 (10Krinkle) [00:58:21] 10Analytics, 10Analytics-EventLogging, 10Operations, 10Performance-Team, 10Traffic: Increase EventLogging limit from 2K to 5K - https://phabricator.wikimedia.org/T208282 (10Krinkle) [04:42:03] groceryheist: I think you need to optimize the query but there are different ways to do that and one of them is running it in stages calculating intermediate results for just one wiki [04:42:57] groceryheist: let's first try to understand what is what you are trying to do [04:47:17] 10Analytics, 10Analytics-Kanban: Clickstream dataset for Persian Wikipedia only includes external values - https://phabricator.wikimedia.org/T191964 (10Nuria) Sorry progress is so slow, issue is one of encoding of urls versus page titles so there is no match and thus it seems no internal hits are happening for... [05:09:24] nuria: the specific problem is that I want to solve is to see how dwell time (from the reading events) varies by page length (from the history). It turns out that the reading depth instrumentation didn't record revision ids, so finding out the length of the page /that the user read/ requires joining /by page/ and then finding the minimum positive time distance between the reading event [05:09:30] and the revisions. [05:12:45] I've thought of 2 options: 1. build an intermediate subquery / view that has only the revisions that were made during a narrower window of time (i.e. 1 month, at most 1 year) and for the pages that were not edited during that period have the last revision. [05:14:22] and then join that table with the views on page. The downside is that we still have to join on page but at least we'll only have to deal with a fraction of the revisions. [05:18:33] 2. build an intermediate table of page_title | time_block | page_length where we have a row for every time_block, and we join on page_title, time_block. [05:26:26] the downside there is that the intermediate table could be pretty huge. [05:27:20] and also there will be some error, since we won't have the exact revision that was viewed, just a revision from around the same time. [07:00:28] RECOVERY - Check the last execution of check_webrequest_partitions on an-coord1001 is OK: OK: Status of the systemd unit check_webrequest_partitions [07:06:54] manually started it --^ [08:15:24] Hi elukey [08:15:44] elukey: any issue with me deploying on the cluster? [08:15:45] bon jour! [08:15:55] +2 [08:18:11] Ack! Starting then :) [08:18:38] After that I'll spend some time helping groceryheist - two cluster-stale in a day is too much :) [08:19:39] two? [08:35:36] joal: I'm in PST and going to sleep. I'd appriciate help in about 8 hours :) [08:39:26] Hi groceryheist: I have some homework to do for you, so help in 8 hours is actually greqt :) [08:39:35] See you later groceryheist [08:40:10] elukey: sorry had not seen your answer: one yesterday morning that we solved, one yesterday evening that Andrew and Nuria solved [08:41:22] ahhh [08:42:32] (03CR) 10Joal: [V: 032 C: 032] "Merging for deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/467700 (owner: 10Milimetric) [08:43:55] 10Analytics-Kanban: Update alert email address in oozie mediawiki-load job - https://phabricator.wikimedia.org/T208294 (10JAllemandou) [08:44:38] 10Analytics-Kanban: Update alert email address in oozie mediawiki-load job - https://phabricator.wikimedia.org/T208294 (10JAllemandou) a:03Milimetric [08:52:53] (03CR) 10Joal: [C: 04-1] "One thing left: We miss the mw_change_tag_table_partitioned dataset definition (datasets_raw;xml file, at the bottom of file). This datase" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/465416 (owner: 10Fdans) [08:53:19] Hey fdans - If you're around, I'm happy to wait for the last bit of changes on --^ before deploying [08:53:30] fdans: I'm sorry I didn't get to that earlier :( [08:53:40] actually, I can do them mylself if you prefer [09:19:48] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Update to cloudera 5.15 - https://phabricator.wikimedia.org/T204759 (10elukey) List of changes (5.11 -> 5.15.1): https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_fixed.html The most interesting ones are: ``` Apache Hive Code Chan... [09:21:16] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Update to cloudera 5.15 - https://phabricator.wikimedia.org/T204759 (10elukey) [09:30:51] joal: sorry, just logging in! [09:34:36] np fdans - there is no rush for deploy, so I waited for you :) [09:34:49] fdans: Shall we implement the thing I mention? [09:37:29] (03PS9) 10Fdans: Add change_tag to mediawiki_history sqoop [analytics/refinery] - 10https://gerrit.wikimedia.org/r/465416 [09:39:27] joal: thank you for your patience! I just applied the changes you mentioned [09:40:19] fdans: no worries at all - I didn't review your patch for long either - Thanks for your patience as well :) [09:40:56] Last thing fdans - Have you tested somehow? [09:41:10] if you've tested on your own DB etc, let's mege :) [09:41:58] joal: I tested the query and the sqooping, but not the coordinator [09:42:28] hm - Could be good to at least do a dry-run [09:42:36] You knpow how to? [09:43:34] joal: hmmm not sure how to dry run this :) [09:45:32] fdans: You download the patch on a stat machine, copy the modified files to your own oozie folder (don't forget the datasets file), and launch a oozie command setting oozie_directory to your own folder, with the dry-run command instead of run [09:46:17] fdans: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Oozie#Running_a_real_oozie_example [09:47:06] joal: oh, yes of course, so I guess not having the change_Tag table yet won't be a problem since it's a dry run? [09:47:31] fdans: I think so yes [09:47:51] cooool I'll give that a try [09:48:03] fdans: dry-run will try to generate the parameters for some workflows [09:48:25] fdans: a successfull dry-run normally makes your eyes bleed out of XML :) [09:49:09] joal: that's my favorite kind of eye bleeding [09:57:57] joal: I can't remember, do I have to put my refinery folder in hdfs? [09:58:26] fdans: the oozie one yes [09:58:49] fdans: oozie looks for file in $oozie_directory, which should be in hdfs [09:59:14] Abnd don't forget when testing to set -Doozie_directory=... to your own oozie folder [10:08:45] joal: I found something interesting [10:08:54] yes? [10:08:55] about the yarn freeze [10:09:01] Ahh! [10:09:11] I am running sar -d -p 2 in a loop since yesterday [10:09:17] and today I got this [10:09:23] elukey@an-master1001:~$ sudo grep -rni "org.apache.hadoop.util.JvmPauseMonitor" /var/log/hadoop-yarn/yarn-yarn-resourcemanager-an-master1001.log.2 -A 1 [10:09:25] 956123:2018-10-30 01:37:39,872 INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 1894ms [10:09:28] 956124-No GCs detected [10:09:31] it didn't cause any issue [10:09:44] but I went into the sar logs to see if anything was weird [10:09:54] lemme know if you see anything strange in the following [10:10:49] https://phabricator.wikimedia.org/P7737 [10:11:43] I have no idea what that is [10:11:55] but it is definitely I/O the cause of the stall [10:13:09] those peaks saturates the disk I/O probably [10:13:14] and yarn stalls [10:13:27] so I am not sure yet if it is yarn itself causing the mess [10:13:30] Mwarf [10:13:30] or something else [10:13:42] super interesting [10:14:03] what are the mean to find the wr_sec/s culprit? [10:14:30] And thanks for treaching how to check raw-metrics when something goes wrong! [10:15:20] so wr_sec is spans from ~50 to ~700 tops, but the avg is more close to say 100/150 [10:15:31] those values are absolutely out of the ordinary [10:17:40] ok probably with pidstat we'll be able to get the pid causing the trouble [10:22:00] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: JVM pauses cause Yarn master to failover - https://phabricator.wikimedia.org/T206943 (10elukey) Since yesterday I have been running `sar -d -p2` on an-master1001, and this is interesting: ``` elukey@an-master1001:~$ sudo grep -rni "org.a... [10:31:59] Andrew suggested to check if it was the current snapshotting of hdfs causing the trouble [10:32:15] but checking in /var/lib/hadoop I don't see timings related to the stall [10:32:27] now I am really curious about what process does this mess :D [10:34:04] joal: successfuly obtained a slurry of xml :) [10:34:54] \o/ fdans ! [10:35:33] fdans: There still could be an error in workflow (dry-run doesn't check that), but I'm happy merging as is :) [10:35:41] fdans: merging and deploying ! [10:36:35] (03CR) 10Joal: [V: 032 C: 032] "Merging for deploy. Thanks fdans :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/465416 (owner: 10Fdans) [10:37:21] fdans: Iùm assuming we need a patch in puppet to add the table to sqoop and drop ? [10:38:37] joal: yesss I already got it, will push it in a bit [10:38:44] great [10:39:12] fdans: I moved your task in ready to deploy on the kanban, as a reminder of the stuff I have to double check once done :) [10:39:15] (joal: I’m testing this from an immigration office in Madrid, waiting for our number to be called) [10:39:37] I love the idea that wikimedia-code run from an immagration office :) [10:39:57] we run EVERYWHERE :) [11:01:51] mwarf - jenkins is not happy :( [11:32:22] groceryheist: For when you're online - I have generated your data using this query in Spark-SQL - https://gist.github.com/jobar/e5d54cdf40f5ec6ce63e2251c4d3fda9https://gist.github.com/jobar/e5d54cdf40f5ec6ce63e2251c4d3fda9 [11:32:32] Arf sorry for the double paste [11:32:34] https://gist.github.com/jobar/e5d54cdf40f5ec6ce63e2251c4d3fda9 [11:33:03] Let's discuss when you come online about the approach and all [12:26:05] heya team :] [12:31:14] Hi mforns [12:31:20] :] [12:36:08] Team - Deploy is delayed by an issue with jenkins [12:36:24] I'm working in ops chan on this with hashar [12:44:41] ok [13:05:44] joal: would you say that jenkins is being a jerkins? [13:06:44] 10Analytics, 10Project-Admins: Create project for SWAP - https://phabricator.wikimedia.org/T207425 (10Neil_P._Quinn_WMF) >>! In T207425#4692576, @Milimetric wrote: >@Neil_P._Quinn_WMF, would a subproject work for you in this context? Absolutely! 😁 [13:09:06] * joal thinks hard of something funny to answer fdans ... [13:09:25] :D [13:24:00] (03CR) 10Fdans: "@Nuria it's here" [analytics/aqs] - 10https://gerrit.wikimedia.org/r/468927 (https://phabricator.wikimedia.org/T206968) (owner: 10Fdans) [13:48:23] elukey: Heta - Couldn't find any trace of T178832 in logstash - Did I do it wrong? [13:48:23] T178832: Investigate AQS cassandra schema hash warninga - https://phabricator.wikimedia.org/T178832 [13:49:30] Ah- found some actually [13:49:34] ok :) [14:10:16] (03CR) 10Fdans: [V: 032] Replace literal "anonymous editor" with null [analytics/aqs] - 10https://gerrit.wikimedia.org/r/468927 (https://phabricator.wikimedia.org/T206968) (owner: 10Fdans) [14:21:43] FYI I am upgrading the labs cluster to cdh 5.15 [14:21:53] so expect some turbolence in there :) [14:22:44] ack ! [14:28:39] 10Analytics: Investigate AQS cassandra schema hash warninga - https://phabricator.wikimedia.org/T178832 (10JAllemandou) Here are my findings: - The warning message happens for 2 schemas of AQS: `pageviews.per.article.flat` and `top.pageviews`. The difference between schema definitions are on the `version` field... [14:28:45] elukey: --^ [14:28:52] It finally happenned [14:31:18] first time that I see this task :) [14:36:31] elukey: almost a year of dumb warnings :( [14:36:51] * joal feels ashamed [15:02:15] a-team: ping [15:02:21] oops [15:03:25] ping milimetric ottomata [15:08:04] OHHH BOY [15:08:05] wow [15:13:41] 10Analytics, 10Analytics-Dashiki, 10CX-analytics: The language-reportcard.wmflabs.org/cx2 chart is stuck at 2018-10-21 - https://phabricator.wikimedia.org/T208324 (10Amire80) [15:34:23] !log kafka topics --alter --topic eventlogging_VirtualPageView --partitions 12 [15:34:24] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:02:36] fdans, ping me if you want to deploy aqs together! [16:11:55] mforns: let's do it! [16:12:23] fdans, ok, gimme 2 mins [16:21:25] (03PS8) 10Joal: Add spark code for wikidata json dumps parsing [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/346726 [16:42:36] (03PS1) 10Mforns: Update aqs to f0fc0e4 [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/470628 [16:43:13] (03CR) 10Fdans: [V: 032 C: 032] Update aqs to f0fc0e4 [analytics/aqs/deploy] - 10https://gerrit.wikimedia.org/r/470628 (owner: 10Mforns) [16:45:07] !log Starting AQS deployment using scap [16:45:08] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:55:01] (03CR) 10Mforns: [C: 032] Handle null name values in top metrics from UI [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/468964 (https://phabricator.wikimedia.org/T206968) (owner: 10Fdans) [16:55:48] !log Finished AQS deployment using scap [16:55:49] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:58:26] (03PS1) 10Fdans: Release 2.4.7 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/470634 [16:58:42] (03CR) 10Fdans: [V: 032 C: 032] Release 2.4.7 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/470634 (owner: 10Fdans) [17:05:10] (03PS1) 10Milimetric: Add debug log of previously completed dates [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/470636 [17:08:18] (03CR) 10jerkins-bot: [V: 04-1] Add debug log of previously completed dates [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/470636 (owner: 10Milimetric) [17:08:58] 10Analytics, 10Analytics-Dashiki, 10CX-analytics: The language-reportcard.wmflabs.org/cx2 chart is stuck at 2018-10-21 - https://phabricator.wikimedia.org/T208324 (10Milimetric) This is due to a bad review by me on the last code change. I should not have allowed the date used in the first column to change,... [17:16:17] * elukey off! [17:32:02] 10Analytics, 10Analytics-Dashiki, 10CX-analytics: The language-reportcard.wmflabs.org/cx2 chart is stuck at 2018-10-21 - https://phabricator.wikimedia.org/T208324 (10Pginer-WMF) >>! In T208324#4706843, @Milimetric wrote: > This is due to a bad review by me on the last code change. I should not have allowed... [17:33:42] 10Analytics, 10Analytics-Dashiki, 10CX-analytics: The language-reportcard.wmflabs.org/cx2 chart is stuck at 2018-10-21 - https://phabricator.wikimedia.org/T208324 (10Milimetric) Yeah, apologies for the trouble, happy to help port the other stats to a dashboard with the start-date-convention if that helps. [17:48:09] Gone for today team - See you tomorrow [17:55:11] (03CR) 10Milimetric: [C: 032] Memoizing results of state functions [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/468205 (https://phabricator.wikimedia.org/T207352) (owner: 10Nuria) [17:55:21] (03CR) 10Milimetric: [C: 032] "very nice" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/468205 (https://phabricator.wikimedia.org/T207352) (owner: 10Nuria) [17:58:00] (03CR) 10Milimetric: Add scheduling for Content Translation MT engine data (032 comments) [analytics/limn-language-data] - 10https://gerrit.wikimedia.org/r/469390 (https://phabricator.wikimedia.org/T207765) (owner: 10Amire80) [18:01:50] milimetric: you around today? [18:02:12] ottomata: yea! [18:02:29] just had to go to dentist so missed the morning, I sent an email... right? (/me checks) [18:02:32] yaaaa [18:02:37] i read but i don't retain anything [18:02:54] :) phew, ok, yeah, sent it [18:02:59] wanna chat? [18:03:01] ya [18:03:32] bc [18:08:35] milimetric: moved retro to tomorrow [18:08:40] right after standup [18:08:45] will be ruthless so it happens [18:34:30] :) ok nuria no problem for me [18:34:34] ottomata: ok, ready [19:06:04] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Table view of timely results in wikistats 2 should be ordered in time descending - https://phabricator.wikimedia.org/T199693 (10Nuria) 05Open>03Resolved [19:06:17] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Splits on top metrics when selections are present on url - https://phabricator.wikimedia.org/T206822 (10Nuria) 05Open>03Resolved [19:14:28] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Move users from stat1005 to stat1007 - https://phabricator.wikimedia.org/T205846 (10Nuria) [19:14:32] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Geoip data archive repository cause puppet to run for minutes - https://phabricator.wikimedia.org/T208028 (10Nuria) 05Open>03Resolved [19:15:14] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: eventlogging logs taking a huge amount of space on eventlog1002 and stat1005 - https://phabricator.wikimedia.org/T206542 (10Nuria) Did we updated docs with the new location for logs older than 90 days? [21:17:08] 10Analytics, 10Analytics-EventLogging: Allow (almost?) all EventLogging events to go into MySQL in beta - https://phabricator.wikimedia.org/T208359 (10Ottomata) [21:47:04] 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Core Platform Team Backlog (Later), 10Services (later): Make schemas use required $schema property with absolute path to the schema - https://phabricator.wikimedia.org/T208361 (10Pchelolo) p:05Triage>03Normal [21:50:10] 10Analytics, 10Analytics-EventLogging: Allow (almost?) all EventLogging events to go into MySQL in beta - https://phabricator.wikimedia.org/T208359 (10MMiller_WMF) [21:58:13] 10Analytics, 10Analytics-EventLogging: Allow (almost?) all EventLogging events to go into MySQL in beta - https://phabricator.wikimedia.org/T208359 (10MMiller_WMF) Thanks for filing this, @Ottomata. I nested it under the Growth team's "Understanding first day" epic because it is blocking our ability to test o... [22:32:47] 10Analytics, 10Analytics-EventLogging: Allow (almost?) all EventLogging events to go into MySQL in beta - https://phabricator.wikimedia.org/T208359 (10Nuria) >because it is blocking our ability to test our new EditorJourney schema. It shouldn't, EditorJourney events on beta are available in kafka, and while ha... [23:12:45] 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Core Platform Team Backlog (Later), 10Services (later): Make schemas use required $schema property with absolute path to the schema - https://phabricator.wikimedia.org/T208361 (10Pchelolo) Some related conversations I was able to find https://gi... [23:33:55] 10Analytics, 10New-Readers: Instrument the landing page - https://phabricator.wikimedia.org/T202592 (10Prtksxna) Hey @nuria, the URL is going to be `bienvenida.wikimedia.org`. [23:34:30] 10Analytics, 10New-Readers: Instrument the landing page - https://phabricator.wikimedia.org/T202592 (10Prtksxna) [23:45:16] 10Analytics, 10New-Readers: Instrument the landing page - https://phabricator.wikimedia.org/T202592 (10Nuria) Ok, @Prtksxna let us know when it is live and we can test the snipet, just created a site: