[00:52:40] 10Analytics, 10DBA, 10Operations, 10User-Elukey: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3746363 (10Tgr) Checking how much different user cohorts (bucketed by editcount) click on a button, for example. If it can be done in Hadoop, that sounds gre... [01:40:03] (03PS2) 10TerraCodes: git.wikimedia.org -> phab [analytics/geowiki] - 10https://gerrit.wikimedia.org/r/389907 (https://phabricator.wikimedia.org/T139089) [02:47:39] (03CR) 1020after4: [C: 031] "recheck" [analytics/geowiki] - 10https://gerrit.wikimedia.org/r/389907 (https://phabricator.wikimedia.org/T139089) (owner: 10TerraCodes) [02:47:52] (03CR) 10jerkins-bot: [V: 04-1] git.wikimedia.org -> phab [analytics/geowiki] - 10https://gerrit.wikimedia.org/r/389907 (https://phabricator.wikimedia.org/T139089) (owner: 10TerraCodes) [02:56:05] (03PS3) 10TerraCodes: git.wikimedia.org -> phab [analytics/geowiki] - 10https://gerrit.wikimedia.org/r/389907 (https://phabricator.wikimedia.org/T139089) [05:34:15] (03CR) 10TerraCodes: [C: 031] "Jenkins pls" [analytics/geowiki] - 10https://gerrit.wikimedia.org/r/389907 (https://phabricator.wikimedia.org/T139089) (owner: 10TerraCodes) [08:57:37] 10Analytics, 10EventBus, 10Patch-For-Review, 10Services (next): Timeouts on event delivery to EventBus - https://phabricator.wikimedia.org/T180017#3746776 (10mobrovac) >>! In T180017#3744585, @Ottomata wrote: > +1 ^ > > There's also this old unmerged patch: > https://gerrit.wikimedia.org/r/#/c/302372/ >... [08:59:17] 10Analytics, 10EventBus, 10Patch-For-Review, 10Services (next): Timeouts on event delivery to EventBus - https://phabricator.wikimedia.org/T180017#3746784 (10Pchelolo) Judging by the logs (on mw-log1001, not currently available in logstash) the timeouts did not disappear. Maybe we could consider increasing... [09:09:09] 10Analytics, 10EventBus, 10Patch-For-Review, 10Services (next): Timeouts on event delivery to EventBus - https://phabricator.wikimedia.org/T180017#3743921 (10elukey) >>! In T180017#3746784, @Pchelolo wrote: > Judging by the logs (on mw-log1001, not currently available in logstash) the timeouts did not disa... [09:40:32] 10Analytics, 10EventBus, 10Patch-For-Review, 10Services (next): Timeouts on event delivery to EventBus - https://phabricator.wikimedia.org/T180017#3746880 (10Pchelolo) Seems like we don't specify it in mediawiki-config, so we're using the default of 5 seconds. [10:40:04] * elukey running errand + early lunhc! [10:40:13] Bye elukey [11:04:42] elukey: I'll have a question for you when you'll be back :) [11:04:46] elukey: It's about a possib [11:05:08] possible change in network config between hadoop workers and druid100[123] [11:31:54] 10Analytics, 10Analytics-Wikistats: X-axis is at odds with stated period in header of trend charts for 'total articles' for a wiki - https://phabricator.wikimedia.org/T180118#3747322 (10Erik_Zachte) [11:34:21] 10Analytics, 10Analytics-Wikistats: X-axis is at odds with stated period in header of trend charts for 'total articles' for a wiki - https://phabricator.wikimedia.org/T180118#3747339 (10Erik_Zachte) That looks odd, indeed, also for other languages. [11:41:45] (03PS1) 10Joal: Add clickstream oozie job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/390226 (https://phabricator.wikimedia.org/T175844) [11:43:44] (03PS1) 10Joal: Correct bug in ClickstreamBuilder [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/390227 (https://phabricator.wikimedia.org/T175844) [12:12:17] joal: I am here :) [12:12:25] Good afternoon elukey :) [12:12:44] elukey: is now a good time? You having pinged me, I assume so, but prefer to check :) [12:13:17] it is! Just came back from lunch :) [12:13:39] elukey: I'm sorry to disturb you post-prandial pause :) [12:14:16] hhahah I am working joal, it is fine :) [12:14:23] huhu [12:15:03] elukey: so, here is my problem: my streaming job for banners started to fail yesterday at midnight [12:15:22] elukey: 2017-11-08T00:00:00 [12:15:44] elukey: reading from kafka and processinf is fine, but my job failed to index in druid [12:16:04] elukey: whatever I tried since - restarting jobs etc failed - logs say there is a connection timeout [12:16:44] elukey: So, I wonder if anything has changed lately on those hosts [12:17:34] nothing that I know of, but it happens from time to time that we get those weird indexing timeouts no? [12:17:42] we can check metrics/logs on the host if you want [12:18:05] elukey: it had not happened to me for some time [12:18:21] elukey: I'll try to relaunch the job and we can monitor [12:18:31] ack! [12:18:47] elukey: job launched [12:20:24] elukey: for the moment, it processes old data, therefore drop it [12:42:49] 10Analytics-Kanban, 10User-Elukey: Restart Analytics JVM daemons for open-jdk security updates - https://phabricator.wikimedia.org/T179943#3747605 (10elukey) Currently waiting for the jdk7 updates for jessie. [12:43:00] 10Analytics-Kanban, 10User-Elukey: Restart Analytics JVM daemons for open-jdk security updates - https://phabricator.wikimedia.org/T179943#3747607 (10elukey) 05Open>03stalled [13:08:03] elukey: job is stuck - I really don't undersntad :( [13:09:02] joal: so iiuc it is trying to submit an index job to the overlord? [13:09:27] elukey: I think so yes, but can't confirm - tranquility is basically timing out [13:11:23] elukey: in previous task, it was failing trying to open connections to druid100[123].eqiad.wmnet:2181 [13:12:01] oh elukey, it might actually be zookeeper ! [13:13:17] ahhh it is the private cluster! [13:13:29] ? [13:13:30] I was checking the overlord logs in the other one :P [13:13:38] Arf sorry elukey [13:13:50] elukey: I should have been more explicit [13:17:09] joal: ahh is it tranquillity?? We don't allow it to connect to zk anymore :( [13:17:21] elukey: I think that's it indeed [13:17:33] elukey: tranquility is ran inside a spark job [13:17:35] sorry I didn't know it was it :( [13:17:49] elukey: how come we don't allow it? [13:18:49] * joal don't understand :( [13:28:57] joal: so some time ago Andrew merged a change for the Druid clusters and since we didn't use Tranquillity anymore he removed the support for it [13:29:01] (iirc) [13:29:31] elukey: hm, what kind of support would that be? [13:29:45] joal: Ferm rules :) [13:29:50] elukey: other weird thing is that problem seem to have started like 2 days ago :( [13:29:59] elukey: Ah, that could be ;) [13:30:30] profile::zookeeper::firewall::srange: '(($HADOOP_MASTERS $KAFKA_BROKERS_ANALYTICS $KAFKA_BROKERS_JUMBO $KAFKA_BROKERS_MAIN $ZOOKEEPER_HOSTS_MAIN @resolve(krypton.eqiad.wmnet)))' [13:30:40] elukey: Well, we actually still use tranquility - Inside spark, but we use it [13:31:13] joal: let's bc for a sec ok ? [13:31:21] hm - I'm definitely not fluent in ferm :) [13:31:30] sure elukey [13:31:33] OME [13:54:25] elukey: Found it ! [13:54:43] elukey: tranquility uses zookeeper to discover the topology of the druid cluster [13:57:54] elukey: more precisely - tranquility uses zookeeper to find he current overlord-master [13:58:40] elukey: but, interestingly: https://github.com/druid-io/tranquility/issues/251 [13:58:40] ahhhh [13:59:20] elukey: Do you think reopenning the ports will be problematic? [13:59:39] joal: do you need it only for the "private" cluster right? [13:59:44] correct sir [13:59:56] all right, so this one is basically inside the analytics network [14:00:11] so I don't see any trouble in opening the zk port [14:00:22] elukey: <3 [14:01:16] elukey: about the patch for HDFS and YARN heap size - Have seen interesting changes in an1030 behavior after your patch? [14:02:10] joal: I did! https://grafana.wikimedia.org/dashboard/db/analytics-hadoop?orgId=1 - you can see a line that behaves differently for heap sizes [14:02:41] moreover I thought that since we'll add the javaagent for prometheus soon to all the daemons a bit more heap size will surely be good [14:02:48] since it is already heavily used [14:02:59] 10Analytics, 10DBA, 10Operations, 10User-Elukey: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3747806 (10Ottomata) > Would there be a separate webrequest hadoop access and EL-only Hadoop access? It doesn't work 100% as it should, but for Hadoop access... [14:03:27] works for me elukey :) [14:03:55] (03CR) 10Ottomata: [C: 032] Correct bug in ClickstreamBuilder [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/390227 (https://phabricator.wikimedia.org/T175844) (owner: 10Joal) [14:06:50] joal just curious, why archive? [14:06:55] oh because no hive table? [14:07:00] this is intended forpublishing? [14:07:06] ottomata: correct, and text file, and to be presented externally :) [14:07:11] gr8 k [14:07:20] ottomata: morninggggg [14:08:09] joal,ottomata - https://gerrit.wikimedia.org/r/#/c/390242/1/hieradata/role/common/druid/analytics/worker.yaml [14:08:39] Rreeeallly?! [14:08:45] huh! [14:09:03] ottomata: didn't notice it, but yeah, streaming job was broken [14:09:23] thanks elukey [14:09:30] (03CR) 10Ottomata: [C: 031] Add clickstream oozie job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/390226 (https://phabricator.wikimedia.org/T175844) (owner: 10Joal) [14:12:09] ottomata: only thing I dislike in current clickstream job is that output folders will be: 2017-09/wiki_db=enwiki/file [14:12:34] ottomata: I'm thinking of adding a sh action to rename wiki_db=XX to XX [14:14:54] meh why joal, because it looks bad in web browser? :) [14:15:03] kinda elukey :) [14:15:11] kinda ottomata sorry [14:15:14] elukey@analytics1040:~$ telnet druid1001.eqiad.wmnet 2181 [14:15:14] Trying 10.64.5.101... [14:15:14] Connected to druid1001.eqiad.wmnet. [14:15:14] Escape character is '^]'. [14:15:15] hehheh [14:15:15] Mwarf, tired [14:15:46] all right tranquillity should be happy now [14:16:07] elukey: <3 <3 [14:21:18] 10Analytics-Kanban, 10Patch-For-Review: Remove AppInstallIId from EventLogging purging white-list - https://phabricator.wikimedia.org/T178174#3748009 (10mforns) @JMinor Please, invite me to the meeting as well. Regardless of the outcome, I'd like to know Legal's point of view on this subject, so that we can a... [14:56:40] Taking a break a-team [14:57:15] ah we use different garbage collectors for druid daemons [14:57:18] didn't know that [14:57:31] I thought we were using G1 [14:57:37] (for all of them) [15:13:57] 10Quarry, 10DBA, 10Data-Services: Raise concurrent mysql connection limit for Quarry (or throttle application concurrency) - https://phabricator.wikimedia.org/T180141#3748147 (10bd808) [15:29:15] 10Quarry, 10DBA, 10Data-Services: Raise concurrent mysql connection limit for Quarry (or throttle application concurrency) - https://phabricator.wikimedia.org/T180141#3748196 (10zhuyifei1999) There may be as many as 48 celery workers: {P6298} [15:35:01] elukey: there is def not a reason i know of that we aren't other than I never specified anything specific [15:36:48] ottomata: no big deal, I was only reviewing metrics in the logs and found multiple names so I was confused :D [15:45:52] (03PS11) 10Mforns: [WIP] Add scala-spark core class and job to import data sets to Druid [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/386882 (https://phabricator.wikimedia.org/T166414) [15:54:20] Hi a-team, a little joy before standup: rik@mediawiki/Yurik] has joined #wikimedia-analytics [15:54:23] 16:45:52 < wikibugs> (PS11) Mforns: [WIP] Add scala-spark core class and job to import data sets to Druid [analytics/refinery/source] - https://gerrit.wikimedia.org/r/386882 [15:54:27] (https://phabricator.wikimedia.org/T166414) [15:54:30] [16:54:14] [joal(+i)] [2:freenode/#wikimedia-analytics(+n)] [Act: 3,4,5,6,7,8,9,12] [15:54:34] Wow - unexpected past [15:54:35] doing it again [15:54:46] Hi a-team, a little joy before standup: http://turnoff.us/geek/follow-the-hype/ [15:55:43] xD [15:56:02] I thought he was going to punch him [15:56:30] mforns: well, there is that extra strip if you click ... ;) [15:56:46] xDDDD [15:56:51] I see :] [16:20:23] 10Analytics-Kanban, 10Patch-For-Review: Remove AppInstallIId from EventLogging purging white-list - https://phabricator.wikimedia.org/T178174#3748313 (10mforns) @JMinor Actually, no need to invite me. We can follow up on Legal's position after the meeting. Thanks! [16:45:25] 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: Possibly faulty BBU on analytics1029 - https://phabricator.wikimedia.org/T178742#3748365 (10RobH) So the determination of ordering new hardware for failed will have to also rely on budgeting and if analytics can run without this host or require it. @e... [16:58:32] (03PS12) 10Mforns: [WIP] Add scala-spark core class and job to import data sets to Druid [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/386882 (https://phabricator.wikimedia.org/T166414) [17:05:27] nuria_: or milimetric, quick brain bounce about some lawsuit data stuff before we resume? [17:05:28] im' still in bc [17:05:38] ottomata: k, one sec [17:13:19] ping milimetric elukey [17:14:00] coming sorry [17:20:24] 10Analytics, 10Analytics-Wikistats: Beta: How to link/to cite Wikistats 2.0 - https://phabricator.wikimedia.org/T179975#3748496 (10mforns) [17:20:48] 10Analytics, 10Analytics-Wikistats: Beta: How to link to/cite Wikistats 2.0 - https://phabricator.wikimedia.org/T179975#3742432 (10mforns) [17:24:11] 10Analytics, 10Operations, 10ops-eqiad, 10User-Elukey: Possibly faulty BBU on analytics1029 - https://phabricator.wikimedia.org/T178742#3748503 (10RobH) Discussed some, Chris is going to pull a BBU out of a decom system to replace the defective one. Also analytics may have to start planning for the replac... [17:24:16] 10Analytics-Kanban, 10Analytics-Wikistats: Alpha release: Label breakdown categories and other non-obvious concepts - https://phabricator.wikimedia.org/T177950#3748508 (10mforns) [17:24:18] 10Analytics, 10Analytics-Wikistats: Add tooltips for metric definitions in Wikistats 2.0 - https://phabricator.wikimedia.org/T179973#3748505 (10mforns) [17:24:38] 10Analytics-Kanban, 10Analytics-Wikistats: Alpha release: Label breakdown categories and other non-obvious concepts - https://phabricator.wikimedia.org/T177950#3675829 (10mforns) From duplicate task: > The current version of Wikistats 2.0 links out to the full page on Meta describing a metric. It would be use... [17:25:05] ottomata: fyi https://phabricator.wikimedia.org/T178742#3748503 [17:25:31] 10Analytics-Kanban, 10Analytics-Wikistats: Format timestamps as human-readable datetimes in Wikistats 2.0 table view - https://phabricator.wikimedia.org/T179971#3748516 (10mforns) [17:32:54] 10Analytics, 10Analytics-Wikistats, 10I18n: Move non-SI prefixes to user- or locale-specific interface - https://phabricator.wikimedia.org/T179906#3739830 (10Ottomata) WOW an American learns something new today. I'm all for this! Let's go SI for everything! 'There are 7 Gigapeople alive today.' YES! [17:35:14] 10Analytics, 10Analytics-Wikistats, 10I18n: Move non-SI prefixes to user- or locale-specific interface - https://phabricator.wikimedia.org/T179906#3739830 (10mforns) We'll undo the change. This looks like a really good argument. Our Alpha launch will not have internationalization, we are aiming to implement... [17:35:30] 10Analytics-Kanban, 10Analytics-Wikistats, 10I18n: Move non-SI prefixes to user- or locale-specific interface - https://phabricator.wikimedia.org/T179906#3748531 (10mforns) [17:45:55] (03CR) 10Chad: "Basically this is failing flake8, badly. We need to fix all the violations first. Which is fun, since the setup.py doesn't target python 2" [analytics/geowiki] - 10https://gerrit.wikimedia.org/r/389907 (https://phabricator.wikimedia.org/T139089) (owner: 10TerraCodes) [17:57:02] 10Analytics, 10DBA, 10Operations, 10User-Elukey: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3748567 (10Nuria) @Tgr : i think your use case would work in hadoop as edit data related information is available since the beginning of time, now, it is true... [17:59:02] ottomata: give me 15 mins and let's review the backup info ok? [18:06:14] 10Analytics, 10Proton, 10Readers-Web-Backlog, 10MW-1.31-release-notes (WMF-deploy-2017-10-31 (1.31.0-wmf.6)), and 2 others: Implement Schema:Print purging strategy - https://phabricator.wikimedia.org/T175395#3748579 (10phuedx) Will we be purging data up until today's deploy differently from the data afterw... [18:14:24] ottomata: batcave? [18:14:32] nuria_: am in spark project showcase [18:14:39] ottomata: k np [18:14:51] ottomata: you let me know [18:24:14] nuria i think i know what is best to do. i'm just going to make the redundant copy in wmf-vs-nsa/ and then redo verification there [18:29:44] ottomata: let's talk when you are out of spark [18:36:01] 10Quarry, 10DBA, 10Data-Services: Raise concurrent mysql connection limit for Quarry (or throttle application concurrency) - https://phabricator.wikimedia.org/T180141#3748147 (10Marostegui) This user has no limits on the old replicas, how many would be a decent limit on the new ones? I'd rather set some high... [18:37:26] ok nuria_ going to bc [18:39:29] ottomata: omw [18:54:48] 10Quarry, 10DBA, 10Data-Services: Raise concurrent mysql connection limit for Quarry (or throttle application concurrency) - https://phabricator.wikimedia.org/T180141#3748774 (10bd808) @Marostegui can we set it to 48? That appears to match the current parallel worker queue size. [19:18:59] * elukey off! [21:41:37] (03PS1) 10TerraCodes: git.wikimedia.org -> phab [analytics/kafkatee] - 10https://gerrit.wikimedia.org/r/390326 (https://phabricator.wikimedia.org/T139089) [21:53:54] mi webpack patch is SO LOW TECH [22:45:43] (03CR) 10Chad: [V: 032 C: 032] git.wikimedia.org -> phab [analytics/kafkatee] - 10https://gerrit.wikimedia.org/r/390326 (https://phabricator.wikimedia.org/T139089) (owner: 10TerraCodes) [22:51:06] (03CR) 10Chad: [V: 032 C: 032] git.wikimedia.org -> phab [analytics/geowiki] - 10https://gerrit.wikimedia.org/r/389907 (https://phabricator.wikimedia.org/T139089) (owner: 10TerraCodes) [23:53:54] 10Analytics-Kanban, 10Contributors-Analysis: Use native timestamp types in Data Lake edit data (needs Hive 1.2) - https://phabricator.wikimedia.org/T161150#3749629 (10Neil_P._Quinn_WMF) [23:54:21] 10Analytics, 10Contributors-Analysis: Use native timestamp types in Data Lake edit data (needs Hive 1.2) - https://phabricator.wikimedia.org/T161150#3122979 (10Neil_P._Quinn_WMF) [23:55:48] 10Analytics, 10Contributors-Analysis: Use native timestamp types in Data Lake edit data (needs Hive 1.2) - https://phabricator.wikimedia.org/T161150#3122979 (10Neil_P._Quinn_WMF) a:05JAllemandou>03None