[00:03:34] 10Analytics, 10Android-app-feature-Compilations, 10Wikipedia-Android-App-Backlog, 10Reading-Infrastructure-Team-Backlog (Kanban): Determine how to gather top-viewed article lists for use in generating ZIM files - https://phabricator.wikimedia.org/T172296#3495671 (10Mholloway) 05Open>03Resolved a:03Mho... [00:55:32] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Replacement of stat1002 and stat1003 - https://phabricator.wikimedia.org/T152712#3495844 (10Catrope) >>! In T152712#3489069, @Ottomata wrote: > @Catrope just emailed: > >> I would love to migrate to stat1006 from stat1003, but stat1006 is unusably... [01:56:52] 10Analytics, 10Beta-Cluster-Infrastructure, 10Wikimedia-Stream: Decom RCStream in Beta Cluster - https://phabricator.wikimedia.org/T172356#3495960 (10Krinkle) [01:57:00] 10Analytics, 10Beta-Cluster-Infrastructure, 10Wikimedia-Stream: Decom RCStream in Beta Cluster - https://phabricator.wikimedia.org/T172356#3495976 (10Krinkle) Also, is there EventStreams in Beta Cluster? [05:21:20] 10Analytics, 10Analytics-EventLogging, 10Performance-Team: Make webperf eventlogging consumers use eventlogging on Kafka - https://phabricator.wikimedia.org/T110903#3496151 (10Krinkle) >>! In T110903#2190538, @Ottomata wrote: > I talked with @Krinkle on IRC yesterday and we came up with a plan of action. Th... [08:03:30] !log restart hive-metastore to pick up new JVM Xms settings [08:03:37] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:04:41] hope that this --^ have not broke any ongoing job (nothing in yarn atm related to hivE) [08:06:33] ah I broke druid webrequest/pageviews jobs [08:08:20] restarted [08:08:45] !log restarted Druid jobs failed over night (drud_loader.py error) and due to Hive metastore restart [08:08:46] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:35:13] 10Analytics-Kanban: Calculate how much Popups events EL databases can host - https://phabricator.wikimedia.org/T172322#3496338 (10elukey) As Nuria pointed out the analytics-store is now under disk consumption pressure and this experiment might be problematic. There are a couple of things that we could do to allo... [08:35:31] 10Analytics-Kanban, 10User-Elukey: Calculate how much Popups events EL databases can host - https://phabricator.wikimedia.org/T172322#3496340 (10elukey) p:05Triage>03Normal [08:39:00] morning elukey :D [08:39:41] o/ [08:40:14] So, I'm going to try and get the initial bits of Gorans work puppetized in the coming days / weeks [08:41:01] and was just wondering do you think it is fine to just keep adding stuff to https://github.com/wikimedia/puppet/blob/production/modules/statistics/manifests/wmde.pp or perhaps start a different manifest? [08:41:37] If we go for a different manifest I might rename the current one to wmde_graphite_metrics or something? [08:41:48] as that is ALL it does [08:47:43] addshore: so classes should be generic and also self contained (https://wikitech.wikimedia.org/wiki/Puppet_coding#Modules), I like the idea of having multiple modules [08:47:49] so we know what they are doing [08:47:56] and we collect them in a profile/role [08:47:59] wdyt? [08:49:25] hmm, so, module would mean it is outside of the statistics module? [08:49:50] I think when I first wrote that puppet stuff I started drafting it as a module but then we pulled it inside of statistics [08:50:02] I think I like what you are sayinbg [08:50:04] *saying [08:51:32] nono sorry what I meant to say is that we could use different classes for different self contained things, event adding statistics --> wmde --> graphite.pp/someother.pp/etc.. [08:52:31] my fear is that we keep adding stuff to the statistics::wmde class (that I am totally ignorant about what it does) and we end up with a monster with multiple heads [08:52:39] each one doing completely different things [08:52:48] a nightmare to maintain in the longer term [08:52:57] also it is difficult to reason about what it does [08:53:57] ebernhardson: hello :) [08:53:58] ahhh okay [08:54:15] elukey: yeh, I am against having one big thing that does all of the stuff [08:54:42] Right, I'm going to try and make a patch that just pushes some stuff around then before we start adding more :) [08:54:44] thanks! :) [08:54:46] ebernhardson: stat1005 seems overloaded by something with your username, the OOM is killing like crazy [08:54:59] addshore: thank you! Will be happy to help/review [09:17:59] elukey: ^^ there is attempt 1 >> https://gerrit.wikimedia.org/r/#/c/369857/ [09:27:53] addshore: maybe wmde.pp could become a class under the wmde directory? [09:28:06] elukey: yeh, wmde::init? or [09:29:07] yeah [09:29:22] What is the deal with this line? [09:29:25] Class['::statistics'] -> Class['::statistics::wmde'] [09:29:34] * addshore isn't sure when he needs to include that or what it does [09:29:51] it says that the statistics module needs to be loaded before the wmde one [09:30:15] ahh okay! [09:30:21] I am not sure when to use init.pp though, never done it [09:30:36] it should be to please the autoloader, so if you include statistcs::wmde it works [09:31:20] ahh okay! [09:31:34] so I dont need to update the thing that includes statistics::wmde [09:31:54] Pushed another PS :) [09:33:56] addshore: now you entered into the refactoring nightmare :D [09:34:04] oh noes [09:34:06] so theoretically statsd_host => hiera('statsd'), is not allowed in a class [09:34:12] but only into profiles [09:34:23] since we don't want classes to make explicit hiera calls [09:34:23] hmmm, but it was always there before! :P [09:34:27] I knowwww [09:34:41] so, now to learn about profiles ;) [09:34:55] so if you want I can work on your PS later on today [09:35:00] and then we can discuss the changes [09:35:09] basically the roles are now collections of profiles [09:35:32] and only in profiles you can make explicit hiera calls, but only in the paramters [09:35:33] hahahaaaaa, okay, I'll have a quick dig around and see if I can figure it out for myself first :) [09:35:52] https://wikitech.wikimedia.org/wiki/Puppet_coding [09:35:54] so then the statsd host has to get passed all the way through the main wmde class i guess? [09:37:00] afaics statistics::wmde is included in profile::statistics::private [09:37:11] so it is sufficient to add a parameter in there [09:37:25] and pass it to the classes [09:37:44] okay! [09:37:49] thanks again! :) [09:38:02] Yeh, I dont think this profile stuff was being used when I first wrote this class [09:39:27] 10Analytics, 10Beta-Cluster-Infrastructure, 10Wikimedia-Stream: Decom RCStream in Beta Cluster - https://phabricator.wikimedia.org/T172356#3496434 (10hashar) It runs on `deployment-stream.deployment-prep.eqiad.wmflabs` `10.68.17.106` created by @ori when he did the RCStream project. The instance has a float... [09:39:30] it is recent so it will take a long time before we transition the whole codebase to it, but it is really better to inspect [09:39:45] with this scheme you know exactly where hiera calls are made [09:39:56] and how things are passed around [09:56:49] !log set piwik in maintenance mode to allow mysql updates [09:56:51] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:05:16] elukey: so, I think I have done that in the next PS now :) [10:16:37] addshore: I added some comments to PS4 [10:18:28] awesome! [10:23:08] elukey: another PS up :) [10:25:16] alas, it has style issue [10:25:37] addshore: great :) - one question - where do we use $graphite_host? It seems redudant no? [10:26:08] modules/statistics/manifests/wmde/init.pp:6 ERROR statistics::wmde not in autoload module layout (autoloader_layout) [10:26:31] http://puppet-lint.com/checks/autoloader_layout/ [10:26:40] elukey: it is used in the config template [10:26:47] hmmmm [10:27:10] graphite_host <%= @graphite_host %> [10:27:24] ahhh okok [10:28:10] and so it needs to be statistics/wmde/manifests/init.pp ? [10:28:38] addshore: ignorant question - should we use only statsd.eqiad.wmnet or is there a reason to use graphite directly? [10:29:17] So statsd is use for realtime / live data, graphite is used for historical data, so these scripts put data in graphite for previous days at given timestamps for example [10:30:04] ok so we'd need a way to grab the graphite host from somewhere else, like hiera [10:30:07] checking it [10:30:26] for the init.pp I'd remove it, let me check what it is best [10:30:46] okay! [10:31:01] and yeh, graphite might be in hiera now, but I know at the time of writing this it wasnt [10:31:20] but afaik I can't see what is in hiera to check anyway :) [10:34:45] as fyi [10:34:46] statsd.eqiad.wmnet. 182 IN CNAME graphite1001.eqiad.wmnet. [10:34:46] graphite1001.eqiad.wmnet. 2076 IN A 10.64.32.155 [10:34:51] argh horrible paste [10:35:49] and afaik statsd is supposed to send metrics in batches to graphite [10:36:24] Yup, but you can't specify timestamps, it buckets data for 1 minute, but that can only be the current minute, and then sends it on to graphite [10:38:02] The stuff that gets sent to graphite directly is generally more daily / weekly data than minutely, and it is usually, this is what the value was at this time rather than counters and times and what not [10:42:25] addshore: for the classes, I'd go back to the previous layout (sorry): so wmde.pp + wmde::graphite in the wmde directory (that already contains ::user) [10:43:07] now we could also add the username/group parameters (with defaults) to statistics::wmde::user or just include that code in wmde.pp [10:43:18] Done in the PS i just pushed :) [10:43:27] I'll do that too :) [10:43:42] aaaand we'd need to use graphite-in.eqiad.wmnet [10:43:45] just asked to Filippo [10:44:13] is that in hiera or should i just change it in the string? [10:44:21] it is not in hiera, but we can pass it to the profile's paramters [10:44:33] okay! [10:45:14] so, for the user, can Group[$username] go in the class param default? [10:45:25] or, is leaving that inside the class itself fine? [10:46:12] wait, ignore that, I guess the group and use should both just stay in that file, i'll just push another PS in a sec [10:47:03] hey, I want to take out top viewed articles of enwiki that happened from Iran in the past seven months, It has no granularity to give out any PII. Is it okay to publish it? [10:47:13] I can also get rid of the usage of $statistics_working_path now as /a is now /srv so the default works! [10:48:06] and, pushed again [10:52:09] Amir1: is it urgent? I'll ask to my team when everybody is up [10:52:49] addshore: I have an idea to de-couple it more, can I do it? [10:52:59] I'm super super excited because it's in a middle of a heated discussion, beside that, no :D [10:53:16] elukey: go for it :D [11:07:22] elukey: the last article in the top 100 has 22K views in the past seven months [11:21:47] Amir1: should be super fine to publish, but I'd prefer to get a second opinion [11:22:30] theoretically there is no risk of identification of small communities since it is aggregated by the whole countryy [11:22:51] soooo if it is super urgent I'd say ok, otherwise let's wait a couple of hours [11:23:08] sure [11:23:19] * elukey lunch! [11:43:09] 10Analytics, 10Android-app-feature-Feeds, 10Pageviews-API, 10RESTBase-API, and 2 others: Why top views data of different sources is not the same? - https://phabricator.wikimedia.org/T172379#3496756 (10Shizhao) [12:13:10] (03CR) 10DCausse: [C: 031] UDF for extracting primary full text search request [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/327855 (https://phabricator.wikimedia.org/T162054) (owner: 10EBernhardson) [12:30:15] elukey: awesome! [12:30:34] I'll make a patch changing the dir soon [12:56:29] elukey: FYI I just made 3 more small patches cleaning up things & putting everything under a sub dir, I have another meeting now though :/ [12:57:18] infact, that is a lie, I have no meeting [12:58:30] everything is a lie [12:58:31] :D [13:02:31] :D [13:05:31] will review them in ~1h! [13:05:34] awesome! [13:17:58] 10Analytics-Kanban, 10Operations, 10Traffic, 10Patch-For-Review, 10User-Elukey: Update Varnishkafka to support TLS encryption/authentication - https://phabricator.wikimedia.org/T165736#3496930 (10elukey) Re-tested it in labs now and the current version of varnishkafka is able to use TLS without any modif... [13:22:43] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Understand Kafka ACLs and figure out what ACLs we want for production topics - https://phabricator.wikimedia.org/T167304#3496938 (10elukey) After checking the `kafka-authorizer.log` file I had to add the following rules to avoid de... [13:24:47] elukey: did you rerun some of these failed oozie jobs? [13:24:56] like https://hue.wikimedia.org/oozie/list_oozie_workflow/0057756-170621131133576-oozie-oozi-W/ sent a fatal error in email [13:25:01] but looks ok [13:25:33] milimetric: I did (logged in the chan's sal log) but some of them kept failing [13:25:41] then I had some things to do and didn't check [13:25:48] same step, druid_loader.py [13:25:54] oh, cool, andrew's over and we're re-running and cleaning stuff [13:25:55] checking now [13:26:15] ahhhh okok [13:26:18] no worries, I guess I'm just paranoid these things will develop sentience and rerun themselves [13:26:21] so yesterday [13:26:24] there were just a few failed jobs [13:26:32] and, i reran them, but the first attempt failed [13:26:44] hola ottomata :) [13:26:45] looking at logs, i saw that there was a temp dir that couldn't be cleaned up(?) [13:26:47] so i removed the tmp dir [13:26:49] and reran [13:26:51] and then they succeeded [13:27:06] it was difficult for me to find what was the error [13:27:14] dunno why they failed in the first place though, it looks like the druid http endpoint the job was polling didn't finish in time? [13:27:45] I had the same impression but since I am super ignorant about druid I dediced to wait for you guys :) [13:27:51] I keep seeing emails though [13:28:35] its really strange that most jobs succeed [13:41:24] elukey: dan noticed that I forgot to update the mysq-metadata-storage jar to the new druid version, which seems unlikely to be the problem [13:41:35] but i'll add docs to the README.debian so I don't forget next time [13:41:40] updating and building new deb now [13:41:42] then will restart service [13:41:43] s [13:42:34] ottomata: ack [13:52:28] FYI people I removed the an-kanban tag from T155065 [13:52:51] (the cluster has been completely expanded) [13:54:47] 10Analytics, 10Analytics-Cluster, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install new kafka nodes kafka-jumbo100[1-6] - https://phabricator.wikimedia.org/T167992#3497020 (10elukey) >>! In T167992#3493916, @RobH wrote: > @elukey: So we do prefer sw raid over hw raid when purchasing serve... [13:56:50] addshore: https://puppet-compiler.wmflabs.org/compiler02/7291/stat1005.eqiad.wmnet/ - does it look ok? [13:57:57] elukey: yup [13:59:19] addshore: will you clean up after this patch on stat1005? (the old files not anymore in /srv/analytics-wmde [14:00:39] yup! [14:01:18] thanks! Merging the last one [14:04:24] elukey: can we just move that ticket to done and let nuria close it? [14:05:05] ottomata: didn't close it since it is listed as "pending invoice/payment", so it is in the hand of procurement now [14:05:14] hellooooooo [14:05:14] this is why I removed the tag [14:06:08] addshore: all done! [14:06:16] great! thanks! [14:10:15] ah hm ok [14:11:01] !log pausing oozie druid jobs and doing a cluster upgrade/restart again to make sure updated version of mysql-metadata-storage jar is properly loaded [14:11:03] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:16:36] 10Analytics-Kanban, 10Analytics-Wikistats, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban): Fix Wikistats build in Jenkins - https://phabricator.wikimedia.org/T171599#3497168 (10fdans) @hashar thank you so much! I've changed a couple of wrong calls to semantic that I added some... [14:23:54] hey elukey shouldn't we have gotten an email alert when a cron on analytics1003 failed? This is what failed: https://github.com/wikimedia/puppet/blob/488f55cd0bfffd2e6d81acfe1e727dfa075faccd/modules/role/manifests/analytics_cluster/refinery/job/sqoop_mediawiki.pp#L32 [14:26:44] milimetric: theoretically there is no MAILTO set so if any output to stdout/err was emitted it should have been sent to root@wikimedia, namely the opsens [14:27:19] checking on 1003 [14:28:45] yeah we have only MAILTO=analytics-alerts@wikimedia.org before the last item [14:29:22] but I don't remember to have seen cronspam from sqoop milimetric (on root@wikimedia.org) [14:29:35] so maybe it doesn't emit anything in stdout/err? [14:29:50] ah snap [14:29:50] >> ${log_file} 2>&1 [14:29:53] yes ok :D [14:30:22] so this is a broader subject/problem that we have at the moment, namely how to sanely alarm on errors happening in log files [14:31:28] the saner option is, in my opinion, to have our scripts log to stderr when they fail and not redirect it to a logfile [14:31:37] so cron will send an email in case of failure [14:33:16] 10Analytics-Kanban, 10Analytics-Wikistats: Wikistats2 bugs (1/4) - Dashboard and general UI - https://phabricator.wikimedia.org/T170933#3497278 (10fdans) [14:33:18] 10Analytics-Kanban, 10Analytics-Wikistats: Addition of Unique Devices metric - https://phabricator.wikimedia.org/T170461#3497279 (10fdans) [14:51:21] (03Draft2) 10Reedy: Add hiwikiversity [analytics/refinery] - 10https://gerrit.wikimedia.org/r/369926 (https://phabricator.wikimedia.org/T168765) [15:31:50] 10Analytics, 10Android-app-feature-Feeds, 10Pageviews-API, 10RESTBase-API, and 2 others: Why top views data of different sources is not the same? - https://phabricator.wikimedia.org/T172379#3497560 (10elukey) 05Open>03declined Because they are different APIs showing different content :) Please re-open... [15:32:27] 10Analytics-Kanban, 10Beta-Cluster-Infrastructure, 10Wikimedia-Stream, 10Patch-For-Review: Decom RCStream in Beta Cluster - https://phabricator.wikimedia.org/T172356#3497562 (10elukey) [15:33:02] 10Analytics, 10Analytics-Wikistats: WiViVi: Per-person should account for connected percentage? - https://phabricator.wikimedia.org/T172335#3497564 (10elukey) a:03ezachte [15:38:38] 10Analytics-Cluster, 10Analytics-Kanban, 10Operations, 10ops-eqiad: rack/setup/install druid100[456].eqiad.wmnet - https://phabricator.wikimedia.org/T171626#3497579 (10Cmjohnson) [15:39:29] 10Analytics-Cluster, 10Analytics-Kanban, 10Operations, 10ops-eqiad: rack/setup/install druid100[456].eqiad.wmnet - https://phabricator.wikimedia.org/T171626#3471295 (10Cmjohnson) a:05Cmjohnson>03RobH Assigning to @robh to do the installs. **Network ports are disabled [15:41:26] 10Analytics-Kanban, 10Analytics-Wikistats: Add piwik to wikistats 2.0 site - https://phabricator.wikimedia.org/T171642#3497591 (10elukey) a:03fdans [15:42:11] 10Analytics, 10Analytics-Cluster, 10Operations, 10ops-eqiad: rack/setup/install druid100[456].eqiad.wmnet - https://phabricator.wikimedia.org/T171626#3497594 (10elukey) [15:45:55] 10Analytics, 10Analytics-EventLogging, 10Contributors-Analysis, 10DBA, and 2 others: Add index to mediawiki_page_create_1 table - https://phabricator.wikimedia.org/T170990#3497629 (10elukey) [15:47:04] 10Analytics-Kanban, 10Analytics-Wikistats: Error handling - https://phabricator.wikimedia.org/T171487#3497635 (10elukey) a:03fdans [15:48:41] 10Analytics, 10DBA, 10Research: Phase out and replace analytics-store (multisource) - https://phabricator.wikimedia.org/T172410#3497641 (10Halfak) [15:50:56] 10Analytics, 10Operations, 10Research: Phase out and replace analytics-store (multisource) - https://phabricator.wikimedia.org/T172410#3497676 (10jcrespo) [15:51:25] 10Analytics-Cluster, 10Analytics-Kanban, 10User-Elukey: Make refinery drop data scripts email analytics-alerts if they fail - https://phabricator.wikimedia.org/T168415#3497678 (10elukey) p:05Unbreak!>03Normal [15:52:25] 10Analytics, 10DBA: Purge all old data from EventLogging master - https://phabricator.wikimedia.org/T168414#3497685 (10elukey) [15:54:21] 10Analytics-Kanban, 10Analytics-Wikistats: Routing - https://phabricator.wikimedia.org/T167672#3497688 (10elukey) [15:54:23] 10Analytics-Kanban, 10Analytics-Wikistats: Cleanup Routing code - https://phabricator.wikimedia.org/T170459#3497690 (10elukey) [15:55:59] 10Analytics-Kanban, 10User-Elukey: Alarms on pageview API latency increase - https://phabricator.wikimedia.org/T164243#3497691 (10elukey) [15:56:38] 10Analytics-Kanban: Design document for wikistats prototype backend - https://phabricator.wikimedia.org/T162817#3497695 (10elukey) a:03Milimetric [15:58:27] 10Analytics, 10Operations, 10Wikimedia-Stream, 10hardware-requests, 10Patch-For-Review: decommission rcs100[12] - https://phabricator.wikimedia.org/T170157#3497701 (10elukey) [16:15:45] !log druid cluster restarted with 0.9.2 mysql-metadata-storage extension, un-suspending oozie druid jobs [16:15:47] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:38:22] * elukey off! [17:40:00] 10Analytics, 10Android-app-feature-Feeds, 10Pageviews-API, 10RESTBase-API, and 2 others: Why top views data of different sources is not the same? - https://phabricator.wikimedia.org/T172379#3497966 (10Shizhao) 05declined>03Open >>! In T172379#3497560, @elukey wrote: > Because they are different APIs sh... [17:47:08] o/ [17:49:27] hey, I have the list of most visited articles of English Wikipedia from Iran, the last one has 22K views and has no granularity whtsoever, Can I publish it? [17:49:36] nuria_: milimetric ^ [17:49:50] (I got it from hadoop) [17:50:49] 10Analytics, 10Analytics-Cluster, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install druid100[456].eqiad.wmnet - https://phabricator.wikimedia.org/T171626#3498017 (10RobH) [17:51:38] 10Analytics, 10Analytics-Cluster, 10Operations: rack/setup/install druid100[456].eqiad.wmnet - https://phabricator.wikimedia.org/T171626#3471295 (10RobH) a:05RobH>03Ottomata These 3 systems are ready to be placed into service, and are calling into puppet with role spare at present. This task can be reso... [18:34:41] we added a new language wiki today [18:34:42] https://gerrit.wikimedia.org/r/#/c/369926/ :) [18:36:03] thanks Reedy we see the warnings, one of us will make the necessary changes soon [18:37:25] Amir1: sounds ok to me from a privacy standpoint, will pm with one possible other thought [19:08:42] (03PS1) 10Milimetric: Fix failure in failed_jobs handling [analytics/refinery] - 10https://gerrit.wikimedia.org/r/369988 (https://phabricator.wikimedia.org/T172426) [19:31:19] (03CR) 10Ottomata: [V: 032 C: 032] Fix failure in failed_jobs handling [analytics/refinery] - 10https://gerrit.wikimedia.org/r/369988 (https://phabricator.wikimedia.org/T172426) (owner: 10Milimetric) [19:40:43] 10Analytics, 10ChangeProp, 10EventBus, 10Epic, and 2 others: [EPIC] Develop a JobQueue backend based on EventBus - https://phabricator.wikimedia.org/T157088#3498644 (10Pchelolo) [19:40:45] 10Analytics, 10ChangeProp, 10EventBus, 10MW-1.30-release-notes (WMF-deploy-2017-07-25_(1.30.0-wmf.11)), and 2 others: Create JobQueue implementation that posts to EventBus - https://phabricator.wikimedia.org/T163379#3498642 (10Pchelolo) 05Open>03Resolved The `JobQueueEventBus` was merged, deployed and... [19:45:47] 10Analytics, 10Contributors-Analysis, 10DBA, 10Chinese-Sites, 10Patch-For-Review: Data Lake edit data missing for many wikis - https://phabricator.wikimedia.org/T165233#3498662 (10Milimetric) Hm, @Marostegui I need some help. I ran the sqoop job to import from all the wikis except the ones you mentioned... [19:48:40] 10Analytics-Kanban, 10Patch-For-Review: Monthly Mediawiki Sqoop job failed - https://phabricator.wikimedia.org/T172426#3498668 (10Milimetric) The jobs that didn't run were identified in T165233#3498662 and the error seems to have to do with database access, so for now I'm moving this to done and deploying the... [20:21:54] 10Analytics, 10Contributors-Analysis, 10DBA, 10Chinese-Sites, 10Patch-For-Review: Data Lake edit data missing for many wikis - https://phabricator.wikimedia.org/T165233#3498832 (10Marostegui) @Milimetric can you try jawiki_p again for instance? [20:27:50] (03PS1) 10Milimetric: Change stat1002 to stat1005 [analytics/refinery/scap] - 10https://gerrit.wikimedia.org/r/370008 [20:28:12] (03CR) 10Milimetric: [V: 032 C: 032] Change stat1002 to stat1005 [analytics/refinery/scap] - 10https://gerrit.wikimedia.org/r/370008 (owner: 10Milimetric) [20:46:38] Amir1: filing a ticket describing data will be best [20:46:55] Amir1: how will be it be shared and file format [20:52:41] (03CR) 10Nuria: [V: 032 C: 032] Add hiwikiversity [analytics/refinery] - 10https://gerrit.wikimedia.org/r/369926 (https://phabricator.wikimedia.org/T168765) (owner: 10Reedy) [20:54:48] ottomata: yt [20:54:57] ottomata: ? [20:55:44] ottomata: i think i have a major hole in how to restart an oozie job that i am not modifying locally, i was to start it so it uses cluster's latest [20:56:18] ottomata: specifying oozie-Url just gives me another error [20:56:22] https://www.irccloud.com/pastebin/oE5wTxJJ/ [20:56:33] ottomata: so paths must need to be specified differently? [21:01:05] -confgi should be a local path [21:01:07] not an hdfs path [21:01:22] so you want the path to your coordinator.properties locally [21:12:42] 10Analytics, 10Analytics-EventLogging, 10Performance-Team: Make webperf eventlogging consumers use eventlogging on Kafka - https://phabricator.wikimedia.org/T110903#3498974 (10Ottomata) Woo wee lots of questions, ok! > Perhaps it would make sense to first update statsv.py to use the eventlogging library on... [21:20:41] 10Analytics, 10Analytics-EventLogging, 10Performance-Team: Make webperf eventlogging consumers use eventlogging on Kafka - https://phabricator.wikimedia.org/T110903#3498979 (10Krinkle) >>! In T110903#3498974, @Ottomata wrote: > Woo wee lots of questions, ok! > >> Perhaps it would make sense to first update... [22:12:26] had a new eventlogging schema go out today. I can see a few trickling in by grepping eventlogging-client-side topic, but not seeing anything end up in the eventlogging_HumanSearchRelevance topic. I'm not able to figure out if there are error messages anywhere about this though, not finding anything in logstash and the schema doesn't show up in grafana dashboards [22:20:17] 10Analytics, 10Analytics-EventLogging, 10Performance-Team: Make webperf eventlogging consumers use eventlogging on Kafka - https://phabricator.wikimedia.org/T110903#3499122 (10Krinkle) Looks like it's not accepting the Kafka read method. At first I forgot to set PYTHONPATH (so it was using the old global ins... [22:39:55] ahha, found it in eventlogging_EventError. [23:10:51] ebernhardson: On a related note, the schema name wasn't in the Grafana dashboard drop down menu because that menu was populated using a Graphite query that predates Kafka. The graph used Kafka to get the numbers etc. but the 'schema' template value was still based only on schemas thet existed in Graphite pre-Kafka. I've fixed that as well. [23:11:02] So the schema name now shows up, albeit still with only 0s. [23:11:24] at https://grafana.wikimedia.org/dashboard/db/eventlogging-schema [23:13:00] it seems logstash is also a bug, there is an input to send the EventError topic into logstash, but it's not showing up. I'll have to look into that [23:30:23] 10Analytics-Kanban: Run mediawiki edit reconstruction with new set of wikis - https://phabricator.wikimedia.org/T172463#3499358 (10Nuria) [23:31:57] 10Analytics-Kanban: Run mediawiki edit reconstruction with new set of wikis - https://phabricator.wikimedia.org/T172463#3499384 (10Nuria) See ticket where we ping DBAS about access issue: https://phabricator.wikimedia.org/T165233 [23:32:10] 10Analytics-Kanban: Run mediawiki edit reconstruction 2017-07 snapshot with new set of wikis - https://phabricator.wikimedia.org/T172463#3499402 (10Nuria)