[06:14:15] 10Analytics, 10Analytics-EventLogging, 10DBA: db1047 has been restarted - needs another restart - https://phabricator.wikimedia.org/T166452#3297456 (10Marostegui) I have restarted it and able to set the replication filter on the s1 channel: ``` Replicate_Wild_Ignore_Table: enwiki.__wmf_checksums ``` I am go... [09:45:08] !log Restarted wikidata-articleplaceholder_metrics-wf-2017-5-27 [09:45:09] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:14:36] 06Analytics-Kanban, 10DBA, 06Operations, 10ops-eqiad: Degraded RAID on db1046 - https://phabricator.wikimedia.org/T166422#3298035 (10Volans) [11:16:15] 06Analytics-Kanban, 10DBA, 06Operations, 10ops-eqiad: Degraded RAID on db1046 - https://phabricator.wikimedia.org/T166422#3298040 (10Marostegui) @elukey @Ottomata maybe this can be done along with: T166141 [11:17:45] 06Analytics-Kanban, 10DBA, 06Operations, 10ops-eqiad: Degraded RAID on db1046 - https://phabricator.wikimedia.org/T166422#3298063 (10elukey) +1 [11:36:44] * elukey lunch! [11:55:29] (03PS2) 10Mforns: Add script to purge old mediawiki data snapshots [analytics/refinery] - 10https://gerrit.wikimedia.org/r/355601 (https://phabricator.wikimedia.org/T162034) [12:00:47] 06Analytics-Kanban, 13Patch-For-Review: Create purging script for mediawiki-history data - https://phabricator.wikimedia.org/T162034#3298184 (10mforns) [12:38:05] 10Analytics, 10Analytics-Wikistats, 10Wikimedia-Site-requests: Add li: Wikibooks to Wikistats - https://phabricator.wikimedia.org/T165634#3298244 (10Ooswesthoesbes) Is it safe to assume that it will be included in Wikistats 2.0 or is it something we should keep an eye out for? [12:46:12] 06Analytics-Kanban, 07Easy, 13Patch-For-Review: Don't accept data from automated bots in Event Logging - https://phabricator.wikimedia.org/T67508#711398 (10Tgr) How will this affect EventLogging calls made from PHP (which might need to be recorded whether the user used some bot framework or not)? [13:00:51] Hey joal, I've been working on a spark job for the translation recommendations, but have gotten to a point where I run out of memory on my local machine when running against the entire dataset. Do you have time to review and/or advise on the next steps I should take? [13:01:11] code for reference: https://github.com/schana/recommendation-translation/blob/master/src/main/scala/TranslationRecommendations.scala [13:05:04] Hi schana, will try to find some time later today to look at the code [13:05:14] thank you [13:06:03] 10Analytics: Pivot - Article Page Views - https://phabricator.wikimedia.org/T166331#3298260 (10JAllemandou) Given that the number of distinct pages would be small, could be a use-case for streaming computation ... To be discussed with Analytics team. [13:40:17] joal: https://gerrit.wikimedia.org/r/#/c/356040/ [13:41:51] elukey: http://tinyurl.com/y7pgd2kc [13:42:55] awesome! :) [13:43:08] :) [13:43:45] cool :) [14:05:00] 10Analytics, 10EventBus, 06Operations, 10hardware-requests, and 2 others: New SCB nodes - https://phabricator.wikimedia.org/T166342#3298338 (10Ottomata) Robh, let's aim for +3 scb nodes in each DC. So +6 nodes total. [14:24:21] 10Analytics, 10Analytics-Cluster: eqiad: hadoop expansion part deux - https://phabricator.wikimedia.org/T166509#3298370 (10Ottomata) [14:29:48] 10Analytics, 10EventBus, 06Operations, 10hardware-requests, and 2 others: New SCB nodes - https://phabricator.wikimedia.org/T166342#3298386 (10mobrovac) FYI, this expansion will also come in handy for the new service being developed by the Research team - the #recommendation-api service. [14:29:56] 10Analytics, 10Analytics-Cluster, 06Operations, 10hardware-requests: eqiad: (3)+ nodes for Druid / analytics - https://phabricator.wikimedia.org/T166510#3298388 (10Ottomata) [14:30:11] 10Analytics, 10Analytics-Cluster, 06Operations, 10hardware-requests: eqiad: hadoop expansion part deux - https://phabricator.wikimedia.org/T166509#3298403 (10Ottomata) [14:33:08] 10Analytics, 10EventBus, 06Operations, 10hardware-requests, and 2 others: New SCB nodes - https://phabricator.wikimedia.org/T166342#3298405 (10Ottomata) p:05Triage>03High [14:33:14] 10Analytics, 10Analytics-Cluster, 06Operations, 10hardware-requests: eqiad: hadoop expansion part deux - https://phabricator.wikimedia.org/T166509#3298406 (10Ottomata) p:05Triage>03High [14:33:26] 10Analytics, 10EventBus, 06Operations, 10hardware-requests, and 2 others: New SCB nodes - https://phabricator.wikimedia.org/T166342#3293197 (10Ottomata) p:05High>03Normal [14:33:36] 10Analytics, 10Analytics-Cluster, 06Operations, 10hardware-requests: eqiad: (3)+ nodes for Druid / analytics - https://phabricator.wikimedia.org/T166510#3298408 (10Ottomata) p:05Triage>03High [15:00:29] ping fdans elukey [15:00:33] standdup [15:15:34] 06Analytics-Kanban, 07Easy, 13Patch-For-Review: Don't accept data from automated bots in Event Logging - https://phabricator.wikimedia.org/T67508#3298492 (10Nuria) @tgr: All calls go through varnish, there are no direct posts from php anymore (it is been a while), thus they are all process equally. [15:18:53] 10Analytics, 10Analytics-EventLogging, 10DBA: db1047 has been restarted - needs another restart - https://phabricator.wikimedia.org/T166452#3296553 (10Nuria) This is a slave machine that is not used. [15:20:38] 10Analytics, 10Analytics-EventLogging, 10DBA: db1047 has been restarted - needs another restart - https://phabricator.wikimedia.org/T166452#3298512 (10Marostegui) Hey Nuria! But we still need to maintain it, right? As in, it is not going to be decommissioned soon but it is a backup server just in case? Tha... [15:22:20] 10Analytics, 10Analytics-EventLogging, 10DBA: db1047 has been restarted - needs another restart - https://phabricator.wikimedia.org/T166452#3296553 (10Ottomata) @marostegui, correct. [15:22:48] 10Analytics, 06Performance-Team: Explore NavigationTiming by faceted properties - https://phabricator.wikimedia.org/T166414#3298515 (10Nuria) Pivot will work dimension-wise. The catch is that you need this data to be real-time ish correct? Let's talk a bit more about it cause we can do that too but we need t... [15:23:51] 10Analytics, 06Performance-Team: Explore NavigationTiming by faceted properties - https://phabricator.wikimedia.org/T166414#3295705 (10Nuria) Note: this using eventlogging refine could be loaded into druid easily. [15:25:49] 10Analytics: Monitor if/when mediawiki history reconstruction partitions and imports fall out of sync - https://phabricator.wikimedia.org/T166405#3298524 (10Nuria) 05Open>03declined Turns out that ipblock was not included on the repair job for partitions [15:28:52] 10Analytics: Are watchlists dead? - https://phabricator.wikimedia.org/T166339#3293102 (10Nuria) If this is arequest for data (not sure) it probably needs to reach the analysts. I have added @kaldari just in case this is in reference to an existing project. [15:30:36] 10Analytics, 10Analytics-Dashiki, 07Wikimedia-log-errors: Warning: JsonConfig: Invalid $wgJsonConfigModels['JsonConfig.Dashiki'] array value, 'class' not found - https://phabricator.wikimedia.org/T166335#3298547 (10Nuria) p:05Triage>03Low [15:31:26] 10Analytics: Pivot - Article Page Views - https://phabricator.wikimedia.org/T166331#3298549 (10Nuria) p:05Triage>03Low [15:37:43] 10Analytics, 10Analytics-EventLogging: Find an alternative query interface for eventlogging on analytics cluster that can replace MariaDB - https://phabricator.wikimedia.org/T159170#3058941 (10Nuria) p:05High>03Normal [15:40:24] 10Analytics, 10Analytics-Cluster: kafka alarms audit - https://phabricator.wikimedia.org/T151211#3298565 (10Nuria) Part of the ops handover for tier-1 kafka [15:43:06] 10Analytics: Provide cumulative edit count in Data Lake edit data - https://phabricator.wikimedia.org/T161147#3298567 (10Nuria) p:05Normal>03High [15:46:40] 10Analytics, 06Discovery, 10EventBus, 10Wikidata, and 3 others: Create reliable change stream for specific wiki - https://phabricator.wikimedia.org/T161731#3141307 (10Nuria) ping @Smalyshev is this still a need? Maybe we should set up a short 30 minute sync up [15:49:36] 10Analytics, 10EventBus, 10Wikimedia-Stream, 06Services (designing), 15User-mobrovac: Puppetize event schema topic configuration - https://phabricator.wikimedia.org/T161027#3298589 (10Nuria) p:05Normal>03Low [15:51:05] 10Analytics: Improve Oozie error emails for testing - https://phabricator.wikimedia.org/T161619#3137344 (10JAllemandou) The way I currently manage to have that working for me is by: # Having updated the `oozie/util/send_error_email/workflow.xml` updated to have only my address as default, and copied that def... [15:51:13] 10Analytics: Use native timestamp types in Data Lake edit data - https://phabricator.wikimedia.org/T161150#3298593 (10Nuria) p:05Normal>03High [15:52:25] 10Analytics, 06Operations, 15User-Elukey: Investigate recent Kafka Burrow alarms for EventLogging - https://phabricator.wikimedia.org/T160886#3113658 (10Nuria) ping @elukey please review & close if pertains [15:54:07] 10Analytics, 10Datasets-General-or-Unknown, 07Easy, 05Security: Pageview dumps incorrectly formatted, looks like a result of possibly malicious activity - https://phabricator.wikimedia.org/T144100#3298601 (10Nuria) [15:56:28] 10Analytics, 07Easy: Investigate requests flagged as pageview in analytics header coming from bots - https://phabricator.wikimedia.org/T135251#3298605 (10Nuria) [16:00:58] 10Analytics: User limits for stat machines. Limit space on /home dir and possibly /tmp - https://phabricator.wikimedia.org/T151904#3298615 (10Nuria) [16:01:02] 10Analytics, 10Analytics-Cluster, 06Operations, 10ops-eqiad, 13Patch-For-Review: rack/setup/install replacement to stat1005 (stat1002 replacement) - https://phabricator.wikimedia.org/T165368#3298614 (10Nuria) [16:01:25] 10Analytics: Old deleted pages have empty fields in Analytics Cluster edit data - https://phabricator.wikimedia.org/T165201#3298619 (10Nuria) Ping @mforns for update after brief reserach [16:03:49] mforns: a-batcave2 [16:03:52] hahhaah [16:03:59] check out this diagram [16:03:59] elukey, I'm in it... [16:04:00] https://en.wikipedia.org/wiki/Public_key_infrastructure#/media/File:Public-Key-Infrastructure.svg [16:04:07] from the wp page for public key infrastructure [16:04:12] hahahah [16:04:13] elukey, oh, wait [16:04:50] hahaha [16:05:48] "apprehensive business dude with huge keys navigates bureaucracy just so hey can buy some hotpants" [16:11:16] elukey: Heya, question for you [16:11:37] elukey: Do you know if the 'content-type' we receive from varnish is the response or request one? [16:35:37] joal: here I am, reading [16:36:03] joal: almost sure it is the response one [16:39:30] checking [16:41:10] so on a random cache::text host, varnishkafka webrequest has this conf tag: %{Content-Type@content_type}o [16:41:49] %o corresponds to https://github.com/wikimedia/varnishkafka/blob/master/varnishkafka.c#L863-L869 [16:42:05] that grabs SLT_RespHeader [16:42:18] joal: ---^ 100% sure it is response :) [17:08:25] * elukey afk! [17:10:51] Thanks a lot elukey ! [17:17:45] (03PS3) 10Joal: Provide RedirectToPageview function and UDF [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/353310 (https://phabricator.wikimedia.org/T143928) [17:33:50] 10Analytics, 06Performance-Team: Explore NavigationTiming by faceted properties - https://phabricator.wikimedia.org/T166414#3298794 (10Gilles) We don't need the data to be updated in real time, this would be used to investigate performance changes after the fact. Having it updated once a day would be acceptabl... [17:41:35] (03PS4) 10Joal: Add last access uniques global oozie jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/352181 (https://phabricator.wikimedia.org/T143928) [17:49:45] 10Analytics, 06Research-and-Data: Improve bot identification at scale - https://phabricator.wikimedia.org/T138207#3298831 (10leila) @Nuria and team: I see that there is a tag for this task to be picked up in July-September 2017. If that is the case, please let me know and I will set aside time for it. [18:08:32] 10Analytics, 06Research-and-Data: Improve bot identification at scale - https://phabricator.wikimedia.org/T138207#3298858 (10Nuria) @leila: it will probably get bumped up to after september 2017 [18:08:46] (03PS5) 10Joal: Add last access uniques global oozie jobs [analytics/refinery] - 10https://gerrit.wikimedia.org/r/352181 (https://phabricator.wikimedia.org/T143928) [18:12:36] 10Analytics, 06Performance-Team: Explore NavigationTiming by faceted properties - https://phabricator.wikimedia.org/T166414#3298860 (10Nuria) Then (cc @ootomata and @joseph for confirmation) we can get it done now in the same fashion that we load pageviews, there is an issue with "merging" data from schemas s... [18:17:32] 10Analytics, 10Analytics-Wikistats, 10Wikimedia-Site-requests: Add li: Wikibooks to Wikistats - https://phabricator.wikimedia.org/T165634#3298862 (10Nuria) >Is it safe to assume that it will be included in Wikistats 2.0 or is it something we should keep an eye out for? Yes, if there is data for it, it will b... [18:22:17] 10Analytics: Serbian Wikipedia edits spike 2016 - https://phabricator.wikimedia.org/T158310#3033071 (10JAllemandou) 2 minutes looking into the dataset: Seems that all those edits are made by user https://sh.wikipedia.org/wiki/Korisnik:Kolega2357 https://pivot.wikimedia.org/#mediawiki-history-beta/line-chart/2/E... [18:23:11] 10Analytics: Are watchlists dead? - https://phabricator.wikimedia.org/T166339#3298865 (10Whatamidoing-WMF) Yes, this is a request for data. Any information that might suggest an answer this question would be helpful; I don't think that we need an absolutely definitive or ideal set of data. [18:24:57] 10Analytics, 06Performance-Team: Explore NavigationTiming by faceted properties - https://phabricator.wikimedia.org/T166414#3298867 (10Gilles) OK, we have a plan to fix some issues with NavigationTiming and its schema: {T104902}. We have that work scheduled for next quarter. I think it'll be better if the data... [18:25:13] 10Analytics, 06Performance-Team: Explore NavigationTiming by faceted properties - https://phabricator.wikimedia.org/T166414#3298869 (10Gilles) p:05Triage>03Normal [18:28:21] 10Analytics, 06Research-and-Data: Improve bot identification at scale - https://phabricator.wikimedia.org/T138207#3298870 (10leila) @Nuria I got you. Then Q2 it is, it seems. :) [18:28:46] 10Analytics, 06Research-and-Data: [Open question] Improve bot identification at scale - https://phabricator.wikimedia.org/T138207#3298873 (10leila) [18:33:28] hey joal. I'm looking at https://github.com/ewulczyn/wiki-readers/blob/master/src/data_generation/create_hive_traces.py and it's calling a module called db_utils. do you know where I can find it? [18:37:25] joal, by any chance you have the link of the etherpad where we studied/vetted the contents of the mediawiki history data? or you know a way to search for it? [18:38:36] 10Analytics: Are watchlists dead? - https://phabricator.wikimedia.org/T166339#3298879 (10Nuria) @Whatamidoing-WMF : please take a look at link provided, analytics does not grant data requests (unless they are of legal nature) it is up to the requester to reach the data analysts [18:40:17] 10Analytics, 06Performance-Team: Explore NavigationTiming by faceted properties - https://phabricator.wikimedia.org/T166414#3298880 (10Nuria) @Gilles :please but our work can start earlier, we will just scrape data once you call it good. [18:45:31] lzia: that looks like custom code from ellery's (or someone) in stat 1002 [18:45:51] I see, nuria_. do you expect it to be in Ellery's home? [18:46:01] lzia: in 1002, let me see [18:47:10] Hey mforns, this is what I have: https://etherpad.wikimedia.org/p/edit_history_vetting [18:47:37] joal, you are da man [18:47:45] thanks :] [18:48:59] leila: i think that package is in 1002 but it is just custom code for analytics store [18:49:09] leila: ./wmf/util/db_utils.py [18:49:19] * leila checks [18:49:31] on ellery's directory [18:49:40] leila: it is just a bunch of utility functions [18:49:52] leila: /home/ellery$ more ./wmf/util/db_utils.py [18:49:52] right, so nuria_: if I run a code on analytics-store, I should be able to use that, right? [18:50:10] leila: no, it is a bunch of custom code [18:50:52] got you, nuria_. no problem. I think this is already helpful cuz https://github.com/ewulczyn/wiki-readers/blob/master/src/data_generation/create_hive_traces.py calls it and I didn't know where to look for it. [18:50:54] thanks, nuria_ [18:51:18] leila: just copy that folder such it is available to the script [18:52:37] yup. thanks nuria_. :) [18:54:36] 10Analytics, 10EventBus, 06Operations, 10hardware-requests, and 2 others: SSDs for main Kafka clusters - https://phabricator.wikimedia.org/T166341#3298909 (10mobrovac) >>! In T166341#3293262, @RobH wrote: > Is this something that you want done in next years budget, or is it now invalid? Please advise. He... [18:56:15] Hey leila, sorry missed your earlier ping - The repo for that code is actually: https://github.com/ewulczyn/wmf/tree/master/util [18:56:36] leila: it also contains some more utilities [19:22:45] ah! thanks, joal. took a note. [19:29:55] 10Analytics, 10EventBus, 06Operations, 10hardware-requests, and 2 others: SSDs for main Kafka clusters - https://phabricator.wikimedia.org/T166341#3298990 (10Ottomata) Apparently the stuff has to be actually received at the datacenter for it to count towards this year's budget. [19:47:50] 10Analytics, 10EventBus, 06Operations, 10hardware-requests, and 2 others: SSDs for main Kafka clusters - https://phabricator.wikimedia.org/T166341#3298996 (10mobrovac) Well, that's unfortunate. We definitely want them under warranty as it's an important production system. We still want them, so I guess we'... [19:58:12] 10Analytics, 06Discovery, 10EventBus, 10Wikidata, and 3 others: Create reliable change stream for specific wiki - https://phabricator.wikimedia.org/T161731#3299006 (10Smalyshev) @Nuria yes, still very much needed and unsolved. Please feel welcome to set up a meet. [20:52:14] (03CR) 10Nuria: "I think we should document here: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Unique_Devices/Last_access_solution#Uniqu" (033 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/352181 (https://phabricator.wikimedia.org/T143928) (owner: 10Joal) [21:56:31] 10Analytics: Old deleted pages have empty fields in Analytics Cluster edit data - https://phabricator.wikimedia.org/T165201#3299109 (10mforns) Hello @Neil_P._Quinn_WMF ! Yes, you're right, data is missing in some places. We couldn't reconstruct it from the source MediaWiki databases, especially older data (2007... [23:05:20] (03CR) 10Nuria: "I have a bunch of suggestions here but the bottom line is that I do not think we should mix the pageview definition with the redirect code" (036 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/353310 (https://phabricator.wikimedia.org/T143928) (owner: 10Joal)