[07:19:49] Analytics, ContentTranslation-Analytics, MediaWiki-extensions-ContentTranslation, Operations, and 4 others: schedule a daily run of ContentTranslation analytics scripts - https://phabricator.wikimedia.org/T122479#2274511 (MoritzMuehlenhoff) [07:19:53] Analytics, ContentTranslation-Analytics, MediaWiki-extensions-ContentTranslation, Operations, and 2 others: Add amire80 to analytics-privatedata-users group - https://phabricator.wikimedia.org/T122524#2274508 (MoritzMuehlenhoff) Open>Resolved @Amire80 : I've merged the patch, let me know... [10:22:08] !log restarted eventlogging on eventlogging1001 for security upgrades [10:49:41] (CR) DCausse: Fixes Prefix API request detection (1 comment) [analytics/refinery/source] - https://gerrit.wikimedia.org/r/287264 (owner: Bearloga) [10:51:16] joal: o/ [10:51:33] this afternoon I need to restart the whole hadoop cluster for java upgrades :P [10:52:08] Hi elukey [10:52:33] I'll be AFK this afternoon, so you'll have to deal with it while I'm away ;) [10:52:41] I'll be back hopefully for standup [10:53:00] joal: ahhhh okok! :) [10:53:06] all good during the weekned? [10:53:09] *weekend? [10:53:27] yup, no issue (as I've seen) [10:53:36] nono I mean you :) [10:53:59] Yesir, //me is good :) [10:54:04] Thanks :) [10:54:08] What about you? [10:57:13] elukey: --^ ? [10:57:51] joal: all good! [10:58:01] Great :) [10:58:29] elukey: I'll probably miss EventBus sync, would you mind confirm to the team that I'm activelly working on schemas? [10:59:00] elukey: To be precise, I'll have a meeting with Dan on that matter tomorrow, and hopefully will submit proposal before end-of-week [10:59:12] elukey: If you prefer I can send an email :) [10:59:19] sure! I can do it [11:01:32] elukey: sent an email anyhow ;) Just in case :) [11:01:58] :) [11:11:06] Analytics-Kanban: Compile a request data set for caching research and tuning - https://phabricator.wikimedia.org/T128132#2275503 (JAllemandou) a:JAllemandou [11:16:40] (CR) Aklapper: "@Ottomata: This patch has been sitting here for more than a year without any -2/-1/+1/+2 review." [analytics/refinery] - https://gerrit.wikimedia.org/r/201009 (https://phabricator.wikimedia.org/T94596) (owner: Ottomata) [11:17:15] * elukey lunch! [11:27:02] * joal is AFK for the afternoon [12:54:46] Analytics-Kanban, Operations, ops-eqiad, Patch-For-Review: rack/setup/deploy aqs100[456] - https://phabricator.wikimedia.org/T133785#2275859 (elukey) @Cmjohnson: tried to file a code review for the DHCP config, not sure if correct though! [13:06:05] hey A team :D [13:06:10] o/ [13:06:13] :D [13:06:51] So I have someone asking for some numbers and I was just going to paste what they were looking or in here incase anyone had any bright ideas :D [13:07:36] 1) Data on how many revisions back in time a Diff is when a Diff is viewed in Mediawiki [13:07:58] 2) Data on how many revisions are between the two compared revisions in a Diff in Mediawiki [13:08:53] 3) Data on if it is common to jump between different versions of a page / different Diffs in a short time. [13:10:48] addshore: I have no idea but I'll make sure that somebody will read it :) [13:10:49] Right now I'm thinking 1 & 2 could be done by slightly abusing a hook in mediawiki and taking sampled data out, as for 3, my only idea of anything like this is tracking the times the next and previous revision links are pressed, I guess something could be don through eventlogging, but that might not be worth the time. [13:12:26] !log restarting hadoop java daemons for Java upgrades on analytics102X and analytics 103X hosts [13:26:10] ottomata1: regarding that puppet patch I am slowly working on, how do I get secret information in there? ie. that cant go in puppet [13:26:35] aye, it goes into a private repo which is not in gerrit [13:26:37] in prod [13:26:41] i can put it there [13:27:04] okay [13:28:37] (Abandoned) Ottomata: Add oozie util workflow to launch spark jobs [analytics/refinery] - https://gerrit.wikimedia.org/r/201009 (https://phabricator.wikimedia.org/T94596) (owner: Ottomata) [13:31:44] also ottomata currently all of the crons just run scripts like this https://github.com/wikimedia/analytics-wmde-scripts/blob/master/daily_misc.sh [13:31:49] which in turn run a bunch of stuff. [13:32:09] Do you think it would be better to define every individual cron in puppet? or keep them batched like this? [13:32:25] naw i think keeping them in a separate repo is better [13:32:33] if there was just one or mayybe two, puppet is ok [13:32:33] okay! [13:32:45] I may try and reduce the number there are still [13:33:04] basically the only reason they are like that is to reduce / spread out their processing :) [13:33:27] aye [13:34:57] ottomata: o/ [13:35:09] I am restarting all the hadoop nodes for java upgrades [13:37:33] elukey: oOoook thank you! [13:37:41] hm, btw, elukey did we ever get an48 back? [13:38:46] ottomata: nope, still waiting for a new disk.. [13:39:17] aye [13:55:15] Analytics-Wikistats, Internet-Archive: Total page view numbers on Wikistats do not match new page view definition - https://phabricator.wikimedia.org/T126579#2276002 (ezachte) The discrepancy between [1] and [2] is as follows: rightmost column in [2] is calculated from other columns, numbers which were a... [14:01:02] Analytics-Wikistats, Internet-Archive: Total page view numbers on Wikistats do not match new page view definition - https://phabricator.wikimedia.org/T126579#2276016 (ezachte) I should have said *part of the discrepancy* is as follows [..]. There seems to be more than rounding error. So it looks like s... [14:04:53] (PS1) Addshore: DNM prepare for puppet stuff [analytics/wmde/scripts] - https://gerrit.wikimedia.org/r/287622 [14:06:31] dam, still have to do groups / permissions [14:06:43] *reads bcak through the comments* [14:10:10] hmm ottomata after grepping through puppet for analytics-search I cant really see how that is given the access to hive it needs.? Or is all the access it needs to be on stat1002? [14:11:18] addshore: it needs to be on analytics1001, is all really [14:11:25] how about the sql login details? [14:11:26] hadoop permissions are calcuated on the namenode [14:11:47] for hive? there aren't any, it just uses posix user accounts on the namenode, just like hadoop [14:12:03] noo, for the mysql replicas :) [14:12:07] oh, uhhh [14:12:20] that is given to folks via .conf files in /etc/mysql/conf.d [14:12:25] that contain user and password in mysql conf format [14:12:29] so, if you have access to those files [14:12:30] you can do soemething like [14:12:39] mysql --extra-conf-file (or somethign) /etc/mysql/conf/pw.conf [14:12:50] (whatever that file might be...) [14:13:13] but how do I give the user I am making in puppet access to /etc/mysql/conf.d/analytics-research-client.cnf ? [14:14:08] Analytics-Cluster, Analytics-Kanban, Operations, Traffic, Patch-For-Review: Upgrade analytics-eqiad Kafka cluster to Kafka 0.9 (or 0.10?) - https://phabricator.wikimedia.org/T121562#2276061 (Ottomata) [14:14:37] ottomata: https://gerrit.wikimedia.org/r/#/c/287605/1 - FYI [14:14:51] let me know if you have anything against the names etc. [14:15:10] addshore: the user would have to be in the analytics-privatedata-users grou [14:15:12] which ummmm [14:15:14] im' not sure it can be [14:15:19] since it is a system user [14:15:31] thus far those conf files have just been used for real people [14:15:38] ahhh [14:17:02] a system user cant be part of multiple groups? [14:17:28] a system user can't be part of a puppet manged user group [14:17:35] because of the way it is set up [14:17:54] puppet makes it so that the exact group membership specificed in admin data.yaml exists [14:18:03] system users are not managed in admin.yalm [14:18:04] ahhhh [14:18:07] so you can't mix and match [14:18:16] ja its pretty annoying :( [14:18:20] ewww, okay [14:18:34] well, I guess the details are in the puppet private repo thing right? [14:18:51] so this puppet thing could put them somewhere else? [14:19:00] yea that coudl work, actually i think we've done that before [14:19:10] any idea what to grep for? :D [14:19:50] addshore: i forget, do your scripts access hadoop? or not? [14:19:59] some of them yes :/ [14:21:05] yeah, i think i'd just do the same as is done for analytics-research-client.cnf [14:21:14] mysql::config::client { 'analytics-research': [14:22:15] (CR) Nikerabbit: [C: -1] Add sorted errors (1 comment) [analytics/limn-language-data] - https://gerrit.wikimedia.org/r/282228 (owner: Amire80) [14:22:26] mysql::config::client { 'stats-research': ? [14:22:38] make a new one of those for your user [14:22:45] but using the same pw [14:22:49] :/ [14:23:12] mysql::config::client { 'whatever-user-something': [14:23:13] group => 'whatever-your-user-group' [14:24:18] !log restarting hadoop java daemons for Java upgrades on analytics104X and analytics105X hosts [14:38:52] ottomata: I think https://gerrit.wikimedia.org/r/#/c/269467/5 might actually have everything covered now! [14:39:26] bah *pushes the last version* [14:58:33] joal, mind if I copy-paste your recent email into phab so that I can track the work better? [14:58:39] (re. event structures) [14:58:44] halfak: please do ! [14:59:25] halfak: I'm currently preparing a PR and plan to add your comments to that [14:59:33] great :) [14:59:37] halfak: Will send you a link to previous version [15:00:09] halfak: after tomorrow's meeting with Dan :) [15:01:26] That'll help. Right now, I'm thinking that I want to build the field set from scratch for the items you asked about. [15:02:07] halfak: oh no, ok, I'll send you some code soon :) [15:02:45] :) [15:04:43] * halfak finally makes task [15:04:43] https://phabricator.wikimedia.org/T134767 [15:16:00] milimetric, hi! yt? [15:32:14] ottomata, joal: https://gerrit.wikimedia.org/r/#/c/287605 - anything against the naming scheme? [15:32:25] it is the one used for restbase multi-instance afaik [15:32:35] restbase/cassandra multi-instance [15:34:13] joal: if you have time (even tomorrow) I'd need to figure out how to restart oozie workflows :) [15:34:21] elukey: batcave! [15:37:04] elukey: on naming, if it follows an existing convention, good for me ! [15:41:05] elukey: aqs100[456]? ja that's correct i think [15:44:30] hey mforns, what's up [15:47:59] ottomata, joal: last one sorry - I need to restart analytics1001/2 for the java upgrades, want me to take extra precautions? Like disabling camus, oozie bundles, etc..? [15:48:15] hm, elukey maybe just camus [15:48:16] elukey: yessir ! Good call [15:48:32] i don't trust it to not corrupt its offsets [15:48:39] elukey: camus for sure, wait for camus run stop, the rest should be ok (hopefully) [15:48:41] but the others should be fine, and the worst case is we'd have to rerun something [15:48:43] ja [15:49:16] most likely camus *should* be fine too, but the worst case if camus corrupts offsets is wayyy more annoying than if an oozie job fails [15:49:38] ottomata: correct ! [15:49:43] all right, disabling camus :) [15:49:50] thx elukey :) [15:51:21] !log disabled camus+puppet on analytics1027 as prep step for maintenance on the cluster. [15:53:52] hi milimetric! back [15:54:50] milimetric, I was wondering if we do a granularity selector in the metrics-per-project layout? or we specify the granularity in the metric config [15:55:41] I'd say we specify it in the metric config [15:56:35] to keep the UI simpler, and we can mention non-daily granularities in the metric name like "monthly pageviews" [15:56:36] milimetric, but, we specify the options in the metric config? like: granularities: ['daily', 'monthly'] and then the user can choose among them in the ui? [15:56:54] milimetric, or we specify a fixed one in the config? and the ui has no options? [15:56:59] i was thinking fixed [15:57:02] ok [15:57:33] Analytics, Datasets-General-or-Unknown, WMDE-Analytics-Engineering: Fix permissions on dumps.wm.o access logs synced to stats1002 - https://phabricator.wikimedia.org/T134776#2276672 (Addshore) [16:01:02] madhuvishy: Heya :) Standup ? [16:06:29] Analytics, Hovercards, Reading-Web-Backlog, Reading-Web-Sprint-72-N: Verify X-Analytics: preview=1 in stable - https://phabricator.wikimedia.org/T133067#2276729 (dr0ptp4kt) [16:28:37] (PS3) Mforns: Add support for unique devices in pageview api [analytics/dashiki] - https://gerrit.wikimedia.org/r/287192 (https://phabricator.wikimedia.org/T122533) [16:29:52] milimetric, I pushed the comments ^. I think there is still something to solve: when the unique devices number is lower than a threshold it is cut off, and the project has no line in the chart... [16:33:32] Analytics, Datasets-General-or-Unknown, WMDE-Analytics-Engineering: Fix permissions on dumps.wm.o access logs synced to stats1002 - https://phabricator.wikimedia.org/T134776#2276863 (mforns) p:Triage>Unbreak! [16:34:09] Analytics-Kanban, Datasets-General-or-Unknown, WMDE-Analytics-Engineering: Fix permissions on dumps.wm.o access logs synced to stats1002 - https://phabricator.wikimedia.org/T134776#2276870 (mforns) [16:34:13] Analytics-Kanban, Datasets-General-or-Unknown, WMDE-Analytics-Engineering: Fix permissions on dumps.wm.o access logs synced to stats1002 - https://phabricator.wikimedia.org/T134776#2276672 (Milimetric) We'll try to fix this right away, but I'm just curious why/how these logs are used. [16:38:48] Analytics, Data-release, Research-and-Data-Backlog: Wikipedia Clickstream dataset. Programmatic Access - https://phabricator.wikimedia.org/T134231#2258736 (mforns) Does it need sanitization? [16:40:41] Analytics-Kanban, WMDE-Analytics-Engineering: Remove http://datasets.wikimedia.org/aggregate-datasets/wikidata/ - https://phabricator.wikimedia.org/T125407#2276908 (mforns) [16:43:03] Analytics-Kanban: Check if we can deprecate legacy TSVs production (same time as pagecounts?) - https://phabricator.wikimedia.org/T130729#2276909 (mforns) [16:46:02] Analytics, Datasets-General-or-Unknown, WMDE-Analytics-Engineering: Fix permissions on dumps.wm.o access logs synced to stats1002 - https://phabricator.wikimedia.org/T134776#2276926 (JAllemandou) [16:46:38] Analytics, WMDE-Analytics-Engineering: Remove http://datasets.wikimedia.org/aggregate-datasets/wikidata/ - https://phabricator.wikimedia.org/T125407#2276930 (JAllemandou) [16:46:50] Analytics: Check if we can deprecate legacy TSVs production (same time as pagecounts?) - https://phabricator.wikimedia.org/T130729#2276948 (JAllemandou) [16:46:53] Analytics: Doc cleanup day 2.0 {flea} - https://phabricator.wikimedia.org/T112024#2276950 (mforns) Open>declined It seems we're not going to do this. [16:47:12] Analytics: Compile a request data set for caching research and tuning - https://phabricator.wikimedia.org/T128132#2276954 (JAllemandou) a:JAllemandou>None [16:47:45] Analytics: Describe threat model for sanitized pageview data {mole} - https://phabricator.wikimedia.org/T131158#2276958 (JAllemandou) [16:49:38] Analytics: Get jenkins to update refinery with deploy of new jars {hawk} - https://phabricator.wikimedia.org/T130123#2276963 (mforns) [16:50:44] Analytics: Unique devices endpoint Graphana Dashboard {bear} - https://phabricator.wikimedia.org/T132795#2276969 (mforns) [16:51:08] Analytics, Analytics-EventLogging, Scap3 (Scap3-Adoption-Phase1): Stop using global eventlogging install on hafnium (and any other eventlogging lib user) - https://phabricator.wikimedia.org/T131977#2276974 (JAllemandou) [16:56:38] Analytics, Datasets-General-or-Unknown, WMDE-Analytics-Engineering: Fix permissions on dumps.wm.o access logs synced to stats1002 - https://phabricator.wikimedia.org/T134776#2277009 (Addshore) @Milimetric they are not yet used, but see T119070 [16:59:21] sorry milimetric, mforns : I need to take 5 minutes now for Lino, let's meet at past 5 in batcave please [16:59:33] joal, np [17:06:13] milimetric, mforns : here ! [17:14:09] joal: I know that your patience will have a limit someday, buuuut can you tell me how I can check that Camus is already done with its activities? [17:14:16] hue, yarn, else? [17:14:37] elukey: ps ax on analyticws 1015 ;) [17:14:53] that's it? Checking camus processes? [17:14:58] correct :) [17:15:10] * elukey goes in the corner and start crying [17:16:58] all right restarting analytics1001 then [17:22:03] !log analytics1001 Yarn+HDFS masters failed over to analytics1002 for Java upgrades [17:27:15] Analytics: Spike - Extract edit oriented dqata from MySQL on Simplewiki to match EventBus schemas - https://phabricator.wikimedia.org/T134790#2277147 (JAllemandou) [17:28:57] Analytics: Spike - MySQL edit data extraction - https://phabricator.wikimedia.org/T134790#2277160 (JAllemandou) [17:29:14] Analytics-Kanban: Propose evolution of Mediawiki EventBus schemas to match needed data for Analytics need - https://phabricator.wikimedia.org/T134502#2277163 (JAllemandou) [17:29:16] Analytics: Spike - MySQL edit data extraction - https://phabricator.wikimedia.org/T134790#2277147 (JAllemandou) [17:31:00] !log analytics1002 Yarn+HDFS masters failed over to analytics1001 for Java upgrades (restored original state) [17:31:33] Analytics: Scale MySQL edit data extraction - https://phabricator.wikimedia.org/T134791#2277177 (JAllemandou) [17:31:57] Analytics: Spike - MySQL edit data extraction - https://phabricator.wikimedia.org/T134790#2277180 (JAllemandou) [17:31:59] Analytics: Scale MySQL edit data extraction - https://phabricator.wikimedia.org/T134791#2277164 (JAllemandou) [17:33:48] Analytics, Hovercards, Reading-Web-Backlog, Reading-Web-Sprint-72-99 problems but Nirzar aint one: Verify X-Analytics: preview=1 in stable - https://phabricator.wikimedia.org/T133067#2277182 (Jdlrobson) [17:34:15] Analytics: Spike - Slowly Changing Dimensions on Druid - https://phabricator.wikimedia.org/T134792#2277183 (JAllemandou) [17:34:19] Analytics, Hovercards, Reading-Web-Backlog, Reading-Web-Sprint-72-99 problems but Nirzar aint one: Verify X-Analytics: preview=1 in stable - https://phabricator.wikimedia.org/T133067#2218761 (Jdlrobson) @dr0ptp4kt please take care to leave old sprint tags on when moving things over. [17:36:51] Analytics: Edit data schemas for anaylitcs - https://phabricator.wikimedia.org/T134793#2277198 (JAllemandou) [17:37:05] Analytics: Edit data schemas for anaylitcs - https://phabricator.wikimedia.org/T134793#2277210 (JAllemandou) [17:37:07] Analytics: Spike - Slowly Changing Dimensions on Druid - https://phabricator.wikimedia.org/T134792#2277211 (JAllemandou) [17:37:09] Analytics-Kanban: Propose evolution of Mediawiki EventBus schemas to match needed data for Analytics need - https://phabricator.wikimedia.org/T134502#2277212 (JAllemandou) [17:37:46] !log camus+puppet re-enabled on analytics1027 [17:39:50] a-team: whole hadoop cluster rebooted, together with Yarn/HDFS master nodes, please keep an extra eye during the next hours just in case :) [17:40:01] awesome elukey ! thanks :) [17:40:05] :] [17:40:16] NICE thanks elukey [17:40:22] s/rebooted/restarted sorry :) [17:43:23] a-team: gone for diner, will be back after for a quick hadoop check [17:43:36] bye1 [17:43:41] bye! [17:47:05] joal: thanks! [17:53:44] going offline, byeeeee o/ [18:48:54] a-team: double checked hadoop: load is a bit latem but no job has failed [18:49:01] Logging off for tonight :) [18:56:50] laters! [19:09:29] Analytics, Wikipedia-iOS-App-Product-Backlog, iOS-app-feature-Analytics: Fix iOS uniques in mobile_apps_uniques_daily after 5.0 launch - https://phabricator.wikimedia.org/T130432#2277561 (JMinor) Open>Resolved a:JMinor [20:18:38] Analytics-Cluster, Analytics-Kanban, Operations, Traffic, Patch-For-Review: Upgrade analytics-eqiad Kafka cluster to Kafka 0.9 - https://phabricator.wikimedia.org/T121562#2277851 (Ottomata) [20:19:48] Analytics-Cluster, Analytics-Kanban, Operations, Traffic, Patch-For-Review: Upgrade analytics-eqiad Kafka cluster to Kafka 0.9 - https://phabricator.wikimedia.org/T121562#1881753 (Ottomata) Upgraded the analytics Kafka cluster in deployment-prep today. Along the way I had to create an extra... [21:23:12] Analytics-Kanban: Dedicated and/or automated Wikimedia pageviews API project/tag in Phabricator Maniphest [1 pts] - https://phabricator.wikimedia.org/T119151#2278166 (Milimetric) yeah, we haven't figured out a way to fix this, we're waiting for Herald to get smarter. [21:28:09] Analytics-Kanban, DBA: Set up auto-purging after 90 days {tick} - https://phabricator.wikimedia.org/T108850#2278202 (mforns) a:mforns [21:40:05] Analytics, Analytics-Wikistats, Internet-Archive: Total page view numbers on Wikistats do not match new page view definition - https://phabricator.wikimedia.org/T126579#2278293 (Milimetric) [21:44:19] Analytics, Editing-Analysis, Notifications, Collab-Team-2016-Apr-Jun-Q4: Numerous Notification Tracking Graphs Stopped Working at End of 2015 - https://phabricator.wikimedia.org/T132116#2278306 (DarTar) [21:45:16] Analytics, Analytics-Wikistats, Internet-Archive: Total page view numbers on Wikistats do not match new page view definition - https://phabricator.wikimedia.org/T126579#2278313 (Milimetric) I followed along so far, Erik and Tilman, let me know if I can help. [21:55:54] Analytics, Collaboration-Team-Interested, DBA, Notifications: Purge all Schema:Echo data after 90 days - https://phabricator.wikimedia.org/T128623#2278373 (Milimetric) If it's easier, @jcrespo, you can just delete all Echo_% tables any time, we have confirmation from Roan that they don't need tha... [22:09:11] Analytics, RESTBase, Services, User-mobrovac: configure RESTBase pageview proxy to Analytics' cluster on wiki-specific domains - https://phabricator.wikimedia.org/T119094#2278458 (Milimetric) a:Milimetric [22:09:34] Analytics, RESTBase, Services, User-mobrovac: configure RESTBase pageview proxy to Analytics' cluster on wiki-specific domains - https://phabricator.wikimedia.org/T119094#1817629 (Milimetric) just assigning this to myself so I can catch up with the bike-shedding and start work at some point hopef... [22:56:19] Analytics-Tech-community-metrics, Developer-Relations, Community-Tech-Sprint: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool? - https://phabricator.wikimedia.org/T125459#2278820 (kaldari) Here's what came out of our meeting with Microsoft today. (For those of... [23:09:01] Analytics-Tech-community-metrics, Developer-Relations, Community-Tech-Sprint: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool? - https://phabricator.wikimedia.org/T125459#2278845 (Earwig) @Ricordisamoa As a service, it seems fairly limited. Maybe in the future... [23:11:57] Analytics-Tech-community-metrics, Developer-Relations, Community-Tech-Sprint: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool? - https://phabricator.wikimedia.org/T125459#2278856 (Compassionate727) Would it still cost $10k if only Coren were using Google? [23:12:14] Analytics-Tech-community-metrics, Developer-Relations, Community-Tech-Sprint: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool? - https://phabricator.wikimedia.org/T125459#2278857 (Earwig) I don't think the copyvios tool actually takes advantage of any Labs-spe...