[00:28:51] (03PS1) 10Zhuyifei1999: worker: refuse to save if rowcount is > 65536 [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/415489 (https://phabricator.wikimedia.org/T188564) [00:28:54] 10Analytics-Kanban, 10Discovery-Analysis, 10Reading-analysis: Pageviews/Stats on dataviz-literacy.wmflabs.org - https://phabricator.wikimedia.org/T187104#4012479 (10mpopov) >>! In T187104#4012317, @Nuria wrote: > You just have to add tracking code to your site Thank you! Just added snippet & deployed. Is th... [00:36:06] (03CR) 10BryanDavis: worker: refuse to save if rowcount is > 65536 (031 comment) [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/415489 (https://phabricator.wikimedia.org/T188564) (owner: 10Zhuyifei1999) [00:39:11] (03PS2) 10Zhuyifei1999: worker: refuse to save if rowcount is > 65536 [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/415489 (https://phabricator.wikimedia.org/T188564) [01:48:24] 10Quarry: Implement SQL Query Validator in Quarry - https://phabricator.wikimedia.org/T188538#4012684 (10zhuyifei1999) Is there an open source 'SQL Query Validator' that supports MySQL syntax? My quick google search didn't find anything. [04:46:35] 10Analytics, 10Wikipedia-iOS-App-Backlog: iOS traffic data is not available on Piwik since Feb 20, 2018 - https://phabricator.wikimedia.org/T188559#4012799 (10Nuria) I wonder if the archive cron (which i think is running once a day) has not run due to machine failures for couple times and now it cannot run d... [04:50:45] 10Analytics, 10Wikipedia-iOS-App-Backlog: iOS traffic data is not available on Piwik since Feb 20, 2018 - https://phabricator.wikimedia.org/T188559#4012805 (10Nuria) Executed archiving by hand and got the following for website "3" which is iOS INFO [2018-03-01 04:37:22] - Will invalidate archived reports for 2... [07:18:23] 10Analytics, 10Wikipedia-iOS-App-Backlog: iOS traffic data is not available on Piwik since Feb 20, 2018 - https://phabricator.wikimedia.org/T188559#4012973 (10elukey) Executed the following: ``` elukey@bohrium:/var/log/piwik$ elukey@bohrium:/var/log/piwik$ for el in {20..28}; do sudo -u www-data /usr/share/pi... [07:46:00] 10Analytics-Kanban, 10User-Elukey: Reboot all Analytics hosts for Kernel upgrade - https://phabricator.wikimedia.org/T188594#4012985 (10elukey) p:05Triage>03High [08:10:14] Hi elukey - I'm assuming the burrow alarm is from you testing stuff? [08:11:24] joal: nope, if it fires we need to check it :) [08:11:45] the monitors are fully productionized [08:11:49] plus I am testing in beta [08:12:17] I rebalanced the analytics cluster though recently, kafka1023 was out of the leaders [08:12:37] hm - Let's wait some time and see if it recovers [08:12:55] it is only the eventlogging_consumer_mysql_00 [08:12:57] weird [08:13:39] hm [08:13:43] joal: https://grafana.wikimedia.org/dashboard/db/kafka-consumer-lag?orgId=1&var-datasource=eqiad%20prometheus%2Fanalytics&var-cluster=eqiad&var-topic=All&var-consumer_group=All&from=now-3h&to=now [08:14:44] elukey: I don't understand that chart - It tells us about webrequest_text, while the error is on EL_mysql [08:14:50] * joal is getting lost [08:15:14] so the graph lists all the consumer groups and their related topic/partitions [08:15:24] americium is a host inside frack [08:15:32] (fundraising) [08:15:51] so probably kafkatee is recovering after some consumer group changes [08:15:54] Ah! Makes sense [08:15:57] ok [08:15:58] if you scroll down you can see the el ones [08:16:06] buuut I don't see any issue with the mysql consumer [08:16:08] that is really weird [08:16:27] joal: even better https://grafana.wikimedia.org/dashboard/db/kafka-consumer-lag?orgId=1&from=now-3h&to=now&var-datasource=eqiad%20prometheus%2Fanalytics&var-cluster=eqiad&var-topic=All&var-consumer_group=eventlogging_consumer_mysql_00 [08:16:29] yeah, they all say 0 [08:16:33] this isolate the cgroup [08:17:07] hm - burrow is freaking out? [08:20:55] we have burrow 0.1 that is super old (1.0 is the last one) so there might be some differences between what the metrics are showing and the heuristic that fires the email [08:21:08] elukey: ok let's wait [08:21:10] I can see in the m4-consumer logs that a consumer group rebalance happened [08:21:19] and it didn't take it very well [08:21:23] but now it works fine [08:21:48] so as far as I can see we are good [08:22:06] I'll wait a bit before rebooting kafka1012 [08:23:55] ok [08:24:18] I was looking at the logs as well, but you were faster :) [08:26:10] Thanks elukey for having found the issue :) [08:27:56] I still don't like a lot how kafka-python reacts when a consumer group leader changes [08:28:31] also this morning is not great - I tried to reprocess some piwik archive data and ganeti1006 died [08:28:34] :( [08:32:07] 10Analytics, 10Wikipedia-iOS-App-Backlog: iOS traffic data is not available on Piwik since Feb 20, 2018 - https://phabricator.wikimedia.org/T188559#4013034 (10elukey) Good news: Nuria was right, the archiver was adding missing data after invalidation. Bad news: the extra IO (probably) triggered the Ganeti bug... [08:32:15] --^ [08:41:15] elukey: it's a shame we hit perf/scale issues with piwik when we process more than a tera a day of webrequest :( [08:42:24] the bigger shame is that there's so many iOS devices out there :-) [08:43:37] mwahahaha moritzm [08:44:56] elukey: usual beinning of month cluster load [08:45:38] moritzm: it is not fair to make fun of piwik, too easy :D [09:08:32] PROBLEM - Number of banner_activity realtime events received by Druid over a 30 minutes period on einsteinium is CRITICAL: CRITICAL - druid_realtime_banner_activity is 0 https://grafana.wikimedia.org/dashboard/db/prometheus-druid?refresh=1m&panelId=41&fullscreen&orgId=1 [09:09:32] joal: --^ [09:09:54] Yes, just saw that [09:09:57] not sure if it is stuck or if we need to have a wider check period (like 2h) [09:10:24] elukey: I think we have an issue [09:10:52] elukey: I don't know how, but it looks like the job died (or was killed) [09:11:39] Then, let's have a look at the restarting cron output - I'm assuming it has failed because of spark-version mismatches between jar and cluster deployed one [09:11:42] :( [09:12:55] elukey: could you manually update the cron to log onto a file? [09:13:59] ah because it does /dev/null 2>&1 now [09:14:12] should we do it permanently? [09:15:22] elukey: I don't think so [09:15:40] elukey: or maybe we could ... hml [09:15:45] joal: why not? [09:16:28] does the spark job emit anything useful to stdout/err? [09:16:31] elukey: there are many uninformative logs [09:16:37] elukey: but I guess it's the same everywhere [09:16:43] yeah [09:16:48] elukey: If we do so we'd need to set up some rotation [09:17:09] it should be already in place for /var/log/refinery [09:20:08] elukey: I'm thinking of patching in an unclean way: make the change in refinery-source, and manually move a jar to the correct place - not good he [09:21:26] (03PS1) 10Joal: Use spark 2.2.1 to follow cluster upgrade [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/415539 [09:21:36] elukey: --^ [09:22:11] 10Analytics-Kanban, 10Research, 10WMDE-Analytics-Engineering, 10User-Addshore, 10User-Elukey: Phase out and replace analytics-store (multisource) - https://phabricator.wikimedia.org/T172410#4013113 (10Neil_P._Quinn_WMF) >>! In T172410#4010821, @Nuria wrote: > Ah, sorry, i see. Then -since we plan to add... [09:22:49] elukey: I'm actually going to properly deploy the new jar [09:23:10] (03CR) 10Joal: [V: 032 C: 032] "Self merging bug-fix" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/415539 (owner: 10Joal) [09:23:19] joal: no idea what's best, super ignorant :( [09:24:22] elukey: in any case, I need to manually deploy that jar as it is not part of our regular jars [09:24:46] elukey: So I'm doint it now, for that jar only (no other deploy), to fix the issue [09:25:07] elukey: I do hope it'll work as is :S [09:25:58] joal: out of curiosity, what would be the correct/safe/best way to do it? [09:27:09] elukey: Manually deploying the jar is correct [09:27:29] elukey: however deploying that jar without the others is not how we usually do [09:27:39] ah ok I thought it was a elegant hack waiting for $later_step [09:29:44] going to reboot analytics1030 in the meantime :) [09:30:17] elukey: currently testing the new jar [09:33:13] 10Analytics, 10Operations, 10User-Elukey: Import some Analytics git puppet submodules to operations/puppet - https://phabricator.wikimedia.org/T188377#4013131 (10MoritzMuehlenhoff) p:05Triage>03Normal [09:34:46] RECOVERY - Number of banner_activity realtime events received by Druid over a 30 minutes period on einsteinium is OK: OK - druid_realtime_banner_activity is 647 https://grafana.wikimedia.org/dashboard/db/prometheus-druid?refresh=1m&panelId=41&fullscreen&orgId=1 [09:35:19] elukey: --^ [09:35:45] The job is running is a joal-screen on stat1004 [09:36:00] elukey: this proves the solution I suggest works [09:36:23] elukey: if you agree, let's wait for Andrew and make decision with him on how to proceed [09:36:53] elukey: also, I'll need to leave soon to have lunch with family, so I'm happy that a temporary solution is found :) [09:36:58] joal: you guys know better than me what's best, I have no idea, I trust your judgment [09:37:40] (jar management is one of those dark corners of analytics I have not yet explored/understoood :) [09:37:47] have a good lunch! [09:45:55] !log rerun webrequest-load-wf-text-2018-3-1-6 manually, failed due to analytics1030's reboot [09:45:57] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:48:17] elukey: the banner job AM is located on analytics1051 - please ping me before restarting this one please :) [09:48:23] I'll be back after lunch [09:51:58] joal: ack! [10:18:51] 10Analytics, 10Analytics-Cluster: Migrate mjolnir Kafka clients to use Kafka jumbo - https://phabricator.wikimedia.org/T188408#4013240 (10Gehel) [10:26:12] (03CR) 10Thiemo Kreuz (WMDE): "Oh no, this got lost. I would really love to see charts for the individual priorities." [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/396065 (owner: 10Thiemo Kreuz (WMDE)) [10:28:30] (03PS5) 10Addshore: Record metrics for Wikidata task priorities (via color) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/396065 (owner: 10Thiemo Kreuz (WMDE)) [10:30:03] PROBLEM - Hadoop NodeManager on analytics1030 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [10:32:02] urgh [10:33:07] rebooting [10:38:04] RECOVERY - Hadoop NodeManager on analytics1030 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [10:44:15] (03CR) 10Addshore: [V: 04-1] "So I have run this:" [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/396065 (owner: 10Thiemo Kreuz (WMDE)) [10:53:41] (03PS6) 10Thiemo Kreuz (WMDE): Record metrics for Wikidata task priorities (via color) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/396065 [10:54:41] (03CR) 10Thiemo Kreuz (WMDE): "Fixed it. Thanks for looking into this again!" [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/396065 (owner: 10Thiemo Kreuz (WMDE)) [11:02:11] (03CR) 10Addshore: [C: 032] Record metrics for Wikidata task priorities (via color) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/396065 (owner: 10Thiemo Kreuz (WMDE)) [11:02:16] (03PS1) 10Addshore: Record metrics for Wikidata task priorities (via color) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/415544 [11:02:19] (03Merged) 10jenkins-bot: Record metrics for Wikidata task priorities (via color) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/396065 (owner: 10Thiemo Kreuz (WMDE)) [11:02:21] (03CR) 10Addshore: [C: 032] Record metrics for Wikidata task priorities (via color) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/415544 (owner: 10Addshore) [11:02:29] (03Merged) 10jenkins-bot: Record metrics for Wikidata task priorities (via color) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/415544 (owner: 10Addshore) [11:24:01] 10Analytics-Tech-community-metrics, 10Developer-Relations (Jan-Mar-2018): "Authors With No Activity During Last 6 Months" on C_Gerrit_Demo lists some users who have been active in the last 6 months - https://phabricator.wikimedia.org/T185316#4013333 (10Aklapper) [11:33:02] 10Analytics-Data-Quality, 10Analytics-Kanban, 10WMDE-Analytics-Engineering, 10User-GoranSMilovanovic: Please review: new WMDE public data set on stat1005 - https://phabricator.wikimedia.org/T187606#4013355 (10mforns) Hi @GoranSMilovanovic Sorry for the delay in taking care of this task. I see that your d... [11:37:10] 10Analytics-Tech-community-metrics, 10Developer-Relations (Apr-Jun-2018): Advertise wikimedia.biterg.io more widely in the Wikimedia community - https://phabricator.wikimedia.org/T179820#4013366 (10Aklapper) [11:37:12] 10Analytics-Tech-community-metrics, 10Developer-Relations (Jan-Mar-2018): "Authors With No Activity During Last 6 Months" on C_Gerrit_Demo lists some users who have been active in the last 6 months - https://phabricator.wikimedia.org/T185316#4013362 (10Aklapper) 05Open>03Resolved There were two bugs that s... [11:39:48] 10Analytics-Kanban, 10Discovery-Analysis, 10Reading-analysis: Pageviews/Stats on dataviz-literacy.wmflabs.org - https://phabricator.wikimedia.org/T187104#3964689 (10mforns) Hi @mpopov! > Is there a special login I'll need to use? I enter piwik.wikimedia.org with my Wikitech credentials. > (Once there's dat... [11:51:23] * elukey lunch! [12:03:58] 10Analytics, 10Analytics-Wikistats: Wikistats Bug - differences in view stats for smaller wikipedias - https://phabricator.wikimedia.org/T188613#4013530 (10Kipala) [12:04:52] 10Analytics, 10Analytics-Wikistats: Wikistats Bug - differences in view stats for smaller wikipedias - https://phabricator.wikimedia.org/T188613#4013542 (10Kipala) [12:05:19] 10Analytics, 10Analytics-Wikistats: Wikistats Bug - differences in view stats for smaller wikipedias - https://phabricator.wikimedia.org/T188613#4013530 (10Kipala) [12:07:30] 10Analytics, 10Analytics-Wikistats: Wikistats Bug - differences in view stats for smaller wikipedias - https://phabricator.wikimedia.org/T188613#4013553 (10Kipala) [12:37:59] 10Analytics-Data-Quality, 10Analytics-Kanban, 10WMDE-Analytics-Engineering, 10User-GoranSMilovanovic: Please review: new WMDE public data set on stat1005 - https://phabricator.wikimedia.org/T187606#4013641 (10GoranSMilovanovic) @mforns Thank you for the review! - The data set will be sanitized as per your... [12:44:57] 10Analytics, 10Wikipedia-iOS-App-Backlog: iOS traffic data is not available on Piwik since Feb 20, 2018 - https://phabricator.wikimedia.org/T188559#4013672 (10elukey) Added some documentation: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Piwik#Invalidate_old_reports Seems better now (archiver still r... [13:09:11] 10Analytics-Data-Quality, 10Analytics-Kanban, 10WMDE-Analytics-Engineering, 10User-GoranSMilovanovic: Please review: new WMDE public data set on stat1005 - https://phabricator.wikimedia.org/T187606#4013746 (10mforns) Thanks @GoranSMilovanovic! [13:54:40] 10Analytics-Kanban, 10User-Elukey: Reboot all Analytics hosts for Kernel upgrade - https://phabricator.wikimedia.org/T188594#4013854 (10elukey) [14:17:04] joal: hiyyya i had a thought last night [14:17:20] maybe hivecontext isn't working in 1.6 because...we don't have hive-site.xml in the oozie sharelib? [14:17:27] the hortonworks instructions say to put it there [14:17:44] wondering if you'll have the same problem with our temp 2.2.1 sharelib, since we also didn't put it there [14:20:47] o/ [14:21:01] kafka jumbo and analytics rebooted today, all good [14:23:28] nice thanks elukey [14:27:12] ottomata: if you are ok I'd merge the el systemd change and kill the deployment-prep el ubuntu instance [14:30:32] elukey: +1 [14:43:13] (03CR) 10Fdans: [C: 032] Fix issues with numer formatting [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/409714 (https://phabricator.wikimedia.org/T187010) (owner: 10Nuria) [14:50:16] Heya ottomata [14:50:57] I've had a similar issue with Spark 2.2.1 indeed - If you don't mind adding the hive-site.xml file in the test shrealib we have, we could confirm :) [14:51:27] Also ottomata, banner srteaming job experienced issues this morning, but with the new version, I had to quick-patch and manually restart a job [14:53:20] hm, ok... [14:56:52] joal: hive-site.xml now in spark2_etst0 [14:57:13] ottomata: and share-lib updated? [14:57:47] ottomata: I actually think I have an idea of why it originally didn't work with the --files thing [14:58:17] ya [14:58:19] ottomata: oozie adds its own --files option contaning all the jars of the sharelib after the options provided in the conf [14:58:38] And I think the last one only is taken into account [14:59:07] nuria_: thanks for https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/TestingOnBetaCluster#How_to_verify_events ! [14:59:40] joal: I don't think we include suppressed / hidden revisions either, because labs sanitizes those [14:59:42] right? [14:59:51] elukey: i'm going to go a little crazy with something, got a second for a brainbounce in cave? [14:59:53] !log shutdown deployment-eventlog02 in favor of deployment-eventlog05 in deployment-prep (Ubuntu -> Debian EL migration) [14:59:54] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:59:59] no hurry! [15:00:05] ottomata: when you have time, I'd like to take 5 mins to discuss streaming job issue and solution [15:00:08] ottomata: sure! Do I need coffee first? :D [15:00:15] dunno! [15:00:24] want to get coffee and I talk to joal first? [15:00:29] +1 [15:00:37] joal: headed to cave [15:01:49] milimetric: huidden revisions are present in labs, just have NULL user and/or sha1 [15:02:27] joal: oh, so do we count those as anonymous edits? [15:05:41] joal: hm, I only see examples with sha1 and rev_len null, not user [15:06:03] so we do count them, and attribute them to the user, unless the user field is null [15:11:02] yessir [15:11:28] milimetric: we have a ticket about adding revision_deleted field in MWH [15:11:51] ottomata: jar path: stat1003:/home/joal/refinery-job-spark-2.2-0.0.59.jar [15:12:16] ottomata: tell me when you're ready, I'll kill my manual version of the job [15:13:38] joal: ok, last detail seems to be that we attribute them to an "anonymous" user if the rev_user is null: [15:13:39] https://github.com/wikimedia/analytics-refinery-source/blob/78ee0716cf8f9815276886fccb0dc35503661a30/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/mediawikihistory/denormalized/MediawikiEvent.scala#L307 [15:13:42] https://github.com/wikimedia/analytics-refinery-source/blob/78ee0716cf8f9815276886fccb0dc35503661a30/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/mediawikihistory/denormalized/DenormalizedRunner.scala#L145 [15:14:03] (those are the two relevant lines, unless something else happens with the None that user_id gets set to later on) [15:14:17] that seems wrong [15:14:42] hm [15:15:01] maybe we should have a "suppressed" user? This kind of overloads the meaning of "anonymous", though technically these folks are anonymous [15:15:50] I updated https://wikitech.wikimedia.org/wiki/Analytics/Systems/Wikistats/Metrics/FAQ#Does_active_editor_metrics_include_contributors_that_were_blocked_and_whose_contributions_were_deleted_? [15:15:51] milimetric: I don't actually how those special cases are to be considered [15:16:47] yeah, joal, it seems to me like a good idea to go over the whole pipeline and group all the different types of inputs we can have, and figure out how we compute each distinct combination [15:16:48] milimetric: Thanks for the doc ! Super clear :) [15:17:57] ok, Sveta, here is the answer to your question from last night, after some more digging into the process. I hope it helps, and we are going to look at these special cases more closely soon: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Wikistats/Metrics/FAQ#Does_active_editor_metrics_include_contributors_that_were_blocked_and_whose_contributions_were_deleted_? [15:30:02] ottomata: Success with hive-site file in sharelib :) [15:30:05] That's great! [15:30:49] joal: , can we try 1.6 too? i'll put hive-site there [15:30:53] in spark sharelib [15:31:09] (done) [15:33:59] (03PS18) 10Mforns: Add EL and whitelist sanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/412939 (https://phabricator.wikimedia.org/T181064) [15:35:09] ottomata: launching the test when cluster gets a breqthing time [15:35:17] o yeah mar 1 ok! [15:49:04] happy martisor everyone :) https://usercontent.irccloud-cdn.com/file/5k9bjc8n/image.png [15:49:48] in romania, we celebrate spring with these cute little things, usually made from red and white threads [15:50:06] (coming of spring, you know what I mean) [15:58:16] oooohhhh [16:01:01] :) milimetric - Happy March 1st to you :) [16:01:48] ottomata: spark with hive 1.6 failed - I suspect because of datanucleus jars not being present in the oozie sharelib [16:01:53] 10Analytics, 10Wikipedia-iOS-App-Backlog: iOS traffic data is not available on Piwik since Feb 20, 2018 - https://phabricator.wikimedia.org/T188559#4014190 (10Nuria) Super thanks @elikey. @chelsyx data shoudl be back but please have in mind that piwik is not a very reliable data store, nor does it have teh av... [16:02:10] 10Analytics, 10Wikipedia-iOS-App-Backlog: iOS traffic data is not available on Piwik since Feb 20, 2018 - https://phabricator.wikimedia.org/T188559#4014191 (10Nuria) cc @JMinor [16:03:10] ottomata: Checked on hdfs - The jars are there !!!! So it must be something else [16:09:14] ottomata: one difference I find in the jars present or not (one that my guts tell me could lead to a jdbc error) is hive-jdbc [16:09:29] this jar is present in spark2 lib but not in spark [16:11:51] oh huh [16:11:54] interesting [16:11:59] joal shall I just put it there and we try? [16:12:00] again? [16:12:03] i mean, i guess we do'nt need to know [16:13:35] (03CR) 10Nuria: "Joseph , can you describe a bit more on commit message why is this change needed?" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/415539 (owner: 10Joal) [16:14:52] 10Analytics, 10Wikipedia-iOS-App-Backlog: iOS traffic data is not available on Piwik since Feb 20, 2018 - https://phabricator.wikimedia.org/T188559#4014243 (10elukey) Interesting stat: it took ~5h to archive the iOS data for the past week :D ``` Archived website id = 3, 4 API requests, Time elapsed: 19351.548... [16:15:36] wow it took 5h for piwik to archive 7 days of data :D [16:17:47] ottomata: as you wish :) [16:18:18] ottomata: now that it works with 2.2.1 and we're to migrate, I guess it's not that important, but we can try :) [16:18:30] nuria_: sorry about the small commit message :( [16:18:36] joal: np [16:19:02] nuria_: that patch was a quick fix to repare the streaming job, and it's true I have not been super explicit in the message [16:19:15] joal: wild guess of today: extrapolating from this data https://www.comscore.com/Insights/Rankings/comScore-Ranks-the-Top-50-US-Digital-Media-Properties-for-January-2016 [16:19:21] joal: and our unique devices [16:19:33] joal: google has 2.5 billion users per month [16:20:16] joal: or ~2.2 bilion rather [16:21:18] joal: unless you care, let's not bother, that way I don't have to mess with original jars in spark 1.6 sharelib [16:22:12] sounds good to me ottomata [16:22:26] ottomata: Thanks for having found the hive-site.xml one !!!! [16:23:59] yaaah glad that works! [16:24:45] 10Analytics-Tech-community-metrics, 10Developer-Relations (Jan-Mar-2018): Have "Last Attracted Developers" information for Gerrit automatically updated / Integrate new demography panels in GrimoireLab product - https://phabricator.wikimedia.org/T151161#4014289 (10Aklapper) [16:25:56] elukey: https://gerrit.wikimedia.org/r/#/c/415465/ when you have a sec [16:28:12] Sveta: added to our FAQ: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Wikistats/Metrics/FAQ#Does_active_editor_metrics_includes_contributors_that_were_blocked_and_contributions_deleted_ [16:33:10] joal: cron job modified on an03 [16:33:14] with your jar [16:33:32] ottomata: ok, killing manual job [16:33:43] fdans, milimetric : we can deploy wikistats right? [16:33:55] nuria_: yep [16:34:04] any thoughts milimetric ? [16:34:40] 10Analytics-Kanban, 10Discovery-Analysis, 10Reading-analysis: Pageviews/Stats on dataviz-literacy.wmflabs.org - https://phabricator.wikimedia.org/T187104#4014337 (10Nuria) [16:34:44] job killed ottomata, I'm monitoring the fact that it is relaunched [16:34:52] 10Analytics-Kanban, 10Discovery-Analysis, 10Extension-MobileApp, 10Wikipedia-Android-App-Backlog, 10Patch-For-Review: Bug behavior of QTree[Long] for quantileBounds - https://phabricator.wikimedia.org/T184768#4014340 (10Nuria) [16:35:08] fdans: i can do it [16:36:11] k [16:36:17] ottomata: confirmed it worked [16:36:22] Thanks ottomata :) [16:36:54] 10Analytics-Kanban, 10Patch-For-Review: Spark 2.2.1 as cluster default (working with oozie) - https://phabricator.wikimedia.org/T159962#4014349 (10Ottomata) [16:37:07] gr8 [16:39:27] nuria_: no, that patch looked good, I saw fdans' comments and left it alone [16:39:42] milimetric: ok, will do deployment steps [16:41:12] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Move webrequest varnishkafka and consumers to Kafka jumbo cluster. - https://phabricator.wikimedia.org/T185136#4014354 (10Jgreen) >>! In T185136#4006541, @Ottomata wrote: > @Jgreen, I think we are ready for webrequest_text. Can we find a time to d... [16:42:10] * elukey hates go and Burrow [16:44:06] fdans: the drop down menu navigation, still does not work on FF, does it work for you? [16:44:48] nuria_: which menu navigation? [16:44:59] fdans: the drop down "topic explorer" [16:45:53] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Move webrequest varnishkafka and consumers to Kafka jumbo cluster. - https://phabricator.wikimedia.org/T185136#4014394 (10Ottomata) Cool, we'll start working together on this at around 9:30am EST on Tuesday. [16:46:09] nuria_: it's working for me, except for the metrics with the issue described in the other task [16:46:26] fdans: do select a metric that is not available for "all-projects" [16:46:34] fdans: after that can you do afurther selection? [16:47:49] I can, but the issue i find is that i can't select further if I select a bytes metric, for example [16:48:03] because of the naming problem I presume [16:48:21] but if I select top articles with all wikis i've no problem [16:50:22] (03PS1) 10Nuria: Prepare for release 2.1.11 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/415595 [16:50:45] fdans: ok [16:51:25] fdans: would you please take a look at the version here to make sure it makes sense for deploy: https://gerrit.wikimedia.org/r/#/c/415595/1/package.json [16:51:55] fdans: argh, wait a sec [16:52:35] (03PS2) 10Nuria: Prepare for release 2.1.11 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/415595 [16:53:24] joal: how are the streaming job kafka brokers configured ? [16:53:33] fdans: https://gerrit.wikimedia.org/r/#/c/415595/2/package.json [16:54:49] 10Analytics-Tech-community-metrics, 10Developer-Relations (Jan-Mar-2018): For new authors on C_Gerrit_Demo, provide a way to access the list of Gerrit patches of each new author - https://phabricator.wikimedia.org/T187895#4014465 (10Aklapper) a:03Aklapper [16:55:05] nuria_: oh, nice catch on the package.json [16:55:09] it looks good to me [16:55:10] 10Analytics-Tech-community-metrics, 10Developer-Relations (Jan-Mar-2018): For new authors on C_Gerrit_Demo, provide a way to access the list of Gerrit patches of each new author - https://phabricator.wikimedia.org/T187895#3989396 (10Aklapper) Ahem, that is way easier to achieve... [16:55:21] fdans: ok, will deploy after standup [16:55:53] (03CR) 10Nuria: [V: 032 C: 032] Prepare for release 2.1.11 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/415595 (owner: 10Nuria) [16:56:46] ottomata: I don't understand you question :( [16:58:56] joal: the banner streaming job consumes from kafka, right? [16:59:12] where are the kafka broker bootstrap hostnames set? [16:59:15] hardcoded? [16:59:38] set by default in the job - can be overwritten [16:59:44] ottomata: --^ [17:00:20] ping ottomata fdans standduppp [17:00:23] ah ok great [17:00:38] will need to set that in the job too, tuesday we are going to move webrequet text to jumbo [17:00:57] ottomata: standup? [17:07:05] 10Analytics-Tech-community-metrics, 10Developer-Relations (Jan-Mar-2018): For new authors on C_Gerrit_Demo, provide a way to access the list of Gerrit patches of each new author - https://phabricator.wikimedia.org/T187895#4014543 (10Aklapper) p:05Low>03Normal @srishakatux: On https://wikimedia.biterg.io/ap... [17:07:47] 10Analytics-Kanban, 10Wikipedia-iOS-App-Backlog: iOS traffic data is not available on Piwik since Feb 20, 2018 - https://phabricator.wikimedia.org/T188559#4014551 (10Nuria) [17:29:56] 10Analytics-Kanban, 10Discovery-Analysis, 10Reading-analysis: Pageviews/Stats on dataviz-literacy.wmflabs.org - https://phabricator.wikimedia.org/T187104#4014612 (10mpopov) >>! In T187104#4013380, @mforns wrote: > Hi @mpopov! > >> Is there a special login I'll need to use? > I enter piwik.wikimedia.org with... [17:35:59] 10Analytics: Piwik to measure traffic for new wikimedia foundation site - https://phabricator.wikimedia.org/T188419#4007049 (10Ottomata) Q: This is to be a WordPress site, right? Looks like WordPress comes with built in pageview and unique stats? https://en.support.wordpress.com/stats/#views-and-visitors Is... [17:42:06] 10Analytics, 10Analytics-EventLogging, 10Readers-Web-Backlog: Open up EventLogging's debug mode to certain users - https://phabricator.wikimedia.org/T188640#4014632 (10phuedx) [17:52:45] ping ottomata groossskinnn [17:54:56] 10Analytics-Kanban, 10Discovery-Analysis, 10Reading-analysis: Pageviews/Stats on dataviz-literacy.wmflabs.org - https://phabricator.wikimedia.org/T187104#4014689 (10Nuria) IOs issues fixed, next report will run at some point today/tomorrow [17:57:05] 10Analytics, 10Analytics-EventLogging, 10Readers-Web-Backlog: Open up EventLogging's debug mode to certain users - https://phabricator.wikimedia.org/T188640#4014698 (10Milimetric) Timo, is this feasible to do in JS? We're not familiar enough with mediawiki [17:57:14] 10Analytics, 10Analytics-EventLogging, 10Readers-Web-Backlog: Open up EventLogging's debug mode to certain users - https://phabricator.wikimedia.org/T188640#4014700 (10Milimetric) is it somewhere like mw.user.group? [18:01:47] 10Analytics-Kanban, 10Wikipedia-iOS-App-Backlog: iOS traffic data is not available on Piwik since Feb 20, 2018 - https://phabricator.wikimedia.org/T188559#4014745 (10chelsyx) Thanks @Nuria and @elukey ! @JMinor, considering the unreliability of piwik, we should figure out a way with the Analytics team to back... [18:03:19] 10Analytics, 10Analytics-Wikistats: Wikistats Bug - differences in view stats for smaller wikipedias - https://phabricator.wikimedia.org/T188613#4013530 (10Milimetric) Sorry, it's hard to know what you're comparing to what from the description (I think the table formatting didn't work out). Can you give one e... [18:05:13] 10Analytics, 10Analytics-Wikistats: Wikistats 2.0: allow to view stats for all language versions (a.k.a. Project families) - https://phabricator.wikimedia.org/T188550#4011848 (10Milimetric) [18:05:16] 10Analytics, 10Analytics-Wikistats: Remaining reports. - https://phabricator.wikimedia.org/T186121#4014771 (10Milimetric) [18:11:49] 10Analytics: Stats on which clients/user-agents are making the most API edits - https://phabricator.wikimedia.org/T188549#4014856 (10Milimetric) 05Open>03Resolved a:03Milimetric You can look it up on here, an example report: https://pivot.wikimedia.org/#webrequest/table/2/EQUQLgxg9AqgKgYWAGgN7APYAdgC5gQAWA... [18:12:24] 10Analytics: Stats on which clients/user-agents are making the most API edits - https://phabricator.wikimedia.org/T188549#4014870 (10Milimetric) (that's 1/128 sampled) [18:12:26] 10Analytics: Stats on which clients/user-agents are making the most API edits - https://phabricator.wikimedia.org/T188549#4014872 (10Milimetric) (that's 1/128 sampled) [18:14:49] 10Analytics, 10Patch-For-Review: Update druid to latest release (0.11) - https://phabricator.wikimedia.org/T164008#3218113 (10Milimetric) [18:16:55] 10Analytics-Kanban, 10Wikipedia-iOS-App-Backlog: iOS traffic data is not available on Piwik since Feb 20, 2018 - https://phabricator.wikimedia.org/T188559#4014912 (10Nuria) Data is backed up, now, piwik is a tool mean to be used for low traffiqued sites, otherwise it just cannot handle it. In this case IOS dat... [18:25:46] nuria_: ok thanks [18:26:22] 10Analytics-EventLogging, 10Analytics-Kanban: Implement purging scheme for eventlogging data on top of eventlogging refine - https://phabricator.wikimedia.org/T176426#3624787 (10Milimetric) a:03mforns [18:27:13] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: Purge refined JSON data after 90 days - https://phabricator.wikimedia.org/T181064#4015013 (10Ottomata) [18:27:15] 10Analytics-EventLogging, 10Analytics-Kanban: Implement purging scheme for eventlogging data on top of eventlogging refine - https://phabricator.wikimedia.org/T176426#4015015 (10Ottomata) [18:27:48] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: Sanitize Hive EventLogging - https://phabricator.wikimedia.org/T181064#3778325 (10Ottomata) [18:29:58] 10Analytics-Kanban, 10Wikipedia-iOS-App-Backlog: iOS traffic data is not available on Piwik since Feb 20, 2018 - https://phabricator.wikimedia.org/T188559#4015034 (10chelsyx) @Nuria I was talking about the [outage on Nov 23 2017](https://wikitech.wikimedia.org/wiki/Analytics/Systems/Piwik#Known_outages), which... [18:30:43] 10Analytics: Whitelist analytics.wikimedia.org and stats.wikimedia.org in ad blockers - https://phabricator.wikimedia.org/T182816#4015044 (10Milimetric) [18:31:57] 10Analytics, 10Analytics-Wikistats: When searching for a project language, display a full list of languages - https://phabricator.wikimedia.org/T182960#4015060 (10Milimetric) 05Open>03Resolved a:03Milimetric Done with new improvements in the wiki selector [18:32:27] 10Analytics, 10Analytics-Wikistats: When searching for a project language, allow user to type the wiki's URL - https://phabricator.wikimedia.org/T182961#4015067 (10Milimetric) 05Open>03Resolved a:03Milimetric Wiki selector was improved, this is possible! yay :) [18:38:09] 10Analytics, 10User-Elukey: Setup regular backups from mysql eventlogging master - https://phabricator.wikimedia.org/T183000#3840880 (10Milimetric) 05Open>03declined [18:39:35] (03CR) 10Zhuyifei1999: [C: 032] worker: refuse to save if rowcount is > 65536 [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/415489 (https://phabricator.wikimedia.org/T188564) (owner: 10Zhuyifei1999) [18:39:58] (03Merged) 10jenkins-bot: worker: refuse to save if rowcount is > 65536 [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/415489 (https://phabricator.wikimedia.org/T188564) (owner: 10Zhuyifei1999) [18:42:51] 10Analytics, 10Research: Formal announcement of productized clickstream dataset - https://phabricator.wikimedia.org/T183097#4015092 (10Milimetric) 05Open>03Resolved done! yay [18:43:57] 10Analytics: Create LVS endpoint for druid-public-overlord (for oozie job indexing) - https://phabricator.wikimedia.org/T180971#4015102 (10Milimetric) [18:45:30] * elukey off! [18:45:49] 10Analytics, 10Analytics-Kanban: Hardware refresh eventlogging box - https://phabricator.wikimedia.org/T182925#4015118 (10Ottomata) [18:45:51] 10Analytics, 10Operations: setup/install eventlog1002.eqiad.wmnet - https://phabricator.wikimedia.org/T185667#4015116 (10Ottomata) [18:46:23] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade eventlogging servers to Stretch - https://phabricator.wikimedia.org/T114199#4015119 (10Ottomata) [18:46:42] milimetric: can you delete ac olumn? [18:46:46] EventLogging Refine [18:46:47] don't need [18:47:44] 10Quarry, 10Patch-For-Review: Quarry should refuse to save results that are way too large - https://phabricator.wikimedia.org/T188564#4015124 (10zhuyifei1999) 05Open>03Resolved https://quarry.wmflabs.org/query/24945 [18:47:58] ottomata: you can't actually, you can only hide :) but I'll hide it [18:48:05] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10Operations, and 3 others: Encrypt Kafka traffic, and restrict access via ACLs - https://phabricator.wikimedia.org/T121561#4015126 (10Ottomata) [18:48:25] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban: Import Kafka messages into HDFS authenticating with TLS/SSL - https://phabricator.wikimedia.org/T166832#4015128 (10Ottomata) [18:48:26] (ottomata just curious, do you have the option to hide when you click on that little down triangle in the column?) [18:48:29] 10Quarry: Gigantic query results cause a SIGKILL and the query status do not update - https://phabricator.wikimedia.org/T172086#4015129 (10zhuyifei1999) p:05Triage>03Low With T188564 fixed, hopefully this won't be necessary. [18:48:33] 10Analytics, 10Analytics-Cluster: Import Kafka messages into HDFS authenticating with TLS/SSL - https://phabricator.wikimedia.org/T166832#3309097 (10Ottomata) [18:48:42] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Port Kafka clients to new jumbo cluster - https://phabricator.wikimedia.org/T175461#4015132 (10Ottomata) [18:49:18] 10Analytics, 10Analytics-Cluster, 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: Move EventLogging analytics processes to Kafka jumbo-eqiad cluster - https://phabricator.wikimedia.org/T183297#4015133 (10Ottomata) [18:49:28] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Move webrequest varnishkafka and consumers to Kafka jumbo cluster. - https://phabricator.wikimedia.org/T185136#4015134 (10Ottomata) [18:49:37] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Move EventStreams to new jumbo cluster. - https://phabricator.wikimedia.org/T185225#4015135 (10Ottomata) [18:49:40] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review: Migrate Mediawiki Monolog Kafka producer to Kafka Jumbo - https://phabricator.wikimedia.org/T188136#4015136 (10Ottomata) [18:52:06] 10Analytics-Kanban, 10Wikipedia-iOS-App-Backlog: iOS traffic data is not available on Piwik since Feb 20, 2018 - https://phabricator.wikimedia.org/T188559#4015149 (10JMinor) Yes, per @nuria this is a low priority system. I appreciate that gaps in data are problematic for analysis, but given that we plan to mov... [18:52:22] 10Analytics, 10EventBus, 10MediaWiki-JobQueue, 10Services (doing): Failed to acquire page lock in LinksUpdate - https://phabricator.wikimedia.org/T188106#4015150 (10Pchelolo) This happened again and here's the chronology of events according to logs and kafka message for a single page: 1. 2018-03-01T15:56:... [18:54:25] 10Analytics, 10Analytics-Cluster, 10Patch-For-Review: Port Kafka clients to new jumbo cluster - https://phabricator.wikimedia.org/T175461#4015156 (10Ottomata) [18:54:38] 10Analytics, 10Analytics-Cluster, 10Operations, 10Traffic, and 2 others: Encrypt Kafka traffic, and restrict access via ACLs - https://phabricator.wikimedia.org/T121561#4015160 (10Ottomata) [18:54:51] (03PS1) 10Nuria: Squashed commit of the following: [analytics/wikistats2] (release) - 10https://gerrit.wikimedia.org/r/415625 (https://phabricator.wikimedia.org/T187010) [18:55:33] 10Analytics-Kanban, 10EventBus, 10Wikimedia-Stream, 10Services (watching), 10User-mobrovac: EventStreams - https://phabricator.wikimedia.org/T130651#4015162 (10Ottomata) 05Open>03Resolved I think its time to close this task :) [18:55:56] (03PS2) 10Nuria: Squashed commit of the following: [analytics/wikistats2] (release) - 10https://gerrit.wikimedia.org/r/415625 (https://phabricator.wikimedia.org/T187010) [18:59:37] (03Abandoned) 10Nuria: Squashed commit of the following: [analytics/wikistats2] (release) - 10https://gerrit.wikimedia.org/r/415625 (https://phabricator.wikimedia.org/T187010) (owner: 10Nuria) [19:01:19] 10Analytics-Tech-community-metrics, 10Developer-Relations (Jan-Mar-2018): "Changesets" on "Gerrit-Delays" should not list ABANDONED changesets? - https://phabricator.wikimedia.org/T151215#4015193 (10Aklapper) [19:01:52] 10Analytics-Tech-community-metrics, 10Developer-Relations (Jan-Mar-2018): "Changesets" on "Gerrit-Delays" should not list ABANDONED changesets? - https://phabricator.wikimedia.org/T151215#2810948 (10Aklapper) 05Open>03Resolved The dashboard "Gerrit-Delays" does not exist anymore. https://wikimedia.biterg.i... [19:05:34] 10Analytics-Kanban, 10Operations, 10ops-eqiad: DIMM errors for analytics1062 - https://phabricator.wikimedia.org/T187164#4015226 (10Cmjohnson) Record: 7 Date/Time: 02/13/2018 09:09:38 Source: system Severity: Critical Description: Multi-bit memory errors detected on a memory device at location... [19:26:13] 10Analytics-Kanban, 10Patch-For-Review: Spark 2.2.1 as cluster default (working with oozie) - https://phabricator.wikimedia.org/T159962#4015298 (10Ottomata) [19:26:49] 10Analytics-Kanban, 10Patch-For-Review: Spark 2.2.1 as cluster default (working with oozie) - https://phabricator.wikimedia.org/T159962#3084705 (10Ottomata) @joal, spark2.2.1 sharelib exists. Let me know if I can remove our test spark2_test0 one. [19:28:51] 10Analytics-Tech-community-metrics, 10Developer-Relations (Jan-Mar-2018): Explain decrease in number of patchset authors for same time span when accessed 3 months later - https://phabricator.wikimedia.org/T184427#4015304 (10Aklapper) >>! In T184427#3909716, @Aklapper wrote: > I saved the CSV data for Dec2017 a... [19:32:18] 10Analytics-Tech-community-metrics, 10Developer-Relations (Jan-Mar-2018): Explain decrease in number of patchset authors for same time span when accessed 3 months later - https://phabricator.wikimedia.org/T184427#4015311 (10Aklapper) The artifact we saw might have been related to * T185316#4013362 because unti... [19:34:30] 10Analytics-Kanban, 10Patch-For-Review: Spark 2.2.1 as cluster default (working with oozie) - https://phabricator.wikimedia.org/T159962#4015335 (10Ottomata) @mforns FYI, we'd like to get your Sanitize job merged before we proceed with this...and we're hoping we can do this next week! :D [19:37:43] 10Analytics-Tech-community-metrics, 10Developer-Relations (Jan-Mar-2018): Explain decrease in number of patchset authors for same time span when accessed 3 months later - https://phabricator.wikimedia.org/T184427#4015357 (10Aklapper) 05Open>03stalled [19:38:37] (03CR) 10Ottomata: [C: 031] "Nits on comment, but +1 from me!" (033 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/412939 (https://phabricator.wikimedia.org/T181064) (owner: 10Mforns) [19:44:40] 10Analytics: Stats on which clients/user-agents are making the most API edits - https://phabricator.wikimedia.org/T188549#4015369 (10Reedy) For reference, it requires ldap (wikitech) login details Thanks @Milimetric! (Why does #huggle use a unique UA per user?) {F14211548 size=full} [19:47:38] 10Analytics, 10EventBus, 10MediaWiki-JobQueue, 10Services (doing): Failed to acquire page lock in LinksUpdate - https://phabricator.wikimedia.org/T188106#4015380 (10Pchelolo) Tracked another instance and it's a bit different: 1. 2018-03-01T15:23:37 - original event fired 2. 2018-03-01T15:23:38 - `SlowTime... [19:53:57] nuria_: yt? [19:57:45] or milimetric? [19:57:59] hey ottomata [19:58:00] sup [19:58:02] hey so [19:58:05] https://phabricator.wikimedia.org/T186728 [19:58:08] agent_type [19:58:14] looking at how this is done for webrequest [19:58:41] https://github.com/wikimedia/analytics-refinery/blob/master/oozie/webrequest/load/refine_webrequest.hql#L98-L101 [19:58:43] so [19:58:47] device_family = Spider [19:58:48] fine. [19:58:49] but. [19:58:52] in eventlogging data [19:58:59] userAgent string has already been parsed (by EL) [19:59:15] so uhhh, dunno how i can really use this [19:59:15] since [19:59:26] it is some fancy regex joseph(?) made on userAgent? [19:59:30] https://phabricator.wikimedia.org/T108598 [20:00:08] hm, yeah, we'd have to do it in the EL processor [20:00:10] do you know, if the data needed to match spiderPattern would be available in one of the uaparser parsed ua fields? [20:00:31] I think it's just the UA itself that's needed [20:00:35] looking at refinery-source now [20:01:01] https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/Webrequest.java#L76-L92 [20:01:49] yeah, I mean it needs the full userAgent: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/Webrequest.java#L130 [20:02:01] yeah, but maybe that regex really only needs to match on a parsed field? [20:02:07] or, would we lose too much info from the raw string? [20:03:12] i'm inclined to re-add raw userAgnet into el data [20:03:16] and blacklist it from mysql [20:03:18] like we did for IP [20:03:20] hm, that's a good question, I'll do a query on the parsed UA map to see if I find examples of each of the possibilities in spiderPattern [20:03:28] if we can't do it otherwise [20:03:35] hm ok thanks [20:03:43] I don't have an easy answer but this'll tell us what we need [20:06:03] 10Analytics, 10EventBus, 10Availability (Multiple-active-datacenters), 10Services (watching), 10User-mobrovac: WANObjectCache relay daemon or mcrouter support - https://phabricator.wikimedia.org/T97562#4015427 (10mobrovac) [20:13:52] 10Analytics: Stats on which clients/user-agents are making the most API edits - https://phabricator.wikimedia.org/T188549#4011836 (10Anomie) >>! In T188549#4014856, @Milimetric wrote: > You can look it up on here, an example report: https://pivot.wikimedia.org/#webrequest/table/2/EQUQLgxg9AqgKgYWAGgN7APYAdgC5gQA... [20:16:05] 10Analytics: Stats on which clients/user-agents are making the most API edits - https://phabricator.wikimedia.org/T188549#4015453 (10Milimetric) @Reedy: careful with sharing UAs in public like this, though. That's why I didn't paste a list but a link behind authentication. [20:23:06] rawr, milimetric either way, i think we probably need to bring back raw ua for el data [20:23:17] its tricky on how to do this backwards compatible with other consumers, e.g. mysql.. [20:23:20] i can do it hacky, but not clean [20:24:30] ottomata: yeah, I found a couple of examples of user_agent that matches that regex and nothing in the user_agent_map seems to help, but I'm double checking [20:24:45] like, trying to find if there's a pattern [20:24:45] aye [20:27:38] yeah, there's not ottomata, some of it's just looking for stuff deep in the UA string [20:27:44] like "yahoo" [20:27:52] which none of the UA fields catch [20:28:11] OH Right we have msyql_mapper plugin, which will make this easier to maek backwareds compatible if we have to chagne hmmm [20:28:15] ok [20:28:16] hm [20:37:56] 10Analytics, 10Cloud-VPS, 10EventBus, 10Services (watching): Set up a Cloud VPS Kafka Cluster with replicated eventbus production data - https://phabricator.wikimedia.org/T187225#4015485 (10mobrovac) [21:00:53] 10Analytics, 10Analytics-EventLogging, 10Readers-Web-Backlog: Open up EventLogging's debug mode to certain users - https://phabricator.wikimedia.org/T188640#4015626 (10Krinkle) I've updated the summary to remove notion of user groups. This feature is in no way restricted or associated with user groups. It's... [21:08:41] 10Analytics, 10Analytics-EventLogging, 10Performance-Team, 10Readers-Web-Backlog: Open up EventLogging's debug mode to certain users - https://phabricator.wikimedia.org/T188640#4015640 (10Krinkle) [21:10:21] 10Analytics, 10Analytics-Wikistats: Wikistats Bug - differences in view stats for smaller wikipedias - https://phabricator.wikimedia.org/T188613#4015641 (10Kipala) So I look at https://stats.wikimedia.org/EN_Africa/Sitemap.htm Wednesday May 31, 2017 ( TABLE 1) The figures for the hourly pageviews in May 2... [21:13:27] 10Analytics, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 2 others: Support dynamic rates for ChangeProp - https://phabricator.wikimedia.org/T188667#4015643 (10Pchelolo) p:05Triage>03Normal [21:16:15] 10Analytics, 10Analytics-Wikistats: Wikistats Bug - differences in view stats for smaller wikipedias - https://phabricator.wikimedia.org/T188613#4015674 (10Kipala) Table 1 is "Wikipedia Statistics - Africa" (the present table, as per 31 May 2017) Table 2 is "Wikipedia Statistics" (the present table, as per 30... [21:17:53] 10Analytics: Update user_history and page_history column naming convention - https://phabricator.wikimedia.org/T188669#4015680 (10Milimetric) [21:31:12] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: Bring back raw user_agent in EventLogging data so we can do further processing in Hadoop - https://phabricator.wikimedia.org/T188673#4015747 (10Ottomata) [21:42:31] nuria_: hiiii [21:44:50] (03PS3) 10Milimetric: [WIP] Compute geowiki statistics for Druid from cu_changes data [analytics/refinery] - 10https://gerrit.wikimedia.org/r/413265 [22:29:04] 10Analytics-Kanban, 10User-Elukey: Reduction of stat1005's disk space usage - https://phabricator.wikimedia.org/T186776#4015910 (10ezachte) just a quick heads-up: I (finally) managed to regain access to stat1005, and will start cleanup tomorrow [23:35:33] ottomata[m]: sorry, totally missed your ping [23:35:45] ottomata[m]: but i did look at the code [23:36:15] neilpquinn: does mediawiki store anywhere teh editor with wich the edits where made? (VE, wikitext...) [23:36:16] *the