[05:03:48] (03PS2) 10Nuria: Automate calculations for number of pages using wikidata items [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/593092 (https://phabricator.wikimedia.org/T247099) [05:05:45] (03CR) 10Nuria: "@mforns I have now tested the sql, let me know if you see anything not correct with the conventions we abide by in reportupdater" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/593092 (https://phabricator.wikimedia.org/T247099) (owner: 10Nuria) [05:08:15] (03PS3) 10Nuria: Automate calculations for number of pages using wikidata items [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/593092 (https://phabricator.wikimedia.org/T247099) [05:23:17] 10Analytics, 10Pageviews-Anomaly: Abnormal peaks @ huwiki - https://phabricator.wikimedia.org/T249792 (10Nuria) Closing, this was a bot and traffic should be flagged as "automated" going forward. [05:23:23] 10Analytics, 10Pageviews-Anomaly: Abnormal peaks @ huwiki - https://phabricator.wikimedia.org/T249792 (10Nuria) 05Open→03Resolved [05:23:25] 10Analytics, 10Research-Backlog: [Open question] Improve bot identification at scale - https://phabricator.wikimedia.org/T138207 (10Nuria) [05:24:29] 10Analytics: Correct pageview_hourly and derived data for T141506 - https://phabricator.wikimedia.org/T175870 (10Nuria) 05Open→03Declined [05:24:32] 10Analytics, 10Research-Backlog: [Open question] Improve bot identification at scale - https://phabricator.wikimedia.org/T138207 (10Nuria) [05:26:47] 10Analytics, 10Pageviews-API: Pageview API: Better filtering of bot traffic on top enpoints - https://phabricator.wikimedia.org/T123442 (10Nuria) Closing, automated marker has been deployed and top endpoints will not be reporting data marked as 'automated". See: https://wikitech.wikimedia.org/wiki/Analytics/Da... [05:26:56] 10Analytics, 10Pageviews-API: Pageview API: Better filtering of bot traffic on top enpoints - https://phabricator.wikimedia.org/T123442 (10Nuria) 05Open→03Resolved [05:26:58] 10Analytics, 10Research-Backlog: [Open question] Improve bot identification at scale - https://phabricator.wikimedia.org/T138207 (10Nuria) [05:29:41] 10Analytics: https://www.tracemyfile.com/ is a bot, UA: Mozilla/5.0 (compatible; tracemyfile/1.0) - https://phabricator.wikimedia.org/T212486 (10Nuria) We see at this time 500 reqs per day, quite small . Closing. [05:29:50] 10Analytics: https://www.tracemyfile.com/ is a bot, UA: Mozilla/5.0 (compatible; tracemyfile/1.0) - https://phabricator.wikimedia.org/T212486 (10Nuria) 05Open→03Declined [05:29:58] 10Analytics: https://www.tracemyfile.com/ is a bot, UA: Mozilla/5.0 (compatible; tracemyfile/1.0) - https://phabricator.wikimedia.org/T212486 (10Nuria) 05Declined→03Resolved [05:31:00] 10Analytics, 10Analytics-Wikistats, 10Wikidata, 10User-Addshore: Wikistats for Wikidata lists several bots as normal users - https://phabricator.wikimedia.org/T59379 (10Nuria) 05Open→03Declined [05:33:27] 10Analytics, 10MediaWiki-General, 10Pageviews-Anomaly: Check abnormal pageviews for some pages on itwiki - https://phabricator.wikimedia.org/T209404 (10Nuria) closing , automated marker added to pageviews which will address issues such us these, top pageview traffic going forward is just traffic not marked a... [05:33:33] 10Analytics, 10MediaWiki-General, 10Pageviews-Anomaly: Check abnormal pageviews for some pages on itwiki - https://phabricator.wikimedia.org/T209404 (10Nuria) 05Open→03Resolved [05:34:40] nuria o/ [05:34:48] 10Analytics, 10Analytics-Wikistats: Unexpected increase in traffic for 4 languages in same region, on smaller projects - https://phabricator.wikimedia.org/T136084 (10Nuria) Automated marker deployed to pageview data. Please see: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/BotDetection [05:34:55] fdans: time warp! [05:35:17] phabricator evening [05:36:03] nothing like sitting in a comfy chair, throwing over a blanket, opening a bottle of wine, and go through a few hundred phab tasks [05:37:25] fdans: ayayayay [05:39:42] 10Analytics: small bot activity marked as user in Manuel_de_Pedrolo page - https://phabricator.wikimedia.org/T213148 (10Nuria) Notice automated marker to many of the pageviews to this page in recent times: https://tools.wmflabs.org/pageviews/?project=eu.wikipedia.org&platform=all-access&agent=automated&redirects... [05:40:37] 10Analytics, 10Research-Backlog: [Open question] Improve bot identification at scale - https://phabricator.wikimedia.org/T138207 (10Nuria) [05:40:38] 10Analytics: small bot activity marked as user in Manuel_de_Pedrolo page - https://phabricator.wikimedia.org/T213148 (10Nuria) [05:42:04] 10Analytics: Bot from an Azure cloud cluster is causing a false pageview spike (can we identify as bot?) - https://phabricator.wikimedia.org/T137454 (10Nuria) 05Open→03Resolved [05:42:06] 10Analytics, 10Research-Backlog: [Open question] Improve bot identification at scale - https://phabricator.wikimedia.org/T138207 (10Nuria) [05:42:30] 10Analytics, 10Analytics-Wikistats: Unexpected increase in traffic for 4 languages in same region, on smaller projects - https://phabricator.wikimedia.org/T136084 (10Nuria) 05Open→03Resolved [05:42:32] 10Analytics, 10Research-Backlog: [Open question] Improve bot identification at scale - https://phabricator.wikimedia.org/T138207 (10Nuria) [05:43:14] 10Analytics: small bot activity marked as user in Manuel_de_Pedrolo page - https://phabricator.wikimedia.org/T213148 (10Nuria) 05Open→03Resolved [05:43:16] 10Analytics, 10Research-Backlog: [Open question] Improve bot identification at scale - https://phabricator.wikimedia.org/T138207 (10Nuria) [05:44:24] 10Analytics: clear bot spam-scraping [[en:United States Senate]] not being detected as a bot - https://phabricator.wikimedia.org/T247085 (10Nuria) 05Open→03Resolved [05:44:58] 10Analytics: clear bot spam-scraping [[en:United States Senate]] not being detected as a bot - https://phabricator.wikimedia.org/T247085 (10Nuria) See automated marker added: https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org&platform=all-access&agent=automated&redirects=0&start=2020-02-01&end=2020-05... [05:46:14] 10Analytics: Possible bot traffic from SA not categorized as bot traffic but regular pageviews - https://phabricator.wikimedia.org/T249835 (10Nuria) Automated marker was deployed late Aprill so it would not have caught this event thus declining, should not happen going forward [05:46:56] 10Analytics: Possible bot traffic from SA not categorized as bot traffic but regular pageviews - https://phabricator.wikimedia.org/T249835 (10Nuria) Please see docs: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/BotDetection as i said, effective from Late April onwards [05:47:13] 10Analytics: Possible bot traffic from SA not categorized as bot traffic but regular pageviews - https://phabricator.wikimedia.org/T249835 (10Nuria) 05Open→03Declined [05:51:55] 10Analytics, 10Pageviews-Anomaly: Manipulation of pageview statistics German Wikipedia - https://phabricator.wikimedia.org/T232992 (10Nuria) Automated marker is been deployed, issues such as these should be mitigated going forward: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/BotDetection [05:52:03] 10Analytics, 10Pageviews-Anomaly: Manipulation of pageview statistics German Wikipedia - https://phabricator.wikimedia.org/T232992 (10Nuria) 05Open→03Resolved [05:52:05] 10Analytics, 10Research-Backlog: [Open question] Improve bot identification at scale - https://phabricator.wikimedia.org/T138207 (10Nuria) [05:52:44] (03CR) 10Fdans: "The job has been now successfully tested" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) (owner: 10Fdans) [06:01:19] goood morning [06:01:46] the new druid nodes for the public cluster have almost caught up (segments wise) with the rest of the cluster [06:32:15] I'll try to add the new hosts behind the load balancer IP that AQS calls [07:08:08] 10Analytics, 10Dumps-Generation, 10Core Platform Team Workboards (Clinic Duty Team), 10Patch-For-Review: page_restrictions field incomplete in current and historical dumps - https://phabricator.wikimedia.org/T251411 (10Naike) 05Open→03Stalled [07:09:04] 10Analytics, 10MediaWiki-extensions-WikimediaEvents, 10Core Platform Team Workboards (Clinic Duty Team), 10Performance-Team (Radar): Remove usage of MEDIAWIKI_JOB_RUNNER from WikimediaEvents extension - https://phabricator.wikimedia.org/T247130 (10Naike) 05Open→03Stalled [07:09:11] !log add druid100[7,8] to the LVS druid-public-brokers service (serving AQS's traffic) [07:09:12] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:09:21] new hosts are serving traffic now, everything seems fine [07:34:55] brb [08:13:10] (03CR) 10Elukey: [V: 03+2 C: 03+2] Release upstream version 0.36.0 [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/591440 (https://phabricator.wikimedia.org/T249495) (owner: 10Elukey) [08:15:06] !log superset down for maintenance [08:15:08] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:23:34] of course the db upgrade shows a new error on analytics-tool1004 that I didn't see in staging [08:23:37] sigh [08:31:37] interesting, in the superset_staging.dbs table, `encrypted_extra` is blob, meanwhile in superset_production is text [08:31:46] I am not sure how this happened [08:32:38] and the col is full of nulls [09:01:51] (03PS19) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [09:04:13] nope can't really solve the issue, need to rollback sigh [09:07:30] (03PS1) 10Elukey: Revert "Release upstream version 0.36.0" [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/598001 [09:08:43] (03CR) 10Elukey: [V: 03+2 C: 03+2] Revert "Release upstream version 0.36.0" [analytics/superset/deploy] - 10https://gerrit.wikimedia.org/r/598001 (owner: 10Elukey) [09:11:15] !log superset upgrade attempt to 0.36 failed due to a db upgrade error (not seen in staging), rollback to 0.35.2 [09:11:17] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:11:22] sigh [09:12:01] ok superset restored, shoild be good [09:14:19] if anybody can double check that superset is fine I'd appreciate it [09:47:55] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: EventLogging data missing from event_sanitized schemas - https://phabricator.wikimedia.org/T253182 (10mforns) OK, it seems everything went well. I checked several tables (including the ones in the task description) and all have complete data for this peri... [09:56:57] 10Analytics, 10Cassandra, 10observability, 10Puppet, and 2 others: Upgrade prometheus-jmx-exporter on all services using it - https://phabricator.wikimedia.org/T192948 (10fgiunchedi) [10:01:53] * elukey early lunch + errand [11:16:46] (03PS20) 10Fdans: Add Pageviews Complete dumps backfilling job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/597541 (https://phabricator.wikimedia.org/T252857) [11:21:03] I'm so hungry [11:23:42] fdans: eat then? [11:25:01] RhinosF1: sounds like an obvious solution [11:25:03] but it isn't [11:25:20] why? [11:25:31] when you're confined with your Galician parents lunch happens at 3pm [11:28:36] Ah [11:58:54] 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics, 10Patch-For-Review: Upgrade to Superset 0.36.0 - https://phabricator.wikimedia.org/T249495 (10elukey) For some strange reason, when I attempted the upgrade I have got the following error during the db upgrade step (not got when doing the same i... [12:18:31] so I tracked down the issue [12:18:52] sqlalchemy 1.3.15 vs 1.3.17 [12:19:07] in .17 they add a check that raise the exception [12:19:11] sigh [12:30:25] no this is not true! Still need to find the issue.. [14:01:17] 10Analytics, 10Analytics-Kanban: Add new Druid nodes to analytics and public clusters - https://phabricator.wikimedia.org/T252771 (10elukey) [14:02:07] 10Analytics, 10Analytics-Kanban: Add new Druid nodes to analytics and public clusters - https://phabricator.wikimedia.org/T252771 (10elukey) a:03elukey [14:05:12] 10Analytics, 10Analytics-Kanban: Move the Analytics infrastructure to Debian Buster - https://phabricator.wikimedia.org/T234629 (10elukey) For the stat100x, Kafka and Druid nodes it would be great that the Partman recipe letf /srv intact, but it doesn't seem feasible at the moment: https://phabricator.wikimedi... [14:05:46] * elukey brb! [14:20:28] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: EventLogging data missing from event_sanitized schemas - https://phabricator.wikimedia.org/T253182 (10Nuria) 05Open→03Resolved [14:31:21] 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics, 10Patch-For-Review: Upgrade to Superset 0.36.0 - https://phabricator.wikimedia.org/T249495 (10elukey) Posted a question in https://github.com/apache/incubator-superset/pull/8493#issuecomment-632721492 [14:45:03] 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics, 10Patch-For-Review: Upgrade to Superset 0.36.0 - https://phabricator.wikimedia.org/T249495 (10Nuria) The #9878 is not an old bug right? The version of mysql used by the fellow reporting is, but this is an issue with alchemy running the migrati... [14:54:33] 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics, 10Patch-For-Review: Upgrade to Superset 0.36.0 - https://phabricator.wikimedia.org/T249495 (10elukey) >>! In T249495#6158443, @Nuria wrote: > The #9878 is not an old bug right? The version of mysql used by the fellow reporting is, but this is a... [15:00:04] 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics, 10Patch-For-Review: Upgrade to Superset 0.36.0 - https://phabricator.wikimedia.org/T249495 (10Nuria) I think this is the migration bit that is throwing an error cause the text filed might require a specific length that is not specified. try:... [15:04:04] 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics, 10Patch-For-Review: Upgrade to Superset 0.36.0 - https://phabricator.wikimedia.org/T249495 (10elukey) @Nuria we don't have postgres, the problem comes from the except part of the migration script that fails IIUC from the stacktrace. I had alrea... [15:26:59] going off a little earlier today, but ping me if needed! (on the phone :) [15:27:00] (03CR) 10Mforns: "Left a couple comments on string interpolation and output format." (034 comments) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/593092 (https://phabricator.wikimedia.org/T247099) (owner: 10Nuria) [15:27:02] o/ [15:27:21] byeeeee [15:45:21] 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics, 10Patch-For-Review: Upgrade to Superset 0.36.0 - https://phabricator.wikimedia.org/T249495 (10elukey) Ok I think I know what went wrong, and... drum roll... it is of course my fault :D What I did to test this version has been: 1) pick 0.36 an... [16:15:45] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: EventLogging data missing from event_sanitized schemas - https://phabricator.wikimedia.org/T253182 (10MMiller_WMF) @mforns -- the data looks good to me now. Thank you. [16:34:12] 10Analytics, 10Better Use Of Data, 10Event-Platform: Document in-schema who sets which fields - https://phabricator.wikimedia.org/T253392 (10mpopov) [16:39:28] 10Analytics: analytics.wikimedia.org TLC - https://phabricator.wikimedia.org/T253393 (10mforns) [17:46:05] 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics, 10Patch-For-Review: Upgrade to Superset 0.36.0 - https://phabricator.wikimedia.org/T249495 (10Nuria) Sounds great, let me know if you need me to re-tests in staging. [17:52:11] (03PS4) 10Nuria: Automate calculations for number of pages using wikidata items [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/593092 (https://phabricator.wikimedia.org/T247099) [17:54:14] (03PS5) 10Nuria: Automate calculations for number of pages using wikidata items [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/593092 (https://phabricator.wikimedia.org/T247099) [17:54:30] (03CR) 10Nuria: "Corrected issues and added config" (033 comments) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/593092 (https://phabricator.wikimedia.org/T247099) (owner: 10Nuria) [18:09:13] (03CR) 10Mforns: "LGTM! One typo comment." (031 comment) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/593092 (https://phabricator.wikimedia.org/T247099) (owner: 10Nuria) [18:10:22] (03PS6) 10Nuria: Automate calculations for number of pages using wikidata items [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/593092 (https://phabricator.wikimedia.org/T247099) [18:11:28] (03CR) 10Nuria: Automate calculations for number of pages using wikidata items (031 comment) [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/593092 (https://phabricator.wikimedia.org/T247099) (owner: 10Nuria) [18:12:19] nuria: want me to test that in RU? [18:12:46] mforns: how would i do that? I tested the select works fine but what else can i do? [18:16:17] nuria, I usually run it in stat1007 with reportupdater: clone reportupdater, clone reportupdater-queries and execute: python3 ~/reportupdater/update_reports.py ~/reportupdater-queries/ ~/reportupdater-output [18:16:38] mforns: ok, will do that [18:16:54] nuria: this will run all reports for the query folder, so if you want to avoid the other report running, then you can comment it out in the config file [18:17:22] mforns: k, will do that in sec [18:49:28] hi, does someone knows if there is a way to identify articles that has been created with thre Content Translation Tool? I found a table in the wikishared db, but it doesn't say if the article is new or not. And apparently that table is not stored in the hive. [19:23:27] dsaez: right , wikishared is not imported currently as it is part of an extension only available for some wikis, it can be done, it is just an infrequent case taht extension specific tables are useful [19:25:22] dsaez: the easiest way to see whether a page was created with content translation is looking at page tags [19:25:51] dsaez: see https://superset.wikimedia.org/r/220 [19:25:59] hi nuria, thx. The problem with tags is that they are language dependent. [19:26:07] Oh, great, I'll check that [19:26:12] dsaez: tag is 'contenttranslatation' [19:26:50] dsaez: these tags are content dependent? are you sure cause they are the same ones to split mobile/desktop edits for example [19:27:07] nuria,a in case I want to import that table (or any other) to hdfs, should I use sqoop? [19:27:25] nuria, I've seen in french that tag was different, something in french [19:27:32] dsaez: ya, you can import using scoop into your local space [19:28:01] dsaez: some examples: https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Administration#Sqoop [19:29:09] nuria: cool! I give try, and use the opportunity to learn how to use sqoop [19:29:12] thx [20:05:20] 10Analytics: Better redirect handling for pageview API - https://phabricator.wikimedia.org/T121912 (10MusikAnimal) Could this be re-prioritized? As I understand it, we are now processing wikitext so including the pageviews of redirects should be possible. I keep getting questions about pageviews of all COVID-rel... [20:37:30] mforns: does reportupdater execute with python 2.7 or python3? [20:37:48] nuria: python3 I think [20:38:07] we migrated it a while ago [20:38:14] mforns: it has syntax like passed_params = {k: v for k, v in passed_params.iteritems() if v is not None} [20:38:27] mforns: which i think no longer works on python3 [20:38:33] oh, hm [20:38:44] maybe we never migrated it? [20:38:54] mforns: i think we never did [20:38:57] or maybe it used to work with earlier versions of python 3? [20:39:19] 10Analytics: reportupdater shoudl run with python3 - https://phabricator.wikimedia.org/T253418 (10Nuria) [20:39:28] mforns: just created ticket [20:39:33] ok [20:40:04] mforns: and since pip is not installed on stats 1007 how do we get the other deps for reportupdater like pymsql [20:41:01] nuria: hm, does reportupdater fail in stat1007 for you? [20:41:29] mforns: it does not get the deps (which makes sense) from my local copy, wait maybe i can call the global instance [20:41:30] maybe try an-launcher1001 then [20:41:40] reportupdater is running from there currently no? [20:41:57] mforns: from an-launcher? [20:41:59] yes [20:47:53] mforns: wait, do reportupdater jobs need a puppet patch as well? https://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/reportupdater/jobs/mysql.pp [20:48:41] nuria: only if they introduce a new query folder, in this case the puppet snippet is already there, because the query folder has another report [20:48:53] mforns: i see [20:48:57] structured-data [20:50:55] mforns: but how about wmcs? [20:51:07] mforns: thsi job is been done for a while but has no puppet entry [20:51:17] mforns: it is a hive job just liek this one [20:51:20] *like [20:51:27] mforns: so that one shoudl have one right? [20:51:31] *should [20:51:32] yes [20:51:35] one sec [20:51:44] 10Analytics, 10Analytics-Kanban, 10Cloud-Services, 10Developer-Advocacy: Data missing on the hierarchical view on the wmcs-edits tool - https://phabricator.wikimedia.org/T252915 (10Nuria) @milimetric, doesn't this job need a puppet entry at: https://github.com/wikimedia/puppet/blob/production/modules/profi... [20:52:38] nuria: there's two reportupdater job files in puppet, one is mysql and the other is hive [20:52:49] the wmcs one is in the hive one [20:53:26] mforns: i see [20:53:30] together with structured-data, no? [20:53:53] mforns: yes yes [20:54:06] 10Analytics, 10Analytics-Kanban, 10Cloud-Services, 10Developer-Advocacy: Data missing on the hierarchical view on the wmcs-edits tool - https://phabricator.wikimedia.org/T252915 (10Nuria) nevermind it is athttps://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/reportupdater/jobs.pp#L59