[01:15:30] 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, 10WMDE-Analytics-Engineering, 10User-GoranSMilovanovic: Data set review for the Wiktionary Cognate Dashboard - https://phabricator.wikimedia.org/T199851 (10GoranSMilovanovic) @Milimetric Thank you very much. > In general if you're just analyzing... [06:22:21] 10Analytics-Legal, 10WMF-Legal, 10Wikidata: Solve legal uncertainty of Wikidata - https://phabricator.wikimedia.org/T193728 (10Psychoslave) Hi @Pintoch if a license is compatible with CC0 requirements, then yes it can be imported into any dataset covered by CC0, including Wikidata. The link you are providin... [07:12:09] Morning team :) [08:22:12] (03CR) 10Zhuyifei1999: "I'll see how to build this in Cloud on August 9th." [analytics/quarry/web] - 10https://gerrit.wikimedia.org/r/440007 (https://phabricator.wikimedia.org/T192698) (owner: 10Framawiki) [09:09:42] 10Analytics, 10Contributors-Analysis, 10Product-Analytics: Decommision edit analysis dashboard - https://phabricator.wikimedia.org/T199340 (10Neil_P._Quinn_WMF) [09:25:45] 10Analytics, 10Product-Analytics: Load change_tag tables into the Analytics Data Lake on a daily basis - https://phabricator.wikimedia.org/T201062 (10Neil_P._Quinn_WMF) >>! In T201062#4477232, @Milimetric wrote: > I see @chelsyx. We only save the user agent string in one place, the recentchanges and cu_change... [12:21:50] !log Warning in webrequest-upload-2018-8-1-13 contained only false positives [12:21:51] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:23:48] Hi milimetric - Let me know when you're in and have a minute :) [12:32:25] 10Analytics, 10Wikimedia-Incident, 10cloud-services-team (Kanban): Alarms on throughput on refined data - https://phabricator.wikimedia.org/T198908 (10JAllemandou) Sounds good to me! Maybe we could put a bigger list of topics to check, to multiply the probability of catching errors, but except from that sou... [12:34:43] 10Analytics: Repairing Partitions Breaks - https://phabricator.wikimedia.org/T201100 (10JAllemandou) This problem already happened last month and @fdans is working on it as part of T198600. I'll merge this task in the other one as a duplicate. [12:35:25] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Correct data-removal jobs for mediawiki tables (public and private) - https://phabricator.wikimedia.org/T198600 (10JAllemandou) [12:35:27] 10Analytics: Repairing Partitions Breaks - https://phabricator.wikimedia.org/T201100 (10JAllemandou) [13:03:04] o/ [13:03:12] heya mforns, what was the deal with your IRC nick being banned? [13:03:19] when I first sign on and try to auto-join this channel [13:03:20] it says that [13:03:27] but then I can eventually join and its fine [13:05:01] wow - there have been some banning last week ottomata ? [13:06:07] yeah, but i don't know the details [13:06:09] joal hiIII! [13:06:17] Hello ottomata :) [13:06:21] How was Ukraine? [13:06:44] oh man [13:06:46] so so amazing [13:06:52] so beauitiful [13:07:51] :) [13:07:55] Have you sang a lot? [13:08:03] 10Analytics, 10Analytics-Wikistats: Getting historical country data for WikiSource - https://phabricator.wikimedia.org/T201176 (10JAllemandou) Hi @Astinson :) Indeed there is more data available. Not since beginning of wiki times, but at least a few years. Data is available by months through API calls of the f... [13:08:29] joal: o/ [13:08:38] Hi bmansurov :) [13:08:42] joal: have you moved wikidata parquet files somewhere else? [13:08:57] /user/joal/wikidata/parquet seems missing [13:09:01] bmansurov: oh man, I think I did [13:09:05] bmansurov: I'm very sorry [13:09:08] joal yeah we sang A LOT [13:09:10] no problem [13:09:17] ottomata: <3 [13:09:22] * joal loves singing people :) [13:10:02] joal doesn't love when people stop singing lol ;))) [13:10:57] joal: where's the new location? [13:11:05] bmansurov: looking for it now [13:11:12] joal: also, do you think I should copy those files over to my folder? [13:11:14] ok thanks [13:11:41] bmansurov: not needed really, I reorganized a bit when doing some cleaning and forgot I gave you those localtions [13:11:51] ok cool [13:13:26] bmansurov: /user/joal/wmf/data/wmf/mediawiki/wikidata_parquet/20180108 [13:13:34] joal: thanks! [13:13:54] bmansurov: everything I play with (wikidata, wikitext) can be found in /user/joal/wmf/data/wmf/mediawiki [13:14:07] got it [13:16:39] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Spark Jupyter Notebook integration - https://phabricator.wikimedia.org/T190443 (10diego) With pyspark I'm getting this error a lot (even when I'm working with small datasets, for example a list of 1 million integers): ``` Reason: Container killed by YAR... [13:20:31] 10Analytics, 10Analytics-Wikistats: Underreporting WikiSource edits? - https://phabricator.wikimedia.org/T201177 (10JAllemandou) Hi again @Astinson, The `wikisource` item in Wikistats selector refers to the `wikisource.org` url, not every wikisource bundled together. See https://stats.wikimedia.org/v2/#/en.wik... [13:30:29] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Services (watching): Modern Event Platform: Scalable Event Intake - https://phabricator.wikimedia.org/T201068 (10Ottomata) The EventLogging javascript (optionally?) does client some client side validation. If this fails, the event i... [13:32:04] 10Analytics, 10EventBus, 10Operations, 10Discovery-Search (Current work), and 2 others: Create kafka topic for mjolinr bulk daemon and decide on cluster - https://phabricator.wikimedia.org/T200215 (10Ottomata) Not 100%, but I believe MirrorMaker will handle the additional throughput. If it doesn't, it wil... [13:40:47] a-team: holaaa [13:40:57] nuria_: hiiiiaiiaiai [13:43:32] hi nuria_! [13:45:09] Hi nuria_ :) [13:45:17] joal! Hi! [13:45:18] hola hola [13:45:18] :) [13:45:31] Hi milimetric ! [13:47:14] joal: ok, you wanted to chat? [13:47:17] cave? [13:47:54] milimetric: yes, wanted to say sorry for the sqoop failure, and to let you know about the partition-repair issue that is a known problem [13:47:59] sure milimetric [13:59:12] (03CR) 10Mforns: "A .DS_Store file slipped into the commit! Also, a couple inline comments. Otherwise code looks good!" (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/447665 (https://phabricator.wikimedia.org/T197971) (owner: 10Fdans) [13:59:42] (03CR) 10Mforns: [C: 031] Update mediawiki_history_reduced datasource name [analytics/refinery] - 10https://gerrit.wikimedia.org/r/448504 (owner: 10Joal) [14:04:51] 10Analytics, 10Analytics-Dashiki: Update mixing-deep in dashiki - https://phabricator.wikimedia.org/T201323 (10Nuria) [14:05:47] 10Analytics, 10Analytics-Dashiki: Update mixing-deep in dashiki - https://phabricator.wikimedia.org/T201323 (10Nuria) 05Open>03Resolved [14:07:20] 10Analytics, 10Analytics-Kanban, 10wikimediafoundation.org: Measure traffic for new wikimedia foundation site - https://phabricator.wikimedia.org/T188419 (10Nuria) In order to access piwik (and any internal tool) you need to have ldap credentials (a user and a password) once you enter those the wikimedia15... [14:16:15] (03CR) 10Mforns: [C: 031] "LGTM!" (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/448551 (https://phabricator.wikimedia.org/T197889) (owner: 10Fdans) [14:16:55] 10Analytics, 10Analytics-Wikistats: Underreporting WikiSource edits? - https://phabricator.wikimedia.org/T201177 (10Astinson) @JAllemandou That is not clear on the labeling of that item -- it says "Sources" which suggests a plural set of projects. is there anyway we could run a report that produces the coun... [14:19:10] (03CR) 10Mforns: [C: 032] "Sorry for the delay, LGTM!" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/450050 (https://phabricator.wikimedia.org/T200864) (owner: 10Cicalese) [14:31:18] (03CR) 10Mforns: [C: 031] "LGTM! Just one suggestion" (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/445395 (https://phabricator.wikimedia.org/T198600) (owner: 10Fdans) [14:33:37] 10Analytics, 10Analytics-Cluster, 10cloud-services-team (Kanban): Upgrade spark 2.3.0 -> 2.3.1 on analytics cluster - https://phabricator.wikimedia.org/T200732 (10Ottomata) [14:34:09] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban: Upgrade spark 2.3.0 -> 2.3.1 on analytics cluster - https://phabricator.wikimedia.org/T200732 (10Ottomata) [14:34:22] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban: Upgrade spark 2.3.0 -> 2.3.1 on analytics cluster - https://phabricator.wikimedia.org/T200732 (10Ottomata) a:03Ottomata [14:35:26] (03CR) 10Mforns: [C: 04-1] "Code looks good, but the line-chart's popup flickers whenever I hover over the popup. Let's leave this on pause for now. We can decide to " [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/448852 (https://phabricator.wikimedia.org/T192416) (owner: 10Sahil505) [14:40:52] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban: Upgrade spark 2.3.0 -> 2.3.1 on analytics cluster - https://phabricator.wikimedia.org/T200732 (10Ottomata) deb built: https://apt.wikimedia.org/wikimedia/pool/main/s/spark2/ Tested in labs with Refine job, works fine. @JAllemandou any objects to upgrad... [14:52:03] (03PS1) 10Ottomata: Add --hive-server-url flag to Refine job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/450581 [14:59:20] 10Analytics, 10Analytics-Kanban, 10wikimediafoundation.org: Measure traffic for new wikimedia foundation site - https://phabricator.wikimedia.org/T188419 (10Nuria) Piwik is registering about 5000 visits per day [15:08:19] I'm back in !! [15:10:38] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban: [EL sanitization] Retroactively sanitize (including hash and salt appInstallId fields) data in the events database - https://phabricator.wikimedia.org/T199902 (10mforns) [15:11:52] (03PS1) 10Mforns: Change appInstallID labels to hash in EL sanitization whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/450588 (https://phabricator.wikimedia.org/T199902) [15:16:08] hey I'm back in :] [15:18:32] (03CR) 10Nuria: Filter out unwanted wikis from wmf.virtualpageview_hourly (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/447665 (https://phabricator.wikimedia.org/T197971) (owner: 10Fdans) [15:35:06] been logged out from google... [15:57:07] (03CR) 10Ottomata: [C: 032] Add --hive-server-url flag to Refine job [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/450581 (owner: 10Ottomata) [15:57:32] ooo joal i want to start back up that revision-score discussion with ya later this week [16:01:02] hi ottomata, I have two questions for you [16:01:12] yaaa [16:01:18] where were you in Ukraine? [16:01:53] ping joal still blocked? [16:02:16] and, have you seen my last comment in the Jupyter notebooks ticket? [16:02:37] dsaez: all over, mostly the north and west [16:02:40] little villages [16:02:43] singing with babusi [16:02:46] (grandmas) [16:02:48] dsaez: yes [16:02:51] babusi? [16:02:54] haven't looked into it yet though [16:03:02] baba == grandma [16:03:05] babasi == grandmas [16:03:14] grandmothers [16:03:20] oh, I see [16:03:29] I have some collaborators in Lviv [16:03:38] been twice in the last year [16:03:45] oh cool [16:03:47] yea we went there [16:04:16] very nice place, good people, good food, good prices [16:04:40] ping milimetric [16:05:09] ya [16:23:27] 10Analytics, 10EventBus, 10Operations, 10Discovery-Search (Current work), and 2 others: Create kafka topic for mjolinr bulk daemon and decide on cluster - https://phabricator.wikimedia.org/T200215 (10EBernhardson) >>! In T200215#4481210, @Ottomata wrote: > Not 100%, but I believe MirrorMaker will handle th... [16:23:44] 10Analytics, 10EventBus, 10Operations, 10Discovery-Search (Current work), and 2 others: Create kafka topic for mjolinr bulk daemon and decide on cluster - https://phabricator.wikimedia.org/T200215 (10EBernhardson) [16:27:34] 10Analytics, 10EventBus, 10Operations, 10Discovery-Search (Current work), and 2 others: Create kafka topic for mjolinr bulk daemon and decide on cluster - https://phabricator.wikimedia.org/T200215 (10Ottomata) Oh yes, singular is better, that was a typo great. [16:31:39] 10Analytics, 10Analytics-Cluster, 10Analytics-Kanban: Upgrade spark 2.3.0 -> 2.3.1 on analytics cluster - https://phabricator.wikimedia.org/T200732 (10JAllemandou) I have not tested but I don't see why it would break :) Let's go ! [16:42:44] 10Analytics, 10Analytics-Wikistats: Wikistats 2.0: allow to view stats for all language versions (a.k.a. Project families) - https://phabricator.wikimedia.org/T188550 (10Nuria) [16:42:46] 10Analytics, 10Analytics-Wikistats: Underreporting WikiSource edits? - https://phabricator.wikimedia.org/T201177 (10Nuria) [16:43:50] 10Analytics, 10Analytics-Wikistats: Underreporting WikiSource edits? - https://phabricator.wikimedia.org/T201177 (10Milimetric) @Astinson the ultimate solution to this is the parent task that Nuria just added. If you need answers quicker, I'm happy to help you write some sql and get to some answers. Ping me... [16:44:20] 10Analytics, 10Analytics-Wikistats: Underreporting WikiSource edits? - https://phabricator.wikimedia.org/T201177 (10Nuria) @Astinson : labelling comes from sitematrix : https://meta.wikimedia.org/w/api.php?action=sitematrix Please fix as needed be, we just present the labeling there. This is the entry on the s... [16:46:30] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops, 10Patch-For-Review: Review analytics-in4/6 rules on cr1/cr2 eqiad - https://phabricator.wikimedia.org/T198623 (10ayounsi) No flows logged since at least the last 3 days. So it looks fine to me. I removed the syslog statement to minimize noise while... [16:46:30] ottomata: so can we create the topics now? :) [16:46:34] ya [16:46:40] \o/ [16:48:13] 10Analytics, 10EventBus, 10Operations, 10Discovery-Search (Current work), and 2 others: Create kafka topic for mjolinr bulk daemon and decide on cluster - https://phabricator.wikimedia.org/T200215 (10Ottomata) Done. [16:50:22] Gone for diner team, back after [16:50:26] 10Analytics, 10EventBus, 10Operations, 10Discovery-Search (Current work), and 2 others: Create kafka topic for mjolinr bulk daemon and decide on cluster - https://phabricator.wikimedia.org/T200215 (10EBernhardson) Thanks! [16:57:51] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Services (watching): Modern Event Platform: Scalable Event Intake - https://phabricator.wikimedia.org/T201068 (10phuedx) >>! In T201068#4476098, @mobrovac wrote: > Could you elaborate on that? There would be validation upon event me... [17:01:15] (03CR) 10Nuria: [V: 032 C: 032] Change appInstallID labels to hash in EL sanitization whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/450588 (https://phabricator.wikimedia.org/T199902) (owner: 10Mforns) [17:02:18] thx nuria, this should be depoyed together with the already merged scala code, otherwise EL sanitization job would fail [17:02:50] we'll surely deploy both together this week, but just fyi [17:11:32] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Services (watching): Modern Event Platform: Scalable Event Intake - https://phabricator.wikimedia.org/T201068 (10mobrovac) IMHO, relying on client libraries for validation is not really an option if we want to ensure the well-functi... [17:13:45] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Services (watching): Modern Event Platform: Scalable Event Intake - https://phabricator.wikimedia.org/T201068 (10Ottomata) Yaya, I don't think anyone wants to get rid of server side validation; that will always be present. The sugg... [17:25:27] 10Analytics, 10Analytics-Kanban, 10wikimediafoundation.org: Measure traffic for new wikimedia foundation site - https://phabricator.wikimedia.org/T188419 (10Varnent) >>! In T188419#4481359, @Nuria wrote: > In order to access piwik (and any internal tool) you need to have ldap credentials (a user and a passw... [17:44:34] 10Analytics, 10Analytics-Wikistats: Underreporting WikiSource edits? - https://phabricator.wikimedia.org/T201177 (10Astinson) @Milimetric @Nuria The amazing @Samwalton9 came to my rescue with some scripting and API pulls and got pretty far. I think for the quick and dirty stuff I need right now, that will do... [18:43:00] (03CR) 10Cicalese: "Thank you! Interesting that this hasn't been verified/merged. Does that need to be done manually in this repo?" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/450050 (https://phabricator.wikimedia.org/T200864) (owner: 10Cicalese) [19:00:07] (03CR) 10Mforns: [C: 032] "Yes, this repo needs manual verification and merge." [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/450050 (https://phabricator.wikimedia.org/T200864) (owner: 10Cicalese) [19:11:58] !log upgrading from spark 2.3.0 -> spark 2.3.1 everywhere [19:11:59] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:14:47] (03CR) 10Cicalese: "Great! I did test the queries and the generated output files were sane, so I will Verified+2. Thanks!" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/450050 (https://phabricator.wikimedia.org/T200864) (owner: 10Cicalese) [19:14:49] (03CR) 10Cicalese: [V: 032] Filter out erroneous pingback data caused by T200864. [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/450050 (https://phabricator.wikimedia.org/T200864) (owner: 10Cicalese) [19:57:45] hi you all. I may have missed a thread: any reason we may not be able to ssh to stat1005? [19:57:58] (I can't do it, and I know bmansurov can't do it either.) [20:00:00] Hi leila - No idea - I can't connect either and the machine doesn't seem overloaded [20:00:02] leila: no issues for me but I'm using the virginia bastion (bast1002) as of today, so maybe west coast bastion is down? are you ssh'ing through bast4001? that's my best guess [20:00:04] ottomata: any idea? [20:00:57] bearloga: I'm using bast1002, and didn't manage to login :( [20:00:59] huhhhlooking [20:01:00] no idea [20:01:10] i can get in just fine... [20:01:18] mwarf [20:02:31] ottomata: problem seems wider - stat1004 unreachable for me either [20:02:43] bast1002 i think is not working for me either.. [20:03:26] right [20:03:41] ottomata: which one should we use? [20:03:50] usually that one [20:03:53] :) [20:04:14] asking in ops [20:04:31] ottomata: from bearloga comment, bast4001 works for me [20:04:44] it looks pretty down [20:04:46] aye [20:05:21] brion says to use a different bastion for now [20:05:22] 4001 [20:05:23] Thanks ottomata [20:05:23] is fine [20:05:29] leila: --^ [20:06:19] * leila uses 4001 and tests [20:08:14] Thanks leila for having noticed :) [20:12:21] hey, so here's the map with the other bastion servers to pick from: https://wikitech.wikimedia.org/wiki/Bastion [20:12:40] the one that is closest to you should be the best [20:13:02] np joal. bmansurov brought it up. ;) [20:13:06] if that would normally be 1002, then 2001 should be the next best [20:13:27] so for now please just replace 1002 with 2001 in your ssh config, thanks [20:13:41] makes sense, mutante. [20:15:31] I confirm that via bast4001 I can connect to stat1005. thanks. [20:15:48] great [20:16:31] mutante: As I am in europe, should I use esams, or eqiad/codfw? [20:17:01] joal: esams [20:17:15] ack mutante, thanks for the map! [20:29:56] (03PS1) 10Ottomata: Add email error reporting to CamusPartitionChecker [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/450861 (https://phabricator.wikimedia.org/T198908) [20:40:36] 10Analytics, 10Product-Analytics: Load change_tag tables into the Analytics Data Lake on a daily basis - https://phabricator.wikimedia.org/T201062 (10Milimetric) We talked about this today, and here's our plan: * Daily incremental sqoops are too hard to squeeze in before the 24th, without derailing our other... [21:04:43] 10Analytics, 10Product-Analytics: Load change_tag tables into the Analytics Data Lake on a daily basis - https://phabricator.wikimedia.org/T201062 (10Ottomata) > Another option you have is to work with someone like Gergo to add the change tags to the mediawiki_revision_create event, we don't own that and any m... [21:15:27] (03CR) 10jerkins-bot: [V: 04-1] Add email error reporting to CamusPartitionChecker [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/450861 (https://phabricator.wikimedia.org/T198908) (owner: 10Ottomata)