[01:38:21] 10Analytics, 10Analytics-Kanban: Decide: start_timestamp for mediawiki history - https://phabricator.wikimedia.org/T220507 (10Neil_P._Quinn_WMF) >>! In T220507#5110485, @JAllemandou wrote: > > In order to be as explicit and precise as we can, I will add a field for users named `userRegistrationTimestamp`, and... [06:38:53] 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models - https://phabricator.wikimedia.org/T148843 (10elukey) >>! In T148843#5125683, @dr0ptp4kt wrote: > (Detour) > > @Nur... [07:04:17] !log releasing refinery source v0.0.86 for what I hope is the last time [07:04:20] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:15:42] fdans: o/ [07:15:52] did you guys find a way to fix refinery? [07:17:04] elukey: yes we overwrote the master branch with the correct tree [07:17:24] I'm going to skip the version to 0.0.87 though [07:18:01] what was the procedure? I am curious [07:19:29] (I didn't get fully the explanation on the email) [07:21:26] Morning team [07:23:01] bonjour [07:25:18] elukey we located the HEAD that coresponded to the commit history that was not corrupted, checked it out locally, forced push to master [07:27:14] fdans: how did you do the first part? (when you have time) [07:29:05] elukey: wanna batcave and I'll fill you in? [07:30:11] 10Analytics, 10Growth-Team, 10Product-Analytics: Revisions missing from mediawiki_revision_create - https://phabricator.wikimedia.org/T215001 (10JAllemandou) I'm not working on this (yet?) - Seems event related. [07:33:21] fdans: nono just a line in here is fine, don't want to have the full long explanation, I am just curious :) [07:33:54] joal: this is interesting [07:33:54] elukey@stat1007:~$ id analytics [07:33:55] uid=497(analytics) gid=498(analytics) groups=498(analytics) [07:33:58] :D [07:34:16] ? [07:34:57] hahahah yes that was my reaction as well [07:35:10] I was trying to see puppet diffs for the new user [07:35:16] but on stat1007 is already there [07:36:42] Ah! [07:37:55] (03PS3) 10Joal: Fix mediawiki-history user event join [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/504834 [07:38:25] elukey: basically, even if you corrupt the master branch by altering history, gerrit maintains a tree for every code review, so just checking out this one was enough: [07:38:26] https://gerrit.wikimedia.org/r/#/c/analytics/refinery/source/+/504883/ [07:39:03] ahhh nice! [07:39:10] you like gerrit more now right? :D [07:40:27] Thanks for fixing that fdans :) [07:40:48] elukey: not really [07:41:24] in reality joal and I found the correct tree by using github's event API, but then nuria suggested just using an old code review [07:43:05] fdans: you could give gerrit some slack after saving a repo in a quick way! [07:43:08] :P [07:43:18] never [07:44:23] ahhahaha [07:49:41] ahhh mistery solved [07:49:41] Notice: /Stage[main]/Profile::Analytics::Refinery::Repository/Scap::Target[analytics/refinery]/File[/var/lib/analytics]/ensure: created [07:50:20] scap::target { 'analytics/refinery': [07:50:20] deploy_user => 'analytics', [07:50:20] key_name => 'analytics_deploy', [07:50:20] manage_user => true, [07:50:20] } [07:51:06] completely forgot about it sigh [07:56:48] so I guess that another name is probably needed [08:02:18] :S [08:02:38] I personally like 'analytics-hdfs` [08:03:10] because it depicts very well what the user is fo [08:03:11] *for [08:03:32] thoughts? [08:03:33] works for me :) [08:03:41] (03PS1) 10Fdans: Update changelog for 0.0.87 release [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/505704 [08:03:48] I need to convince Andrew of course :D [08:07:17] (03CR) 10Fdans: [V: 03+2 C: 03+2] Update changelog for 0.0.87 release [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/505704 (owner: 10Fdans) [08:10:31] * fdans if this release doesn't work I'm going to cry [08:11:30] we are with you Fran! [08:19:56] 10Analytics, 10Product-Analytics, 10Epic, 10User-Elukey: Provide feature parity between the wiki replicas and the Analytics Data Lake - https://phabricator.wikimedia.org/T212172 (10JAllemandou) Regarding the quick-lookups, I suggest using spark in shell mode (whether in python or in scala): - Extract the... [08:22:36] ok it finally worked yaisas [08:22:37] 10Analytics, 10Product-Analytics: Identify imported revisions in mediawiki_history - https://phabricator.wikimedia.org/T221482 (10JAllemandou) Super good idea and good presentation of the difficulty :) Maybe one day ;) [08:24:09] 10Analytics, 10Dumps-Generation, 10WikiCite, 10Wikidata, 10Patch-For-Review: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday - https://phabricator.wikimedia.org/T216160 (10JAllemandou) Community has spoken, we'll find workarounds - Thanks a lot @ArielGlenn for help... [08:26:23] !log refinery source v0.0.87 released and symlinks updated [08:26:25] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:28:56] !log deploying refinery [08:28:57] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:29:48] (03PS3) 10Joal: Remove leftover files in oozie folders [analytics/refinery] - 10https://gerrit.wikimedia.org/r/504914 (https://phabricator.wikimedia.org/T221460) [08:30:10] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Remove dead code from refinery/oozie folders - https://phabricator.wikimedia.org/T221460 (10JAllemandou) >>! In T221460#5128400, @fdans wrote: > We have to remove mediawiki_history_druid Good catch ! Done. [08:32:10] (03PS1) 10Fdans: Update jar version for webrequest load to 0.0.87 [analytics/refinery] - 10https://gerrit.wikimedia.org/r/505706 [08:32:35] joal: this is ok with you? https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/505706/ [08:33:37] fdans: question - Should we also bump the pageview-version-number? [08:33:44] (03CR) 10Joal: [V: 03+2 C: 03+2] "Works for me :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/505706 (owner: 10Fdans) [08:34:10] joal: no idea, your choice! [08:34:25] Let's do it please ;) [08:34:46] joal: is that the record_version? [08:34:51] yessir [08:34:54] cool [08:37:42] don't want to be pedantic, but these info should be in https://etherpad.wikimedia.org/p/analytics-weekly-train if we want to be consistent [08:37:51] before deploying [08:38:01] (03PS1) 10Fdans: Bump record version in webrequest load job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/505707 (https://phabricator.wikimedia.org/T144100) [08:38:12] (maybe even reviewed and +1ed ?_ [08:38:12] yes elukey well put [08:38:28] 10Analytics, 10Analytics-Kanban: Add caused_by_user_text to mediawiki_page_history - https://phabricator.wikimedia.org/T167608 (10JAllemandou) @nuria: The `caused_by_user_text` field contains the event-performer `user_text` so `additional_info`is not accurate enough IMO. We could use a complex structure for `c... [08:38:58] Ah fdans - I meant the record_version in oozie/pageview/hourly :) [08:39:17] oo [08:39:26] fdans: we can skip this one, my point is that we'd need to do it before starting to deploy as step to ensure that we know what to do. Later is kinda pointless :) [08:39:30] fdans: updating both makes sense [08:40:01] elukey: I understand, just updating now so that there's precedent [08:41:22] (03PS2) 10Fdans: Bump record version in webrequest load job and pageview hourly [analytics/refinery] - 10https://gerrit.wikimedia.org/r/505707 (https://phabricator.wikimedia.org/T144100) [08:42:05] (03CR) 10Joal: [V: 03+2 C: 03+2] "LGTM!!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/505707 (https://phabricator.wikimedia.org/T144100) (owner: 10Fdans) [08:42:37] merged fdans - please update the record-version accordingly when updating the docs ;)- [08:42:50] yessir [08:43:02] Thanks :) [09:13:47] !log refinery deployed successfully [09:13:49] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:24:35] 10Analytics, 10Knowledge-Integrity, 10Research, 10Epic, 10Patch-For-Review: Citation Usage: run third round of data collection - https://phabricator.wikimedia.org/T213969 (10Miriam) @bmansurov many thanks!! @RyanSteinberg @tizianopiccardi FYI, the data collection is over! [09:58:08] 10Analytics, 10Analytics-Kanban: Decide: start_timestamp for mediawiki history - https://phabricator.wikimedia.org/T220507 (10JAllemandou) >>! In T220507#5129134, @Neil_P._Quinn_WMF wrote: > I agree with the overall philosophy of being very explicit and precise in this dataset, but I do still wonder if it's ne... [10:29:28] Very cool: https://twitter.com/universal_sci/status/1120458513732579330 [10:39:16] * elukey lunch! [11:46:41] 10Analytics, 10Knowledge-Integrity, 10Research, 10Epic, 10Patch-For-Review: Citation Usage: run third round of data collection - https://phabricator.wikimedia.org/T213969 (10bmansurov) 05Open→03Resolved a:03bmansurov Data collection is over. [11:47:02] 10Analytics, 10Knowledge-Integrity, 10Research, 10Epic, 10Patch-For-Review: Citation Usage: run third round of data collection - https://phabricator.wikimedia.org/T213969 (10bmansurov) [11:57:52] 10Analytics, 10Dumps-Generation, 10WikiCite, 10Wikidata, 10Patch-For-Review: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday - https://phabricator.wikimedia.org/T216160 (10ArielGlenn) 05Open→03Declined I'll close this for now as declined; the idea of getting a... [13:35:22] joal: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Druid#Delete_segments_from_deep_storage is still good to drop a datasource from druid? [13:35:33] I'd like to nuke the popups testing one [13:35:47] (so we can also test in the analytics cluster that nothing explodes) [13:36:24] elukey: updating to remove the part that shouldn't be there, but except from that it's ok - give me aminute please: ) [13:36:52] ack! [13:37:41] elukey: updated - can you confirm it;s ok for you? [13:38:02] elukey: so that you know, I'm also working with druid now, let me know if you want me to do it [13:38:30] joal: I can do it so I'll learn something :) [13:38:38] ack :) [13:39:12] joal: from your guide I can see that I have to rm -rf segments directly [13:39:15] is it ok? [13:39:16] * elukey runs [13:39:29] also elukey, kids' holiday being finished, I'll take my leave to get them in ~20 minutes, and will be back for standup - And, I'll be evening-working tomorrow :) [13:39:46] sure! [13:39:50] * joal knows elukey runs faster [13:39:57] * joal SHOUTS AFTER elukey [13:40:06] :) [13:40:10] that is not what I remember from the last part of the NYC run :D [13:49:47] !log delete tbayer_popups from druid analytics - T220575 [13:49:49] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:06:51] elukey: will you be able to look at T221408 anytime soon? Without updated puppet runs those VMs are depending on a Trusty nameserver that I need to shut down. [14:06:52] T221408: Puppet broken on most VMs in the 'analytics' project - https://phabricator.wikimedia.org/T221408 [14:07:28] andrewbogott: of course! Lemme check now [14:07:35] thanks! [14:07:51] I completely missed the task, sorry! [14:13:12] ottomata: o/ [14:13:21] can I nuke ottotest1.analytics.eqiad.wmflabs ? [14:14:43] I cannot even ssh so I guess the answer is yes [14:15:08] 10Analytics: Puppet broken on most VMs in the 'analytics' project - https://phabricator.wikimedia.org/T221408 (10elukey) 05Open→03Resolved All done! [14:15:12] andrewbogott: --^ [14:16:04] 10Analytics, 10Dumps-Generation, 10WikiCite, 10Wikidata, 10Patch-For-Review: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday - https://phabricator.wikimedia.org/T216160 (10Lydia_Pintscher) T94019 might be relevant for getting things more in sync. [14:16:11] elukey: thank you! [14:16:27] 10Analytics, 10Analytics-Cluster, 10Operations: furud - DISK CRITICAL - /mnt/hdfs is not accessible: Input/output error - https://phabricator.wikimedia.org/T221483 (10Ottomata) a:05Dzahn→03Ottomata Thanks! We should probably unpuppetize the Hadoop part of these nodes and unmount /mnt/hdfs until we need... [14:16:47] 10Analytics, 10Analytics-Cluster, 10Operations: furud - DISK CRITICAL - /mnt/hdfs is not accessible: Input/output error - https://phabricator.wikimedia.org/T221483 (10Ottomata) a:05Ottomata→03Dzahn [14:17:20] 10Analytics, 10Analytics-Cluster, 10Operations: Remove Hadoop configs and unmount /mnt/hdfs from unused backup hosts (furud, +) - https://phabricator.wikimedia.org/T221629 (10Ottomata) [14:29:56] 10Analytics, 10EventBus, 10Product-Analytics, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Use Z UTC suffix in EventBus emitted events rather than +00:00 - https://phabricator.wikimedia.org/T217041 (10Ottomata) 05Invalid→03Open Not quite! The new monolog based events do... [14:29:58] 10Analytics, 10Product-Analytics, 10MW-1.33-notes (1.33.0-wmf.21; 2019-03-12), 10Patch-For-Review: Standardize datetimes/timestamps in the Data Lake - https://phabricator.wikimedia.org/T212529 (10Ottomata) [15:30:57] A-TEAM RETRO YO [15:31:45] I’m sick, been trying to see if I’d feel better but I have to call it, can’t make it [15:32:46] nuria ^ [15:33:03] fdans: on meeting, sorry [15:33:54] we have 3 people missing, I'd say that we could skip this one? [15:49:07] a-team: moved retro to later today [15:51:31] nuria prob can't make that, going to do a deployment at :30. or maybe i can make and half pay attention :p [15:51:48] ottomata: can we postpone deployment to after meeting? [15:51:54] no , the windows are tight today [15:52:12] ottomata: ok, let's reschedule [15:52:20] i'm already overlapping a bit [15:52:20] https://wikitech.wikimedia.org/wiki/Deployments#Tuesday,_April_23 [16:32:49] 10Analytics, 10Analytics-Data-Quality, 10Product-Analytics: Many small wikis missing from mediawiki_history dataset - https://phabricator.wikimedia.org/T220456 (10Nuria) @Neil_P._Quinn_WMF the private wikis are not included on the labs replicas and that is intentional, if you notice we also do not report pag... [16:33:09] 10Analytics, 10Analytics-Data-Quality, 10Product-Analytics, 10cloud-services-team (Kanban): Many small wikis missing from mediawiki_history dataset - https://phabricator.wikimedia.org/T220456 (10Nuria) p:05Low→03Normal a:03fdans [16:33:20] 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, 10Product-Analytics: Many small wikis missing from mediawiki_history dataset - https://phabricator.wikimedia.org/T220456 (10Nuria) [16:37:35] ottomata: one thing that I forgot to mention - we already use 'analytics' for scap :( [16:37:48] elukey: i saw, i commented on your phab ticket [16:37:49] would it be ok to have 'analytics-hdfs' ? [16:37:55] ah sorry! [16:38:33] mforns: you didn't invite me to the talk-rehearsal meeting :( [16:43:25] joal: it is in the WMF analyitics calendar [16:43:41] I was fooled as well [16:44:07] Ahhh ! Thanks elukey :) [16:44:27] elukey: you actually made me realize the expected calendar tab was not open on the correct g-account [16:51:39] joal, :'( [16:51:51] I created an event in the analytics team calendar [16:51:58] joal, I didn' t invite anyone personally [16:52:10] but will send invites now! [16:52:50] done joal [17:00:28] heya a-team, whoever wants to hear the presentation, I'm in the meeting :] [17:05:59] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Fundraising-Backlog, and 3 others: Fix EventLogging schemas that use array for items type - https://phabricator.wikimedia.org/T218617 (10chelsyx) @Nuria , like @Ottomata said, I don't have an opinion here. Just let me know if you need me to chang... [17:38:19] 10Analytics, 10Analytics-Data-Quality, 10Analytics-Kanban, 10Product-Analytics: Many small wikis missing from mediawiki_history dataset - https://phabricator.wikimedia.org/T220456 (10Neil_P._Quinn_WMF) >>! In T220456#5131546, @Nuria wrote: > @Neil_P._Quinn_WMF the private wikis are not included on the labs... [18:04:01] * elukey off! [18:10:55] 10Analytics, 10Analytics-Kanban, 10Datasets-General-or-Unknown, 10Patch-For-Review, and 2 others: Pageview dumps incorrectly formatted, need to escape special characters - https://phabricator.wikimedia.org/T144100 (10Nuria) 05Open→03Resolved [18:11:45] 10Analytics, 10Analytics-Kanban, 10Datasets-General-or-Unknown, 10Patch-For-Review, and 2 others: Pageview dumps incorrectly formatted, need to escape special characters - https://phabricator.wikimedia.org/T144100 (10Nuria) Super thanks! [18:19:28] (03CR) 10Nuria: [C: 03+2] Remove leftover files in oozie folders [analytics/refinery] - 10https://gerrit.wikimedia.org/r/504914 (https://phabricator.wikimedia.org/T221460) (owner: 10Joal) [18:24:16] 10Analytics, 10Analytics-Kanban: Add caused_by_user_text to mediawiki_page_history - https://phabricator.wikimedia.org/T167608 (10Nuria) "caused_by_user_additional_text"? [18:46:53] 10Analytics, 10MediaWiki-extensions-ORES, 10Core Platform Team Backlog (Designing), 10MW-1.34-notes (1.34.0-wmf.3; 2019-04-30), and 3 others: ORES hook integration with EventBus - https://phabricator.wikimedia.org/T201869 (10Pchelolo) a:05Ladsgroup→03Pchelolo [18:50:13] 10Analytics-Kanban, 10EventBus: Allow analytics VLAN to reach schema.svc.$site.wmnet - https://phabricator.wikimedia.org/T221690 (10Ottomata) [18:50:28] 10Analytics-Kanban, 10EventBus, 10Operations, 10netops: Allow analytics VLAN to reach schema.svc.$site.wmnet - https://phabricator.wikimedia.org/T221690 (10Ottomata) [18:54:43] 10Analytics, 10Product-Analytics: Eventbus revisions are duplicated in event.mediawiki_revision_tags_change - https://phabricator.wikimedia.org/T218246 (10Pchelolo) @Milimetric The entry point for generating the tags change events is located at https://github.com/wikimedia/mediawiki-extensions-EventBus/blob/ma... [18:55:50] 10Analytics, 10EventBus, 10Product-Analytics, 10Core Platform Team Kanban (Doing), 10Services (doing): Eventbus revisions are duplicated in event.mediawiki_revision_tags_change - https://phabricator.wikimedia.org/T218246 (10Pchelolo) a:03Pchelolo [19:44:39] Is data on wikidata_ids for wikipedia pages available in the data lake? [19:45:10] I don't see it in mediawiki_page_history [19:54:19] I'd like to be able to do this at a fairly large scale [20:24:49] groceryheist: i can't recall off the top of my head, but i don't think so [20:24:57] i think that is a requested thing from others too though [20:25:01] probably on a backlog somewhere [20:40:25] 10Analytics, 10Product-Analytics, 10Epic, 10User-Elukey: Provide feature parity between the wiki replicas and the Analytics Data Lake - https://phabricator.wikimedia.org/T212172 (10Groceryheist) > Wikipedia-to-Wikidata linkage patterns (T209891#4798717, using the page_props table) I have a use-case for th... [20:40:45] ^ottomata [20:47:19] ah saw thaat one but couldn't parse it to tell if it was relevant, great!~ [20:51:17] 10Analytics, 10Analytics-Kanban, 10EventBus, 10MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), 10Patch-For-Review: Make Refine use JSONSchemas of event data to support Map types and proper types for integers vs decimals - https://phabricator.wikimedia.org/T215442 (10Ottomata) Yahoo! ` hive (event)> describe... [20:55:55] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team Backlog (Watching / External), and 2 others: Modern Event Platform: Stream Intake Service: Implementation - https://phabricator.wikimedia.org/T206785 (10Ottomata) eventgate-analytics is now handing ~6K api-request events per second. T... [20:56:07] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 3 others: Modern Event Platform: Stream Intake Service: Implementation - https://phabricator.wikimedia.org/T206785 (10Ottomata) [20:56:48] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 2 others: Modern Event Platform: Deploy instance of EventGate service that produces events to kafka main - https://phabricator.wikimedia.org/T218346 (10Ottomata) [20:58:35] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, and 3 others: Modern Event Platform: Stream Intake Service: Implementation - https://phabricator.wikimedia.org/T206785 (10Nuria) ta-ta-channnnnnnnn [20:59:23] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Discovery, and 5 others: Rewrite Avro schemas (ApiAction, CirrusSearchRequestSet) as JSONSchema and produce to EventGate - https://phabricator.wikimedia.org/T214080 (10Ottomata) Cool! The Hive table event.mediawiki_api_request is now being filled...