[00:32:51] 10Analytics, 10Operations, 10Research, 10Article-Recommendation, and 3 others: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10Nuria) Any further thoughts on this, i think we agree that best solution is to run and deploy these scripts from some vir... [03:31:48] 10Analytics, 10Operations, 10Research, 10serviceops, and 4 others: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10Dzahn) [03:45:30] 10Analytics, 10Operations, 10Research, 10serviceops, and 4 others: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10Dzahn) @Nuria I agree it seems the most likely solution is using a Ganeti VM though but due to allhands we still did not have an SRE m... [04:09:33] 10Analytics: Restoring the daily traffic anomaly reports - https://phabricator.wikimedia.org/T215379 (10Tbayer) [04:49:34] lol, i ran connected components to see if it would help my spark problem. The largest component is 14M vertices, the next is 94 :) [05:05:17] morning! anyone else still jetlagged? [05:32:37] fdans: morning :) [05:34:18] RoanKattouw: ah! you were using s3-analytics-slave! I sent an email to several mailing lists to announce the move to the -replica suffix, but I would have probably needed to send it to engineering@ too [05:39:16] RoanKattouw: I modified again the warning message in https://wikitech.wikimedia.org/wiki/Analytics/Data_access#MariaDB_replicas, if you have time/patience let me know if it is more clear. [06:35:17] 10Analytics, 10User-Elukey: Restoring the daily traffic anomaly reports - https://phabricator.wikimedia.org/T215379 (10elukey) p:05Triage→03Normal [06:43:59] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Marostegui) [07:01:41] 10Analytics, 10User-Elukey: Restoring the daily traffic anomaly reports - https://phabricator.wikimedia.org/T215379 (10elukey) I think that the first step, if this job is important, could be to restore the cronjob under a user's crontab that will actively maintain it. The second step is to discuss how importan... [07:04:25] 10Analytics: Clean up home dirs for user mkroetzsch - https://phabricator.wikimedia.org/T214501 (10elukey) Forgot that we are waiting confirmation from Markus, just found the email thread about it. [07:05:36] fdans: https://wikitech.wikimedia.org/wiki/Analytics/Ops_week#Have_any_users_left_the_Foundation? - not sure if you saw the update, I added the script to quickly check stat/notebooks/hive/etc.. [07:05:43] less tedious [07:14:05] elukey: sorry, was getting coffee [07:14:21] np, it is pretty early :) [07:14:33] I updated all the open tasks with the output of the script [07:14:36] oooo nice!! love the script [07:40:22] 10Analytics, 10Fundraising-Backlog: Clean up old fundraising-related user data on Analytics hosts - https://phabricator.wikimedia.org/T215382 (10elukey) p:05Triage→03Normal [07:43:44] 10Analytics, 10Fundraising-Backlog: Clean up old fundraising-related user data on Analytics hosts - https://phabricator.wikimedia.org/T215382 (10elukey) * mwalker ` ====== stat1007 ====== total 648 -rwxr-xr-x 1 2454 wikidev 3400 Nov 11 2013 aggByTime.py drwxr-xr-x 2 2454 wikidev 4096 Nov 11 2013 counts... [08:01:11] 10Analytics, 10Operations, 10SRE-Access-Requests: Allow Erik Bernhardson to have root access on stat1005 for GPU testing - https://phabricator.wikimedia.org/T215384 (10elukey) p:05Triage→03Normal [08:04:03] ebernhardson: --^ [08:04:09] I am also reaading https://medium.com/tensorflow/amd-rocm-gpu-support-for-tensorflow-33c78cc6a6cf [08:04:12] that seems encouraging [08:04:28] (not sure what changed since then) [08:05:00] ah wait https://gpuopen.com/rocm-tensorflow-1-8-release/ is the link that we checked at all hands [08:06:26] https://rocm.github.io/ROCmInstall.html#ubuntu-support---installing-from-a-debian-repository [08:07:43] brb [08:48:58] 10Analytics, 10Analytics-Wikistats: Check wikistats numbers for agreggations for "all-wikipedias" - https://phabricator.wikimedia.org/T189626 (10fdans) > Example: or 201505-01 druid has 10.4 M - Compared to 10.3 for WKS1 @Nuria I'm confused by this, where are you seeing these numbers? If it's number of edits... [08:54:38] https://github.com/ROCmSoftwarePlatform/tensorflow-upstream seems nice! [08:54:55] there is a tensorflow-rocm python package [09:37:59] * elukey going to the doctor! [09:53:10] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Marostegui) @elukey after merging: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/487000/ I have done the foll... [11:48:13] Wow - Sqooping without comments is faster :) Done now for the labs part [12:51:04] who's ops oncall this week? [12:51:19] I opened a bunch of tasks to clear old users dirs [12:51:46] the only tricky part now is move/delete hive databases [12:51:46] elukey: it's mine [12:51:49] ah! [12:51:51] :D [12:51:53] :D [12:52:01] hellooOOOOOooo :) [12:53:22] :) [12:53:33] joal: time to chat about the hive db drop stuff? [12:53:42] elukey: yes, until kids wake up :) [12:53:57] We should be good for some time hopefuly [12:53:59] ah snap it is WEd! [12:54:03] uff I always forget [12:54:07] nono nevermind [12:54:14] no worries, let's do it! [12:54:16] this evening after standup [12:54:18] nah nah [12:54:24] forget what I have said :P [12:54:25] k, as you want :) [12:54:42] as FYI, I removed some ssh keys from the hadoop master nodes [12:54:46] for the analytics-search users [12:54:50] the users are still there [12:54:54] but no ssh access [12:55:08] and I deployed users without ssh to the test workers [12:55:09] all good [12:55:19] so in theory yarn is ready to get properly containerized [13:45:37] elukey: https://phabricator.wikimedia.org/T215413 done :) Not sure which tags should I add, I added "Research" but feel free to add more :) thanks a lot! [13:46:55] 10Analytics, 10Operations, 10Research-management, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10elukey) a:05Halfak→03elukey [13:47:19] 10Analytics, 10Research: Image Classification Working Group - https://phabricator.wikimedia.org/T215413 (10elukey) [13:47:19] miriam: very nice! [13:48:05] elukey :) [13:49:53] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10elukey) First of all, I just realized that all these IPs ports need to be whitelisted on the Analytics VLAN's firewall... [14:05:12] 10Analytics, 10Knowledge-Integrity, 10Research, 10Epic, 10Patch-For-Review: Citation Usage: run third round of data collection - https://phabricator.wikimedia.org/T213969 (10Miriam) @RyanSteinberg @bmansurov I can see all events on the client side. I'll do some tests there. On the server side, I can see... [14:05:56] o/ [14:08:42] 10Analytics, 10Discovery-Search, 10Research: Image Classification Working Group - https://phabricator.wikimedia.org/T215413 (10dcausse) [14:12:46] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Marostegui) Then also make sure to whitelist dbstore1003:3340 as that is where the staging database will leave. [14:19:18] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10elukey) analytics-in4 diff: ` + term mysql-dbstore { + from { + destination-address { +... [14:21:14] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Marostegui) >>! In T210478#4931616, @elukey wrote: > analytics-in4 diff: > > ` > + term mysql-dbstore { > +... [14:26:23] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10elukey) After a first quick test it looks good: ` elukey@stat1007:~$ mysql -e 'show databases' -P 3312 -u research -p... [14:33:27] ottomata: o/ [14:34:05] hiyaaa [14:41:19] RoanKattouw: still me sorry for the pings :) We have just enabled access to the new dbstore nodes, the ones basically that were not working yesterday (https://wikitech.wikimedia.org/wiki/Analytics/Data_access#MariaDB_replicas) [14:41:42] if you want to test them now as beta tester it would be great :) [14:41:53] you are using the research user to connect right? [14:50:24] 10Analytics, 10Operations, 10Research, 10serviceops, and 4 others: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10bmansurov) [14:50:54] 10Analytics, 10Operations, 10Research, 10serviceops, and 4 others: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10bmansurov) Thanks, @Dzahn for the info. I've this task: {T215421}. [15:12:20] 10Analytics, 10Wikimedia-Stream: EventStreams returns 502 errors from outside the WMF network - https://phabricator.wikimedia.org/T215013 (10Ottomata) Wow that is a lot more clients than we've ever had! @jcrespo thanks for bouncing the service. Where/how did you see this MAX_CONCURRENT_STREAMS == 128 error? [15:17:52] 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Product-Analytics, and 5 others: Modern Event Platform: Schema Guidelines and Conventions - https://phabricator.wikimedia.org/T214093 (10Ottomata) > For the lowercasing headers ... seems like something that would naturally be handled during import... [15:19:26] 10Analytics-EventLogging, 10Analytics-Kanban, 10EventBus, 10Product-Analytics, and 5 others: Modern Event Platform: Schema Guidelines and Conventions - https://phabricator.wikimedia.org/T214093 (10Ottomata) I need to check with @JAllemandou to see how/if we can handle map types in our Refine code. Joseph,... [15:19:37] * addshore search up through logs for the answer to a question he asked in here ages ago.... [15:23:19] [2018-12-06 19:10:16] addshore: TL;DR - total-bytes is ~45G over ~100M unique strings. Usefull-bytes (minus duplication) is 1 order of magnitude smaller: 4G, with ~25M duplicates - The idea of creating an indirection table could be very valuable :) [15:23:20] success! [15:23:55] 10Analytics, 10Knowledge-Integrity, 10Research, 10Epic, 10Patch-For-Review: Citation Usage: run third round of data collection - https://phabricator.wikimedia.org/T213969 (10Ottomata) @Miriam, the eventlogging MySQL stuff in beta is very flaky. I just bounced it there, can you try again? [15:24:51] 10Analytics, 10Operations, 10Research, 10serviceops, and 4 others: Transferring data from Hadoop to production MySQL database - https://phabricator.wikimedia.org/T213566 (10Ottomata) Marco's suggestion of using mwmaint1002 is not a bad idea... [15:25:22] (03PS1) 10Elukey: [WIP] Introduce analytics-mysql [analytics/refinery] - 10https://gerrit.wikimedia.org/r/488473 [15:26:59] (03PS2) 10Elukey: [WIP] Introduce analytics-mysql [analytics/refinery] - 10https://gerrit.wikimedia.org/r/488473 (https://phabricator.wikimedia.org/T212386) [15:30:12] afk for a bit [15:31:50] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), and 2 others: Develop a library for JSON schema backwards incompatibility detection - https://phabricator.wikimedia.org/T206889 (10Ottomata) Hm, I'm not sure if we need to try and do what Avro does here.... [15:33:47] 10Analytics, 10Performance-Team (Radar): [Bug] Type mismatch between NavigationTiming EL schema and Hive table schema - https://phabricator.wikimedia.org/T214384 (10Ottomata) Altering the table will fix this going forward, but it might break existing data... [15:34:38] 10Analytics, 10MediaWiki-Vagrant: Kafka in mw vagrant: kafka_broker.keystore.jks has expired - https://phabricator.wikimedia.org/T214593 (10Ottomata) a:03Ottomata Ah! Interesting! I had this problem too but didn't see this error. The keys are generated and added to the vagrant repo manually. I should pro... [15:36:08] 10Analytics, 10Wikimedia-Stream: EventStreams returns 502 errors from outside the WMF network - https://phabricator.wikimedia.org/T215013 (10jcrespo) It was the only other information other than the return status that the headers or the content returned. The error only happened outside of the internal network-... [16:01:35] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), and 2 others: Develop a library for JSON schema backwards incompatibility detection - https://phabricator.wikimedia.org/T206889 (10Pchelolo) > With JSON, consumers don't really 'read data' with the schem... [16:02:07] a-team: I just woke up and don’t feel well, gonna relax [16:04:46] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), and 2 others: Develop a library for JSON schema backwards incompatibility detection - https://phabricator.wikimedia.org/T206889 (10Ottomata) Gr8 :D [16:13:48] milimetric: got that [16:15:27] 10Analytics, 10Analytics-Wikistats: Check wikistats numbers for agreggations for "all-wikipedias" - https://phabricator.wikimedia.org/T189626 (10Nuria) I can explain, but let me unassign this task and move into "for later" we should work on the geowiki task first [16:15:29] 10Analytics, 10Analytics-Wikistats: Check wikistats numbers for agreggations for "all-wikipedias" - https://phabricator.wikimedia.org/T189626 (10Nuria) p:05High→03Normal a:05fdans→03None [16:31:19] (03CR) 10Elukey: "Still very WIP, need to work a bit more on it, will add reviewers when ready! :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/488473 (https://phabricator.wikimedia.org/T212386) (owner: 10Elukey) [16:31:36] elukey: ops sync? we can skip if you need to run! (just saw email) [16:31:44] ottomata: I am in bc! [16:31:49] oh k coming! [16:31:50] will only need to skip standup [16:45:03] 10Analytics, 10EventBus, 10Parsoid, 10Research, and 5 others: How to surface link changes as a stream? - https://phabricator.wikimedia.org/T214706 (10Jhernandez) [16:51:22] awesome addshore :) [16:51:53] just made this https://usercontent.irccloud-cdn.com/file/gQEYEVPG/image.png [16:52:23] not sure if that is a step too far, or not :P, gonna try it out (once I get access back to the server i was testing on) and see how it goes [16:53:15] addshore: Since I don't know the current schema, can't really say - But denormalization for SQL engines sound like correct to my hears :) [16:53:25] hahaaaa, you dont want to see the current schema [16:53:45] ths is it https://usercontent.irccloud-cdn.com/file/zG5FGu6z/image.png [16:53:46] joal: ^^ [16:54:26] Ah - right - and by the way - I meant normalization, not denormalization above ... My bad [16:54:41] yeh, this table is so stupid :P [16:54:47] I'm so used to denormalizing in the hadoop world that I often mistake [16:56:38] addshore: Your schema looks a lot nicer - but will be more difficult to updqte/query [16:56:56] yup [16:58:26] but thats what we have machines for ;) to do the complex bit [16:59:01] true ! [17:01:07] 10Analytics: Aggregate pageviews to Wikidata entities - https://phabricator.wikimedia.org/T215438 (10Sascha) [17:01:09] * elukey to the doctor! [17:17:46] nuria: by "the geowiki task" you mean this one? https://phabricator.wikimedia.org/T190535 [17:26:51] fdans: yes, sorry i could not be at standup today [17:27:30] nuria: coooool! sounds good [17:27:51] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10JAllemandou) Did a quick test, it's working for me (one of them) :) Thanks a milion @Marostegui and !@elukey [17:34:45] ottomata: do you want us to test the hive-schema precreation foir refine? [17:37:50] joal: that would be awesome [17:37:56] if you have time [17:38:02] will make a task... [17:38:04] ottomata: let's try :) [17:39:30] ottomata: IIRC spark-refine uses the existing hive schema if tpes are incompatible, rught? [17:41:15] 10Analytics, 10EventBus: Spike: Can Refine handle map types if Hive Schema already exists with map fields? - https://phabricator.wikimedia.org/T215442 (10Ottomata) p:05Triage→03Normal [17:41:16] yes.... [17:41:32] maybe it'll do it automagically? [17:41:38] so maybe some stuff in in HiveExtensions or DataFrameToDruid will be enough [17:42:03] it would be interesting to know what happens if we read the json data with a spark schema that has a map type in it [17:42:10] Task made: T215442 [17:42:10] T215442: Spike: Can Refine handle map types if Hive Schema already exists with map fields? - https://phabricator.wikimedia.org/T215442 [17:42:18] ottomata: if spark can cast struct to map , then it should not even be needed [17:44:09] aye [17:45:16] ottomata: will drop for diner time now, than back at it [17:47:33] 10Analytics, 10Knowledge-Integrity, 10Research, 10Epic, 10Patch-For-Review: Citation Usage: run third round of data collection - https://phabricator.wikimedia.org/T213969 (10Miriam) @Ottomata thanks you're the best! I now see the table correponding to the new version of the schema, but I don't see the ev... [18:10:48] 10Analytics: Clean up home dirs for user mkroetzsch - https://phabricator.wikimedia.org/T214501 (10leila) @elukey I see that the email thread with Markus is moving forward. If you need my help there, let me know. In this case, be aware that Stas may be able to help you with moving the data needed to his home fol... [18:12:11] 10Analytics, 10Knowledge-Integrity, 10Research, 10Epic, 10Patch-For-Review: Citation Usage: run third round of data collection - https://phabricator.wikimedia.org/T213969 (10Ottomata) Hm, the consumer is now inserting events, and I see a few in your new table, but I don't know if current ones are coming... [18:13:31] elukey: re databases and tables, do you want to create a norm for these so you can track who owns them more easily? For example, you can say that every table/database created by a user should have their username in the name. [18:14:16] elukey: I'm wondering if you can even enforce this via automatic amending of the names. [18:18:53] joal: looks like denormalize failed [18:19:15] wait... why did it even start... [18:19:58] ... the sqoop for actor/comment hasn't even started [18:20:35] I'm confused, this doesn't have the new requirements: https://hue.wikimedia.org/oozie/list_oozie_coordinator/0070362-181112144035577-oozie-oozi-C/ [18:20:45] and somehow the sqoop from cloud was like 2x faster [18:32:34] actually milimetric, denornalized succeeded, but check failed [18:33:36] milimetric: and indeed sqoop has been a lot faster [18:33:46] joal: yeah, but how could denormalize succeed, the second sqoop starts on the 7th [18:33:54] yessir [18:33:55] something's off. I gotta eat lunch, will look more after [18:34:05] milimetric: currently checking as well [18:38:19] milimetric: I think we've been driven to error by the fact that jobs had been restarted fro january [18:38:49] milimetric: the codebase the jobs rely on doens't have the from-prod dataset dependency [18:39:18] milimetric: will drop the mediawiki-history computed data and restart jobs [18:39:19] that makes sense, I didn’t know we restarted since the compat change, weird [18:39:26] yup [18:40:14] ok joal sounds good. I tried to look for the jar version in the config but didn’t find it. I should learn how to do that properly next time [18:41:01] milimetric: I looked at the coordinator config in hue - It contains exact path for the various files it uses (hdfs://analytics-hadoop/wmf/refinery/2019-01-07T21.16.01+00.00--scap_sync_2019-01-07_0001/oozie/mediawiki/history for instance) [18:41:13] I then checked manually on HDFS for the folder [18:41:50] !log Killling-restarting mediawiki-history related oozie jobs [18:41:52] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:41:55] ah, ok [18:48:37] milimetric: and by the way, found a bug in mediawiki-h [18:48:44] mediawikik-history-checker oozie code [18:51:55] (03PS1) 10Joal: Correct mediawiki-history-checker oozie job [analytics/refinery] - 10https://gerrit.wikimedia.org/r/488531 [18:52:08] milimetric: --^ [18:52:31] milimetric, nuria: If you're ok let's merge that, and I'll deploy and restart tonight [18:53:11] The bonus point for deploying tonigh is to fix the discrepency there was between sqooped data from labs and the one sqooped from prod [18:53:21] elukey: to confirm, staging database will be only on one shard of the new db cluster, right? (this is so we know superset could connect to staging db) [18:54:17] leila: hi! having the username prefixed in table names would really be great, but there is no real way to enforce it.. the main issue is that we use the 'research' user for all the people, so I cannot even see who created/modified for last a table. I think that the right solution is to introduce proper accounts and deprecate 'research', but it will take a while :) [18:55:16] nuria: correct, it will be on one shard. I am planning to have something like staging-db-analytics.eqiad.wmnet as CNAME, so it is easy to remember/use. Would it be ok? [18:55:27] the port of course will need to be remembered [18:56:04] elukey: ya, let's make a superset dashboard as an example so others know how to use it , i can help as needed once the cname is ready [18:56:23] elukey: so it is clear that having a dashboard of top of data in staging is also possible [18:58:02] sure [18:59:32] basically https://gerrit.wikimedia.org/r/#/c/operations/dns/+/488535 [19:01:06] joal: thanks, I was about to write the same in the email thread :) [19:01:26] np elukey - Trying to improve on the communication side ;) [19:01:31] but I think that we lost manuel in the Cc [19:01:38] Ah crap [19:01:41] Will forward [19:01:56] super, Aaron seems happy about sqoop [19:02:22] is the table import something that we can prioritize this week? [19:02:34] having it dropped from dbstore1002 would be a game changer for Manuel IIUC [19:03:09] I'll try to have it done yes :) [19:03:15] thanks :) [19:03:28] ping milimetric, nuria about a deploy of refinery tonight for the patch above :) [19:03:42] joal: sorry, on meeting right now [19:04:05] np nuria - will make decision with milimetric [19:04:36] joal: will catch up in 30 mins [19:04:40] looking at change joal [19:05:01] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Sqoop staging.mep_word_persistence to HDFS and drop the table from dbstore1002 - https://phabricator.wikimedia.org/T215450 (10elukey) p:05Triage→03High [19:05:07] created--^ [19:06:00] awesome elukey - Thanks! [19:06:55] (03CR) 10Milimetric: [V: 03+2 C: 03+2] "wow, a bug we would only see on the first month of the year, cool. We had a 2019-12 snapshot? - so close to time travel :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/488531 (owner: 10Joal) [19:09:51] joal: I can deploy and restart jobs (always makes me nervous) [19:10:28] milimetric: If you can deploy (refinery only, no need for refinery-source), I'll care the restarts [19:10:38] deploying [19:10:58] milimetric: You dpeloying gives me the time to try to help halfak and elukey with sqoop :) [19:12:13] actually elukey, I don't see staging.mep_word_persistence table on analytics-slave :( [19:12:19] elukey: any idea? [19:12:33] joal: use analytics-store! :P [19:12:40] Maaaaaan [19:12:54] elukey: can you reminf me the diff between analytics-slave and analytics-store? [19:13:19] -slave is basically db1108, and it holds the log database, -store is the wiki replicas (dbstore1002) [19:13:28] Ah [19:13:35] And staging dbexists [19:13:38] on both [19:13:42] only heck [19:13:56] yes correct, we have a staging db also on -slave but I think it is not used [19:14:06] elukey: plenty tables in there [19:14:17] there might be some automated thing that uses it [19:17:49] 10Analytics, 10Knowledge-Integrity, 10Research, 10Epic, 10Patch-For-Review: Citation Usage: run third round of data collection - https://phabricator.wikimedia.org/T213969 (10Miriam) Yess it now works after starting a new session on the client. Thanks @Ottomata ! [19:22:29] joal: if you are ok I'd go off for dinner [19:22:45] Please elukey :) [19:22:51] thanks a lot for the mep table <3 [19:22:55] ;) [19:28:17] halfak: quick question for you on mep_word_persistence table if you have a minute [19:28:54] !log deployed refinery [19:28:55] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:29:02] \o/ [19:29:04] joal, I'm here [19:29:50] there are 3 fields being tinyint(4) (minor, censored and non_self_censored) - Shall I consider them boolean? [19:30:06] Yes. [19:30:14] Ok - Thanks [19:32:28] milimetric: by refinery deployed, you mean also to hadoop? [19:33:29] yes joal sorry not clear [19:33:37] np milimetric - Will restart jobs [19:33:45] k, thank you!Q [19:34:03] Thanks you actually :) [19:41:41] ok milimetric, jobs restarted - They are now waiting for actor and comment :) [19:42:01] sweet [19:53:38] 10Analytics, 10Analytics-Kanban, 10User-Elukey: Sqoop staging.mep_word_persistence to HDFS and drop the table from dbstore1002 - https://phabricator.wikimedia.org/T215450 (10JAllemandou) Job started with the command: ` sudo -u hdfs sqoop import \ -D mapred.job.name='sqoop-staging-mep_word_persistence' \... [19:54:04] \o/ [19:54:17] So nice to be able to move these kinds of datasets to HDFS/Hive :) [19:54:53] halfak: Let's hope the job will not break the mysql machine :S [19:55:16] But yes, HDFS is right place (at least I htink it is) for this type of dataset [20:00:57] 10Analytics, 10Analytics-EventLogging, 10Discovery, 10EventBus, and 2 others: Rewrite Avro schemas (ApiAction, CirrusSearchRequestSet) as JSONSchema and produce to EventGate - https://phabricator.wikimedia.org/T214080 (10Ottomata) @bd808, let's discuss your Monolog idea from https://gerrit.wikimedia.org/r/... [20:01:45] joal: here and ready to talk about mw history [20:06:09] nuria: heya - milimetric has deployed refinery for a quick bug fix, and jobs have been restarted [20:07:14] joal: this one? https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/488531/1/oozie/mediawiki/history/check_denormalize/coordinator.xml [20:07:44] yes [20:10:07] joal: was that not an issue for prior snapshots? [20:10:49] nuria: the bug only affects the year part of the date, meaning same-year snapshots checks don't fail [20:11:01] nuria: it only fails for january [20:11:07] every year :) [20:12:01] joal: aahahah [20:17:21] 10Analytics, 10Discovery-Search, 10Multimedia, 10Research, and 2 others: Image Classification Working Group - https://phabricator.wikimedia.org/T215413 (10Ramsey-WMF) [20:28:25] Gone for tonight team - ottomata: will continue my tests tomorrow, but I think we'll need to be very carefull in term of allowed structs we'll convert: Every field of the stuct needs to be the same ... [20:31:55] joal: that should be doable with validation [20:32:29] wait, every field of the struct? [20:32:43] oh just the types, rigth? [20:32:48] corect [20:32:49] not the names of the struct fields (map keys) [20:32:50] right ok [20:32:53] that is possible via vaildation [20:32:56] https://gerrit.wikimedia.org/r/#/c/mediawiki/event-schemas/+/487154/2/jsonschema/test/event/0.0.3 [20:32:58] line 50 [20:33:10] additionalProperties [20:33:15] specifies the type of the value [20:33:17] and keys are always strings [20:33:47] ok ottomata, noted - Will test with that in mind [20:33:48] so, its psosibel to have an object value (which will be struct to hive), but it will be specified in the schema and validated before it gets to hive [20:33:49] ok cool [20:33:51] thanks joal! [20:40:36] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team Backlog (Later), 10Services (later): Make schemas use required $schema property with absolute path (not absolute URL) to the schema - https://phabricator.wikimedia.org/T208361 (10Ottomata) Wow @Pchelolo, I just came across https://cl... [20:56:01] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team Backlog (Later), 10Services (later): Make schemas use required $schema property with absolute path (not absolute URL) to the schema - https://phabricator.wikimedia.org/T208361 (10Pchelolo) Interesting.. I will have to read this first... [21:01:51] ottomata: do we feel it is ok to add an event with about half as much traffic than this one? https://grafana.wikimedia.org/d/000000018/eventlogging-schema?orgId=1&var-schema=VirtualPageView (i think so but so to triple check) [21:17:01] 10Analytics, 10Analytics-EventLogging, 10Discovery, 10EventBus, and 2 others: Rewrite Avro schemas (ApiAction, CirrusSearchRequestSet) as JSONSchema and produce to EventGate - https://phabricator.wikimedia.org/T214080 (10Pchelolo) > If we were to use Monolog, we'd likely want to do it with the aim of conve... [21:26:23] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team Backlog (Later), 10Services (later): Make schemas use required $schema property with absolute path (not absolute URL) to the schema - https://phabricator.wikimedia.org/T208361 (10Ottomata) Yeah, here `data` is basically the same as o... [21:43:35] 10Analytics, 10Analytics-EventLogging, 10Discovery, 10EventBus, and 2 others: Rewrite Avro schemas (ApiAction, CirrusSearchRequestSet) as JSONSchema and produce to EventGate - https://phabricator.wikimedia.org/T214080 (10bd808) >>! In T214080#4932784, @Ottomata wrote: > @bd808, let's discuss your Monolog i... [22:22:35] elukey: Works great [22:23:01] elukey: Although one thing that's potentially going to trip people up is that if you connect to s2-analytics-replica on port 3313 (should be 3312), it'll connect but you get the s2 wikis [22:23:53] I know that's not what I'm supposed to do, and I understand why it happens (both cnames point to the same IP), but someone somewhere is likely to be confused by that for a while [22:24:10] Not sure that you have a better option because you can't tie port names to cnames [22:25:24] I guess maybe each dbstore host could listen on a number of different IPs, one for each sN? I guess thatwould take up more IP address space, but hopefully we have enough in the 10.* space? [22:25:57] Anyway, I don't mean to Monday morning quarterback / second-guess the design for this. If I do what the docs tell me to do, everything works, so yay :) [23:12:47] 10Analytics, 10Analytics-EventLogging, 10Discovery, 10EventBus, and 2 others: Rewrite Avro schemas (ApiAction, CirrusSearchRequestSet) as JSONSchema and produce to EventGate - https://phabricator.wikimedia.org/T214080 (10Tgr) IMO Monolog, and event logging in general, is meant for collecting data. Much of...