[00:28:29] 10Analytics, 10DBA, 10Data-Services, 10Research, 10cloud-services-team (Kanban): Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511#3530850 (10MZMcBride) This task is potentially of interest to me. I previously ran a dumb report with... [00:53:34] 10Analytics, 10DBA, 10Data-Services, 10Research, 10cloud-services-team (Kanban): Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511#3739451 (10bd808) >>! In T173511#3739381, @MZMcBride wrote: > If this task were implemented, under "Pr... [02:22:41] (03PS1) 10Milimetric: Add hif.wiktionary to whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/389658 [02:22:56] (03CR) 10Milimetric: [V: 032 C: 032] Add hif.wiktionary to whitelist [analytics/refinery] - 10https://gerrit.wikimedia.org/r/389658 (owner: 10Milimetric) [07:15:55] 10Analytics-Kanban, 10Analytics-Wikistats: Fonts from fonts.googleapis.com on wikistats - https://phabricator.wikimedia.org/T178317#3688129 (10Nemo_bis) > this might violate the privacy policy Definitely yes. Thanks for removing them. [07:30:41] 10Analytics, 10Analytics-Wikistats, 10I18n: Move non-SI prefixes to user- or locale-specific interface - https://phabricator.wikimedia.org/T179906#3739830 (10Nemo_bis) [08:54:50] hello team! eventlogging_cleaner just finished to sanitize data up to the middle of 2016 \o/ [09:31:49] Yay elukey ! [09:41:08] it is difficult to find a table with data to sanitize [10:23:52] joal: would you mind to log to the staging database on db1108 and see if you can create a table? [10:24:18] sure elukey [10:24:36] elukey: however I am not sure about user/procedure [10:24:48] elukey: Should I use my ldap username, or research? [10:25:13] In theory both should work [10:25:27] ok [10:27:08] elukey: ldap username fails [10:28:31] joal: can you retry now? [10:29:05] elukey: worked using research user [10:29:49] elukey: not for my personal user nope [10:31:11] are you able to connect to anything on db1108? I checked and you don't seem to have perms for your username [10:31:44] (sorry for the annoying questions :( ) [10:32:15] elukey: elukey nope, can't even log in mysql [10:32:34] elukey: don't worry, I'm glad I can help :) [10:33:12] ahhh okok so it makes sense [10:33:17] good all works as expected :) [10:33:24] great ! [10:33:36] so I created a empty staging db on db1108 so people will have a scratchpad [10:33:54] then we'll need to figure out how/where to migrate the one on db1047 to dbstore1002 or elsewhere [10:34:05] I'll write everything in the email to engineering [10:34:10] and ask people to chime in the task [10:34:11] You rock elukey :) [10:34:20] \o/ [11:44:16] 10Analytics-Tech-community-metrics, 10Developer-Relations (Oct-Dec 2017): Empty or incorrect data on Kibana's "Git-Demographics" dashboard - https://phabricator.wikimedia.org/T171240#3740667 (10Aklapper) [11:44:29] 10Analytics-Tech-community-metrics, 10Developer-Relations (Oct-Dec 2017): Empty or incorrect data on Kibana's "Git-Demographics" dashboard - https://phabricator.wikimedia.org/T171240#3458588 (10Aklapper) 05Open>03Resolved [12:34:54] Taking a break a-team [12:35:14] elukey you're the beeeeest [12:47:03] fdans: hello Francisco, not sure why but thanks <3 [12:47:16] * elukey going to lunch + errand [14:16:42] mforns: o/ [14:18:16] whenever you have time I'd like to do some vetting on el tables (db1108) [14:18:20] el cleaner completed! [14:36:32] elukey, coool, OK will do today [14:36:45] elukey, completed completed? until today? [14:37:10] mforns: nope, until 2016-06-21 [14:37:23] ok, great :] [14:37:33] I tried to check this morning some tables but most of the times all fields were already nulled [14:37:40] this way I can test before and after [14:37:45] aha [14:38:07] so I didnt' find some proof that all is working :D [14:39:25] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: Resolve EventCapsule / MySQL / Hive schema discrepancies - https://phabricator.wikimedia.org/T179625#3741250 (10Ottomata) @krinkle, I just submitted https://gerrit.wikimedia.org/r/#/c/389713/, let me know what you think. Also, do you use the... [14:40:07] mforns: but it is impressive that in 3/4 days db1108 crunched this amount of data [14:40:17] true that we don't have anymore some huge tables [14:40:33] aha [14:40:34] but it runs fast anyway :D [14:40:49] 100k updates with sleep 2 seconds [14:41:12] yea, that's awesome [14:43:51] ottomata: o/ [14:44:02] I am rolling restart jumbo for jvm updates [14:45:10] hiii ok! [14:50:12] 10Analytics-EventLogging, 10Analytics-Kanban: Timestamp format in Hive-refined EventLogging tables is incompatible with MySQL version - https://phabricator.wikimedia.org/T179540#3741320 (10Ottomata) > Aye! but MySQL users are not the only user of this data! The performance team folks use it build Grafana dashb... [14:58:54] 10Analytics-Kanban, 10User-Elukey: Restart Analytics JVM daemons for open-jdk security updates - https://phabricator.wikimedia.org/T179943#3741372 (10elukey) [15:06:14] joal: --^ [15:23:54] elukey: not in the list: NameNode and ResourceManager (analytics100[12] - On purpose? already done?) [15:26:10] nono I forgot, adding them [15:27:00] 10Analytics-Kanban, 10User-Elukey: Restart Analytics JVM daemons for open-jdk security updates - https://phabricator.wikimedia.org/T179943#3741501 (10elukey) [15:36:32] 10Analytics-Kanban, 10User-Elukey: Restart Analytics JVM daemons for open-jdk security updates - https://phabricator.wikimedia.org/T179943#3741522 (10elukey) [15:37:17] ottomata: qq whenever you have time - I'd need to create a druid cluster (even one node) in labs to test my prometheus stuff [15:37:49] is there anything running by any chance? [15:37:52] naw [15:37:56] but should be easy enough [15:38:21] have fun copy and pasting all those hiera yamls into the right place because I know you love those profile params without defaults so miuch nannanananabooboooooo :p :) [15:38:48] * elukey <_< [15:39:04] :D [15:39:13] what deep storage should I configure? [15:39:20] do you need to test hdfs stuff? role::druid::worker [15:39:23] oops sorry [15:39:25] first q: [15:39:29] do you need to test hdfs stuff? [15:39:30] ? [15:39:56] nono i need something that pushes metrics via http [15:40:10] ok, i think if you don't set that stuff [15:40:16] it'll pick sane defaults in the module [15:40:20] super [15:40:28] and I guess I can run a single node right? [15:40:40] yeah [15:40:45] you'll need zk [15:40:47] somewhere [15:41:07] yep yep [15:41:12] i think you can leave out setting all the *::properties hiera profile params to empty [15:41:27] profile::druid::common::use_cdh: false [15:41:36] ack! [15:41:56] going to do some unit testing and then I'll try to work on druid in labs [15:42:13] oo, if you get a zk runnin glocally on the same host [15:42:19] the druid will default to using localhost:2181 [15:49:26] ottomata: hola [15:49:52] ottomata: the db on beta cluster might be horked , i ma going to reboot mysql [15:51:12] waait [15:51:14] i'm in it now [15:51:41] nuria_: what's the problem? [15:51:50] ottomata: it does not create tables [15:51:59] ottomata: there is no popup table [15:52:11] i deleted the log databse yesterday [15:52:15] because it had totally filled up disk [15:52:22] do we know there are popup events in beta right now? [15:52:40] ottomata: i send several yesterday [15:52:44] ottomata: and you can too [15:52:46] https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/TestingOnBetaCluster#How_to_log_a_client-side_event_to_Beta_Cluster_directly [15:53:00] ottomata: let me see those are valid ones [15:57:18] i see them coming in, butnot going to valid-mixed topic [15:59:53] 10Analytics-Kanban, 10User-Elukey: Restart Analytics JVM daemons for open-jdk security updates - https://phabricator.wikimedia.org/T179943#3741623 (10MoritzMuehlenhoff) Note that the hadoop clusters and kafka* are running Java 7 and there hasn't been an openjdk-7 release yet (so also no update in Debian), so a... [16:00:43] ping milimetric ottomata [16:01:22] oh [16:01:47] milimetric: Can you hear us? [16:01:50] 10Analytics-Kanban, 10User-Elukey: Restart Analytics JVM daemons for open-jdk security updates - https://phabricator.wikimedia.org/T179943#3741633 (10elukey) Yep, I was planning to keep this open until the new updates will arrive for jdk 7, but if this is too far in the future I'll open a new phab task later o... [16:06:51] oh nuria duh. Popups is blacklisted from MySQL [16:29:21] 10Analytics-Kanban: Geowiki stopped updating on October 24th - https://phabricator.wikimedia.org/T179952#3741723 (10Milimetric) [16:31:27] ottomata: i am a dummy . twice [17:10:34] Gone for dinner a-team, back after [17:29:03] 10Analytics, 10EventBus, 10Services (next): Clean up retry-retry Kafka topics - https://phabricator.wikimedia.org/T179958#3741948 (10Pchelolo) [17:49:27] * elukey off! [17:50:25] a-team let me know if you want me to make adjustments to the email for engineering@ (About the db refactor), if I don't hear anything I'll send it tomorrow morning :) [17:50:28] byyyee [18:20:35] milimetric: i had put the caching dataset here: https://analytics.wikimedia.org/datasets/archive/limn-public-data/caching/ [18:20:54] milimetric: but i think things got moved arround, do you know where it could be now? [18:30:00] nuria_: https://analytics.wikimedia.org/datasets/archive/public-datasets/analytics/caching/ [18:30:02] is that it? [18:31:03] milimetric: YES! [18:31:12] :) yay [18:31:21] 10Analytics-Kanban: Compile a request data set for caching research and tuning - https://phabricator.wikimedia.org/T128132#3742235 (10Nuria) Updated link to data: https://analytics.wikimedia.org/datasets/archive/public-datasets/analytics/caching/ [18:31:27] hey team, anyone is up for some navigationtiming-druid brainbounce? there's some data already in pivot: https://tinyurl.com/ybs7u4z5 [18:32:17] mforns_: i can , let me move places and find a source of power [18:32:28] k! [18:36:44] mforns: omw to cave [18:40:48] anyone know how to fail a hive job in oozie if some partition doesn't exist? I re-deployed one of our repos yesterday but forgot to set the partition for wmf_raw.mediawiki_project_namespace_map appropriately, so while the jobs finished their result was partitions with 0 rows. Ideally i'd like to figure out how to make the hive script fail if the partition doesn't exist [18:42:59] hmmm, good q ebernhardson not sure hm. [18:43:54] basically you want fail if 0 results? [18:43:58] hm [18:44:03] ottomata: i suppose that would work too [18:44:06] ebernhardson: not sure, i think maybe [18:44:13] you'd need to explicitly check if partition exists somehow [18:44:17] as a separate oozie step [18:44:21] you can [18:44:25] show partitions tableName [18:44:26] in hive [18:44:27] but [18:44:28] hm [18:44:46] ebernhardson: are your oozie jobs launching on existence of partitions afte rload? [18:45:16] ottomata: ooh, yea i suppose i could add a dataset requirement [18:45:27] ottomata: it currently only depends on wmf_raw.cirrussearchrequestset and wmf.webrequest partitions [18:46:41] ebernhardson: ya that would probably help, if you need partitions to exist before job runs [18:46:55] probably better to make explicit dataset dependency rather than failling if not exists [18:53:00] +1 for that approach ebernhardson :) [18:56:29] what creates wmf_raw.mediawiki_project_namespace_map ? I don't see it in analytics/refinery so not sure what to set for instance variables in oozie. It looks like i need roughly month-1, but not quite exactly [18:57:12] for example oct was created nov 3, sept was created oct 3, aug was created sept 13 [18:59:01] ebernhardson: wmf_raw.mediawiki_project_namespace_map data gets created from a cron runnin [18:59:26] refinery/bin/wmf_raw.mediawiki_project_namespace_map [18:59:47] joal: ahh ok, thanks! [19:00:13] This cron is supposed to run every month, adding a new snapshot - it runs the 3rd of the month, to prevent having big uniques job in parallel with buig sqoop jobs [19:00:18] ebernhardson: --^ [19:00:36] I think for aug we had an issue and reissue it manually :) [19:01:20] also ebernhardson - Once data is downloaded, we have an oozie job rebuilding the hive table (refinery/oozie/mediawiki/history/load) [19:01:59] ok i'll figure out how to get oozie + hive to use the new snapshot after the 3rd [19:02:12] ebernhardson: The above job just run REPAIR on all tables we update (sqooped + namespace_map) and flags them with a _PARTITIONED for later oozie dependencies [19:02:51] You have refinery/oozie/mediawiki/history/datasets_raw.xml to help with that: ) [19:02:55] ebernhardson: --^ [19:03:46] hey analytics [19:03:47] I was checking out the status of wikistats 2.0, beautiful work [19:03:50] ooh, yes! somehow my grep missed that one [19:03:56] one thing I noticed, IDK if you’re working on it: [19:04:27] timestamps in the human-readable table view should probably be formatted [19:04:59] e.g. Unique Devices, Monthly, enwiki – gives [19:05:00] Month Total [19:05:01] "2016-01-01T00:00:00.000Z" 661172290 [19:07:20] Hi DarTar, thanks for the feedback :) [19:07:38] hey joal [19:07:50] love the overall UX and usability [19:07:54] DarTar: Would you mind creating a task about that ? fdans and milimetric currently work the UI, so it could be picked up pretty fast :) [19:08:01] will do [19:08:07] DarTar: Some effort has been put onto that, yes ;) [19:08:34] the other thing I noticed is the metric definition, currntly linking out to full-text on wiki. Any plans to have tooltips? [19:08:54] DarTar: I have no clue :) [19:09:20] joal: I’ll file tasks, you guys can close them as invalid as appropriate [19:09:35] mforns: nuria_, what happens if i merge the mariadb el whitelist change? [19:09:42] is anything else needed? [19:09:46] its totally safe to merge? [19:09:48] ottomata, no [19:09:49] I'm pretty sure they'll be given some thoughts DarTar :) Many thanks for that ! [19:10:20] ottomata, yes, it means the next time we execute the purging script, the fields that were removed from the white-list will be also purged! [19:10:34] ok cool [19:10:37] merging then [19:10:40] :] [19:13:15] joal: is #Analytics-wikistats the right tag? [19:13:50] DarTar: #Analytics, and #analytics-wikistats please :) [19:14:12] joal: cool [19:15:02] DarTar: probably not tooltips for the near term [19:15:22] DarTar (cc joal): analytics-anything automatically tags analytics, btw, so you can tag intuitively [19:15:32] DarTar: but yes to date formatting for humans [19:15:44] Oh I didn't know !!! Thanks milimetric :) [19:15:52] nuria_ / DarTar we do have a tooltip task that I'd like to get to sooner than later: https://phabricator.wikimedia.org/T177950 [19:17:10] 10Analytics, 10Analytics-Wikistats: Format timestamps as human-readable datetimes in Wikistats 2.0 table view - https://phabricator.wikimedia.org/T179971#3742368 (10DarTar) [19:18:58] 10Analytics, 10Analytics-Wikistats: Add tooltips for metric definitions in Wikistats 2.0 - https://phabricator.wikimedia.org/T179973#3742396 (10DarTar) [19:19:12] milimetric: oh cool [19:19:29] I just posted a phab task :) [19:19:51] ottomata: let me know if you want my very not helpful self helping you test EL chnages in beta [19:19:59] ottomata: after my stellar performance yesterday [19:25:45] 10Analytics, 10Analytics-Wikistats: How to link/cite to Wikistats 2.0 - https://phabricator.wikimedia.org/T179975#3742432 (10DarTar) [19:26:06] alright, done filing wikistats suggestions. ttyl folks [19:26:25] 10Analytics, 10Analytics-Wikistats: How to link to/cite Wikistats 2.0 - https://phabricator.wikimedia.org/T179975#3742445 (10DarTar) [19:26:46] milimetric: I think I have encountered a new edge case for deletions :) [19:26:50] milimetric: wanna see? [19:27:49] on my way joal [19:30:57] 10Analytics, 10Analytics-Wikistats: Beta: How to link to/cite Wikistats 2.0 - https://phabricator.wikimedia.org/T179975#3742449 (10Nuria) [19:54:46] 10Analytics-Kanban, 10Performance-Team, 10Patch-For-Review: Explore NavigationTiming by faceted properties - EventLogging refine - https://phabricator.wikimedia.org/T166414#3742488 (10mforns) We encountered a couple difficulties in the way Pivot works versus the nature of NavigationTiming measures: - Navigat... [19:58:18] 10Analytics-Kanban: Create scala-spark job to ingest simple data sets from Hive-EventLogging to Druid to Pivot - https://phabricator.wikimedia.org/T179976#3742504 (10mforns) [20:10:58] back is killing me, gonna go see a chiropractor [20:11:00] oof, can't focus [20:24:57] Hello! Do we have the application data in hadoop? I was running a query in commonswiki database and used `INSTR()` on a large table. It took more than a week to run and the query got killed finnaly... :( [20:27:42] what does applicatino data mean? [20:29:09] like the data we have in mysql for enwiki, commonswiki, etc, which contains tables like user, page, logging, etc [20:29:58] https://upload.wikimedia.org/wikipedia/commons/9/94/MediaWiki_1.28.0_database_schema.svg [20:30:31] 10Analytics-EventLogging, 10Analytics-Kanban: Timestamp format in Hive-refined EventLogging tables is incompatible with MySQL version - https://phabricator.wikimedia.org/T179540#3742635 (10Ottomata) I talked with Timo and with the A-team, and we decided it'd be best to remove `timestamp` altogether in favor of... [20:31:06] chelsyx: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits [20:32:16] But not all tables are there, right? I need the tables like image, categorylinks... [20:36:41] chelsyx: no, not all tables are scooped [20:37:02] chelsyx: the ones that are are the ones needed to rebuild history [20:37:17] chelsyx: and image is not one of them [20:37:36] chelsyx: you can scoop data though to your user space [20:38:36] chelsyx: tables scooped for mw are here: https://github.com/wikimedia/analytics-refinery/blob/master/bin/sqoop-mediawiki-tables [20:39:10] chelsyx: but scooping might not be teh solution here, what data are you interested in? [20:41:14] fdans, milimetric : iam going to split off js loading and css loading in webpack ok? [20:42:36] nuria_: I need to count the number of files by number of categories on commons wiki. This is the query that got killed: https://phabricator.wikimedia.org/P6283 [20:44:32] chelsyx: and this data is not on dumps at all (no idea, just asking) [20:46:08] nuria_: I'm not sure if it is in dumps... I had a hard time parsing dumps data... scooping tables to my user space looks like a good solution for me [20:46:46] nuria_: Thanks! [20:46:47] chelsyx: scooping is also not free of hardships so you know [20:47:02] chelsyx: so i woudl not rule out dumps [20:47:45] nuria_: Got it. I will try. Thanks! :) [21:03:26] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: Resolve EventCapsule / MySQL / Hive schema discrepancies - https://phabricator.wikimedia.org/T179625#3742753 (10Ottomata) [21:03:34] 10Analytics-EventLogging, 10Analytics-Kanban, 10Patch-For-Review: Resolve EventCapsule / MySQL / Hive schema discrepancies - https://phabricator.wikimedia.org/T179625#3731162 (10Ottomata) [21:10:46] nuria_: kinda want to deploy https://gerrit.wikimedia.org/r/#/c/388255/ tomorrow morning. [21:10:50] got a +1 for me :D [21:10:51] ? [21:20:29] ottomata: sounds good, it being a no-op [21:20:45] ottomata: besides my mostly failed attempts to look at beta i did not do anything [21:21:03] its running in beta now, i'm going to delploy one more version there that updates the capsule scid [21:21:05] with a new capsule version [21:21:12] ottomata: I checked there were no errors in mysql consumer [21:21:13] that version will be backwards compatible too [21:21:20] ottomata: but did not looked at records on db [21:21:23] ottomata: did you? [21:21:41] i did yesterday, but not today [21:21:53] actually yeah, i'm looking at one right now :) [21:26:38] ottomata: ok , is UA string like it was before ( i imgine so or inserts will fail) [21:26:42] *imagine [21:27:16] fdans: do you have a .jscsrc for wikistats you want to share? [21:28:34] nuria_: it is currently string like before. [21:28:50] i'm (trying?) to make userAgent in capsule schema a union type of ['object', 'string'] [21:28:52] ottomata: right, just triple checking [21:29:18] ottomata: that seems like it should not work, does it? [21:29:35] yeah it does, wiki not letting me save schema atm... [21:29:40] Value [ "object", "string" ] not in enum for property Schema -> Properties (properties) -> userAgent -> Type (type) [21:29:42] whatever that means! [21:30:06] schema is valid though. [21:37:22] yargh, nuria_ eventlogging extension validates that the schemas conform to a schema [21:37:29] which is not just the draft 03 jsonschema [21:37:53] https://github.com/wikimedia/mediawiki-extensions-EventLogging/blob/master/schemas/schemaschema.json [21:39:16] rats, i think i gotta use type: any [21:40:24] ottomata: so many things , i actually think that any seems better here [21:40:57] ottomata: cause object, string and null seems a bit a random bag of types [21:52:06] ok [21:52:08] fdans: so once started npm run dev should pick up changes and deploy them to local dist-dev directory right? [21:52:09] new capsule version [21:52:09] https://meta.wikimedia.org/wiki/Schema:EventCapsule [21:52:26] https://meta.wikimedia.org/w/index.php?title=Schema%3AEventCapsule&type=revision&diff=17397982&oldid=16619630 [21:52:46] ottomata: ok, if jsonschema swallows that... [21:53:04] nuria_: yes, it should regenerate bundle.js [21:53:21] also, EL in beta is running with latest patch and latest schema [21:53:33] fdans: i bet it does it when you change js files but not teh webpack config [21:53:50] that’s right! [21:54:19] fdans: ya, i see [23:30:56] 10Analytics-Kanban, 10Patch-For-Review: Remove AppInstallIId from EventLogging purging white-list - https://phabricator.wikimedia.org/T178174#3743156 (10JMinor) We don't consent to this change. Before you merge a change which will irrevocably delete data that could be vital to the long term understanding of ou...