[01:59:36] 10Analytics-Kanban: Create small sample mediawiki-history table in MariaDB - https://phabricator.wikimedia.org/T165309#3826835 (10Milimetric) Made this in milimetric.mediawiki_history_sample, just to see the size. I did 2 days of everything and all of history for etwiki. It's about 7 million rows, 2.7 GB. Loo... [04:58:53] 10Analytics-Data-Quality, 10Reading-analysis: Number of nlwiki (biography) articles getting consistently ~70 hits per day for the past months - https://phabricator.wikimedia.org/T180621#3826894 (10Tbayer) 05Open>03Resolved I ran more queries and confirmed that almost all traffic to those four articles (on... [05:19:36] 10Analytics-Data-Quality, 10Reading-analysis: Number of nlwiki (biography) articles getting consistently ~70 hits per day for the past months - https://phabricator.wikimedia.org/T180621#3826904 (10Effeietsanders) Thanks. I don't think we have contact yet, but I'll reach out. Seems an interesting party to have... [09:33:26] Heya team [09:39:00] 10Analytics, 10Research, 10WMDE-Analytics-Engineering, 10User-Addshore: dbstore1002 (analytics store) enwiki lag due to blocking query - https://phabricator.wikimedia.org/T175790#3827242 (10jcrespo) I think this could be still happening once a week, probably due to some cron job. [09:39:12] hello! [09:49:35] elukey: I think I have understood why the streaming failed the other days [09:49:43] elukey: it;s a tricky one, but makes senase [09:49:43] wooowww [09:50:08] elukey: spark side - nothing to do with tranquility or druid - we're safe on the idea of the realtime metrics covering our back :) [09:52:21] Also elukey - all new jobs for mediawiki-history succeeded - I Need to document data quality research I made, and I think we're good to announce :) [09:52:55] \o/ [09:57:00] elukey: If you want, I think we can think of moving forward with netflow stream [09:57:47] sure, if this will not take a huge amount of time it is fine to me [09:58:31] elukey: the streaming part shouldn't be too hard [09:58:55] elukey: then there is the batch one, which without being difficult is a bit more work (camus + oozie) [09:59:36] joal: since it is not urgent, we do it next quarter? [09:59:41] sure [10:43:04] (03CR) 10Fdans: [C: 032] Add .gitreview [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/395536 (owner: 10Hashar) [10:45:36] hi teammm! [10:46:01] wololooooo mforns [10:46:10] :] [11:09:09] Hi mforns and fdans :) [11:09:22] heya :] [11:16:13] (03CR) 10Hashar: "> we were unaware that our current CI infrastructure had Firefox and Chromium installed" [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/395537 (owner: 10Hashar) [11:41:01] * elukey lunch! [12:01:32] 10Analytics, 10Research, 10WMDE-Analytics-Engineering, 10User-Addshore: dbstore1002 (analytics store) enwiki lag due to blocking query - https://phabricator.wikimedia.org/T175790#3827600 (10Addshore) >>! In T175790#3827242, @jcrespo wrote: > I think this could be still happening once a week, probably due t... [12:08:37] 10Analytics, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 4 others: Migrate htmlCacheUpdate job to Kafka - https://phabricator.wikimedia.org/T182023#3827651 (10mobrovac) Switching all small non-WP projects should be a non-brainer, so I'd vote for switching them + `cebwiki` and `ruwiki`. This should... [12:34:14] joal, yt? [12:34:17] yup [12:34:21] What's up? [12:34:59] joal, I got results for monthly stdev in bycountry data set, but it surprises me that the bigger the time span is, the higher is the stdev curve... [12:35:07] intuitively iy should be lower [12:35:30] do you mind looking at my queries to see if they make sense? [12:35:38] sure mforns [12:36:15] joal, bc? [12:36:20] sure [12:36:24] omw [12:46:53] mforns: hello. Do you know whether analytics/mediawiki-storage is still being used? :) [12:47:08] npm install yells at me because of bower [12:47:09] hashar, I think so [12:47:10] warning bower@1.8.2: ...psst! Your project can stop working at any moment because its dependencies can change. Prevent this by migrating to Yarn: https://bower.io/blog/2017/how-to-migrate-away-from-bower/ [12:47:11] [12:47:11] :D [12:56:31] He mforns - sorry for the disruptin [12:56:38] joal, np [12:57:00] mforns: As I briefly said - If results are the same for monthly than they were for daily, maybe no point [12:57:18] I'd expect that k would be similar, but that we would lose less data [12:57:35] aha [12:57:41] joal, maybe... [12:58:09] if we kept adding more and more data to the study, the variability would at some point cease to grow [12:58:29] but there's so much variability that we didn't reach that neither with daily nor with monthly [12:59:06] !log disabled druid middlemanager on druid1002 to drain+restart with new logging config [12:59:07] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:59:42] I mean, the data set is sparse enough, so that adding more data still adds more new buckets, adding more variability [13:00:42] If we kept incrementing granularity, the data set would at some point not be sparse, and then the stddev would cease growing? [13:05:55] 10Analytics-Kanban, 10Operations, 10ops-eqiad, 10Patch-For-Review: kafka1018 fails to boot - https://phabricator.wikimedia.org/T181518#3827811 (10elukey) Updated procedure after a chat with @Volans: 1) shutdown notebook1002 2) replace it in DNS and puppet with kafka1023 (caveat: wmf-auto-reimage will need... [13:06:35] mforns: I think I don't follow- What you mean is that, when adding more month, more pairs (project-country) show up, therefore adding variability? [13:06:49] yes [13:12:40] Oh by the way mforns - noticed something on AQS/wikistats - Not sure what to do [13:12:52] what happened? [13:13:18] mforns: the activity-levels we use don't match the ones wikistats1 use: we do 1..4, 5..24, 25..99, 100.. [13:13:32] Wikistats does: 1..5, 6..100, 101.. [13:13:37] :( [13:13:45] I just noticed why writing the doc [13:13:50] aha [13:13:54] s/why/while/ [13:14:26] Since we've not yet luanched, I think we chan easily change, but it involves many changes: AQS, restbase, fornt-end [13:14:42] :/ [13:15:37] if it was for free, what would you do? copy wks1? [13:16:03] good question mforns [13:16:21] actually, not sure what the best partition is [13:16:38] maybe we can check the distribution of editors and decide which one is the best [13:16:38] neither am I [13:17:03] mforns: It's small detail, but it's a diff [13:17:08] I can try to query that, now that I'm in stats mode [13:17:10] fdans: do you have an opnion? --^ [13:17:26] mforns: don't even bother, I have it in druid already [13:17:31] joal, oh ok! [13:17:38] reading... [13:20:32] joal: in terms of wikistats 2 usability, I would prefer not to increase the number of activity levels [13:21:04] because we need to make as many requests as activity levels when breakdowns are on [13:21:27] fdans: I hear that, question is: should we move from 1..4, 5..24, 25..99, 100.. to 1..5, 6..25, 26..100, 101.. [13:21:43] oh sorry I misunderstood [13:22:55] joal: hmmm I think we _should_, but we don't have to do it _now_ [13:23:05] hm [13:23:19] If we decide to do it, I'd go for it NOW [13:23:25] before anouncement [13:23:35] Let's ask milimetric [13:23:39] and possibly nuria_ [13:24:25] ok [13:24:50] but wouldn’t that not match wikistats for the 5+? [13:25:11] milimetric: I observe small diffs [13:25:13] That’s probably the most important thing to have, 5+, 100+ [13:25:30] milimetric: we completely missed it [13:25:45] ok, starting to patch AQS and restbase [13:25:59] No, I mean changing the ranges would make 5+ impossible to calculate right? [13:26:13] But as is, you can do it [13:26:24] milimetric: 5+ is the sum of the 3 others [13:26:33] Right [13:26:42] milimetric: and actually it should be named: 6+ [13:26:56] because it's >5 [13:26:57] I’m getting confused [13:27:12] No, it’s 5 or more, by wikistats convention [13:27:17] milimetric: problem is at the boundaries [13:28:25] oh, milimetric - Thank you for having me checking this again - I got confused by the writing >5 [13:28:34] You're right - sorry for the noise a-team :( [13:28:56] we can chat in a bit when I’m off baby duty :) [13:29:05] np milimetric - Thanks for fast asnswer ! [13:29:16] fdans: sorry again [13:29:32] nono :) [13:38:24] (03CR) 10Addshore: Record metrics for Wikidata task priorities (via color) (031 comment) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/396065 (owner: 10Thiemo Mättig (WMDE)) [13:38:57] (03CR) 10Addshore: [C: 04-1] "Added a comment to PS1, I'll merge this as soon as it has some sort of check / whitelist for the metrics that it submits." [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/396065 (owner: 10Thiemo Mättig (WMDE)) [13:40:37] 10Analytics-Kanban, 10DBA, 10Operations, 10Patch-For-Review, 10User-Elukey: Decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3827865 (10Marostegui) 05Open>03Resolved As per our chat, closing this. Thanks for all the hard work you've put to make this happen! [13:45:59] (03CR) 10Thiemo Mättig (WMDE): "I disagree, or I did not got your point. What problem are you trying to solve?" (031 comment) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/396065 (owner: 10Thiemo Mättig (WMDE)) [13:48:52] (03CR) 10Addshore: Record metrics for Wikidata task priorities (via color) (031 comment) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/396065 (owner: 10Thiemo Mättig (WMDE)) [13:56:03] (03CR) 10Thiemo Mättig (WMDE): Record metrics for Wikidata task priorities (via color) (031 comment) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/396065 (owner: 10Thiemo Mättig (WMDE)) [13:59:36] (03PS3) 10Thiemo Mättig (WMDE): Record metrics for Wikidata task priorities (via color) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/396065 [14:04:28] ok, reporting for duty [14:04:46] is everything settled, then, joal? [14:07:51] !log disable druid middlemanager on druid1003 to drain + restart to pick up new logging settings [14:07:52] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:08:03] joal: https://grafana.wikimedia.org/dashboard/db/prometheus-druid?panelId=41&fullscreen&orgId=1&from=now-1h&to=now :) [14:08:28] in ~1h we should have the three redundant streams [14:11:48] so cool elukey :) [14:26:34] fdans, can you please pass me the link to your spreadsheet about the correlation of pageviews with edits in the bycountry data set? I will start writing some docs, if you're ok [14:26:50] mforns: sure [14:27:02] :] [14:27:04] * elukey hates hw raid controllers [14:28:00] mforns: just sent it to your email [14:28:21] elukey, hate leads to suffering and to the dark side [14:28:33] thanks fdans! [14:30:17] mforns: I blame kafka [14:30:30] hiiiiii [14:30:46] hehe [14:30:52] hey ottomata :] [14:36:24] a-team, sorry for all the json-refine spam [14:36:39] i should figure out what to do about that. since the code looks back over the past N hours [14:36:42] ottomata: looked a it and assumed I'd wait for you :) [14:36:46] it will retry the same erroneous hour over and over again [14:36:51] yeah, I just filter it now, until it settles down [14:36:55] aye [14:37:05] yeah, no need for yall to respond (yet) :) [14:37:13] especially since its just this poups things [14:37:14] thing [14:38:33] taking a break a-team [14:51:07] fdans, what do you think of repeating the views-edits correlation experiment on top of monthly data, now that we know we want to go for monthly? this way we can get a more proper K no? [14:52:07] mforns: that makes sense [14:52:24] ok, I can do that [14:52:26] but we don't even need to run a query, we can just get pageview data by country from cassandra [14:52:33] ottomata: o/ [14:52:33] sorry not by country [14:52:38] by project mforns [14:53:02] fdans, makes sense! [14:53:38] fdans, you mean querying cassandra? [14:53:54] or the API? [14:54:05] cassandra [14:54:18] since we're not using country data for this study [14:54:29] yep [14:54:49] ok, will do, and also the edit side [14:55:15] fdans, will put that on the second tab of your sheet [14:55:26] awesome [14:58:58] (03CR) 10Addshore: "Meh, I wrote a comment in gerrit and then lost it.." [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/396065 (owner: 10Thiemo Mättig (WMDE)) [15:02:53] ottomata or elukey: I always have problems remembering/finding docs on getting to logs for hadoop jobs. I usually end up here: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration#Logs [15:03:03] is there another place it's described, so I can link it from here? [15:04:00] just searched milimetric don't see one [15:04:02] it should be [15:04:08] yarn logs -applicationId [15:04:18] but probably not on the Administration page [15:04:45] yeah I was about to say that [15:04:53] thanks ottomata, I'll just mention that there, and we can move it wherever [15:04:59] ok [15:05:09] mention that it won't work until the job is finished [15:05:10] or dies [15:07:00] k, https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration#Logs [15:10:27] milimetric: do you know if it is possible to upgrade a dependency B to X.Z of a dependency A, when dependency A specifically uses B==X.Y? [15:10:35] with python pip/setup.py whatever? [15:10:51] probably not, right? not without patching dependency A to say B=X.Z? [15:12:03] yeah, if A said maybe B~= or B^= or something, but if it's ==, I think it has to be exact [15:12:35] yeha [15:12:46] if it's useful though, it seems like something they'd accept upstream [15:12:57] at least changing it to >= or something [15:14:14] ottomata: how do I get the applicationId for https://yarn.wikimedia.org/cluster/app/application_1512469367986_17163? [15:14:30] oh, doh application_1512469367986_17163 [15:17:45] :) [15:17:59] hmm, milimetric ya maybe [15:18:10] its superset, the flask-appbuilder folks merged my patch to auto create accounts [15:18:15] wanted to deploy with updated flask-appbuilder [15:18:18] buuut, superset uses == [15:18:52] gotcha, yeah, superset is pretty active, you can probably get them to merge == your new version [15:18:58] hmm yeah maybe.. [15:19:07] k will try [15:28:17] (03CR) 10Thiemo Mättig (WMDE): [C: 04-1] "Thanks a lot for the insight. I found this very helpful. I can as well implement the actual mapping back from colors to priorities then. I" [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/396065 (owner: 10Thiemo Mättig (WMDE)) [15:39:24] fdans: did you see https://gerrit.wikimedia.org/r/#/c/396469/ and https://gerrit.wikimedia.org/r/#/c/396537/ ? [15:41:36] ottomata: vk should be ready to be configured with TLS [15:42:44] (03PS4) 10Thiemo Mättig (WMDE): Record metrics for Wikidata task priorities (via color) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/396065 [15:45:32] milimetric: not yet, will get to them in a lil bit [15:48:27] 10Analytics-Kanban, 10Analytics-Wikistats: Privacy pageview threshold for map report - https://phabricator.wikimedia.org/T181508#3828301 (10mforns) a:03mforns [15:50:53] coOOOOll elukey let's do it! a separate vk instance ya! [15:50:54] ? [15:50:59] i should show you how to make a cert :) [15:51:01] and deploy it [15:51:03] orr actually [15:51:06] maybe you can figure it out? [15:51:18] https://wikitech.wikimedia.org/wiki/Cergen [15:52:19] ottomata: I just checked the puppet code, a new profile for the vk 'webrequest-tls' instance should be enough.. then we just include it on cache::misc :) [15:52:35] I can definitely try to figure it out, maybe after the meetings [15:52:42] today I have been busy with other things :( [15:54:45] k ya, i have tons of meeting stoday [15:54:55] now for the next 3.5 hours :( [15:56:00] hopefully tomorrow we'll start pushing TLS traffic to jumbo :) [15:56:05] ayyye :) [15:56:07] milimetric: something else to add to your doc about hadoop job logs: If you're not the app owner, go for sudo -u hdfs yarn logs --applicationId APP_ID --appOwner APP_OWNER [15:56:15] that would be so great! we might make that goal after all! [15:56:24] Sounds super cool elukey ! [15:57:02] ;) [16:00:36] elukey: statndup! ;) [16:00:37] pink elukey [16:03:28] (03CR) 10Milimetric: [V: 032 C: 032] Correct SQL comments about moderation checks [analytics/limn-flow-data] - 10https://gerrit.wikimedia.org/r/396558 (owner: 10Catrope) [16:13:41] 10Analytics-Kanban: Wikistats Beta: split webpack bundle - https://phabricator.wikimedia.org/T182601#3828410 (10Nuria) [16:13:52] 10Analytics-Kanban: Wikistats Beta: split webpack bundle - https://phabricator.wikimedia.org/T182601#3828419 (10Nuria) [16:14:09] 10Analytics-Kanban: Wikistats Beta: split webpack bundle - https://phabricator.wikimedia.org/T182601#3828410 (10Nuria) a:03Nuria [16:26:34] milimetric, I updated RU's docs for re-runs [16:27:22] (03CR) 10Milimetric: "@Catrope, do you want to delete the old data and re-run all history with this new logic? There's a way to do that in reportupdater, docum" [analytics/limn-flow-data] - 10https://gerrit.wikimedia.org/r/396566 (owner: 10Catrope) [16:30:04] 10Analytics, 10Analytics-Cluster, 10Operations: stat1004 - /mnt/hdfs is not accessible - https://phabricator.wikimedia.org/T182342#3828452 (10Ottomata) 05Open>03Resolved a:03Ottomata Followed https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration#Fixing_HDFS_mount_at_/mnt/h... [16:34:10] 10Analytics-Kanban: Load test druid backend via siege - https://phabricator.wikimedia.org/T182603#3828472 (10Nuria) [16:34:26] 10Analytics-Kanban: Load test druid backend via siege - https://phabricator.wikimedia.org/T182603#3828484 (10Nuria) https://wikitech.wikimedia.org/wiki/Analytics/Systems/AQS/Scaling/LoadTesting#Cassandra_Info [16:34:45] 10Analytics-Kanban: Load test druid backend via siege - https://phabricator.wikimedia.org/T182603#3828485 (10Nuria) [16:51:03] 10Analytics, 10Discovery, 10Discovery-Analysis, 10Discovery-Search: UDF for language detection - https://phabricator.wikimedia.org/T182352#3821234 (10fdans) How about tagging each request with their language using this library? [16:54:37] 10Analytics, 10Discovery, 10Discovery-Analysis, 10Discovery-Search: UDF for language detection - https://phabricator.wikimedia.org/T182352#3828540 (10Milimetric) The javascript bindings open an interesting possibility. We can use them to consume and tag a kafka topic, publishing to another topic. We alre... [16:55:33] 10Analytics-Cluster, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Enable more accurate smaps based RSS tracking by yarn nodemanager - https://phabricator.wikimedia.org/T182276#3828542 (10fdans) a:03elukey [17:00:23] 10Analytics-EventLogging, 10Analytics-Kanban, 10Tracking: Update client-side event validator to support (at least) draft 3 of JSON Schema - https://phabricator.wikimedia.org/T182094#3828556 (10fdans) [17:01:35] 10Analytics, 10Analytics-EventLogging: Unit tests for Event Logging - https://phabricator.wikimedia.org/T86543#3828562 (10fdans) 05Open>03Resolved a:03fdans There are unit tests in EL [17:03:40] 10Analytics-Kanban, 10Analytics-Wikimetrics, 10Software-Licensing: Add a license file to wikimetrics - https://phabricator.wikimedia.org/T60753#3828566 (10fdans) a:03Milimetric [17:05:48] o/ nuria_ [17:06:07] I have a member of Wikimedia Austria who wants to work on some statistics of readership [17:06:13] Would need to do the whole NDA dance. [17:06:26] The difficulty is that she doesn't have much experience with data analysis. [17:06:53] Does analytics have capacity to take someone on like this? Or maybe should she file tasks for the analysis she is looking for? [17:06:59] sorry on meeting [17:07:08] No worries. Can wait until later :) [17:07:13] sorry halfak we are 150% with legal work [17:07:19] and more to come [17:07:31] halfak: it will be best to file a task [17:07:33] Understood. I'm looking for some sort of official response either way. [17:07:40] halfak: there are custom datasets (1-offs) [17:07:45] that might be of help [17:07:55] OK will ask her to describe the data she is looking for.. [17:07:57] Thanks! [17:09:00] 10Analytics-EventLogging, 10Analytics-Kanban, 10Tracking: Use draft 4 of JSON Schema specification - https://phabricator.wikimedia.org/T46809#3828582 (10fdans) [17:12:52] halfak: phabricator task would be best [17:12:55] Could anyone point me to a one-off data requests that I could use as a reference? [17:13:02] (ec) [17:14:47] Maybe https://phabricator.wikimedia.org/T143819 ? [17:15:31] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-API, 10Easy, and 2 others: ApiJsonSchema implements ApiBase::getCustomPrinter for no good reason - https://phabricator.wikimedia.org/T91454#3828608 (10fdans) @Anomie this task is pretty old, is this still going on? [17:28:11] 10Analytics, 10EventBus, 10MediaWiki-API, 10MediaWiki-JobQueue, and 3 others: Handling of structured data input in MediaWiki APIs - https://phabricator.wikimedia.org/T182475#3824471 (10Anomie) > {T56035}: batching thumbnail URL fetches would require submitting filename/height/params triplets. There's no n... [17:31:58] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-API, 10Easy, and 2 others: ApiJsonSchema implements ApiBase::getCustomPrinter for no good reason - https://phabricator.wikimedia.org/T91454#3828654 (10Anomie) The problem seems to be that it has to be very carefully deployed, and the people who can do that... [17:32:14] halfak: ya, give me a sec [17:32:39] halfak: https://phabricator.wikimedia.org/T144714 [17:32:44] milimetric, fdans, nuria_ : doc for review - https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats/Data_Quality [17:33:05] Gone for diner [17:33:21] joal: reading [17:34:46] 10Analytics, 10Analytics-EventLogging, 10MediaWiki-API, 10Easy, and 2 others: ApiJsonSchema implements ApiBase::getCustomPrinter for no good reason - https://phabricator.wikimedia.org/T91454#3828666 (10Anomie) Or the patch at https://gerrit.wikimedia.org/r/#/c/264494/ could be split into two parts: one to... [17:36:44] 10Analytics, 10Research: Make HTML dumps available - https://phabricator.wikimedia.org/T182351#3828671 (10leila) @bd808 thanks for the note. I'll take this offline to discuss who I should reach out to as it's a bit unclear now. :) [17:41:15] Hallo. Is this supposed to work on terbium? [17:41:19] mysql -pblablabla -u research_prod -h db1047.eqiad.wmnet -vvv -e "select blablabla" [17:41:32] (replace "blalblabla" with real password and query) [17:41:48] aharoni: db1047 is long gone, db1108 is the new one (analytics-slave.eqiad.wmnet) [17:44:26] elukey: thanks, seems to work. Just curious: how long is "long"? I'm pretty sure this command worked last week. [17:45:40] aharoni: I'd say two weeks ago but I am not 100% sure [17:45:58] :) [18:01:07] 10Analytics, 10Research: Make HTML dumps available - https://phabricator.wikimedia.org/T182351#3828761 (10leila) @ArielGlenn actually, I overlooked this. Are you the person in charge of this kind of request? :) [18:08:01] 10Analytics, 10Research: Make HTML dumps available - https://phabricator.wikimedia.org/T182351#3828787 (10ArielGlenn) This is meant to be (I think) the dumps of content from RESTBase as stored by parsoid. I know I can't get to this before at least February and maybe later than that. I have a script that was... [18:12:50] (03CR) 10Catrope: "@Milimetric, yes that would be great" [analytics/limn-flow-data] - 10https://gerrit.wikimedia.org/r/396566 (owner: 10Catrope) [18:13:49] k, RoanKattouw, feel free to self-merge that, the sql looks valid and I don't know enough about the logic to comment [18:14:05] OK will do [18:14:07] and when you're done, I can back up the old stuff and re-run with the new logic [18:14:30] (03CR) 10Catrope: [C: 032] "Self-merging because Milimetric told me to" [analytics/limn-flow-data] - 10https://gerrit.wikimedia.org/r/396566 (owner: 10Catrope) [18:14:58] (03CR) 10Milimetric: [V: 032] Fix definition of "active board", "active topic" [analytics/limn-flow-data] - 10https://gerrit.wikimedia.org/r/396566 (owner: 10Catrope) [18:15:14] hah thanks sorry, I'm not used to repos where humans are allowed to V+2 [18:15:19] yeah, I figured [18:49:34] Hey fdans - any comment for me? [19:14:46] milimetric: Let me know when the reports have (re)generated, because I also need to edit the meta page to expose the new report that I made [19:15:04] Is this something you have to do manually, or do we just wait for a cronjob to run? [19:16:05] RoanKattouw: I have to kick it off and then we wait for it to run (might take a while, not sure how fast that sql is) [19:16:11] OK [19:16:44] No rush at all, just wanna know when my PM and I should expect to have the new data [19:20:16] 10Analytics, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 4 others: Migrate htmlCacheUpdate job to Kafka - https://phabricator.wikimedia.org/T182023#3829073 (10mobrovac) >>! In T182023#3827651, @mobrovac wrote: > Switching all small non-WP projects should be a non-brainer, so I'd vote for switching... [19:22:56] * elukey off! [19:33:16] ok, RoanKattouw, the jobs are running, and I saved your old outputs to a folder called rerun-backup, so you can examine differences between the two outputs like this: [19:33:19] https://www.irccloud.com/pastebin/ZLgvnNFH/ [19:34:23] RoanKattouw: I'll let you know when the job is done, seems to be relatively quick [19:46:23] k, RoanKattouw, all done, all data is fresh, but it may still take a while for it to rsync to the public folder (no more than an hour though) [19:46:54] you can see it and the differences on stat1006, and the new created-topics report is finished too [19:58:20] milimetric: added the last part "Commons and wikidata" to the doc about data quality [19:59:10] nuria_: If you have a minute, will you merge the clickstream patch ? It could a double-christmas-rainbow between wks2 and clickstream :) [19:59:17] cool, thanks joal, I am kind of overloaded with things right now, may not get to it today [19:59:25] no prob [19:59:29] Thanks milimetric :) [20:09:10] Thanks! Will take a look [20:09:29] 10Analytics-Kanban, 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Get access to geowiki data - https://phabricator.wikimedia.org/T182027#3829238 (10Ottomata) @tbayer you should be able to sudo -u stats on stat1006 now, which will let you view `sudo -u stats ls /srv/geowiki`, etc. [20:22:45] milimetric: The data looks like I expected, it's on the dashboard already, and my edit to the meta config page to add created-topics worked immediately. Thanks for all your help~ [20:22:46] * ! [20:26:16] cool, np [20:28:54] 10Analytics-Kanban, 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Get access to geowiki data - https://phabricator.wikimedia.org/T182027#3829266 (10Tbayer) 05Open>03Resolved It works, thanks! [20:37:22] halfak: just curious, what kind of readership stats? (perhaps we can point her to existing analysis work) [20:37:57] -._o_.- [20:44:45] (03PS1) 10Jforrester: Revert "Adding renamed tables to sql union statements" [analytics/limn-edit-data] - 10https://gerrit.wikimedia.org/r/397625 [20:44:47] (03PS1) 10Jforrester: Update for new Edit schema version 17520312 [analytics/limn-edit-data] - 10https://gerrit.wikimedia.org/r/397626 [20:44:49] (03PS1) 10Jforrester: Drop support for the Edit_11448630 schema revision [analytics/limn-edit-data] - 10https://gerrit.wikimedia.org/r/397627 [20:44:51] (03PS1) 10Jforrester: Swap archaïc code to use current tables [analytics/limn-edit-data] - 10https://gerrit.wikimedia.org/r/397628 [21:00:15] joal: yes, looking at it now [21:05:08] gotta go to the doctor, will be working more later [21:08:14] 10Analytics, 10Phabricator: Create phabricator space for tickets with legal restrictions - https://phabricator.wikimedia.org/T174675#3829340 (10Nuria) We could use this one @Aklapper for upcoming work with NSA lawsuit, could we get it done ? (cc @Ottomata ) [21:13:50] 10Analytics: eqiad: hadoop expansion - FY 2017 / 2018 - https://phabricator.wikimedia.org/T182628#3829350 (10Ottomata) [21:14:04] 10Analytics, 10Analytics-Cluster, 10Operations, 10hardware-requests: eqiad: hadoop expansion - FY 2017 / 2018 - https://phabricator.wikimedia.org/T182628#3829362 (10Ottomata) [21:14:48] 10Analytics, 10Analytics-Cluster, 10Operations, 10hardware-requests: eqiad (8): hadoop expansion - FY 2017 / 2018 - https://phabricator.wikimedia.org/T182628#3829369 (10Ottomata) [21:15:02] 10Analytics, 10Analytics-Cluster, 10Operations, 10hardware-requests: eqiad: (8) Hadoop expansion - FY 2017 / 2018 - https://phabricator.wikimedia.org/T182628#3829350 (10Ottomata) [21:15:31] 10Analytics, 10Analytics-Cluster, 10Operations, 10hardware-requests: eqiad: (8) Hadoop expansion - FY 2017 / 2018 - https://phabricator.wikimedia.org/T182628#3829350 (10Ottomata) [21:32:55] 10Analytics, 10Phabricator: Create phabricator space for tickets with legal restrictions - https://phabricator.wikimedia.org/T174675#3829409 (10Nuria) Also ping @ggellerman [21:34:48] 10Analytics, 10Phabricator: Create phabricator space for tickets with legal restrictions - https://phabricator.wikimedia.org/T174675#3829415 (10Aklapper) a:03Aklapper [21:35:00] 10Analytics, 10Phabricator: Create phabricator space for tickets with legal restrictions - https://phabricator.wikimedia.org/T174675#3569915 (10Aklapper) 05stalled>03Open p:05Triage>03Normal [21:59:24] 10Analytics-Kanban, 10Patch-For-Review: Productionize Superset - https://phabricator.wikimedia.org/T166689#3829518 (10Ottomata) Ok, things are pretty decent over at superset.wikimedia.org. Still to do: - Update superset/flask-appbuilder once we get new releases from them. This will allow auto-account creati... [22:00:09] 10Analytics-Kanban, 10Patch-For-Review: Productionize Superset - https://phabricator.wikimedia.org/T166689#3829520 (10Ottomata) Oh, btw, I added DB connections for analytics-slave / log db, and analytics-store, with passwords stored in puppet, not in superset meta db. You can now query those MySQL databases f... [22:13:17] Hello! I was trying to sqoop image, categorylinks, page_props tables from commonswiki, and I was `sudo -u hdfs sqoop list-tables --password-file '/user/hdfs/mysql-analytics-research-client-pw.txt' --username research --connect jdbc:mysql://analytics-store.eqiad.wmnet/log` following https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Administration#Dumping_data_via_sqoop_from_eventlogging_to_hdfs [22:14:19] But my account doesn't seem to be able to connect -- I was asked for password when `sudo` and my LDAP pw doesn't work [22:40:28] 10Analytics, 10Phabricator: Create phabricator space for tickets with legal restrictions - https://phabricator.wikimedia.org/T174675#3829613 (10ggellerman) @Nuria Do you want me to be added to the space or to make sure the ACL gets set up? Thanks!