[00:15:26] 10Quarry, 10Patch-For-Review: Excel does not recognize Quarry CSV output as UTF-8 - https://phabricator.wikimedia.org/T76126#3484849 (10IKhitron) @zhuyifei1999, thank you very much! I have not excel today, so I asked somebody to check. He is absolutely happy. [00:21:10] 10Quarry: Quarry's indentation function is not completely functional - https://phabricator.wikimedia.org/T101424#1338373 (10zhuyifei1999) https://github.com/codemirror/CodeMirror/issues/1942 suggests to use `CodeMirror.extendMode("sql", {electricChars: ")"});` [00:31:23] 10Quarry: Add an option to export result in Wikilist - https://phabricator.wikimedia.org/T137268#3484860 (10zhuyifei1999) >>! In T137268#2808805, @Dvorapa wrote: > I know this method, but it is super complicated to do it this way, then export results e.g. in excel format, then get everything from excel format in... [00:31:36] 10Quarry: Add an option to export result in Wikilist - https://phabricator.wikimedia.org/T137268#3484861 (10zhuyifei1999) p:05Triage>03Low [00:37:50] 10Quarry: Quarry runs thousands times slower in last months - https://phabricator.wikimedia.org/T160188#3091603 (10zhuyifei1999) This task is essentially "Invalid" unless some clear steps-to-reproduce are provided. [00:40:06] 10Quarry: Investigate redash.io (open source query and report system) - https://phabricator.wikimedia.org/T131651#3484867 (10zhuyifei1999) [00:40:08] 10Quarry, 10Cloud-Services: Consider moving Quarry to be an installation of Redash - https://phabricator.wikimedia.org/T169452#3484870 (10zhuyifei1999) [00:48:53] 10Quarry: Quarry runs thousands times slower in last months - https://phabricator.wikimedia.org/T160188#3484872 (10IKhitron) How can I do this? I know that it happens, but can't bring you the same run before and after without a time machine. [00:56:21] 10Quarry: Quarry's indentation function is not completely functional - https://phabricator.wikimedia.org/T101424#3484877 (10Huji) 05Open>03Resolved a:03Huji [01:00:58] 10Quarry: Quarry should store the query runtime along with the results - https://phabricator.wikimedia.org/T172082#3484883 (10Huji) [01:04:02] 10Quarry: Quarry should store the query runtime along with the results - https://phabricator.wikimedia.org/T172082#3484904 (10zhuyifei1999) [01:04:04] 10Quarry: Include query execution time - https://phabricator.wikimedia.org/T126888#3484907 (10zhuyifei1999) [01:07:27] 10Quarry: Include query execution time - https://phabricator.wikimedia.org/T126888#3484909 (10Huji) [01:09:19] 10Quarry: Quarry's indentation function is not completely functional - https://phabricator.wikimedia.org/T101424#3484910 (10Huji) 05Resolved>03Open @zhuyifei1999 unsure if it should take effect in minutes or not, but I just checked now and it didn't solve the problem on https://quarry.wmflabs.org/ yet. Re-op... [01:24:34] 10Quarry: Some querries cannot be 'unstarred' - https://phabricator.wikimedia.org/T165169#3258930 (10zhuyifei1999) Issues: * The table `star` allows duplicates * On duplicate, [[https://github.com/wikimedia/analytics-quarry-web/blob/7dd8c60973fd03692491877fea2bec8f9acb2987/quarry/web/app.py#L159|the button will... [01:51:06] 10Quarry: Quarry's indentation function is not completely functional - https://phabricator.wikimedia.org/T101424#3484937 (10zhuyifei1999) Works for me. Have you cleared your browser cache? [01:58:32] 10Quarry: Quarry's indentation function is not completely functional - https://phabricator.wikimedia.org/T101424#3484938 (10zhuyifei1999) My test is: press `(`, enter, `)` [03:07:04] 10Quarry: Some long queries give no results - https://phabricator.wikimedia.org/T109016#1537760 (10zhuyifei1999) Probably because of result loading takes time. [03:51:24] 10Quarry: Quarry query in unknown state - https://phabricator.wikimedia.org/T170464#3432141 (10zhuyifei1999) The logs are unfortunately lost :( [09:44:35] 10Quarry: Gigantic query results cause a SIGKILL and the query status do not update - https://phabricator.wikimedia.org/T172086#3485418 (10zhuyifei1999) Oops, sorry forgot to add the tags [10:24:05] 10Quarry: Gigantic query results cause a SIGKILL and the query status do not update - https://phabricator.wikimedia.org/T172086#3485554 (10zhuyifei1999) Recent OOMs: ``` zhuyifei1999@quarry-runner-01:~$ zcat /var/log/messages*.gz | cat /var/log/messages* - | grep oom | grep python2.7 Jul 21 05:11:05 quarry-runne... [10:28:07] hmm E_TOOMANYCHANNELS at least until I get bouncer set up properly... [10:32:20] 10Quarry: Gigantic query results cause a SIGKILL and the query status do not update - https://phabricator.wikimedia.org/T172086#3485563 (10zhuyifei1999) Uh, ``` zhuyifei1999@quarry-runner-01:~$ zcat /var/log/messages*.gz | cat /var/log/messages* - | grep oom | grep invoked Jul 27 12:34:04 quarry-runner-01 kernel... [10:36:17] 10Quarry: Quarry query in unknown state - https://phabricator.wikimedia.org/T170464#3485566 (10zhuyifei1999) Probably not an OOM as in T172086. ``` MariaDB [quarry]> select * from query where id = 18832; +-------+---------+-------------------+---------------+---------------------+-----------+-----------+--------... [10:41:03] 10Quarry: Quarry query in unknown state - https://phabricator.wikimedia.org/T170464#3485572 (10zhuyifei1999) Ok, I forked the query to https://quarry.wmflabs.org/query/20623 and reproduced the issue: ``` zhuyifei1999@quarry-runner-01:~$ grep 33b3c878-4f00-46a1-ad8c-a39ef615872f /var/log/syslog -B 1 -A 11 Jul 31... [10:43:23] 10Quarry: Quarry cannot store results with `table_a.column_name` and `table_b.column_name` in the same result - https://phabricator.wikimedia.org/T170464#3485577 (10zhuyifei1999) [10:51:14] 10Quarry: Quarry cannot store results with `table_a.column_name` and `table_b.column_name` in the same result - https://phabricator.wikimedia.org/T170464#3485627 (10zhuyifei1999) a:05awight>03zhuyifei1999 Reproduced on vagrant with: ```lang=sql SELECT query.id, query_revision.id FROM query, query_revision WH... [11:08:34] 10Quarry: Quarry cannot store results with `table_a.column_name` and `table_b.column_name` in the same result - https://phabricator.wikimedia.org/T170464#3485676 (10zhuyifei1999) ``` MariaDB [quarry]> SELECT query.id, query_revision.id -> FROM query, query_revision -> WHERE query.id = 1 -> AND query_... [11:26:13] 10Quarry: Quarry cannot store results with identical column names - https://phabricator.wikimedia.org/T170464#3485732 (10zhuyifei1999) [11:44:46] 10Quarry: Quarry cannot store results with identical column names - https://phabricator.wikimedia.org/T170464#3485792 (10zhuyifei1999) a:05zhuyifei1999>03None Some background for anyone who can cleanly resolve this: Results are stored in SQLite tables. [12:40:33] 10Quarry: Quarry's indentation function is not completely functional - https://phabricator.wikimedia.org/T101424#3485978 (10Huji) 05Open>03Resolved It was a cache issue. Good catch! (No pun intended). [12:41:57] 10Quarry: Make a Quarry automatically refresh on a set time interval - https://phabricator.wikimedia.org/T141698#3485985 (10zhuyifei1999) [12:42:00] 10Quarry: Recurring queries - https://phabricator.wikimedia.org/T101835#3485988 (10zhuyifei1999) [13:06:22] tizianop! [13:06:26] I have data for you :) [13:06:44] \o/ [13:07:06] thank you! [13:07:17] No problem :) So, it looks like /srv directories don't get sync'd between the stat machines. [13:07:40] I have the data on stat1006. Would that work for you or should I copy it to 1005? [13:08:53] I never used the internal infrastructure, let me check if I can connect [13:12:03] halfak how big is the dataset? [13:12:31] 601M on top of the 3.5G of the old dataset [13:12:41] Mergining them will give you an up to date dataset [13:12:46] so ~4GB [13:13:25] I just asked, it would be better on 1005 [13:14:25] OK. Working on it. [13:14:38] We really need a better way to xfer data between the machines. [13:15:15] In the meantime, you'll probably want to wget https://analytics.wikimedia.org/datasets/archive/public-datasets/all/wp10/20160801/enwiki-20160801.monthly_scores.tsv.bz2 to wherever you want it. [13:15:34] I'll get the patch dataset in a place you can pull from. [13:16:16] One note, is that the 20160801 will contain articles that have been deleted since then. However, all new articles (even the ones that were undeleted) will show up in the patch dataset. [13:18:29] actually, could you maybe copy the new data to a publicly available server? we realized that copying data from WMF to EPFL might give us VPN problems... [13:18:47] That URL is publicly available. [13:18:53] The next one will also be :) [13:19:00] ok great [13:19:33] so you're copying the new data to the same kind of URL now as the old one? [13:20:33] Yup [13:20:56] Not sure how long it will take the rsync to fire. [13:21:00] great, thanks! [13:21:01] It's on autopilot [13:21:10] You'll see the dataset here: https://analytics.wikimedia.org/datasets/archive/public-datasets/all/wp10/20170701/ [13:22:15] The rows you want will have timestamp of "20170201000000", I guess. [13:23:45] ok [13:35:32] 10Quarry: Include query execution time - https://phabricator.wikimedia.org/T126888#3486103 (10zhuyifei1999) See also T77941 [13:43:56] 10Quarry: Add SHOW EXPLAIN support to Quarry - https://phabricator.wikimedia.org/T146483#3486149 (10zhuyifei1999) a:03zhuyifei1999 I'll try the button implementation, and show the button when the status is "running", and not store the results. Sometimes when you show explain too early the query plans of subque... [14:04:54] tizianop, https://analytics.wikimedia.org/datasets/archive/public-datasets/all/wp10/20170701/enwiki-20160801-20170701.monthly_scores.tsv.bz2 [14:04:59] Looks like that is available now. [14:05:57] perfect, thanks! [14:45:08] halfak: thanks for the quality scores. [14:45:16] hth! [14:45:34] I'll have that figshare entry updated shortly so you can cite that. [15:18:10] leila & tizianop, BTW, https://figshare.com/articles/Monthly_Wikipedia_article_quality_predictions/3859800 has been updated with the new data. [15:18:16] Note the new DOI for the data is https://doi.org/10.6084/m9.figshare.3859800.v4 [15:32:41] halfak: DOI acknowledged. [15:47:37] halfak: do you have word count (some measure of article length) as part of your features? (I can't spot it in the data, but I'm guessing you have this data somewhere) [15:48:13] https://ores.wikimedia.org/v3/scores/enwiki/781154471/wp10?features [15:48:17] yes [15:52:22] I see, content_chars, halfak? [15:52:38] and do you have this dataset sitting somewhere, or shall with hit the API? :) [15:53:04] Oh... Um.. If you do a lot of requests for this data, then you'll probably knock ORES over. [15:53:12] When you request features, you skip the cache [15:53:20] I see. [15:53:23] let's not do that. ;) [15:53:25] How many revisions do you want? [15:53:39] A good proxy for this is the rev_len field in the revision table [15:53:53] if the data is sitting somewhere, I'll just make a copy of it, halfak. [15:54:13] halfak: but that's revision length which is not the full article, right? [15:54:14] "the data"??? [15:54:32] It is the full article as of revision -- which is why it is a good proxy [15:54:35] "the data" you feed into the model/algorithm to do the prediction. [15:54:45] You want my training set? [15:54:46] ow I see, re revision. [15:55:42] well, training set is a subset of the data I'm referring to. You have a training set, and you have the complement of the training set which has the rest of the data points you algorithm will predict classes for. I'm looking for the full set perhaps. [15:55:50] rev_len may well do it though. [15:56:12] full set == all revisions [15:56:16] right. [15:56:45] (and I'm assuming all count(revisions) == count(articles)) [16:08:12] HaeB: question about webrequest logs. do we register section requests as part of webrequest logs? (from mobile or desktop platforms) [16:36:00] 10Quarry, 10Patch-For-Review: Add SHOW EXPLAIN support to Quarry - https://phabricator.wikimedia.org/T146483#3486857 (10zhuyifei1999) 05Open>03Resolved Should be fixed. After cleaning up browser cache there should be an "Explain" button just after "This query is currently executing...". [16:37:53] anwered in #wikimedia-analytics (see also https://meta.wikimedia.org/wiki/Research:Which_parts_of_an_article_do_readers_read) [16:57:51] leila, sec. I'll see what will work for you. [16:57:55] what lang(s)? [18:50:39] 10Quarry: Weird race condition makes query stuck in queued forever - https://phabricator.wikimedia.org/T172143#3487252 (10zhuyifei1999) [19:06:28] 10Quarry, 10Data-Services: Long-running Quarry query (querry?) produces strangely incorrect results - https://phabricator.wikimedia.org/T135087#3487373 (10zhuyifei1999) [19:47:20] OMG OMG OMG [19:47:26] You can "explain" inside of quarry now [19:47:28] OMG [19:47:29] OMG [19:47:32] :D!!!! [19:48:14] I'm filled with surprise and delight. [19:57:05] halfak: that was for en. (sorry, had to run around and just saw your message) [19:57:39] No worries. I think that the page table has a length field that will cover the most recent revision. [19:58:03] got it, halfak. /me looks into the page table. [19:58:05] https://quarry.wmflabs.org/query/20643 [19:58:20] page_latest == most recent revision_id [19:58:31] page_len == byte length of the most recent revision [19:58:49] halfak: is page_len excluding templates? [19:58:50] IN the case of this example, the length is 180KB [19:58:55] It is. [19:59:03] ok, great. thanks, halfak [19:59:20] It doesn't account for encoding at all. Just length in bytes. If you're comparing within-wiki, it's a great proxy for word and char length [19:59:41] yeah, that's what we're doing for now. great. [19:59:44] But across wiki gets weird because ASCII-range chars are mostly one byte and non-ASCII gets into multi-byte chars fast [19:59:49] kk awesome [20:31:45] I have a bunch of Wikipedia edits. What's the easiest way to find out how many of them were reverted? [20:53:52] FWIW, the channel topic has a bad URL to the IRC logs, on Safari it needs to use URLencoding: https://wm-bot.wmflabs.org/logs/%23wikimedia-research/ [20:57:04] hehe [20:57:28] thanks awight [20:57:36] likewise! [20:58:10] o/ AbbeyRipstra [20:58:18] Would it be OK if we skipped the 1:1 today? [20:58:25] I have some work I really want to push on