纪念币预约辅助脚本

发表于 2019-12-21 分类于实用技巧与工具

Description

CommemorativeCoin-Assistant is a repository of some assistants helping subscribe commemorative coins through banks’ websites. Those script can help speed up the subscribing process, especially can automatically fill up some necessary information, including name, phone number and ID card number.

Files

├── BankofChina.js: script for Bank of China (AKA: BOC)
├── ChinaConstructionBank.js: script for China Construction Bank (AKA: CCB)
└── README.md: specialization

Environment

Google Chrome or Firefox
TamperMonkey (for running on CCB’s website)

HOW-TO

CCB

Edit ChinaConstructionBank.js, replace example info to subscribers’ information and modify the exchange date.
Create a script in TamperMonkey and paste the edited copy to it.
Browse the CCB’s subscription website and the script will run .
You can click the name on the right and his/her credential number and mobile number will be inserted to the corresponding widgets automatically.

BOC

Edit BankofChina.js, replace example info to subscribers’ information, and then arrange all the codes in one line only.
Open Chrome or Firefox, create a bookmark(page) on the bookmarks bar, place the one-lined script to URL(Chrome) or Location(Firefox) and save.
Browse the BOC’s subscription website and click the bookmark button on the bookmarks bar. And …
You can click the name on the right and his/her credential number and mobile number will be inserted to the corresponding widgets automatically.

Attention: According to the risk control rules, you’d better not to use ONE phone number for subscriptions more than 5 times.

Bash-Oneliner：超实用的单行Bash命令技巧汇总

发表于 2019-10-18 分类于实用技巧与工具

Bash-Oneliner：超实用的单行Bash命令技巧汇总

Repo link: Bash-Oneliner via Bonnie I-Man Ng

支持GitHub同步的Markdown网页版笔记应用

发表于 2019-10-18 分类于实用技巧与工具

Repo link: takenote via taniarascia

【takenote：支持GitHub同步的Markdown网页版笔记应用】

追一科技首届中文NL2SQL挑战赛决赛第3名方案+代码

发表于 2019-10-18 分类于竞赛解决方案，自然语言处理

Repo link: tianchi_nl2sql via beader

利用油猴批量删除微博

发表于 2019-10-14 更新于 2019-10-18 分类于实用技巧与工具

由于微博并未开放批量删除微博的功能，故利用油猴再个人微博页面上执行简单的逐条删除页面中的微博的脚本。即将下列代码配置到油猴插件中。

'use strict';

var s = document.createElement("script");
s.setAttribute("src","https://lib.sinaapp.com/js/jquery/2.0.3/jquery-2.0.3.min.js");
s.onload = function(){
  //微博一页手动刷新默认加载上线40条微博
  for(var i = 0; i <= 40; i++){
    //800ms执行一次，再小的话出错率大大增加，甚至报错
    setTimeout(function(){
      $('a[action-type="fl_menu"]')[0].click();
      $('a[title="删除此条微博"]')[0].click();
      $('a[action-type="ok"]')[0].click();
    }, 800*i); 
  }
}
document.head.appendChild(s);

编辑过程中应注意配置正确的匹配URL，即 @match 字段，应与个人微博页面的URL保持一致。脚本编辑完成后使用 ctrl + s 保存。

然后再油猴的管理面版将该脚本的状态切换为启用。然后回到个人微博页面刷新页面即可开始执行批量删除操作。欲停止执行，可关闭该页面或停用油猴再重新加载页面即可。

用NumPy从头构建LSTM

发表于 2019-10-11 更新于 2019-10-14 分类于开源实现，自然语言处理

Repo link: rnn_lstm_from_scratch via Nicklas Hansen

How to build RNNs and LSTMs from scratch

Originally developed by me (Nicklas Hansen), Peter Christensen and Alexander Johansen as educational material for the graduate deep learning course at the Technical University of Denmark (DTU). You can access the full course material here.

In this lab we will introduce different ways of learning from sequential data.
As an example, we will train a neural network to do language modelling, i.e. predict the next token in a sentence. In the context of natural language processing a token could be a character or a word, but mind you that the concepts introduced here apply to all kinds of sequential data, such as e.g. protein sequences, weather measurements, audio signals or monetary transaction history, just to name a few.

To really get a grasp of what is going on inside the recurrent neural networks that we are about to teach you, we will carry out a substantial part of this exercise in NumPy rather than PyTorch. Once you get a hold of it, we will proceed to the PyTorch implementation.

In this notebook we will show you:

How to represent categorical variables in networks
How to build a recurrent neural network (RNN) from scratch
How to build a LSTM network from scratch
How to build a LSTM network in PyTorch

Dataset

For this exercise we will create a simple dataset that we can learn from. We generate sequences of the form:

a a a a b b b b EOS, a a b b EOS, a a a a a b b b b b EOS

where EOS is a special character denoting the end of a sequence. The task is to predict the next token $tn$, i.e. a, b, EOS or the unknown token UNK given the sequence of tokens ${ t{1}, t{2}, \dots , t{n-1}}$ and we are to process sequences in a sequential manner. As such, the network will need to learn that e.g. 5 bs and an EOS token will occur following 5 as.

BiLSTM+CRF-命名实体识别-PyTorch

发表于 2019-10-11 更新于 2019-10-14 分类于开源实现，自然语言处理

Repo link: NER_pytorch via keep-steady

NER_pytorch

Named Entity Recognition on CoNLL dataset using BiLSTM+CRF implemented with Pytorch

paper

Neural Architectures for Named Entity Recognition
End-to-End Sequence labeling via BLSTM-CNN-CRF

code

https://github.com/ZhixiuYe/NER-pytorch

This code is customized so that i use latest Pytorch version(1.1.0) starting with https://github.com/ZhixiuYe/NER-pytorch

To use jupyter notebook to visualize the result, i transform ~.py into .ipynb

The f1 score performane of test CoNLL data is 91.3%

Conll performance

f1 91.3%

0. prepare data

To get pre-trained word embedding vector Glove

run prepare_data.ipynb

1. train

150 epoch is enough, 24h with oneP100 GPU, 51 epoch has best f1 score, i use visdom

model shape

1) word embedding with Glove(100d) + charactor embedding with CNN(25d)
2) BiLSTM 1 layer + Highway
3) Linear 400d -> 19d with tanh

BiLSTM_CRF(
          (char_embeds): Embedding(85, 25)
          (char_cnn3): Conv2d(1, 25, kernel_size=(3, 25), stride=(1, 1), padding=(2, 0))
          (word_embeds): Embedding(400176, 100)
          (dropout): Dropout(p=0.5)
          (lstm): LSTM(125, 200, bidirectional=True)
          (hw_trans): Linear(in_features=25, out_features=25, bias=True)
          (hw_gate): Linear(in_features=25, out_features=25, bias=True)
          (h2_h1): Linear(in_features=400, out_features=200, bias=True)
          (tanh): Tanh()
          (hidden2tag): Linear(in_features=400, out_features=19, bias=True)
)

run 1. train.ipynb

2. evaluation

run 2. evaluation.ipynb

data

https://www.clips.uantwerpen.be/conll2003/ner/

The CoNLL-2003 shared task data files contain four columns separated by a single space. Each word has been put on a separate line and there is an empty line after each sentence. The first item on each line is a word, the second a part-of-speech (POS) tag, the third a syntactic chunk tag and the fourth the named entity tag. The chunk tags and the named entity tags have the format I-TYPE which means that the word is inside a phrase of type TYPE. Only if two phrases of the same type immediately follow each other, the first word of the second phrase will have tag B-TYPE to show that it starts a new phrase. A word with tag O is not part of a phrase. Here is an example:

    word     | POS | Syntatic chunk tag | named entity tag
    U.N.       NNP   I-NP                 I-ORG 
    official   NN    I-NP                 O 
    Ekeus      NNP   I-NP                 I-PER 
    heads      VBZ   I-VP                 O 
    for        IN    I-PP                 O 
    Baghdad    NNP   I-NP                 I-LOC 
    .          .     O                    O