-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
291 lines (275 loc) · 11.6 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
<!DOCTYPE HTML>
<!--
Stellar by HTML5 UP
html5up.net | @ajlkn
Free for personal and commercial use under the CCA 3.0 license (html5up.net/license)
-->
<html>
<head>
<title>Fintech Key-Phrase (albert-jin)</title>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no" />
<link rel="stylesheet" href="assets/css/main.css" />
<link rel="stylesheet" href="assets/css/jquery.toast.css" />
<noscript><link rel="stylesheet" href="assets/css/noscript.css" /></noscript>
<!-- <link rel="stylesheet" href="https://unpkg.com/element-ui/lib/theme-chalk/index.css">-->
<!--<script src="https://unpkg.com/element-ui/lib/index.js"></script>-->
</head>
<body class="is-preload">
<!-- Wrapper -->
<div id="wrapper">
<!--https://github.com/albert-jin-->
<!-- Header -->
<header id="header" class="alt">
<span class="logo"><img style="height: 200px;width: 400px" src="images/cow.svg" alt="can't load the image!" /></span>
<h1>Fintech Key-Phrase</h1>
<p>a Chinese Financial & High-tech Dataset Accelerating the domain-specific Information Retrieval task<br />
built by <a href="https://orcid.org/0000-0002-3651-0702">@biaozhao</a> & <a href="https://orcid.org/0000-0002-6656-6061">@weiqiangjin<br />
<a href="https://ror.org/017zhmm22">XI'AN JIAO TONG UNIVERSITY</a>, <a href="http://en.xjtu.edu.cn/">(XJTU)</a>
<br /> </a> Github: <a href="https://github.com/albert-jin/Fintech-Key-Phrase">https://github.com/albert-jin/Fintech-Key-Phrase</a>.</p>
</header>
<!-- Nav -->
<nav id="nav">
<ul>
<li><a href="#intro" class="active">Introduction</a></li>
<li><a href="#first">Model Prediction</a></li>
<li><a href="#second">Application Interface</a></li>
<li><a href="#cta">Summary</a></li>
</ul>
</nav>
<!-- Main -->
<div id="main">
<!-- Introduction -->
<section id="intro" class="main">
<div class="spotlight">
<div class="content">
<header class="major">
<h2>Fintech Key-Phrase</h2>
</header>
<p>We propose a new dataset, Chinese Financial & High-tech Dataset (<b>Fintech Key-Phrase</b>) in Information Retrieval, which is derived from the publicly released Chinese Management’s Discussion and Analysis (CMD&A). To the best of our knowledge, together with more than 1.2K human-annotated instances, <b>Fintech Key-Phrase</b> is the largest also reliable Chinese benchmark for the Expression-level Information Extraction task.</p>
<span ><img style="height: 628px;width: 1266px" src="images/dataset_screenshot.PNG" alt="" /></span>
<header class="major">
<h2>DataSet Statistics</h2>
</header>
<p>The High-tech CMD&A annual reports we collect are total more than 16,600 documents, and the documents have recorded up to 2692 different companies' annual business reports. The high-technology CMD&A documents contains about 11171 words in average, the maximum length and the minimum length in the documents is 115 and 32006, respectively.</p>
<p>The below Figure describes the statistics of the document lengths and the document released time in different interval.</p>
<span ><img style="height: 465px;width: 830px" src="images/corpus_stats.PNG" alt="" /></span>
<header class="major">
<h2>Train & Test Set Split</h2>
</header>
<p>The training set we split contains annotated more than 35,884 Fin-tech domain key-phrases which contains 11,434 different key-phrases after removing duplicated phrases, and the test set contains more than 1769 Fin-tech key-phrases which contains 1,439 different key-phrases after removing duplicated phrases.</p>
<p>The Figure below statistics the key-phrases' counts of different length segment intervals.</p>
<span ><img style="height: 465px;width: 830px" src="images/phrase_len_stats.PNG" alt="" /></span>
<p>From the statistics, we can obviously observe that the majority of Fin-tech domain key-phrases are scattered in the length range from 1 to 6, within a smart part of key-phrases whose length is more than 7. This observation indicate that, generally, the key-phrases in which the financial experts are interested is short and simply.</p>
<ul class="actions">
<li><a href="intro_details.html" class="button">Learn More</a></li>
</ul>
</div>
</div>
</section>
<!-- First Section -->
<section id="first" class="main special">
<header class="major">
<h2>Real-time Model Prediction</h2>
</header>
<ul class="features">
<li id="li1">
<span class="icon solid major style4 fa-code"></span>
<h3>BERT-Linear</h3>
</li>
<li id="li2">
<span class="icon major style5 fa-copy"></span>
<h3>BERT-CRF</h3>
</li>
<li id="li3">
<span class="icon major style3 fa-gem"></span>
<h3>BERT-BiLSTM-CRF</h3>
</li>
<li id="li4">
<span class="icon solid major style1 fa-code"></span>
<h3>RoBERTa-Linear</h3>
</li>
<li id="li5">
<span class="icon major style2 fa-copy"></span>
<h3>RoBERTa-CRF</h3>
</li>
<li id="li6">
<span class="icon major style6 fa-gem"></span>
<h3>RoBERTa-BiLSTM-CRF</h3>
</li>
</ul>
</section>
<!-- Second Section -->
<section id="second" class="main special">
<header class="major">
<h2>Released Application Programming Interface (API)</h2>
<p>Introductions about the released API.</p>
<br/>
<span ><img style="height: 494px;width: 932px" src="images/api_intros.PNG" alt="" /></span>
<p>Hope you enjoy it !</p>
</header>
<p class="content">.</p>
<ul class="statistics">
<li class="style1">
<span class="icon solid fa-code-branch"></span>
<strong>10,0 </strong> Max Threading
</li>
<li class="style2">
<span class="icon fa-folder-open"></span>
<strong>10,0 GB</strong> Max Capacity
</li>
<li class="style3">
<span class="icon solid fa-signal"></span>
<strong>2048,0 KB</strong> Max Bandwidth
</li>
<li class="style4">
<span class="icon solid fa-laptop"></span>
<strong>10,0</strong> Max Connections
</li>
<li class="style5">
<span class="icon fa-gem"></span>
<strong>498,0</strong> PageView Counts
</li>
</ul>
</section>
<!-- Get Started -->
<section id="cta" class="main special">
<header class="major">
<h2>Conclusion and Perspective</h2>
<p style="text-align: left">1) We present a new dataset, named Chinese Financial & High-tech Based Key-Phrase (<b>Fintech Key-Phrase</b>), which can be regarded as the newest Expression-level Information Extraction benchmark in the Chinese Financial & High-tech specific domain. <br>2) We conduct comprehensive experiments by utilizing several <b>SOTA</b> approaches (Six Models which are shown in the "<b>Real-time Model Prediction</b>" part). Experiments demonstrate that our dataset can serve as solid baselines for future Information Extraction related researches. <br>3) In this website, we have released the well-trained SOTA models and corresponding APIs for extracting key-phrases in the Chinese Financial & High-tech specific domain. </p>
</header>
</section>
</div>
<!-- Footer -->
<footer id="footer">
<section>
<h2>Fintech Key-Phrase</h2>
<p>Our proposed <b>Fintech Key-Phrase</b> is a human-annotated Chinese financial and high-technology field related key-phrase dataset, which contains over more than 12K paragraphs together with the annotated domain-specific key-phrases. We hope this dataset can facilitate the scientific research and exploration in the Chinese cross-field of financial and high-technology.</p>
<ul class="actions">
<li><a href="intro_details.html" class="button">Learn More</a></li>
<li><a href="https://github.com/albert-jin" class="icon brands fa-github alt"><span class="label">GitHub</span></a></li>
</ul>
</section>
<section>
<h2>Authors Info</h2>
<dl class="alt">
<dt>Affiliation</dt>
<dd>School of Information and Communications Engineering</dd>
<dt>College</dt>
<dd>Xi`an Jiaotong University</dd>
<dt>Address</dt>
<dd>Innovation Harbor, Xi'an, Shaanxi, China</dd>
<dt>Phone</dt>
<dd>(+86) 130-4061-7148</dd>
<dt>Email</dt>
<dd><a href="#">[email protected]</a></dd>
<dt>Postcode</dt>
<dd>710049</dd>
</dl>
</section>
<p class="copyright">© Website "<b>Fintech Key-Phrase</b>". Design: <a href="https://html5up.net">HTML5 UP</a>.</p>
</footer>
</div>
<!-- Scripts -->
<script src="assets/js/jquery.min.js"></script>
<script src="assets/js/jquery.scrollex.min.js"></script>
<script src="assets/js/jquery.scrolly.min.js"></script>
<script src="assets/js/browser.min.js"></script>
<script src="assets/js/breakpoints.min.js"></script>
<script src="assets/js/util.js"></script>
<script src="assets/js/jquery.toast.js"></script>
<script src="assets/js/jquery.blockUI.min.js"></script>
<script src="assets/js/main.js"></script>
</body>
<script>
$(function (){
$('#li1').click(function (){
window.location.href='model_prediction_bl.html';
})
$('#li2').click(function (){
window.location.href='model_prediction_bc.html';
})
$('#li3').click(function (){
window.location.href='model_prediction_blc.html';
})
$('#li4').click(function (){
window.location.href='model_prediction_rbl.html';
})
$('#li5').click(function (){
window.location.href='model_prediction_rbc.html';
})
$('#li6').click(function (){
window.location.href='model_prediction_rblc.html';
})
})
</script>
<script>
var caution=false
function setCookie(name,value,expires,path,domain,secure)
{
var curCookie=name+"="+escape(value) +
((expires)?";expires="+expires.toGMTString() : "") +
((path)?"; path=" + path : "") +
((domain)? "; domain=" + domain : "") +
((secure)?";secure" : "")
if(!caution||(name + "=" + escape(value)).length <= 4000)
{
document.cookie = curCookie
}
else if(confirm("Cookie exceeds 4KB and will be cut!"))
{
document.cookie = curCookie
}
}
function getCookie(name)
{
var prefix = name + "="
var cookieStartIndex = document.cookie.indexOf(prefix)
if (cookieStartIndex == -1)
{
return null
}
var cookieEndIndex=document.cookie.indexOf(";",cookieStartIndex+prefix.length)
if(cookieEndIndex == -1)
{
cookieEndIndex = document.cookie.length
}
return unescape(document.cookie.substring(cookieStartIndex+prefix.length,cookieEndIndex))
}
function deleteCookie(name, path, domain)
{
if(getCookie(name))
{
document.cookie = name + "=" +
((path) ? "; path=" + path : "") +
((domain) ? "; domain=" + domain : "") +
"; expires=Thu, 01-Jan-70 00:00:01 GMT"
}
}
function fixDate(date)
{
var base=new Date(0)
var skew=base.getTime()
if(skew>0)
{
date.setTime(date.getTime()-skew)
}
}
var now=new Date()
fixDate(now)
now.setTime(now.getTime()+365 * 24 * 60 * 60 * 1000)
var visits = getCookie("counter")
if(!visits)
{
visits=1;
}
else
{
visits=parseInt(visits)+1;
}
setCookie("counter", visits, now)
setTimeout(()=>{
$.toast( "您是到访的第" + visits + "位用户!",{"delay": 3000})
},3000)
</script>
</html>