202303

mysql

MYSQL解决 The total number of locks exceeds the lock table size 问题
https://blog.csdn.net/dingweiye123/article/details/80994576

show variables like "%tmp%";
SET GLOBAL  tmp_table_size =1024*1024*1024;

数据库血缘分析
https://github.com/reata/sqllineage

SQLLineage: 基于Python的SQL血缘分析工具
https://reata.github.io/blog/sqllineage-a-sql-lineage-analysis-tool/

SQLLineage v1.3:迈向字段血缘
https://reata.github.io/blog/sqllineage-towards-column-lineage/

Find the size of each index in a MySQL table
https://dba.stackexchange.com/questions/49656/find-the-size-of-each-index-in-a-mysql-table

How to estimate/predict data size and index size of a table in MySQL
https://dba.stackexchange.com/questions/46069/how-to-estimate-predict-data-size-and-index-size-of-a-table-in-mysql

SELECT
    CONCAT(FORMAT(DAT/POWER(1024,pw1),2),' ',SUBSTR(units,pw1*2+1,2)) DATSIZE,
    CONCAT(FORMAT(NDX/POWER(1024,pw2),2),' ',SUBSTR(units,pw2*2+1,2)) NDXSIZE,
    CONCAT(FORMAT(TBL/POWER(1024,pw3),2),' ',SUBSTR(units,pw3*2+1,2)) TBLSIZE
FROM
(
    SELECT DAT,NDX,TBL,IF(px>4,4,px) pw1,IF(py>4,4,py) pw2,IF(pz>4,4,pz) pw3
    FROM
    (
        SELECT data_length DAT,index_length NDX,data_length+index_length TBL,
        FLOOR(LOG(IF(data_length=0,1,data_length))/LOG(1024)) px,
        FLOOR(LOG(IF(index_length=0,1,index_length))/LOG(1024)) py,
        FLOOR(LOG(data_length+index_length)/LOG(1024)) pz
        FROM information_schema.tables
        WHERE table_schema='mydb'
        AND table_name='mytable'
    ) AA
) A,(SELECT 'B KBMBGBTB' units) B;

select database_name, table_name, index_name, stat_value*@@innodb_page_size
from mysql.innodb_index_stats where stat_name='size';

mysql 索引对 null 的处理

一个表 mytable 总共 100w 条数据,主键是 int unsigned 类型,索引列是 CHAR(50) 类型,下面是创建索引的语句。

create index ix_mytable_json_xxx on mytable ((JSON_VALUE(json, '$.xxx' RETURNING CHAR(50))));

其中 12572 条数据的 json->>‘$.xxx’ 有值,其余都是 NULL, 现在看 ix_mytable_json_xxx 索引占用空间 13 MB。

select database_name, table_name, index_name, stat_value*@@innodb_page_size/1024/1024 size_M
from mysql.innodb_index_stats
where stat_name='size' and  database_name='test' and table_name='mytable' and index_nam='ix_mytable_json_xxx';

假设每行数据在索引里占用 50+4 的话,需要 1000000*(50+4)/1024/1024=51.49 MB,这远远大于 13 MB,说明 mysql 的索引里是对 NULL有优化的。

data

Using pyarrow how do you append to parquet file?
https://9to5answer.com/using-pyarrow-how-do-you-append-to-parquet-file

Unfortunately, this cannot append to an existing .parquet file (see my answer that can). Reason: Once .close() is called, the file cannot be appended to, and before .close() is called, the .parquet file is not valid (will throw an exception due to a corrupted file as it’s missing its binary footer). The answer from @Contango solves this.

https://github.com/cldellow/csv2parquet/issues/13

The nature of the Parquet format is such that you can’t really update an existing file. (Wellllll, maybe it’s technically possible to do surgery and add new row groups and then update the footer. It’d require using relatively low-level APIs in the parquet library, though.)

linix

持续检查某个进程的 CPU 和内存占用

$ while true; do echo -n `date +'%H:%M:%S '`" ";ps -eo pid,%cpu,%mem,cmd | grep 'mysqld'| grep -v grep; sleep 1;done
03:22:55  3226830 39.6  4.0 /usr/sbin/mysqld
03:22:56  3226830 39.6  4.0 /usr/sbin/mysqld
03:22:57  3226830 39.6  4.0 /usr/sbin/mysqld
03:22:58  3226830 39.6  4.0 /usr/sbin/mysqld
03:22:59  3226830 39.6  4.0 /usr/sbin/mysqld
03:23:00  3226830 39.6  4.0 /usr/sbin/mysqld

java

Spring boot开启定时任务的三种方式
https://blog.csdn.net/qianlixiaomage/article/details/106599951

  • 第一种也就是最简单的一种:基于注解 (@Scheduled)的方式;
  • 第二种:基于接口 (SchedulingConfigurer);
  • 第三种:基于注解设定多线程定时任务。

Convert String to Hex in Java
https://www.delftstack.com/howto/java/string-to-hex-java/

// method 1
StringBuilder stringBuilder = new StringBuilder();

char[] charArray = str.toCharArray();

for (char c : charArray) {
    String charToHex = Integer.toHexString(c);
    stringBuilder.append(charToHex);
}

System.out.println("Converted Hex from String: "+stringBuilder.toString());

// method 2
import org.apache.commons.codec.binary.Hex;
import java.nio.charset.StandardCharsets;

byte[] byteArray = str.getBytes(StandardCharsets.UTF_8);
String convertedResult = Hex.encodeHexString(byteArray);

// method 3
byte[] getBytesFromString = str.getBytes(StandardCharsets.UTF_8);
BigInteger bigInteger = new BigInteger(1, getBytesFromString);

String convertedResult = String.format("%x", bigInteger);

System.out.println("Converted Hex from String: " + convertedResult);

linux

How to record reverse proxy upstream server serving request in Nginx log?
https://stackoverflow.com/questions/18627469/how-to-record-reverse-proxy-upstream-server-serving-request-in-nginx-log

log_format upstreamlog '[$time_local] $remote_addr - $remote_user - $server_name $host to: $upstream_addr: $request $status upstream_response_time $upstream_response_time msec $msec request_time $request_time';
access_log /var/log/nginx/access.log upstreamlog;

Experiments about a better locate using grep
https://a3nm.net/blog/better_locate.html

Approach	Time/query (s)  	Index size (MB)  
mlocate	5.5	300
GNU locate	4.6	75
grep	1.7	890
LC_ALL grep	0.7	890
LC_ALL grep (parallel)	0.3	890
LC_ALL grep (parallel) gzip  	1.3	105
LC_ALL grep (parallel) pigz  	1.0	105
LC_ALL grep (parallel) lz4	0.7	160

javascript

知识图谱项目前端可视化图论库——Cytoscape.js简介 DAG
https://www.cnblogs.com/zhilin/p/12074688.html

  • Cytoscape.js 是一个用原生JS编写的开源图论(又名网络)库。你可以使用Cytoscape.js进行图形分析和可视化。
  • Cytoscape.js 支持许多不同的图论用例。它支持有向图,无向图,混合图,循环,多图,复合图(一种超图),等等。

https://ourcodeworld.com/articles/read/159/top-5-best-free-diagrams-javascript-libraries
https://hackernoon.com/my-top-13-javascript-diagram-libraries-g2a53z6u

什么是Black-Litterman资产配置模型?
https://zhuanlan.zhihu.com/p/542122376

  • Black-Litterman 模型是一个关于投资组合配置(portfolio allocation)的贝叶斯模型(Bayesian model),它的初衷在于马科维茨(Markowitz)提出的现代投资组合理论(modern portfolio theory)在实际操作中的问题。
  • 在现代投资组合理论中,投资组合的最优权重是通过一个均值-方差最优化(mean-variance optimization)问题的求解得到的,而最优化问题中的资产预期收益率(expected returns)和协方差(covariances)都是给定的,它们不一定是对于预期收益率的合理估计。
  • 尤其在投资者对特定资产的未来表现持某些观点时,马科维茨均值-方差最优化给出的权重是不合理的。
  • 为解决此问题,Black-Litterman模型使用贝叶斯方法将投资者的观点纳入到对预期收益率的估计中。Black-Litterman模型有一个重要假设:所有资产收益率都遵循同样的概率分布(probability distribution)。

python

How could I use requests in asyncio?
https://stackoverflow.com/questions/22190403/how-could-i-use-requests-in-asyncio

import asyncio
import requests

async def main():
    loop = asyncio.get_event_loop()
    future1 = loop.run_in_executor(None, requests.get, 'http://www.google.com')
    future2 = loop.run_in_executor(None, requests.get, 'http://www.google.co.uk')
    response1 = await future1
    response2 = await future2
    print(response1.text)
    print(response2.text)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())


import asyncio
import aiohttp

@asyncio.coroutine
def do_request():
    proxy_url = 'http://localhost:8118'  # your proxy address
    response = yield from aiohttp.request(
        'GET', 'http://google.com',
        proxy=proxy_url,
    )
    return response

loop = asyncio.get_event_loop()
loop.run_until_complete(do_request())


import aiohttp
import asyncio

async def fetch(session, url):
    async with session.get(url) as response:
        return await response.text()

async def main():
    urls = [
            'http://python.org',
            'https://google.com',
            'http://yifei.me'
        ]
    tasks = []
    async with aiohttp.ClientSession() as session:
        for url in urls:
            tasks.append(fetch(session, url))
        htmls = await asyncio.gather(*tasks)
        for html in htmls:
            print(html[:100])

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

浅度测评:requests、aiohttp、httpx 我应该用哪一个?
https://zhuanlan.zhihu.com/p/103711201

Make a Python asyncio call from a Flask route
https://stackoverflow.com/questions/47841985/make-a-python-asyncio-call-from-a-flask-route

How can I make python request module asyncio and aiohttp
https://stackoverflow.com/questions/75004160/how-can-i-make-python-request-module-asyncio-and-aiohttp

Making 1 million requests with python-aiohttp
https://pawelmhm.github.io/asyncio/python/aiohttp/2016/04/22/asyncio-aiohttp.html

AI

https://zhuanlan.zhihu.com/p/613155165

参考搜索引擎中“先检索再重排”的思路,针对文档问答设计“先检索再整合“的方案,整体思路如下:

  • 首先准备好文档,并整理为纯文本的格式。把每个文档切成若干个小的chunks
  • 调用文本转向量的接口,将每个chunk转为一个向量,并存入向量数据库
    • 文本转向量可以使用 openai embedding
    • 也可以使用其他方案,如 fasttext/simbert 等
  • 当用户发来一个问题的时候,将问题同样转为向量,并检索向量数据库,得到相关性最高的一个或几个chunk
  • 将问题和chunk合并重写为一个新的请求发给 openai api

sd 模型 webui
colab运行stable diffusion以及本地部署方法
https://zhuanlan.zhihu.com/p/561546984
https://blog.csdn.net/Kerry_Blue/article/details/128778005

国风
https://civitai.com/images/249208?modelId=10415&postId=119862&id=10415&slug=3-guofeng3
https://civitai.com/images/249210?modelId=10415&postId=119862&id=10415&slug=3-guofeng3

other

How to calculate cumulative sum / running total of a column in Excel?
https://www.extendoffice.com/documents/excel/5280-excel-calculate-cumulative-sum.html

=SUM($B$2:B2)
=IF(B2="","",SUM($B$2:B2))

条件 GPT 经验
https://c8xa9g10gh.feishu.cn/docx/SFnzdD0LqoM8kIxGVsrcAlRhn1d#SFnzdD0LqoM8kIxGVsrcAlRhn1d