2024-08-05

202408

xargs 多个命令

cd /data/temp && find . -type f -mmin +1 | xargs -I {} sh -c 'rsync -avP {} bak-data-server:/home/bakdata/ && rm -f {}'

https://www.youtube.com/watch?v=JgKbe5tcgZ0

新特性解读 | MySQL 8.0 增强逻辑备份恢复工具介绍
https://zhuanlan.zhihu.com/p/267294729

MySQL Shell UTIL 对象附带的备份工具：

随 MySQL 8.0.21 最新版本捆绑发布，自带多线程备份以及多线程恢复功能，可以直接替代 mysqldump/mysqlpump。
dump_instance/dumpInstance 用来多线程备份 MySQL 整个单机实例
dump_schemas/dumpSchemas 用来多线程备份 MySQL 单个数据库
load_dump/loadDump 用来多线程恢复之前两个工具导出的数据集文件

MySQL Shell 使用指南
https://www.jianshu.com/p/954a011c36f7

安装 mysql-shell，不要用 snap 安装

snap list
sudo snap remove mysql-shell
wget https://dev.mysql.com/get/mysql-apt-config_0.8.32-1_all.deb
sudo dpkg -i mysql-apt-config_0.8.32-1_all.deb
sudo apt-get update
sudo apt-get install mysql-shell

MySQL 核心模块揭秘 | 28 期 | 什么时候释放锁？
https://mp.weixin.qq.com/s/egrrGrjpgX5tMw_xtWY9fw

select、update、delete 语句执行过程中，不管 where 条件是否命中索引，也不管是等值查询还是范围查询，只要扫描过的记录，都会加行锁。
和 update、delete 不一样，select 只在需要加锁时，才会按照上面的逻辑加锁。
可重复读（REPEATABLE-READ）、可串行化（SERIALIZABLE）两种隔离级别，
- 只要加了锁，不管是表锁还是行锁，都要等到事务提交或者回滚即将完成时才释放（手动加的表锁除外）。
读未提交（READ-UNCOMMITTED）、读已提交（READ-COMMITTED）两种隔离级别，
- 如果发现记录不匹配 where 条件，会及时释放行锁。

PG 查看表大小
SELECT pg_size_pretty(pg_total_relation_size(‘public.xxx’)),
pg_size_pretty(pg_relation_size(‘public.xxx’)),
pg_size_pretty(pg_indexes_size(‘public.xxx’));

+----------------+----------------+----------------+
| pg_size_pretty | pg_size_pretty | pg_size_pretty |
+----------------+----------------+----------------+
| 2827 GB        | 1188 GB        | 133 GB         |
+----------------+----------------+----------------+

去掉重音

>>> import unidecode
>>> unidecode.unidecode('Ольга Павлова')
"Ol'ga Pavlova"
>>> unidecode.unidecode('Эмиль Проказов')
"Emil' Prokazov"

一文掌握逻辑门器件选型要点，国产替代哪家强？
https://zhuanlan.zhihu.com/p/658295483

74系列逻辑门系列的特点

（1）74：早期产品，目前仍在使用，逐渐被淘汰；
（2）74H：74的改进型，属于高速TTL产品，其与非门传输时间达到10ns左右，但是电路静态功耗较大，逐渐被淘汰；
（3）74S：TTL的高速型肖特基系列，采用抗饱和肖特基二极管，速度较高但是品种较少，非主流；
（4）74LS：LS是低功耗肖特基，速度比HC系列的略快，属于TTL类的主要产品系列，工作于5V电压，品种和参加非常多，性价比高，在中小规模电路中应用普遍，LS输入开路为高电平，LS 输入端没有上拉或下拉要求，输出端体现下拉强上拉弱。
（5）74HC：HC是高速CMOS标准逻辑电路系列，具有与74LS-系列同等工作度和CMOS固有的低功耗和宽电源电压范围（2V~6V），HC的输入电阻很高，开路状态下电平不定，因此输入端要求有上下拉电阻来确定空置时的电平，电阻取值范围1K~10KΩ；
（6）74HCT：输入输出与LS系列兼容，但是功耗低；
（7）74AS：是74S的后继产品，速度进一步提升到1.5ns，因此也成为“先进超高速肖特基”系列；
（8）74ALS：“先进的低功耗肖特基”系列，属于74LS系列的后继产品，工作频率可达50MHz，典型速度4ns，典型功耗1mW，但是价格较高；
（9）其他：74F采用高速肖特基电路，74AC是带缓冲功能的逻辑门。

https://www.bilibili.com/video/BV12e411L7QB/?spm_id_from=333.788&vd_source=fefee58242775b49fd927c748741105d

并发导入 8000 个文件到数据库的调度策略，多进程调度方案：
1、起多个进程，每个进程处理一个文件列表，命令行参数传入 start_file_id 和 end_file_id
2、处理某个文件时用 {file_id}.progress 保存该文件处理进度
3、进程读取每读取 100 行 bulk insert 一次，成功后将行号写入进度文件
4、如果进程停止后重启，根据当前处理文件的 file_id 读取进度，continue 掉已处理的行
5、处理完一个文件后，将 {file_id}.progress 重命名为 {file_id}.done
6、进程处理某个文件时，如果存在对应 {file_id}.done 文件则跳过该文件
7、全程无锁无共享，可随时重启，随时从头跑，不重不漏

查看进度

for f in  progress/*.progress;do echo $f,`cat $f`; done;
ls -l progress/*.done | wc -l

加索引报错, tmp 所在磁盘满

mysql> alter table x add index ix_id(id);
ERROR 1030 (HY000): Got error 100 - 'InnoDB error' from storage engine

基于MySQL和Otter实现生产环境安全的数据同步及查询
https://baijiahao.baidu.com/s?id=1789028486868642783&wfr=spider&for=pc

3、异地机房同步（是Otter最大的亮点之一，可以解决国际化问题把数据从国内同步到国外提供用户使用，在国内场景可以做到数据多机房容灾)
4、双向同步（双向同步是在数据同步中最难搞的一种场景，Otter可以很好的应对这种场景，Otter有避免回环算法和数据一致性算法两种特性，保证双A机房模式下，数据保证最终一致性）
- 1）避免回环算法 (通用的解决方案，支持大部分关系型数据库)
- 2）数据一致性算法 (保证双A机房模式下，数据保证最终一致性，亮点)

ADHD Test

how often do you have trouble wrapping up the final details of a project, once the challenging parts have been done?
how often do you have difficulty getting things in order when you have to do a task that requires organization?
how often do you have problems remembering appointments or obligations?
when you have a task that requires a lot of thought, how often do you avoid or delay getting started?
how often do you fidget or squirm with your hands or feet when you have to sit down for a long time?
how often do you feel overly active and compelled to do things, like you were drven by a motor?
how often do you make careless mistakes when you have to work on a boring or difficult project?
how often do you have diffiulty keeping your attention when you are doing boring or repetitive work?
how often do you have diffiulty keeping your attention when you are doing boring or repetitive work?
how often do you have difficulty concentrating on what people say to you, event when they are speaking to you directly?
how often do you misplace or have difficulty fiding things at home or at work?
how often are you distracted by activity or noise around you?
how often dou you leave your seat in meetings or other situations in which you are expected to remain seated?
how often do you feel restless or fidgety?
how often do you have difficulty unwinding and relaxing when you have time to yourself?
how often do you find yourself talking too much when you are in social situations?
when you’re in a conversation, how often do you find yourself finishing the sentences of the people you are talking to,before they can finish tehm themselves?
how often do you have difficulty waiting your turn in situations when turn taking is required?
how often do you interrupt others when they are busy?

Deep Tech, Explained
From AI to quantum computing and bioengineering, deep tech is being used to solve tomorrow’s problems today.
https://builtin.com/artificial-intelligence/deep-tech

Deep tech, or deep technology, describes the work of companies developing innovations that surpass technological benchmarks and push the boundaries of current technology.

While deep tech companies are often involved in fields like artificial intelligence, biotechnology and quantum computing, the category also includes companies operating in agriculture, aerospace, green energy, mobility and more. Some have become household names, like Moderna, Tesla and Impossible Foods.

【Rethinking IT】如何结合 Unifi 和 MikroTik 设备打造家庭网络
https://sspai.com/post/81572

https://github.com/rcoh/angle-grinder
Angle-grinder allows you to parse, aggregate, sum, average, min/max, percentile, and sort your data. You can see it, live-updating, in your terminal. Angle grinder is designed for when, for whatever reason, you don’t have your data in graphite/honeycomb/kibana/sumologic/splunk/etc. but still want to be able to do sophisticated analytics.

数据库 name 列去重

# 把需要去重的列排序后导出导磁盘
mysql> select name INTO OUTFILE '/data/mysqltmp/name.csv' from t order by name;

# 用 uniq -c 命令过滤出出现过一次的 name，因为已经排序，所以不需要 sort 命令
$ pv name.csv | uniq -c| awk '$1==2{print $2,$1}' > duplicate_name.txt
29.1GiB 0:01:40 [ 297MiB/s] [==============================================================================================>] 100%

The first 20 hours – how to learn anything | Josh Kaufman | TEDxCSU
https://www.youtube.com/watch?v=5MgBikgcWnY

agrind

 tail -f /var/log/nginx/access.log | agrind '* | parse "* * * * * * \"*,*\" * * \"* * *\"" as time,status,request_time,_,_,_,ip,_,_,_,method,url,http_version|count,p90(request_time) by url'

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 609028 ubuntu    20   0 5102012   1.8g   4620 S 100.0  11.6  28:43.72 agrind

Data compression in Mysql
https://medium.com/datadenys/data-compression-in-mysql-6a0668af8f08

Is Black Myth Wukong Worth Buying
https://www.youtube.com/watch?v=yFF8FWBKSzI

使用Zstandard(zstd)压缩优化备份脚本，实现低负载备份网站数据
https://www.yunloc.com/2438.html

滴滴落地ZSTD压缩算法的实践分享
https://t.it168.com/article_6816820.html

ZSTD（Zstandard）底层基于FSE编码实现，具有出色的压缩和解压速度。ZSTD算法的实现经过了高度优化，通过SIMD等指令集能够充分利用硬件并行性，同时编码过程大量依赖位移运算来完成状态的切换，以此提高处理速度。ZSTD采用字典压缩算法，通过引用字典中的匹配项，能够大大减少重复数据的存储空间，提高压缩比。与此同时，ZSTD采用多级压缩策略，在不同的压缩级别中应用不同的压缩算法，能够在不同的应用场景中灵活地平衡速度和压缩比。

https://python-zstandard.readthedocs.io/en/latest/concepts.html

Counting & Timing
https://code.flickr.net/2008/10/27/counting-timing/

https://docs.vivgrid.com/quick-start

一文学会 OpenAI 的函数调用功能 Function Calling
https://zhuanlan.zhihu.com/p/641239259

https://cookbook.openai.com/examples/how_to_call_functions_with_chat_models

一个人，13年，70个创业项目，独立开发的超级传奇
https://mp.weixin.qq.com/s/T1MGR6yTh_3Z9nU3U9G-Yw

https://cookbook.openai.com/examples/structured_outputs_intro

question = "how can I solve 8x + 7 = -23"

{
   "steps": [
      {
         "explanation": "First, we need to isolate the term with x on one side of the equation, so we subtract 7 from both sides.",
         "output": "8x + 7 - 7 = -23 - 7"
      },
      {
         "explanation": "Simplify both sides. The 7 and -7 on the left-hand side cancel each other out, leaving 8x, and -23 - 7 simplifies to -30.",
         "output": "8x = -30"
      },
      {
         "explanation": "To solve for x, divide both sides of the equation by 8, which is the coefficient of x.",
         "output": "8x/8 = -30/8"
      },
      {
         "explanation": "Simplify -30/8 by reducing the fraction. Divide both numerator and denominator by their greatest common divisor, which is 2.",
         "output": "x = -15/4 or x = -3.75"
      }
   ],
   "final_answer": "x = -3.75"
}

古龙：最放浪的人，最贞洁的小说
https://m.thepaper.cn/baijiahao_9272512

https://github.com/rany2/edge-tts

Programmable Music
http://overtone.github.io/
音乐制作小 Tips：善用 DAW 软件和 MIDI 信号
https://www.sohu.com/a/682533469_121124710

https://b23.tv/MxZntYv

举世流行的作谱应用
https://musescore.org/zh-hans

MuseScore is an open source and free music notation software.
https://github.com/musescore/MuseScore

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/rest-text-to-speech?tabs=streaming
ogg-16khz-16bit-mono-opus
ogg-24khz-16bit-mono-opus
ogg-48khz-16bit-mono-opus

音频编解码器 - Opus
https://www.jianshu.com/p/be8d40b61171
支持的特性包括：

比特率从 6kb/s 到 510 kb/s
采样率从 8kHz（窄带）到 48kHz（全频段）
帧大小从 2.5ms 到 60ms
支持恒定比特率（CBR）和可变比特率（VBR）
从窄带到全频带的音频带宽
支持语音和音乐
支持单声道和立体声
支持多达255个通道（多流帧）
动态可调比特率，音频带宽和帧大小
良好的稳健性和隐蔽性
浮点和定点实现

从 Flask 切到 FastAPI 后，起飞了！
https://blog.csdn.net/m0_59596937/article/details/136383593

Nuxt.js是一个非常强大的框架，但是并不是所有类型的网页都必须使用SSR技术。由于使用SSR需要服务端渲染每一个请求，对服务器资源要求高，尤其是高并发访问的情况，会大量占用服务端CPU资源。我们需要根据业务需求来决定使用什么技术栈去开发页面。

1.SSR
如果项目首屏是动态数据的，对首屏速度要求高且很在乎在搜索引擎的排名，可以考虑使用SSR。

2.SSG
如果项目首屏数据与用户不是强关联的、内容相对稳定的且需要SEO的可以考虑使用预渲染 SSG（Static Site Generation），Nuxt也可以做SSG，避免服务端的渲染压力。

3.CSR
如果网站只是内部产品不需要SEO，或者项目较小不太在意首屏性能的，或者项目是富交互和动态内容的应用，那么还是直接用Vue是最明智的选择。可以通过优化项目体积及其他优化方法如：骨架屏或者添加loading图等方式来优化用户体验。

原文链接：https://blog.csdn.net/m0_55119483/article/details/131491881

binlog 过期天数

mysql> show variables like '%expire%';
+--------------------------------+---------+
| Variable_name                  | Value   |
+--------------------------------+---------+
| binlog_expire_logs_auto_purge  | ON      |
| binlog_expire_logs_seconds     | 2592000 |
| disconnect_on_expired_password | ON      |
| expire_logs_days               | 0       |
+--------------------------------+---------+
4 rows in set (0.01 sec)

mysql> select 2592000/60/60/24;
+------------------+
| 2592000/60/60/24 |
+------------------+
|  30.000000000000 |
+------------------+
1 row in set (0.00 sec)

set global binlog_expire_logs_seconds=4*24*60*60;
show binary logs;
purge binary logs to 'mysql80-bin.000164';
purge binary logs before '2021-22-20 22:00:00';

将Vue项目迁移到Nuxt
https://zhuanlan.zhihu.com/p/664801054

既然加法是基于群论的，为什么小学不先学群论？ - ReversedT的回答 - 知乎
https://www.zhihu.com/question/639591755/answer/3366904561

https://cn.vuejs.org/guide/reusability/composables#what-is-a-composable

说说响应式和函数式
https://blog.csdn.net/jjclove/article/details/124388096

dump mysql to parquet then query by duckdb

# install duckdb and csv2parquet
wget https://github.com/duckdb/duckdb/releases/download/v1.1.0/duckdb_cli-linux-amd64.zip && unzip duckdb*.zip
cargo install csv2parquet
# dump data from mysql to parquet
mysql -h mysql-server -u readonly -p db -Bne 'select * from t limit 1000000;' | pv -p --timer --rate --bytes |csv2parquet -d '       ' /dev/stdin test.parquet
# query pqrquet by duckdb
duckdb -s 'select count(*) from "test.parquet"'

curl <FILE_URL> | csv2parquet /dev/stdin /dev/stdout | aws s3 cp - <S3_DESTINATION>
csv2parquet --header false test.csv  test.parquet

[Python] Incrementally using ParquetWriter keeps data in memory (eventually running out of RAM for large datasets)
https://github.com/apache/arrow/issues/26073

This post is for you. It’s for everybody who plans to write some kind of HTTP service in Go. You may also find this useful if you’re learning Go, as lots of the examples follow good practices. Experienced gophers might also pick up some nice patterns.
https://grafana.com/blog/2024/02/09/how-i-write-http-services-in-go-after-13-years/

Instructors’ Guide to Raft
https://thesquareplanet.com/blog/instructors-guide-to-raft/

Students’ Guide to Raft
https://thesquareplanet.com/blog/students-guide-to-raft/

MySQL 输出到文件

pager cat >/tmp/test.txt 
nopager

埃隆·马斯克认为每个孩子都应该了解的50种认知偏见
https://36kr.com/p/1745363984445063

https://github.com/slimtoolkit/slim
Don’t change anything in your container image and minify it by up to 30x (and for compiled languages even more) making it secure too! (free and open source)

Python环境管理大比拼：pip、Conda、Pyenv、Rye、Virtualenv、PDM、Poetry等工具
https://new.qq.com/rain/a/20240303A055WM00

uv-速度飞快的pip替代
https://zhuanlan.zhihu.com/p/689976933

在Linux系统中，恢复已删除的.log文件的难度取决于多个因素，包括文件系统类型、文件删除后是否被覆盖等。以下是一些可能的恢复方法：

使用extundelete：
如果你使用的是ext3或ext4文件系统，可以尝试使用extundelete工具。
- 首先，卸载该分区（如果可以的话），然后运行：
  1
  extundelete /dev/sdX --restore-file /path/to/your/file.log
- 这将尝试恢复该文件并将其保存到当前目录。
使用photorec：
photorec是一个更通用的恢复工具，可以尝试恢复各种类型的文件。
- 安装testdisk，它包含了photorec：
  1
  sudo apt install testdisk
- 然后运行photorec并按照提示选择分区和文件类型进行恢复。
检查备份：
如果你有定期备份，可以从备份中恢复该文件。
查看日志轮转：
如果系统使用了日志轮转（如logrotate），可能会在某个位置保存旧的日志文件。可以检查/var/log目录下是否有相关的备份文件。

请注意，文件删除后，新的数据写入可能会覆盖原有的数据，降低恢复的成功率。因此，尽快采取恢复措施会更有效。

VScode CPU 占用过高 | search.follow false & exclude nodemodules
https://www.cnblogs.com/dhjy123/p/11906142.html

python bloom filter
from pybloom_live import BloomFilter

# 创建布隆过滤器，设置最大元素数量和误判率
bloom = BloomFilter(capacity=1000, error_rate=0.1)

# 添加元素
bloom.add("apple")
bloom.add("banana")

# 检查元素
print("apple" in bloom)   # 输出: True
print("banana" in bloom)  # 输出: True
print("cherry" in bloom)  # 输出: False（可能为假阳性）

按行号分隔

split --additional-suffix=.csv -d --numeric-suffixes=0 --suffix-length=3 -l 200000 abc.csv abc_

sed 打印第 n-m 行并显示原始行号
sed -n ‘4388879,4388883{=;p}’ a.txt

_csv.Error: line contains NUL 错误

sed -i 's/\x0//g' scileads-phone-numbers-with-source.all.filtered.csv

binlog 提取某个表的操作

mysqlbinlog --database=your_database --tables=your_database.your_table /path/to/binlog.000001 > /path/to/output.sql