@A1kmm

A1kmm@lemmy.amxl.com · 18 days ago

When people say Local AI, they mean things like the Free / Open Source Ollama (https://github.com/ollama/ollama/), which you can read the source code for and check it doesn’t have anything to phone home, and you can completely control when and if you upgrade it. If you don’t like something in the code base, you can also fork it and start your own version. The actual models (e.g. Mistral is a popular one) used with Ollama are commonly represented in GGML format, which doesn’t even carry executable code - only massive multi-dimensional arrays of numbers (tensors) that represent the parameters of the LLM.

Now not trusting that the output is correct is reasonable. But in terms of trusting the software not to spy on you when it is FOSS, it would be no different to whether you trust other FOSS software not to spy on you (e.g. the Linux kernel, etc…). Now that is a risk to an extent if there is an xz style attack on a code base, but I don’t think the risks are materially different for ‘AI’ compared to any other software.

A1kmm@lemmy.amxl.com · 27 days ago

Blockchain is great for when you need global consensus on the ordering of events (e.g. Alice gave all her 5 ETH to Bob first, so a later transaction to give 5 ETH to Charlie is invalid). It is an unnecessarily expensive solution just for archival, since it necessitates storing the data on every node forever.

Ethereum charges ‘gas’ fees per transaction which helps ensure it doesn’t collapse under the weight of excess usage. Blocks have transaction limits, and transactions have size limits. It is currently working out at about US$7,500 per MB of block data (which is stored forever, and replicated to every node in the network). The Internet Archive have apparently ~50 PB of data, which would cost US$371 trillion to put onto Ethereum (in practice, attempting this would push up the price of ETH further, and if they succeeded, most nodes would not be able to keep up with the network). Really, this is just telling us that blockchain is not appropriate for that use case, and the designers of real world blockchains have created mechanisms to make it financially unviable to attempt at that scale, because it would effectively destroy the ability to operate nodes.

The only real reason to use an existing blockchain anyway would be on the theory that you could argue it is too big to fail due to legitimate business use cases, and too hard to remove censorship resistant data. However, if it became used in the majority for censorship resistant data sharing, and transactions were the minority, I doubt that this would stop authorities going after node operators and so on.

The real problems that an archival project faces are:

The cost of storing and retrieving large amounts of data. That could be decentralised using a solution where not all data is stored on a chain - for example, IPFS.
The problem of curating data and deciding what is worth archiving, and what is a true-to-source archive vs fake copy. This probably requires either a centralised trusted party, or maybe a voting system.
The problem of censorship. Anonymity and opaqueness about what is on a particular node can help - but they might in some cases undermine the other goals of archival.

A1kmm@lemmy.amxl.com · 27 days ago

This is absolutely because they pulled the emergency library stunt, and they were loud as hell about it. They literally broke the law and shouted about it.

I think that you are right as to why the publishers picked them specifically to go after in the first place. I don’t think they should have done the “emergency library”.

That said, the publishers arguments show they have an anti-library agenda that goes beyond just the emergency library.

Libraries are allowed to scan/digitize books they own physically. They are only allowed to lend out as many as they physically own though. Archive knew this and allowed infinite “lend outs”. They even openly acknowledged that this was against the law in their announcement post when they did this.

The trouble is that the publishers are not just going after them for infinite lend-outs. The publishers are arguing that they shouldn’t be allowed to lend out any digital copies of a book they’ve scanned from a physical copy, even if they lock away the corresponding numbers of physical copies.

Worse, they got a court to agree with them on that, which is where the appeal comes in.

The publishers want it to be that physical copies can only be lent out as physical copies, and for digital copies the libraries have to purchase a subscription for a set number of library patrons and concurrent borrows, specifically for digital lending, and with a finite life. This is all about growing publisher revenue. The publishers are not stopping at saying the number of digital copies lent must be less than or equal to the number of physical copies, and are going after archive.org for their entire digital library programme.

A1kmm@lemmy.amxl.com · 28 days ago

No

On economic policy I am quite far left - I support a low Gini coefficient, achieved through a mixed economy, but with state provided options (with no ‘think of the businesses’ pricing strategy) for the essentials and state owned options for natural monopolies / utilities / media.

But on social policy, I support social liberties and democracy. I believe the government should intervene, with force if needed, to protect the rights of others from interference by others (including rights to bodily safety and autonomy, not to be discriminated against, the right to a clean and healthy environment, and the right not to be exploited or misled by profiteers) and to redistribute wealth from those with a surplus to those in need / to fund the legitimate functions of the state. Outside of that, people should have social and political liberties.

I consider being a ‘tankie’ to require both the leftist aspect (✅) and the authoritarian aspect (❌), so I don’t meet the definition.

A1kmm@lemmy.amxl.com · 1 month ago

The best option is to run them models locally. You’ll need a good enough GPU - I have an RTX 3060 with 12 GB of VRAM, which is enough to do a lot of local AI work.

I use Ollama, and my favourite model to use with it is Mistral-7b-Instruct. It’s a 7 billion parameter model optimised for instruction following, but usable with 4 bit quantisation, so the model takes about 4 GB of storage.

You can run it from the command line rather than a web interface - run the container for the server, and then something like docker exec -it ollama ollama run mistral, giving a command line interface. The model performs pretty well; not quite as well on some tasks as GPT-4, but also not brain-damaged from attempts to censor it.

By default it keeps a local history, but you can turn that off.

A1kmm@lemmy.amxl.com · 2 months ago

My grandparents had a lot of antiques, some probably which they inherited. My grandfather was particular proud of his clockwork wind-up clock (which was an antique even back then). I disassembled it to find out how it worked, but couldn’t figure out how to reassemble it (and my granddad couldn’t either).