If one of their goals is to sell premium access to train LLMs this type of gibberish would hurt that. When you can’t guarantee that the data source is coherent, then that would have an impact on the final model that is created.
I think a better approach is to transfer comments to a new platform or create new higher quality content. Could the solution to this problem become a guide that goes into more detail?
If one of their goals is to sell premium access to train LLMs this type of gibberish would hurt that. When you can’t guarantee that the data source is coherent, then that would have an impact on the final model that is created.
I think a better approach is to transfer comments to a new platform or create new higher quality content. Could the solution to this problem become a guide that goes into more detail?