ARTICLE AD BOX
I'm wanting to use Python to create an additional 5 sets of data per distance PER material I have. I have my CSV's Set up into two columns as such with 5 pieces of data per distance:
| 100 | 112 |
| 100 | 105 |
| 100 | 119 |
| 100 | 122 |
| 100 | 117 |
| 150 | 89 |
| 150 | 84 |
and so on up to 300mm increasing in 50mm increments. I have multiple CSV's set up like this for each material. Now what I want to do is increase each distances data from 5 to 10, 5 real world data and 5 predicted synthetic data to increase the strength of my lab reports results and to look better to the lecturers.
Now what I have for code is the following for one of the CSV's:
X_lead = lead.drop('counts', axis = 1) y_lead = lead['counts'] x_lead_train, x_lead_test, y_lead_train, y_lead_test = train_test_split(X_lead, y_lead, test_size=0.2, random_state=42 ) model = LinearRegression() model.fit(x_lead_train, y_lead_train) predictions = model.predict(x_lead_test) print(predictions) print(x_lead_train.shape, y_lead_train.shape) print(x_lead_test.shape, y_lead_test.shape) [25.825 25.175 26.15 24.85 25.5 ] (20, 1) (20,) (5, 1) (5,)Now what I'm noticing is its created 5 sets of data but I have 5 distances and what I'm assuming is that this created 1 additional piece of data per distance which is nice but I don't want ONE per distance. I want 5 per distance so I'm wondering what the better way to get around this is?
I have loaded each CSV as a dataframe into my code so I can easily re-use code to gain the new data.
